As machine learning engineers and researchers, the most important thing we care about while developing any application or product is the data. Data is the heart and soul of any machine learning application. With the abundance of big data nowadays, a lot of machine learning applications are now possible. Basically, machine learning models are built and taught using data. Teaching a machine learning model a certain task can be done using one or more of the three main types – Supervised Learning, Unsupervised Learning, and Reinforcement Learning.
Supervised learning is all about having an amount of labelled data as inputs and expected outputs corresponding to these inputs. The goal is to build a model that learns the mapping between the inputs to the outputs, from which it should be able to learn the underlying relations in the data, and then be able to predict the outputs of unlabelled data. This kind of learning is very reliable for industrial usages, but its drawback is the need for labelled data, which might not be available in all applications, and manually labelling dozens of examples might be costly.
Unsupervised learning is used in case we have unlabelled data. The algorithms based on unsupervised data aim to find the underlying relations and patterns among the unlabelled data, these relations can be then used to cluster or classify the unlabelled data.
Simply put, reinforcement learning is used in case we neither have input data nor output data. A model is built to learn a specific task by trial and error from dozens of iterations in an environment relevant to the environment it will be deployed at.
In most industries, we often tend to use Supervised Learning. However, the main challenge is that in some specific applications, data can be scarce. Whenever we encounter this challenge, we used to head to one of two approaches – Data Augmentation to mess up and alter the data in order to increase its amount or to label more data. But in some applications, these two approaches still might not help us.
Imagine a task where we need to build a classification with only one or two samples per class, and each sample is super difficult to find. For example, classifying or detecting ancient Egyptian characters, or a very rare disease from an X-ray or a CT scan. This would call for innovative approaches such as Few-Shot Learning (FSL) which is driven perfectly by some main factors such as the scarcity of data, reducing data collection and computational costs, and rare-case learnings.
Few-shot learning (FSL) is a sub-area of machine learning, where the N in “N-Shot Learning” equals a few numbers of examples, usually less than five, which all cover new data that we only have a minimal number of labelled samples. FSL is still a new area in research, but today it is usable in some computer vision tasks.
FSL is used in applications where data isn’t available enough, or annotating new data will be very costly, for example;
- Computer Vision
- Character Recognition
- Image Classification
- Object Recognition
- Object Tracking
- Motion Prediction
- Action Localization
- Natural Language Processing
- Machine Translation
- Sentence Completion
- User Intent Classification
- Sentiment Analysis
- Multi-label Text Classification
- Drug Discovery
- X-Ray Diagnosis, and Classification
N-Shot Learning Variations:
- Few-Shot Learning (FSL)
- One-Shot Learning (OSL)
- Zero-Shot Learning (ZSL)
This is the most interesting type of FSL, where the model covers totally unseen classes without any labelled training examples. It seems somehow fictional, but it is real. Imagine that you have a model that is capable of describing the input precisely, its appearance, properties, functionality, and more, then it won’t be a problem distinguishing different inputs. This area is actually now functional and it has applications that it is used in, for example, face recognition, where a model is capable of describing faces precisely, then it can be used to recognize dozens of humans that it has never seen before
One-Shot Learning and Few-Shot Learning:
Similarly, OSL is where you only have one labelled example, and FSL is where you have about two to five training examples.
The N-way-K-Shot-classification problem, where our data consists of training examples, and query examples.
Training Examples (Support Set), consists of N class labels, and K labelled images for each class, where K is usually less than 10. So, we have only N * K labelled samples.
Query Examples are the examples that we want to classify among the N classes.
There are two main approaches used when solving FSL problems;
- Data-level Approach (DLA)
- Parameter-level Approach (PLA)
This approach is simple and straightforward. It is used when we don’t have enough data to build a model and avoid over and underfitting. Many FSL problems are solved by adding more information from a large base dataset. The main feature in the base-dataset is that it doesn’t include the classes we have in the support set for the few-shot task. For example, if we’re trying to classify a certain type of dog, the base-dataset can have images of many other dogs. Also, more data can be generated either by data augmentation or generative adversarial networks (GANs).
From the parameter-level aspect, the model is pretty prone to overfitting since we’re dealing with high-dimensional spaces with a limited number of training data. Usually, this can be solved by regularization and suitable loss functions.
Moreover, the model performance can be enhanced by directing it to the extensive parameter space. We train our model to find the best route in the parameter space to give optimal results, which is called Meta-Learning.
In Meta-Learning, we have a set of tasks we want to learn, called Training Tasks. The experience learned from the training tasks will be further used to solve the problem of the few-shot learning task. The training tasks are learned from the base dataset. The meta-training processes consist of a number of finite episodes, each episode we choose N classes and K support-set images per class from the base-dataset, as well as Q query images from the query set of the base-dataset. This way, we build a task similar to our few-shot learning task. The parameters of the model are trained to maximize the accuracy of the Q images from the query set. While the meta-testing process is measuring the accuracy of the model on our test few-shot learning task
The Meta-Learning technique is basically learning a distance function among the data rather than learning the mapping of the data itself. For example; modern face recognition algorithms are mostly based on metric learning, as the models learn to maximize the distance functions between faces in different types of spaces.
This approach is based on building two models, a meta-learner and a base-learner. The meta-learning learns across episodes, while the base-learner is initialized and trained each episode by the meta-learner.
The technique is simply applying the following sequence; choose a meta-learner model, start an episode, initialize the base-learner model, train the base-learner using the support-set, the algorithm used to train the base-learner is defined by the meta-learner, the base-learner predicts the query set, meta-learner parameters are trained on the loss resulting from the classification error, and so on based on the choice of the meta-learner.
Now, Few-shot learning-enabled AI is feasible in some applications that seemed impossible a couple of years ago. Presently we can build models capable of recognizing hundreds of millions of people, using 1 image per person, classifying vehicles’ make and model just after their release using minimal data, and updating our models online with any new classes. It is also now possible to diagnose, classify, and detect rare anomalies in medical imagery which helped improved healthcare worldwide. In a nutshell, Few-shot learning is a crucial step in building systems that are as skillful as our human minds are by using minimal observations to learn new tasks.
Author: Mohamed Tareq Dawoud, Computer Vision Engineer