In this article, I am going to discuss zero-shot learning. In zero-shot learning, we don't have any sample image to train our object of interest. In simple words In the object recognition task to train an image we have not any sample image that will use to train our model. You can say that we are trying to working on unseen data for recognizing an object of interest, that's called zero-shot learning.
Why there is a need for Zero-Shot Learning?
So, the question arises why we need zero-shot learning. In the process of traditional object recognition, to perform object recognition we need a lot of object classes to achieve a better success rate. For object recognition, we need to collect sample images from different environments and sources to train our classifier. And to the collection of such type of images is also a big challenge. To get a higher success rate we also need to acquire these sample images from different angles and different environments. These environments could be an extreme environment. So, the collection of sample images is the main challenge.
A limited number of sample images
For example, we are trying to recognize such species that belong to the deep-sea or that live in some extreme environments where the visit of humans is impossible. So, collecting these types of sample images or data is not much easy. As I said earlier, we need sample images from different angles, as images at the same angles mean poor results. That's why we need lots of sample images to train our model or classifier. The difficulty with a limited no of images means a lower success rate.
Need an expert
In the process of labeling the image, sometimes we need an expert. For example, to label an image that has some tumor diseases, we need a specialist. Similarly, to perform object recognition on sea species or some plants we need an expert to label those images. Again, you need a lot of effort in terms of (time and cost) to perform such tasks.
Recommend:
Difference between Data Science and Data Visualization
Till now we have discussed zero-shot learning. Now it’s time to implement a zero-shot learning model step by step.
Zero-Shot Learning - An Approach
Remember in zero-shot learning, we don't use any sample image to train our model. In other words, we are going to recognize an object that has never been seen before. To apply any machine learning or deep learning algorithm we should represent the data with some reasonable feature. We are going to use the below representation.
- Image Embedding
- Class Embedding
Image Embedding
It uses a convolutional network to extract features from an image. To implement a convolutional network, we may use a pre-trained network or we can also implement it from scratch. Using a pre-trained convolutional network is better because it has already been proven by its success rate. Here we use the VGG16 pre-trained model to extract features from an image. It is proposed by the Visual Geometry Group at the University of Oxford. VGG16 means it uses VGG with 16 weight layers to extract features. We may also use VGG19, which uses 19 weighted layers.
Now that we have the training and zero-shot classes. For training, we have sample images, but for zero-shot, we don’t have any image samples, because it is not possible to get image embeddings for zero-shot. we need another data representation that will function as a bridge between training and zero-shot learning.
Class Embedding
To represent class labels in vector form we use class embedding. To access classes besides their image representation we use vector representation. For training, we have class labels as well as image samples. That's why for training we have both embeddings image as well class. On the other hand, for zero shots we have only class labels. As discussed earlier we have never seen image samples for zero-shot.
How zero-shot learn
For training classes, we use image vector (image embedding) and their related class embedding. Using these techniques our network will learn. This way we map our input image to a vector. After training is done, in a zero-shot learning scenario as a result of the given input image we have output in vector form. Then by using this output vector we will be able to perform image classification.
Below are the steps to perform zero-shot learning
Data Collection (Divide data into two sets, first for a training class and second for zero-shot class.), here one thing should be remembered we are not going to use images as a sample which we have selected for zero-shot learning. We will only use training image samples.
- Image Feature Extraction and Datasets Formation (For feature extraction we use VGG16 - A per-trained network model)
- Data presentation (Class Embedding or Word Embedding)
- Model Training (Structure of model should be in such a manner that given input mapped to output vector)
- Comparing Scores (To achieve better results)
- Zero-Shot Learning model evaluation
In the end, we need to evaluate the zero-shot learning model. The purpose of the evaluation is to measure performance. For evaluation, we will use a sample of zero-shot learning class, remember these samples are never been used at any stage of model training. We will measure the performance of the zero-shot model by using the output vector, and for that purpose, we use Euclidean distance metric rules. In the end, we determine that the class belonging to the closest vector is our output vector.
Conclusion
Zero-Shot Learning is a very unique research area and has high potential in the field of CV. We can use this in robotics, surveillance cameras and to recognize the objects even we have never seen a sample image. We can say that it is similar to the human visual system.
0 Comments