I have to discuss the human brain. A human brain constantly predicts
the world such as predicting everything we see around us. And the nature
of humans is, we see somewhere he tries to set the label to that thing
or (object). In short, we see things, put a label around the thing, and
remember that object for later prediction. Similarly, Convolutional
Neural Network works.
You want to learn
What is Zero-Shot Learning
Convolutional Neural Network
In CNN (Convolutional Neural Network) we show or feed millions of images to our computer algorithm, and make it able to predict the image (by training model) which it has never seen before. We know that the computer is a machine and knows only numbers. If we say that for a computer, every image is just a two-dimensional image (it not wrong). A computer perceive this image in different ways. And to make able to predictions after viewing these images we use Artificial Neural Network. An artificial neural network simulates how the human brain works and processes information, after that it built itself like the human brain to perform tasks.
CNN uses a class of deep learning methodology, which uses ANN (artificial neural networks) in the background, to perform Computer Vision tasks, such as image labeling, object detection, and object recognization.
CNN uses a class of deep learning methodology, which uses ANN (artificial neural networks) in the background, to perform Computer Vision tasks, such as image labeling, object detection, and object recognization.
Convolutional Neural Network consists of multiple building blocks:
- Convolution Layer (Feature Extraction)
- Pooling Layer (Activation Map | Feature Map)
- Classification - Fully Connected Layer
Above are the techniques that CNN (Convolutional Neural Network) uses to learn or train an algorithm. Here the question is what is a Deep Convolutional Neural Network? The answer is in a Convolutional Neural Network we use multiple layers of convolution in the network. Deep learning is a modern concept in which we use an unbounded number of convolution layers.
Convolution Layer
To extract features from a given input (image) we use convolution. It is a linear operation. To perform convolution we apply kernel (in general filters) across the input image and perform element-wise product between kernel and input image. And As a result, we obtain output value. In simple words to perform convolution, we have the image as input and filter or kernel and in the result, we have a feature map.
Example
Suppose we have 256 X 256 image and 5 X 5 kernel (filter), Convolution Neural Network efficiently scan the image using kernel by sliding window technique on the given image. To slide the filter on an image we use stride, which we will discuss below.
Stride
Another important concept in convolution is stride, stride is the size of steps by which the kernel moves on the input image (2-dimensional array). By default or usually stride size is 1, but you can change it according to your application. Suppose if we take the stride size of 2, we will take two steps both in the vertical and horizontal direction (pixel-wise). And we can say that stride helps in reducing the image size. And it is a useful feature in all the processes.
How Convolution Works
A typical CNN has multiple convolution layers and each layer creates multiple convolutions called tensors. By sliding the filter or kernel over the input image we perform convolution. As a result of convolution, matrix multiplication is done and as the result of that matrix multiplication, we got a feature map. For example, for a 5X5 matrix, we may use a 3X3 filter. We apply this filter to our input image and performs matrix multiplication. We can apply the number of convolution on our input image, and this results in different feature maps (activation map). Remember Convolution is a linear operation and we need to convert its output into non-linear. To do that we use the activation map (Mostly used activation map is RELU).
RELU
ReLU stands for the Rectified Linear Unit. And it is the most used activation function. The major advantage of using ReLU over another activation function is that it does not activate all the neurons at the same time. The main purpose of ReLU is to make out non-linear by making all the input 0 if the input is less than 0. You can understand ReLU by the below function.
F(x) = max(x,0)
F(x) = max(x,0)
Pooling Layer
After the convolution layer, it is most common to apply to the pool. To reduce the spatial size of convolution which we got as the result of convolution we use pooling. We can say that the pooling layer is specially used to reduce the size of the input, that why it speeds up the computation. The main purpose of pooling is to reduce dimensionality reduction, this will decrease computational power to process the convoluted data. There are multiple types of pooling.
- Max Pooling
- Average Pooling
- Sum Pooling
Mostly used pooling technique is max pooling, it reduces the noise from the output. Max pooling performed better than average pooling that's why we use max pooling.
Example
If we apply 2 X 2 max-pooling on our input image, then this will give us a 2 X 2 matrix as result. In max-pooling, we take the max of output after applying the filter.
Classification - Fully Connected Layer
After convolution and pooling, we need to pass output data into the Fully connected Layer, it accepts only one-dimensional data. And to achieve this (3-D to 1-D) we use the Flatten approach provided by python. This layer takes input (the result or output of the convolution and Pooling layer) and generates an output in vector form N vector. Each number in the N-dimensional vector this vector represents the probability of a certain class (which we want to achieve).
In the last steps, we need some optimization (by using weight adding or parameter tuning) to do better our classification result.
In the last steps, we need some optimization (by using weight adding or parameter tuning) to do better our classification result.
0 Comments