Convolution Neural Network



A convolutional neural network is a class of deep neural networks, most commonly applied to analyzing visual imagery.
Convolution Neural Network(CNN) is the main category of Image classification, Object dectection, Image recognition, etc. It takes input and classify output. Like it takes image of Pizza, Burger, Sandwich, Drinks random image as input and classify it in those category. Computer take image pixel array as input. Image with 4 * 4 * 3 array means height=4, width=4, dimension=3(RGB) and 4 * 4 * 1 means height and width =4 and 1(Grayscale) array matrix.

# CONVOLUTION LAYER :
•    First layer to extract feature from an input image
•    Preserve the relation between pixels by learning features using small square of input data
•    It takes two input image matrix and a filter



Lets take 6 * 6 matrix and add a 3 * 3 matrix

# Stride:
Stride is a component of convolution neural network, or neural network tuned for the compression of images and video data. Stride is a parameter of the neural network's filter that modifies the amount of movement over the image or video. For example, if a neural network's stride is set to 1, the filter will move one pixel, or unit, at a time.


Stride 2 means, it moves 2pixel at a time

# Padding :
•    Pad the picture with zeros (zero-padding) so that it fits
•    Drop the part of the image where the filter did not fit. This is called valid padding which keeps only valid part of the image.

# Non-Linearity(ReLu):
# ReLU – Rectified Linear Unit
The purpose of applying the rectifier function is to increase the non-linearity in our images. The reason we want to do that is that images are naturally non-linear. When you look at any image, you'll find it contains a lot of non-linear features (e.g. the transition between pixels, the borders, the colors, etc.)
Mainly rectifier function removes all black elements from image, only carry positive value (the gray and white colors)
Main difference of non-rectified and rectified is progression of colors.



# Pooling Layer : Reduce the number of parameter when image is too large.
•    Max Pooling
•    Sum pooling
•    Average pooling
Max pooling is a sample-based discretization process. The objective is to down-sample an input representation (image, hidden-layer output matrix, etc.), reducing its dimensionality and allowing for assumptions to be made about features contained in the sub-regions binned.



Let's take an example:
4 x 4 matrix represent the initial input and 2 x 2 matrix filter, that run over our input and steeping our input will be (2, 2) .For each region take the max value of that region and create a new matrix.

# Average Pooling:
Average Pooling is different from Max Pooling in the sense that it retains much information about the “less important” elements of a block, or pool. Whereas Max Pooling simply throws them away by picking the maximum value, Average Pooling blends them in. This can be useful in a variety of situations, where such information is useful.


# Sum pooling
Sum of all elements in the feature map call as sum pooling.


# Fully Connected Layer:
We flatten our matrix into vector and feed it in fully connected layer.
the feature map matrix will be converted as vector (x1, x2, x3, …). With the fully connected layers, we combined these features together to create a model. Finally, we have an activation function such as softmax or sigmoid to classify the outputs as burger, pizza, sandwich or drink etc.




# This is the very basic of what it is mean? What is it etc type question.
Some collected from wikipedia, blog and what i understand.... 🎉 😋


Comments