Convolutional Neural Networks

Convolution Nerual Networks (CNN) is a algorithm in Deep Learning used for Computer Vision and Voice Recognition problem.

The role of the CNN is to do the features selection for image to get eaiser to process without loss too many features.

There are several advantage for CNN:

Here is a RGB(red, green, blue) image

Convolution Layer - The Kernel

The gif above shows the process of convoluting which make a $5\times5\times1$ matrix into a $3\times3\times1$ matrix

The dimension for the image is defined as :

$dim = No.rows \times No.columns \times No.channels(RGB)$

The kernel we used is a $3\times3\times1$ matrix

$kernel = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix}$

Hence the convolved image has the result given by the matrix multiplication of the portion of the image and the kernel matrix.

Pooling Layer

There are two types Pooling method: Max Pooling and Average Pooling. Similar to the Convolution layer.

Max Pooling also performs as a Noise Suppressant. It discards the noisy activations altogether and also performs de-noising along with dimensionality reduction. On the other hand, Average Pooling simply performs dimensionality reduction as a noise suppressing mechanism. Hence, we can say that Max Pooling performs a lot better than Average Pooling.

Adding a Fully connected layer is a easy way to learn the non-linear combination.

Activation Function

ReLU is $f(x) =max(0,x)$ . The reason why we used ReLU is that the $-1$ is the most irrelevant feature and 1 means the most relevant feature so ReLU can reduce make our Convolution matrix more sparse without lose to much information.