Convolution neural networks

Convolutional neural network has proven to be effective in classification and object recognition. This type of network are part of neural networks. ConvNet are successful in identifying objects and traffic signs. Significant improvement has been achieved in face recognition. ConvNets are one of the most used tools in image recognition and object classification. A thorough understanding of convolutional neural network is required to provide ability for optimization and build future improved neural networks.

The first ConvNet presented in 1990 by Yann LeCun was named LeNet. This architecture provides basic structure of convolution neural network allowing recognition of characters. This has been apply in tasks such zip code recognition, digits etc. The basic convNet architecture includes the following operations:

  • Convolution – This operation is based on mathematical convolutional operand. The main purpose in the convent architecture is to extract features from the input. The convolutional operation learn features of the input image by sliding a kernels (feature detector) which produce a new matrix called Feature Map where the spatial relationship between pixels is preserved. By choosing different filter the convolutional process discovers different features for example Edge detection, sharpen, blur etc.
  • Non linearity – After every convolutional operation a non-linearity is introduced. The original architecture has used Sigmoid or Tanh function. Lately the operation used for non-linearity is ReLU – Rectified Linear Unit which helps to replace all negative pixels in the feature map by zero.
  • Sub sampling (pooling) – The purpose of subsampling is to reduce the dimensionality of the feature maps and in the same time to preserve most of the important information. There are many types of pooling such as Max, Average etc. Max pooling has been used in the original architecture and it has shown good results.
  • Fully connected layer (classifier) – This is a multi-layer perceptron with a softmax activation function as per original architecture. The output from the convolution and pooling layers of the neural network provides many high-level features of the input image. By passing them to the fully connected layers these features their combinations are classified into various classes according to the training data.

These four operations are the building blocks of any convolution neural network. The usual ConvNet architecture contains two Convolution, pooling and fully connected layers. The first Conv layer uses 3 filters and the second 6 filters which provides 6 feature maps extraction to be passed to the fully connected layers.

The basic convolution neural network has shown tremendous success in classifying characters and objects from an input image. By optimizing feature extraction and combining some of the technics specific only for specific cases could provide many benefits and better the outcome from the neural network.