What are Convolutional Neural Networks?

The convolutional neural network, called CNN, is a deep learning algorithm for image recognition tasks. It is a machine learning algorithm using neural networks with multiple-node architecture. CNN is considered to be the best neural network algorithm for image recognition and dealing with pixel data processing. And thus, they have been used in various computer vision (CV) applications such as facial recognition, self-driving cars, and more.

What is a neural network?

A neural network is a type of machine learning algorithm where the network architecture is made of several node layers. These node layers consist of an input layer, an output layer, and one or more hidden layers. Each node has its own weight and threshold and is connected to another node in the network. Data or output from each node is passed to the next node only if it does not exceed the threshold value.
When it satisfies the threshold limitation, the next node is activated and sends data to the next layer of the network. This way, neural networks pass data across each layer depending on the output from the previous layer.

Video on Neural Networks

ConvNets / CNNs

Convolutional neural networks, also called CNNs or ConvNets, belong to this type of neural network machine learning algorithms where feed-forward networking is used. Before the conception of CNNs, the feature extraction techniques used for computer vision tasks were manual and were thus time-consuming. CNNs allow for a more scalable method to feature extraction and several other image recognition tasks, such as image classification and object recognition.
CNN uses mathematical principles such as linear algebra and matrix multiplication to recognize patterns in an image. But these tasks done via CNNs can be computationally expensive as they demand high processing power. Thus to work with CNNs, machine learning experts use Graphical Processing Units (GPUs).

How does CNN work?

Video on Convolutional Neural Networks

As mentioned earlier, neural networks are composed of various layers. In the case of CNNs, the layers can be categorized into three major types.

Convolutional layer

The major computational tasks of the CNN occur in the convolutional layer. It is often called the core building block of a convolutional network.
The basic components of this layer are input data, a filter (or kernel), a feature detector, and a feature map. Image is passed through various receptive fields over multiple iterations and checked to see if a particular feature is present. This process of detecting features in an image is called convolution. Each iteration of applying the filter involves calculating a dot product between the input pixels and the filer.
The output of this operation is a feature map, a series of dots. Feature maps are also called convolved features. Thus the final output of the convolutional layer is a numerical representation of the image, which is used to identify patterns from the image.

Pooling layer

The pooling layer is also called downsampling, as it carries out dimensionality reduction. The feature parameters in the input are reduced to only the necessary parameters to reduce complexity and improve the network’s performance. It also helps avoid the problem of overfitting. There are two types of pooling, namely:

  • Max pooling – In max pooling, the filer selects the maximum value to be passed to the output
  • Average pooling – In average pooling, the filter sends an average value from the receptive field to the output array.

Fully connected layer

A fully convoluted (FC) layer is the final layer where the final task of image classification is carried out. Each node in this layer is directly connected to a node in the previous layer. The output of FC layers is usually a probability from 0 to 1 that is derived using a softmax activation function.
The input first passes through the convolutional layer, followed by one or more pooling layers. The final layer is the fully connected layer which provides the output that identifies the object.
As the image data goes through each of these layers, the image pattern or the elements identified progressively get larger until the last layer, where the final image is identified. As such, the data processing gets increasingly complex with each subsequent layer.
Each layer tries to identify a particular feature of the input image with the help of a corresponding filter. The filter or kernel can start as a simple feature and grow increasingly complex with each subsequent layer. Thus the output of each layer is a convolved image that gets increasingly recognized as an object and is passed to the FC layer, where the final image classification takes place.

History and breakthroughs

While research for image classification has been going on since the 1950s and 1960s, CNNs were first conceptualized in the 1980s. It was researched by Kunihiko Fukushima and Yann LeCun in their work in “Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern RecognitionUnaffected by Shift in Position” (published in 1980) and in zip code recognition system published in 1989, respectively.
The neocognitron invented by Kunihiko Fukushima was based on the discoveries previously made by David Hubel and Torsten Wiesel. Hubel and Wiesel studied the visual system of cats and monkeys and presented their theory that the neurons in the visual cortex of the brains were responsible for visual processing. They were awarded a Nobel Prize in physiology for their findings. These findings inspired Kunihiko to develop his neocognitron which is designed to emulate the visual cortex neurons. Neocognitron was able to recognize basic characters.
The “LeNet5” demonstrated by Yann Lecun in 1994 was a breakthrough in the field of CNNs. It is often regarded as the classic CNN architecture.
There have been several variations of the CNN architecture since then, and many advanced techniques have also been incorporated into CNN. Some of the popular CNN architectures are:

The implementation of fast-processing graphics processing units (GPUs) in the 2000s further accelerated the growth of CNNs. In 2004, K.S. Oh and K. Jung implemented 20 times faster CNN with the help of GPU compared to CNNs working with CPUs.
The AlexNet developed by researchers from the University of Toronto was another breakthrough with an accuracy of 85% and won the 2012 ImageNet computer vision contest 2012.
Dan Ciresan, in 2019, further accelerated the CNN performance with the help of backpropagation methods.
Besides GPUs, the less popular Intel Xeon Phi coprocessors are also sometimes used for training CNNs using a parallelization method. Controlled Hogwild with Arbitrary Order of Synchronization (CHAOS) is one such method using the Intel Xeon Phi.

Tip:

Train your Convolutional Neural Network models efficiently by using high quality data that can be provided by clickworker’s

Datasets for Machine Learning

Applications and benefits

CNNs find their biggest use in the automated image processing application field, computer vision. Computer vision is an AI-based technology that uses machine learning methods to identify images, videos, and any form of visual input. It is widely used in many facets of everyday life, including marketing, healthcare, retail, automotive, security and surveillance, and more.
Besides image recognition, CNN can also classify audio and signal data. It can be used for time series as well as image data. But the popular use of CNN remains with image recognition tasks, given its high-efficiency level over the other methods.
The overall design of a CNN can be compared to how a human brain works. Like neurons in our brain, CNN has several nodes that process data. Just as the frontal lobe of our brain deals with visual input, CNN covers the entire visual field and avoids the problems the previous neural networks faced.
Before using CNNs, images had to be reduced in resolution and fed as pieces into a neural network. CNN avoids this issue and allows it to process the image. Even audio and speech applications perform better with CNN than traditional neural networks.
To summarize, here are some of the advantages that CNNs provide over regular neural networks

  • CNN is easy to scale and can process larger and more complex images than traditional neural networks.
  • It avoids the problem of overfitting
  • It uses parameter sharing where each node connects to another node within a layer and has associated fixed weights/ This system allows for efficient computation compared to traditional neural networks.
  • CNN results are more accurate and best suited for Computer Vision systems.
  • It helps avoid manual feature extraction
  • CNNs can be built on top of existing networks and retrained as well. Thus it allows for more scope without increasing computational complexity or the costs involved.
  • CNN models are easy to deploy and can be run on even smartphones.

Some of the common applications of CNNs and CVs include:

  • Healthcare: Used in diagnostics and anomaly detection in patient data. It is also applied in specific healthcare fields, such as radiology and cancer research
  • Automotive: Used in self-driving cars to process visual inputs
  • Social media: Facial recognition of photos and tagging can be performed with the help of CNNs
  • Retail: Allows for visual search in eCommerce platforms
  • Facial recognition: Law enforcement can make use of facial recognition models such as GANs that are developed based on CNNs
  • Audio processing: Visual assistants can be developed and enhanced with the help of the audio processing capabilities provided by CNNs
  • Drug discovery: AtomNet is a CNN-based deep learning model that helps identify interactions between the molecules in a drug

Challenges and concerns

Even though CNNs have proven to be one of the most powerful neural network algorithms available today, they also have their limitations. It is an image recognition AI model but stops being able to recognize patterns and cannot understand the contents of the images. It is yet nowhere near human cognition, which can derive multiple interpretations of an image.
For instance, an image of a toddler with their father can be interpreted as an image of a middle-aged man and a toddler. But humans can process more, figure out the context and the action, and even perceive the relationship between the two subjects in the image.
CNN is yet to achieve such a level of intelligence, which can pose problems when applied to practical applications. CNN, thus, cannot be used for content moderation on images in social media. For instance, a CNN model once identified a statue as inappropriate and flagged it for nudity on Facebook.
The future of CNN could be that the algorithm is further refined to emulate human intelligence more accurately.
Some of the concerns that a CNN developer will have to take for certain applications are summarized below

  • Self-driving cars
    Object detection for self-driving cars has to be more complex as they should be able to perform localization, obstacle avoidance, and path planning. They should be able to detect incoming collisions and alert the system to take the necessary action. This is a complex problem as CNN must both classify the object and return the position of the bounding box of the object. So the network designer must avoid false alarms and need a huge volume of labeled training data. One source of such labeled data comes from the Google Captcha system, where users are asked to categorize objects like traffic lights, cars, hydrants, and so on.
  • Text classification
    Text classification with CNN requires different preprocessing steps compared to image classification. It has also been found that sentences longer than the input matrix width cannot be processed properly.

Conclusion

CNN has proven to be one of the best neural network algorithms to identify images and perform subsequent AI-based operations. AI-based image classification has come a long way, from recognizing just a few characters to identifying many objects quietly enough for autonomous driving. But it has not yet achieved the level of intelligence that can warrant a fully automated intelligent system just as similar to a human brain. While CNN has advanced tech towards such a goal a step closer, the future of CNN could be that the algorithm is further refined to emulate human intelligence more accurately.

FAQs on Convolutional Neural Network

What is a neural network?

Neural networks are computer algorithms designed to recognize patterns. They are useful tools to help us classify and cluster data. After they have been correctly trained, they can quickly group data that has not been categorized or labeled, helping save time.

Where are Convolutional Neural Networks used?

CNNs find their biggest use in the automated image processing application field, computer vision. It is widely used in many facets of everyday life, including marketing, healthcare, retail, automotive, security and surveillance, and more.Besides image recognition, CNN can also classify audio and signal data. It can be used for time series as well as image data. But the popular use of CNN remains with image recognition tasks, given its high-efficiency level over the other methods.

What are CovNets?

Convolutional neural networks, also called CNNs or ConvNets, belong to this type of neural network machine learning algorithms where feed-forward networking is used.

How many layers are present in CNNs?

There are 3 layers- Convolutional layer, Pooling layer and Fully connected layer