Recurrent Neural Networks are machine learning algorithms where the network architecture is designed such that each node receives the output of the previous node as its input. Unlike the traditional neural network, where the inputs and outputs are dealt with independently, RNN connects the outputs of one step to the inputs of the upcoming step.
This architecture is useful when dealing with problems such as word prediction in a sentence where previous words are required to predict the next word. RNN achieves this output-to-input transition with the help of concepts such as hidden layer and hidden state. These neural networks are said to have a ‘memory’ where the information collected so far is remembered.
Video on Recurrent Neural Networks
Recurrent neural networks are similar to how our brain works. Besides using a feed-forward network to pass data from one node to another, they also retain some form of memory between the nodes, similar to short-term memory. This cognitive approach to processing information draws an interesting parallel to the ongoing discussion on human intelligence vs artificial intelligence.
The independent activations of each layer of a traditional neural network are transformed into dependent activation, where the output from one layer is passed as an input to the next layer.
Thus some form of memory is preserved throughout the layers, and the complexity of activating each layer is considerably reduced.
Some of the activation functions used in the RNN layers are listed below
This type of machine learning is well suited to deal with sequential data. As sequential data carry an extra significance to the order in which data is presented, they need to employ some form of memory to hold this information. RNN helps achieve this. No other algorithm is seen to have better results with sequential data when compared to RNNs.
As for training and assigning the proper weights, a back-propagation algorithm is used.
Training an RNN model can thus be a lot more different than a regular neural network. Here are the steps involved in training an RNN machine-learning model:
Once the final output is calculated, it is compared with the target output, and an error is generated. This error is back-propagated to the network to fine-tune the recurrent layer weights. But doing so could sometimes cause difficulties with the RNNs when the gradients get too large or too small. The back-propagation technique is generally called Backpropagation Through Time (BPTT).
Tip:
Train your Recurrent Neural Network models efficiently by using high quality data that can be provided by clickworker’s
Datasets for Machine Learning
The structure of RNN is quite different from other neural networks. While most traditional neural networks are designed to work feed forward, RNN uses a back-propagation through time for training the model. The RNN architecture thus differs from the other neural networks as they have a linear data direction.
The hidden state of the RNN holds some information about the previous state and thus maintains a form of memory within the neural network.
The basic difference between a regular feed-forward neural network and a recurrent neural network is the route of information flow. In a regular feed-forward neural network, information flows only in one way and does not pass through a node a second time. But with RNNs, the information may be passed through the same node more than once, and the information flow is not strictly a straight route.
A good way to demonstrate how an RNN works is to discuss it with relevance to an example application. If you feed a regular feed-forward a word, say, ‘peacock,’ the model would try to process each letter one by one, and by the time it reaches the fourth letter, it would have no memory of the previous letters. So it would have no idea what the next letter would be and cannot make any predictions. But in the case of RNN, the previous characters will be remembered by an internal memory mechanism, and thus the model can predict the next letter based on its training.
RNN finds great use in time series prediction problems as it can retain information through each network step. Since it can remember the previous inputs, RNN is said to have Long Short Term Memory
RNN can be used alongside CNN (Convolutional neural network) to optimize the results further. RNN helps to expand the effective pixel neighborhood further and thus improves the final results.
Recurrent neural networks were first conceptualized by David Rumelhart in 1986 whereas a similar network by the name of Hopfield networks was discovered earlier by John Hopfield in 1982. Since then, there have been several developments in the RNN architecture, the most significant being the LSTM (Long short-term memory) network developed in 1997 by Hochreiter and Schmidhuber.
LSTM is now a popular network used in applications such as speech recognition, handwriting recognition, machine translation, language modeling, and multilingual language processing. It is also used in Google Android for its text-to-speech synthesis application.
Training an RNN can be challenging given the many times of back propagation with errors to finalize the weights for the recurrent layers. It is a time-consuming process.
RNN also suffers from gradient exploding or gradient vanishing problems. As mentioned earlier, RNN uses back-propagation through time and calculates a gradient with each pass to adjust the nodes’ weights. But as you go through multiple states, the gradients between the states could significantly keep reducing and reach zero, or the converse gradients could become too large to handle during the back-propagation process. The exploding gradient issue can be handled by using a threshold value above which the gradients cannot get bigger. But this solution is often considered to cause quality degradation and is thus not preferred.
RNN also does not really consider future inputs to make the decisions and can thus suffer from inaccuracies in predictions.
Several variations of RNN have been developed, each focusing on a problem to be solved or trying to achieve some optimization. Two major RNNs that have been developed to deal with the challenges faced by RNN are the:
This type of RNN is designed to retain a certain amount of relevant information through the neural network with the help of function layers called gates. The memory blocks used in this type of neural network are called cells, where the information is stored. The gates handle the memory manipulation of retaining relevant information while discarding irrelevant information. There are three gates used in LSTM networks, namely Forget gate, Input Gate, and Output Gate
LSTM finds its use in applications such as:
LSTM deals with the vanishing gradient problem very effectively and is better at handling noise, continuous values, and distributed data values when compared to a regular RNN.
There have also been several variations of the basic LSTM architecture, with improvements to the cell designs and gate layers.
Even though LSTM offers a great improvement over regular RNNs, they also suffer from certain difficulties.
Video on LSTM
Gated Recurrent Unit Networks is another variation of the basic RNN. It also uses gates but does not have an internal cell state, as seen in the LSTM network.
The three gates used are:
GRU (Gated Recurrent Unit) is often used as an alternative to RNN. It is faster and less memory intensive. It also solves the vanishing gradient problem efficiently with the help of its update gate and reset gate mechanisms.
But it does not surpass the accuracy produced by LSTM networks.
In bidirectional RNNs, the nodes can gather inputs from both previous states and future data to calculate the current state.
Besides these popular network architectures, the RNN networks can be broadly classified into the following types based on the way the nodes are connected:
RNNs are quite different kinds of neural networks as they have a neural memory associated with them. They use the back-propagation method for model training and thus have challenges such as exploding gradient and vanishing gradient. But advanced RNNs such as the LSTM help solve these issues and are highly preferred in applications such as speech synthesis, sentence prediction, translations, music generation, and more. RNNs are integral to many AI applications, such as the chatbots in use today.
Despite the widespread use of RNNs, they still have their limitations when dealing with long-range dependencies where data relations are several steps apart.
RNN can be used alongside CNN (Convolutional neural network) to optimize the results further. RNN helps to expand the effective pixel neighborhood further and thus improves the final results.