Related questions:
– What does ‘sequence data’ mean? Discuss the different types
– What is Long-Short Term Memory (LSTM)?
– Compare the different Sequence models (RNN, LSTM, GRU, and Transformers)
Overview of RNN
A Recurrent Neural Network (RNN) is a type of neural network architecture specifically designed for processing sequential data such as time series data, natural language text etc. Because subsequent values are often highly correlated with previous values of the sequence, traditional neural networks are poor choices for making predictions on this type of data.
In a Recurrent Neural Network, at each step of the sequence, a neuron receives as input both information from the current step as well as the output from the activation of the previous step. Thus, the network is able to keep prior values in its memory and use them to help predict later values of a sequence. While the outputs are generated through forward propagation and the weight parameters are updated through back propagation similar to how it is done with regular Neural Networks, RNNs share parameters across layers rather than initializing different weights corresponding to each node of the network.

Source: Stanford NLP course
RNN Architecture
The fundamental architecture of RNN consists of the following components:
- Input Layer: The input layer is responsible for receiving sequential data. In natural language processing tasks, for example, each element of the sequence could correspond to a word or a character
- Hidden Layer: The hidden layer is the core component of an RNN. It maintains an internal state or hidden state that evolves as the network processes each element of the sequence. The hidden state captures information from previous time steps and, therefore, is responsible for preserving context and modeling dependencies within the sequence
- Recurrent Connection: This is a crucial feature of RNNs. It is a set of weights and connections that loop back to the hidden layer from itself at the previous time step. This loop allows the RNN to maintain and update its hidden state as it processes each element in the sequence. The recurrent connection enables the network to remember and utilize information from the past.
- Output Layer: The output layer is responsible for producing predictions or outputs based on the information in the hidden state. The number of neurons in this layer depends on the specific task. For instance, in a language model, it might be a softmax layer to predict the next word in a sequence.
Conclusion:
RNNs are characterized by their ability to process sequential data by maintaining a hidden state that evolves over time. However, basic RNNs have limitations, such as difficulty in capturing long-range dependencies and the vanishing gradient problem. To address these issues, more advanced RNN variants, such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), have been developed, offering improved memory and gradient flow capabilities for more effective sequence modeling.
Video Explanation
- The playlist consists of lecture videos by Prof. Chris Manning from the Stanford NLP course (graduate level). In the first video, Prof. Manning introduces Neural Dependency Parsing and Language models, which serves as a great build up for the introduction of Recurrent Neural Network (RNN) (Jump to 1:12 hrs to directly watch the RNN part). (Runtime: ~7 mins for RNN)
- The second video goes into training of RNNs, advantages and limitations of RNNs (Runtime: ~49 mins for RNN)