Related questions:
– Briefly describe the architecture of a Recurrent Neural Network (RNN)
– What are Language Models? Discuss the evolution of Language Models over time

A Recurrent Neural Network (RNN) is a type of neural network architecture specifically designed for processing sequential data, such as time series or natural language. The advantages and disadvantages of a RNN model is listed as below:
Advantages of RNN:
- Effective in Sequence Modeling tasks: RNNs are specifically designed for processing sequential data, making them highly suitable for tasks such as natural language processing, time series analysis, and speech recognition.
- Memory that remembers Contextual Information: RNNs can remember previous inputs in their hidden states, that act as memory cells. This enables RNNs to capture and utilize context from previous time steps, to model dependencies and relationships within sequences.
- Can process sequences of variable length as opposed to traditional neural networks that require fixed-sized input vectors. This makes them particularly useful in applications where input data is not fixed in size.
- Parameter Sharing: The same weights (W) are used across time steps, making the model size independent of the sequence length. This is efficient in terms of memory and computational requirements.
Disadvantages of RNN:
- Vanishing gradient problem: One of the significant drawbacks of basic RNNs is the vanishing gradient problem. It occurs when gradients during training become extremely small as they are backpropagated through time. This limits the network’s ability to capture long-range dependencies
- Exploding gradient problem: RNNs can also suffer from the exploding gradient problem, where gradients become exceptionally large during training, causing numerical instability. Exploding gradient easier to detect and manage
- Limited Memory: Traditional RNNs have a limited memory capacity, and they struggle to carry information across many time steps. This can be problematic when dealing with long sequences where network may “forget” important information from earlier time steps.
- Biased Toward Recent Data. Lack of Global Context: Following from the above point, in an RNN, as data progresses over time steps, the influence of past data diminishes. This means the network can become biased toward more recent data in the sequence and struggle to capture global context.
- Difficulty with Parallelization: RNNs process data sequentially, which makes parallelization challenging, leading to slower training. As a result, RNNs are not able to take complete advantages of modern hardware architectures such as GPUs designed for parallel processing.
Video Explanation
- The playlist consists of lecture videos by Prof. Chris Manning from the Stanford NLP course (graduate level). The first video introduces the RNN topic and the second video goes deeper into the topic of RNN – training, advantages and limitations of RNNs (Runtime: ~49 mins for RNN)