Machine Learning Resources

What do you mean by vanishing gradient and why is that a problem?

Bookmark this question

Vanishing gradient refers to a problem that can occur during the training of deep neural networks, when the gradients of the loss function with respect to the model’s parameters become extremely small (close to zero) as they are backpropagated through the layers of the network during training. This leads to impairment in learning in deep neural networks (DNN). When the gradients become too small, it means that the model’s weights are not being updated effectively. As a result, the network’s training may stagnate or become extremely slow, making it difficult for the network to learn complex patterns in the data.

vanishing saturating gradient sigmoid
Title: Illustrating saturation region and vanishing gradient problem (derivative close to 0) for a Sigmoid activation function
Source: “Vanishing and Exploding Gradients in Neural Network Models” article by Katherine (Yi) Li

Activation functions like sigmoid and hyperbolic tangent (tanH) have saturated regions and are more prone to vanishing gradient problems in DNN training. The use of activation functions like ReLU and its variants can alleviate the vanishing gradient problem since they do not saturate for positive inputs. The derivative of ReLu is either 0 or 1. During backpropagation, when gradients are multiplied several times to obtain the gradients of the lower layers, ReLU derivatives has a nice property of being 0 or 1, instead of vanishing, leading to a more effective and faster training.

Title: Comparing gradients of Sigmoid, tanH and ReLU
Source: “Advantages of ReLU over Sigmoid” thread, Stackexchange

Other techniques used to alleviate the vanishing gradient issue are: (a) Use of smart initialization techniques such as Xavier initialization, and He initialization, (b) Batch Normalization, and (c) Skip connections and residual connections

Leave your Comments and Suggestions below:

Please Login or Sign Up to leave a comment

Partner Ad  

Find out all the ways
that you can