Activation functions transform a linear combination of weights and biases into an output that has the ability to learn part of a complex function at each node of a network. The most basic activation function is the linear one, which is simply a weighted combination of the weights and biases fed into a given node. No matter how many layers or units present in the network, using a linear activation function at each node is nothing more than a standard linear model. However, much of the power of Neural Networks is derived from using nonlinear activation functions at each node.
ReLU is one such non-linear activation function. ReLU: The ReLU, or Rectified Linear Unit, is generally considered the best performing activation function for Artificial Neural Networks. It is very simply defined as:
meaning that if the input is less than 0, it outputs 0, and if it is larger than 0, it outputs a constant value. There are some alternate formulations of the ReLU that make slight modifications, such as the Leaky ReLU, which instead of outputting 0 for negative values, produces a very small negative number. The ReLU activation is often preferred due to it being robust against the vanishing/exploding gradient problem, which often occurs in the training of Neural Networks.