The “gradient” in Gradient Boosting Machine is a reference to the concept of gradient descent. Gradient descent in general works by iteratively updating model parameters, such as coefficients of a regression equation, until the cost function function that minimizes the specified error satisfies a convergence criteria. In the case of GBM, gradient descent uses the residuals, or difference in predicted and actual values on each iteration, to create each subsequent decision tree so that the algorithm moves in the direction of minimizing the loss criteria.