Key hyper-parameters for a GBM are: Number of trees, learning rate, and maximum depth
- Number of Trees (iterations): Like in Random Forest, this is the maximum number of iterations to train the ensemble and retains many of the same properties as the corresponding Random Forest hyperparameter. In a similar manner to other iterative algorithms, the training and validation deviance or accuracy scores can be plotted at each iteration, and there is often a point at which the validation error begins to rise after a certain number of iterations. It is also possible to set it to a large value but introduce an early stopping criteria on which to terminate training when the out of sample deviance fails to improve by a specified threshold after a certain number of iterations.
- Learning Rate (shrinkage): The learning rate controls the magnitude at which the algorithm adjusts the weight given to each observation based on how well it predicted on them in previous iterations. Large values accelerate the learning process but can quickly result in overfitting if not constrained or built out to a large number of iterations. There is usually an interaction between these two hyperparameters. If the number of trees is large, the learning rate is usually set to a lower value to allow the algorithm to more gradually compensate for its mistakes without learning so quickly that it models noise in the data.
- Maximum Depth: The depth controls the complexity of individual decision trees in the ensemble. Larger values allow for more complex trees to be created but also increase the risk of overfitting. Smaller values are usually preferred, as the idea of boosting is to transform simple decision trees into more powerful learners by the conclusion of the ensemble. All of the hyperparameters can be tuned simultaneously using a grid search approach with cross-validation.