There are three key parameters that can be tweaked: (a) Number of trees, (b) Number of Features and, (c) Sub sample size
- Number of Trees (or iterations): The number of trees controls the maximum number of decision trees that can be built in the ensemble. Increasing this number should continue to reduce prediction variance, but it also increases training time, and there is usually an optimal stopping point in which the validation error begins to level off. An early stopping criteria can be implemented to terminate training once the holdout error decreases within an epsilon threshold on a certain number of iterations.
- Number of features: This controls the maximum number of features that can be considered in the creation of a single decision tree. If there are p features, choosing a subset m, where m<p, is where the power of random forest lies in reducing prediction variance. Common choices include log(p) or sqrt(p).
- Subsample Size: The subsample size limits the proportion of total observations that can be subsampled in order to create each decision tree in the ensemble. As the value for this hyperparameter approaches the original size of the dataset, the variance between decision trees in the ensemble is less, since each tree is being created on a very similar dataset. A value that is too small is more likely to underfit an individual decision tree, especially if the number of features is large.