The website is in Maintenance mode. We are in the process of adding more features.
Any new bookmarks, comments, or user profiles made during this time will not be saved.

Machine Learning Resources

What are the key hyperparameters for a Random Forest model?

Bookmark this question

There are three key parameters that can be tweaked: (a) Number of trees, (b) Number of Features and, (c) Sub sample size

  • Number of Trees (or iterations): The number of trees controls the maximum number of decision trees that can be built in the ensemble. Increasing this number should continue to reduce prediction variance, but it also increases training time, and there is usually an optimal stopping point in which the validation error begins to level off. An early stopping criteria can be implemented to terminate training once the holdout error decreases within an epsilon threshold on a certain number of iterations. 
  • Number of features: This controls the maximum number of features that can be considered in the creation of a single decision tree. If there are p features, choosing a subset m, where m<p, is where the power of random forest lies in reducing prediction variance. Common choices include log(p) or sqrt(p). 
  • Subsample Size: The subsample size limits the proportion of total observations that can be subsampled in order to create each decision tree in the ensemble. As the value for this hyperparameter approaches the original size of the dataset, the variance between decision trees in the ensemble is less, since each tree is being created on a very similar dataset. A value that is too small is more likely to underfit an individual decision tree, especially if the number of features is large. 

Leave your Comments and Suggestions below:

Please Login or Sign Up to leave a comment

Partner Ad  

Find out all the ways
that you can

Explore Questions by Topics

Partner Ad

Learn Data Science with Travis - your AI-powered tutor |