Related Questions:
– What is the difference between Decision Trees, Bagging and Random Forest?
– What are the advantages and disadvantages of using a Decision Tree?
– How does a decision tree create splits from continuous features?
– How does pruning a tree work?
A decision tree is a type of supervised machine learning algorithm that is used for both classification and regression tasks. It is a graphical representation of a set of decisions and their possible consequences. Decision Tree, as the name suggests, has a tree-like structure that is composed of nodes, branches, and leaves. The nodes represent the input features, the branches represent the decisions based on those features, and the leaves represent the outcomes or predictions. Let’s explain this using an example.
Decision tree explained using examples
Classification example:

Regression example:

Decision trees can be used in an analogous way to create a prediction mechanism in the case of real world supervised learning problems. Considering a data set with a target variable and several candidate predictors, a decision tree can be constructed by first identifying the predictor and corresponding splitting criteria that is most predictive of the target. A decision tree is drawn upside down with its root at the top. This first decision criteria forms the root node of the tree, which in the classification example, is bank account link and in the regression example, years of experience. It then continues by creating additional splits until it allocates all observations into leaf nodes.
How is a Decision Tree model built?
The decision tree model is built by recursively splitting the dataset into subsets based on the feature that provides the most information gain (for classification tasks) or the highest reduction in variance (for regression tasks). The information gain and variance reduction are calculated based on the impurity of the subsets. The impurity is a measure of how mixed the classes or values are in the subset.
The most common impurity measures used for classification tasks are Gini impurity and entropy, while mean squared error (MSE) is used for regression tasks.
Once the dataset is split, the process is repeated for each subset until a stopping criterion is reached. The stopping criterion could be a maximum depth of the tree, a minimum number of samples required to split a node, or a minimum impurity decrease required to split a node.
Pruning to improve tree performance
Pruning is a technique that can enhance the performance of a tree. By eliminating branches that rely on less important features, the complexity of the tree is reduced. This reduction in complexity helps to combat overfitting and ultimately improves the tree’s ability to make accurate predictions.
How to predict using a Decision Tree model?
To predict the target variable for a new observation, the decision tree traverses the tree based on the values of the features for the new observation, starting from the root node and following the path that satisfies the conditions until a leaf node is reached. The prediction is then based on the majority class (classification) or average value of the samples (regression) in that leaf node.
Advantages and Disadvantages of using a Decision Tree model
The decision tree model has several advantages, including its interpretability, ability to handle both categorical and numerical data, and ability to capture nonlinear relationships. However, it is prone to overfitting if the tree is too deep or the stopping criterion is not chosen carefully. To overcome this, various ensemble methods such as random forest and gradient boosting are often used with decision trees.