The website is in Maintenance mode. We are in the process of adding more features.
Any new bookmarks, comments, or user profiles made during this time will not be saved.

## AIML.com

###### Machine Learning Resources

Explore Questions by Topics

# What is a Decision Tree? Explain the concept and working of a Decision tree model

A decision tree is a type of supervised machine learning algorithm that is used for both classification and regression tasks.  It is a graphical representation of a set of decisions and their possible consequences. Decision Tree, as the name suggests, has a tree-like structure that is composed of nodes, branches, and leaves. The nodes represent the input features, the branches represent the decisions based on those features, and the leaves represent the outcomes or predictions. Let’s explain this using an example.

### Decision tree explained using examples

Classification example:

Regression example:

Decision trees can be used in an analogous way to create a prediction mechanism in the case of real world supervised learning problems. Considering a data set with a target variable and several candidate predictors, a decision tree can be constructed by first identifying the predictor and corresponding splitting criteria that is most predictive of the target. A decision tree is drawn upside down with its root at the top. This first decision criteria forms the root node of the tree, which in the classification example, is bank account link and in the regression example, years of experience. It then continues by creating additional splits until it allocates all observations into leaf nodes.

### How is a Decision Tree model built?

The decision tree model is built by recursively splitting the dataset into subsets based on the feature that provides the most information gain (for classification tasks) or the highest reduction in variance (for regression tasks). The information gain and variance reduction are calculated based on the impurity of the subsets where impurity measures how mixed the classes or values are in the subset. The most common impurity measures used for classification tasks are Gini impurity and entropy, while mean squared error (MSE) is used for regression tasks.

Once the dataset is split, the process is repeated for each subset until a stopping criterion is reached. The stopping criterion could be a maximum depth of the tree, a minimum number of samples required to split a node, or a minimum impurity decrease required to split a node. The depth of a decision tree refers to the length of the longest path from the root node to a leaf node.

### Pruning to improve tree performance

Pruning is a technique used to enhance the performance of the tree by reducing its complexity. By eliminating branches that rely on less important features, the complexity of the tree is reduced. This reduction in complexity helps to combat overfitting and ultimately improves the tree’s ability to make accurate predictions.

There are primarily two ways to do pruning:

(a) Pre-pruning (Early Stopping): Stops the tree before it has completed classifying the training set based on a certain criteria as discussed in the section above

(b) Post-pruning (Cost-complexity pruning): Involves building the tree first and then removing branches or nodes such that the overall error decreases

### How to predict using a Decision Tree model?

To predict the target variable for a new observation, the decision tree traverses the tree based on the values of the features for the new observation, starting from the root node and following the path that satisfies the conditions until a leaf node is reached. The prediction is then based on the majority class (classification) or average value of the samples (regression) in that leaf node.

The decision tree model has several advantages, including its interpretability, ability to handle both categorical and numerical data, and ability to capture nonlinear relationships. However, it is prone to overfitting if the tree is too deep or the stopping criterion is not chosen carefully. To overcome this, various ensemble methods such as random forest and gradient boosting are often used with decision trees.