What are the options for reporting feature importance from a decision-tree based model?

For any decision-tree based method, feature importance can be measured in a couple of ways. The most common approach is based on how much an attribute contributes to the construction of each decision tree during the training process. The most important features are used in the top split points on a given decision tree. The numeric measure of purity/impurity depends on the loss function used, but the same general intuition holds for both regression and classification. An overall measure of importance for each feature is found from averaging their importance across the entire ensemble.

Values for feature importances can be extracted from a fitted GBM model in most software packages, such as using the feature_importances_ attribute in Python. Another way to interpret variable importance is a permutation-based approach, where after the model is fit, the values of each attribute are randomly shuffled, and the most influential features are those in which altering its values leads to the largest drop-off in model performance. The permutation method is a model agnostic approach for identifying important predictors and is an asset in terms of improving interpretation in black box machine learning.