The website is in Maintenance mode. We are in the process of adding more features.
Any new bookmarks, comments, or user profiles made during this time will not be saved.

Machine Learning Resources

What are the most common transformations when the target variable is not normally distributed?

Bookmark this question

The most common transformation in the case of a skewed response variable is to take its logarithm. In the case of a right skewed variable, such as income, this often reduces the spread so that more points are clustered towards the median of the transformed distribution, which is to be expected for a Gaussian distribution. Another transformation technique is to use a power-based approach, such as the Box-Cox transformation. This method searches a range of exponents to find the value that most closely transforms the variable into a Gaussian distribution. Upon finding the power ƛ that makes the distribution most normal, the transformation applied is of the form: 

When taking any transformation, it is important to remember that inference must be updated to reflect the new units being modeled, and in the case of complex power transformations, interpretation becomes less clear. However, a special case is the log-log model, in which both the response and predictor variable are transformed via a logarithm. The model then has a nice interpretation of the percentage change in Y based on a 1-percent change in X, which is referred to as elasticity. 

Leave your Comments and Suggestions below:

Please Login or Sign Up to leave a comment

Partner Ad  

Find out all the ways
that you can

Explore Questions by Topics

Partner Ad

Learn Data Science with Travis - your AI-powered tutor |