The website is in Maintenance mode. We are in the process of adding more features.
Any new bookmarks, comments, or user profiles made during this time will not be saved.

Machine Learning Resources

Suppose there are a large number of predictors ‘p’. What is the best approach to find out if any of the p predictors are helpful in predicting the response ‘y’? 

Bookmark this question

Suppose Null hypothesis (H0) is:

?1= ?2 = ……….?p= 0

And Alternate hypothesis (Ha) is: 

At least one of the ?i not equal to 0

Best approach: In order to find if any of the ‘p’ predictors are helpful in predicting ‘y’, use F-Statistic. (This approach works well when p<n. For p>n, other high dimensional methods will work)

Side Note: T-statistic might not be good in this scenario

If p is large, let’s say p = 200, and none of the variables (p1, ….pn) are predictive for response variable y (i.e. null hypothesis above is true), yet about 5% of the p-values associated with each of the variables comes below 0.05 by chance. Now, in reality, these variables with low p values do not have any predictive power. The lower p-value is just by chance. Therefore, if we are using individual t-statistic and p values to conclude that the variables have predictive power, we may be drawing the wrong conclusion. 

As F-statistic adjusts for the large number of variables, it doesn’t suffer from the above problem

Leave your Comments and Suggestions below:

Please Login or Sign Up to leave a comment

Partner Ad  

Find out all the ways
that you can

Explore Questions by Topics

Partner Ad

Learn Data Science with Travis - your AI-powered tutor |