Suppose Null hypothesis (H0) is:
?1= ?2 = ……….?p= 0
And Alternate hypothesis (Ha) is:
At least one of the ?i not equal to 0
Best approach: In order to find if any of the ‘p’ predictors are helpful in predicting ‘y’, use F-Statistic. (This approach works well when p<n. For p>n, other high dimensional methods will work)
Side Note: T-statistic might not be good in this scenario
If p is large, let’s say p = 200, and none of the variables (p1, ….pn) are predictive for response variable y (i.e. null hypothesis above is true), yet about 5% of the p-values associated with each of the variables comes below 0.05 by chance. Now, in reality, these variables with low p values do not have any predictive power. The lower p-value is just by chance. Therefore, if we are using individual t-statistic and p values to conclude that the variables have predictive power, we may be drawing the wrong conclusion.
As F-statistic adjusts for the large number of variables, it doesn’t suffer from the above problem