### What problems would arise from using a regular linear regression to model a binary outcome?

Predicted values would not be constrained to the range of [0,1], resulting in predictions that are not valid probabilities.

- Machine Learning 101 (30)
- Statistics 101 (38)
- Supervised Learning (114)
- Regression (42)
- Classification (46)
- Logistic Regression (10)
- Support Vector Machine (10)
- Naive Bayes (4)
- Discriminant Analysis (5)
- Classification Evaluations (9)

- Classification & Regression Trees (CART) (23)

- Unsupervised Learning (55)
- Clustering (28)
- Distance Measures (9)
- Dimensionality Reduction (9)

- Deep Learning (23)
- Data Preparation (34)
- General (5)
- Standardization (6)
- Missing data (7)
- Textual Data (16)

Predicted values would not be constrained to the range of [0,1], resulting in predictions that are not valid probabilities.

Non-Negative Least Squares (NNLS) adds a constraint to the least squares equation that all coefficient estimates must be greater than or equal to zero.

If any of the assumptions of linear regression are violated, the model may not be reliable to use for either inference or prediction.

High influence points are observations that most influence, hence the name, the shape of the regression equation.

A high leverage point specifically refers to an observation in which the value of a predictor is considered to be extreme in the feature space.

Outlier is a general term for an observation that is far away from most other data points.

Outlier is a general term for an observation that is far away from most other data points.

ANOVA is a special case of regression when all of the independent variables are categorical.

In matrix form, the vector of coefficient estimates is derived using the formula: (X’X)-1X’Y, where X is the design matrix where the rows correspond to the observations and columns to the features, and Y is the vector of target values.

While the original variable will likely be correlated with a higher order term constructed from the same variable, multicollinearity is not an overt cause of concern

Find out all the ways

that you can

- Machine Learning 101 (30)
- Statistics 101 (38)
- Supervised Learning (114)
- Regression (42)
- Classification (46)
- Logistic Regression (10)
- Support Vector Machine (10)
- Naive Bayes (4)
- Discriminant Analysis (5)
- Classification Evaluations (9)

- Classification & Regression Trees (CART) (23)

- Unsupervised Learning (55)
- Clustering (28)
- Distance Measures (9)
- Dimensionality Reduction (9)

- Deep Learning (23)
- Data Preparation (34)
- General (5)
- Standardization (6)
- Missing data (7)
- Textual Data (16)