In some classification contexts, it might be more of interest to obtain predicted probabilities of class membership rather than simply the labels themselves.
Logistic regression uses a logistic loss function, where the cost for a single observation is represented by:
Pros:Through a simple transformation, much of the interpretability and intuition of linear regression is preserved
The equivalent of the overall F-Test in logistic regression is Deviance.
The least squares cost function is non-convex in a binary classification setting, meaning the algorithm could get stuck in a local rather than global minimum and thus fail to optimize the loss.
Each ? is interpreted as the change in log odds of a success for a 1-unit increase in the corresponding predictor, holding all other variables constant.
Logistic regression relates the log odds to the weighted combination of predictors and coefficients
The odds, or the ratio of the probability of success to that of failure, can only take on positive values.
Predicted values would not be constrained to the range of [0,1], resulting in predictions that are not valid probabilities.
Logistic regression requires the following assumption: independence of observations