Why are coefficients estimated through Maximum Likelihood (MLE) instead of Least Squares?
The least squares cost function is non-convex in a binary classification setting, meaning the algorithm could get stuck in a local rather than global minimum and thus fail to optimize the loss.