Hinge loss adds an increased penalty to misclassifications that are off by a large amount, since the cost function increases linearly as the decision function output moves further away from the actual label. This property is one of the reasons SVM performs very well on many data sets, as it enables hyperplanes to find margins that result in the highest accuracy possible. As can be seen in the graphs above, hinge loss is non-differentiable, which means that the optimization problem is no longer convex. Logistic, or cross-entropy loss, does not suffer from such a problem and also allows for the computation of predicted probabilities rather than just class labels, which is why it is suitable for logistic regression. In practice, SVM is usually preferred to logistic regression if the decision boundary is non-linear or many variable transformations would be required, but if it is a simpler problem and direct probability estimates are desired, logistic regression might be the preferred choice.

For information regarding less common loss functions used in classification, a reference is provided: (https://en.wikipedia.org/wiki/Loss_functions_for_classification)