What is Max Absolute Scaler? Compare it with MinMax Normalization? Why scaling to [-1, 1] might be better than [0, 1] scaling?
What is the problem with storing sparse two-dimensional training data (feature_vector x n_sample)? What is a space optimal way to store such a matrix?