What is the problem with storing sparse two-dimensional training data (feature_vector x n_sample)? What is a space optimal way to store such a matrix?