What is Max Absolute Scaler? Compare it with MinMax Normalization? Why scaling to [-1, 1] might be better than [0, 1] scaling?
What is the problem with storing sparse two-dimensional training data (feature_vector x n_sample)? What is a space optimal way to store such a matrix?
Briefly describe the architecture of a Recurrent Neural Network (RNN) and how it addresses the shortcomings of traditional Neural Networks.
What is an activation function, and what are some of the most common choices for activation functions?
Explain the basic architecture and training process of a Neural Network model? Discuss briefly the key hyper-parameters
What is the difference between probability and non-probability sampling, and what are some example methodologies for each?
What is the difference between a Probability Mass Function (PMF), Probability Density Function (PDF), and Cumulative Distribution Function (CDF)?