What are different ways to impute missing values for a feature? 

  • Mean Imputation: Replace the missing values with the mean of the non-missing observations. If the data is non-symmetric, the median value can be imputed instead. Depending on the nature of the data, this might not be the best approach if the data is not missing completely at random, as it ignores the correlation structure between features. 
  • Mode Imputation: Replace the missing values with the most frequently occurring value from the non-missing observations. This is kind of an analog to mean imputation for categorical features but would face the same issues if there is a systemic nature to the missingness.
  • Extreme Value Imputation: Replace the missing values with an arbitrary value located at the far end of the distribution of the feature, for example 999. This would not be a recommended approach for a linear model but sometimes works well with decision tree algorithms, as if there is predictive power of the missingness, a decision tree would be able to utilize the extremity of the coding to harness that predictability. 
  • Nearest Neighbor Imputation: Replace the missing values with the average of the observation’s k-nearest neighbors. Unlike simple mean or mode imputation, this approach does account for the correlation structure of the features; however, it can be sensitive to outliers and is more time consuming on large datasets. 
  • Expectation-Maximization: Use the iterative EM algorithm to impute missing observations using the “most likely” values based on the complete observations. Like Nearest Neighbor Imputation, this approach accounts for the correlation structure of the data however is more difficult to implement in standard software packages.