What are the different categories of missing data?

  • Missing Completely at Random (MCAR): If data is missing completely at random, there is nothing systemic about the missing values and it is probably safe to use a simple imputation technique such as the mean of the data or just exclude the observations with missing data entirely. Mathematically, it is assumed that the missing observations and the complete observations are drawn from the same underlying distribution. 
  • Missing at Random (MAR): In this category of missing data, it is no longer the case that the missing observations come from the same distribution as the complete observations. Thus, the missingness can be considered a function of an observed attribute within the data. For example, if a researcher at a university is using Gender and SAT Score to predict 1st Year GPA, and women are more likely to take the SAT than men, this data is considered the MAR case. As the missing data can introduce bias to the results, it might be necessary to adjust for the attribute that is believed to be correlated with the missingness. 
  • Missing Not at Random (MNAR): In this case, the missing data is believed to be systematically associated with data that is not observed or collected. In the example of predicting 1st Year GPA, if students from lower socioeconomic brackets were less likely to take the SAT, this would be an example of MNAR. Thus, it is possible that the missing data can bias any conclusions reached. However, there is no simple weighting adjustment that can be made to an independent variable that can undo the bias.