AIML.com

Machine Learning Resources

What is the difference between probability and non-probability sampling, and what are some example methodologies for each?

As it is often not feasible to be able to collect data from an entire population, it is important to use a sampling mechanism that collects observations that are representative of the population. At a high level, sampling can be divided into probability sampling and non-probability sampling. Probability sampling means that each unit in the population has some probability of being selected. The most common types of probability sampling include:

  • Simple Random Sampling: Each unit of the population has an equal probability of being chosen. The actual sampling is often done through the use of a random number generator. For example, if the population is all students at a university, a simple random sample could be formed by choosing a random list of IDs of all enrolled students. 
  • Stratified Sampling: A stratified sample identifies an attribute that separates a population, such as gender, race/ethnicity, or geographic state, and then samples in such a way that ensures each class is proportionally represented across the study. For example, if 50% of students at a college are from the United States, 25% are from China, and 25% are from India, a stratified sample would be formed by randomly sampling students of each nationality to maintain a similar proportion in the sample compared to the population. 
  • Cluster Sampling: In a cluster sample, subunits of the population that are hopefully independent of each other and representative of the population as a whole are identified. The choosing of the subunits is done randomly, which can be thought of as a simple random sample at the outer level. Once the sub-units are chosen, all of the members of the chosen units are then sampled. For example, if all students at a university, regardless of major or class level, lived in different dorms on campus, a cluster sample could be conducted by choosing a subset of dorms and then sampling all students who live in those buildings. 

In non-probability sampling, there is no random, probabilistic process of selecting units. There are many ways such a sample could be formed. For example, a convenience sample could be obtained by sampling all students in a certain course, which has a good chance of being biased towards students in a certain major or class level. Such a sample could also be formed by sending out a survey to all or part of the population, where there is no guarantee the profile of respondents would align with that of the population. In general, probability sampling is preferred to non-probability sampling to be more confident that the results will generalize to the population. However, if it is not possible to select units in a probabilistic fashion, non-probability sampling is often used to collect data in the real world. 

Partner Ad