Which Sampling Methods Are Most Commonly Used in Research?

I analyzed a random sample of 9,830 full-text research papers, uploaded to PubMed Central between the years 2016 and 2021, to check the popularity of different sampling methods and assess their correlation with the quality of research.

I used the BioC API to download the data (see the References section below).

Here’s a summary of the key findings

1. In the 596 articles that mentioned the sampling method used, probability sampling methods were more popular than non-probability sampling methods. Specifically, cluster sampling was the most used probability sampling method, and convenience sampling was the most used non-probability sampling method.

2. The 8-year trend shows that the use of non-probability sampling methods is increasing, and their popularity surpassed that of probability sampling methods in the past 2 years.

3 & 4. The use of a probability versus non-probability sampling method does not influence the quality of a research article, nor does it affect the number of citations it receives.

5. As expected, some study designs tend to use certain sampling methods more than others. For instance, a case report study always uses a non-probability sampling method, as opposed to an observational design study, which tends to use a probability sampling method.

1. Most used sampling methods overall

Out of 9,830 research papers, only 596 mentioned the sampling method used (6.06%).

Probability sampling methods (simple random sampling, systematic sampling, stratified sampling, and cluster sampling) were mentioned in 306 articles compared to 257 for non-probability sampling methods (convenience sampling, quota sampling, and network sampling).

The following table shows the number of articles that mentioned each of the 8 sampling method:

RankSampling MethodNumber of mentions
(in a total of 596 articles)
Percent
1Convenience sampling20834.9%
2Cluster sampling13222.1%
3Simple random sampling8414.1%
4Multistage sampling7111.9%
5Stratified sampling6711.2%
6Systematic sampling579.6%
7Network sampling427.0%
8Quota sampling111.8%

2. Top sampling methods over the past 8 years

The following plot shows the proportion of articles that used probability and non-probability sampling methods from 2014 to 2021:

Plot showing the trend in the use of probability versus non-probability sampling methods over the past 8 years

Non-probability sampling methods are getting more popular over the years, surpassing probability sampling methods in the past 2 years.

3. Is the sampling method used in a study a good indicator of its research quality?

In order to answer this question, I used a logistic regression model to predict the sampling method used (probability vs non-probability sampling) using the journal impact factor as a predictor (considering that the journal impact factor is a good proxy of research quality).

Here’s the output of the model:

VariableCoefficientStandard ErrorP-value
(Intercept)-0.022330.207570.914
Journal Impact Factor0.089740.068570.191

Since the coefficient of journal impact factor is not statistically significant, we can conclude that the type of sampling method used is not related to the quality of the research paper.

4. Is the citation count for an article affected by the sampling method used?

In order to answer this question, I used a logistic regression model to predict the sampling method used (probability vs non-probability sampling) using the citation count as a predictor.

Here’s the output of the model:

VariableCoefficientStandard ErrorP-value
(Intercept)0.1548210.1061340.145
Citation Count0.0061220.0054130.258

Since the coefficient of citation count is not statistically significant, we can conclude that type of sampling method used is not related to the number of citations the article receives.

5. Do studies with different designs use different sampling methods?

The following plot shows the difference in the proportion of studies that used probability vs non-probability sampling methods for different study designs:

Chart showing how studies with different designs differ in the sampling method used.

For example, 8% more case-control studies used probability sampling methods than those that used non-probability methods; and 100% of case reports used non-probability sampling methods.

Looking at this plot, we see that case reports, pilot studies, and randomized trials tend to use non-probability sampling methods, and case-control, cross-sectional, and cohort studies tend to use probability sampling methods.

References

  • Comeau DC, Wei CH, Islamaj Doğan R, and Lu Z. PMC text mining subset in BioC: about 3 million full text articles and growing, Bioinformatics, btz070, 2019.

Further reading