Checking the Popularity of 125 Statistical Tests and Models

I analyzed the methods sections of 43,110 randomly chosen research papers, uploaded to PubMed Central between the years 2016 and 2021, in order to check the popularity of 125 statistical methods in medical research.

I used the BioC API to download the articles (see the References section below).

Here’s a summary of the key findings

The most popular statistical tests in research articles are:

  1. Student’s t-test: Used to compare the mean of a population to a theoretical value, or compare means between 2 populations.
  2. Chi-square test: Used to compare 2 proportions.
  3. Mann-Whitney U test: Used to compare medians between 2 populations.
  4. One-way ANOVA and Kruskal-Wallis test: Used to compare means between more than 2 populations.
  5. Kaplan-Meier estimator: Used to estimate the survival function when analyzing time-to-event data.
  6. Log-rank test: Used to compare survival times between 2 groups.

The most popular statistical models in research articles are:

  1. Logistic regression: Used to study the relationship between 1 or more predictor variables and 1 binary outcome variable.
  2. Linear regression: Used to study the relationship between 1 or more predictor variables and 1 continuous outcome variable.
  3. Cox regression: Used to study the relationship between 1 or more predictor variables and the survival time of patients.

Top statistical methods overall

In order to visualize the popularity of all 125 statistical tests and models, I created a Fisher-shaped word cloud which is a cluster of words showing the most popular ones in bolder and larger fonts:

Word cloud showing the most used statistical methods in research

And here’s a table for those of you who prefer to look at numbers:

Title of the document
RankStatistical Test/ModelNumber of
Mentions
(In 43,110 Articles)
Mentions
in Percent
1Student’s t-test1283129.76%
2Chi-square test1043724.21%
3Mann-Whitney U test806318.70%
4Logistic regression648215.04%
5One-way ANOVA502011.64%
6Linear regression446010.35%
7Kaplan–Meier estimator37788.76%
8Kruskal-Wallis test34798.07%
9Cox regression31647.34%
10Logrank test30667.11%
11Fisher’s exact test25785.98%
12Bayesian methods23705.50%
13Shapiro–Wilk test21805.06%
14Bonferroni correction19214.46%
15Kolmogorov–Smirnov test18944.39%
16Tukey’s HSD test18344.25%
17Wilcoxon signed-rank test14543.37%
18Paired t-test13883.22%
19Likelihood-ratio test12452.89%
20Dunnett’s test10742.49%
21ANCOVA8942.07%
22Repeated measures ANOVA8822.05%
23Factor analysis8031.86%
24Neural networks6291.46%
25Levene’s test5761.34%
26Non-linear regression4901.14%
27Random forest4541.05%
28Friedman test4501.04%
29Support vector machines4491.04%
30Stepwise regression4280.99%
31K-means clustering4240.98%
32Poisson regression3990.93%
33McNemar’s test3750.87%
34Hosmer–Lemeshow test3400.79%
35Wald test3340.77%
36Z-test3170.74%
37Meta-regression3150.73%
38One sample t-test2960.69%
39Lasso regression2890.67%
40Duncan’s new multiple range test2680.62%
41Cochran’s Q test2590.60%
42PERMANOVA2540.59%
43Bartlett’s test2520.58%
44Welch’s t-test2500.58%
45Linear discriminant analysis2500.58%
46MANOVA2360.55%
47Omnibus test2150.50%
48K-nearest neighbors1780.41%
49Ordinal regression1700.39%
50Tree models1620.38%
51Mantel test1620.38%
52Naive Bayes1590.37%
53Binomial test1560.36%
54Partial least squares discriminant analysis1510.35%
55Analysis of similarities1460.34%
56Negative binomial regression1400.32%
57Mauchly’s sphericity test1250.29%
58Principle component analysis1240.29%
59Continuity correction1210.28%
60Holm–Bonferroni method1190.28%
61Factorial ANOVA1110.26%
62Mixed ANOVA1050.24%
63Kaiser Meyer Olkin test1000.23%
64Gradient boosting930.22%
65Cochran-Mantel-Haenszel test840.19%
66Polynomial regression700.16%
67Elastic net660.15%
68Ridge regression600.14%
69Sign test600.14%
70Item-total correlation test530.12%
71Median test440.10%
72Jonckheere–Terpstra test440.10%
73Quantile regression400.09%
74Partial least squares regression400.09%
75Score test390.09%
76Grubbs’s test350.08%
77Brown–Forsythe test340.08%
78Anderson-Darling test340.08%
79Nemenyi test320.07%
80Beta regression310.07%
81Durbin–Watson test280.06%
82Sobel test240.06%
83Hausman test210.05%
84Tobit regression140.03%
85Separation test120.03%
86Goodman and Kruskal’s gamma110.03%
87Quasi-Poisson regression100.02%
88Vuong’s closeness test100.02%
89Wald–Wolfowitz runs test90.02%
90Jarque-Bera test90.02%
91Location test90.02%
92Breusch–Pagan test70.02%
93Shapiro–Francia test70.02%
94Phillips–Perron test50.01%
95Cramér–von Mises test50.01%
96Fay and Wu’s H50.01%
97Kuiper’s test50.01%
98Randomness test40.01%
99White test30.01%
100Park test30.01%
101Sargan–Hansen test30.01%
102Chauvenet’s criterion30.01%
103Hoeffding’s independence test30.01%
104Dixon’s Q test20.00%
105Ramsey RESET test20.00%
106Sequential probability ratio test10.00%
107Scheirer–Ray–Hare test10.00%
108Durbin test10.00%
109Cuzick–Edwards test10.00%
110Cochran’s C test10.00%
111Multinomial test10.00%
112Van der Waerden test10.00%
113Tukey’s test of additivity00.00%
114Lepage test00.00%
115Hartley’s test00.00%
116Glejser test00.00%
117GRIM test00.00%
118Siegel–Tukey test00.00%
119Tukey–Duckworth test00.00%
120Information matrix test00.00%
121Breusch–Godfrey test00.00%
122Goldfeld–Quandt test00.00%
123Squared ranks test00.00%
124Principle components regression00.00%
125ABX test00.00%

Popularity of normality tests

Normality tests are used to determine if the data follow a normal distribution, an essential requirement for many statistical tests and models.

The most used normality test was the Shapiro-Wilk test (mentioned in 5.06% of research papers), followed by Kolmogorov-Smirnov test (4.39%), Anderson-Darling test (0.08%), Jarque-Bera test (0.02%), and Cramér–von Mises test (0.01%).

Popularity of machine learning algorithms

Machine learning algorithms are divided into 2 classes:

  1. Supervised learning algorithms
  2. Unsupervised learning algorithms

1- Supervised learning algorithms are models used to predict an outcome, given 1 or more predictors. The most popular algorithms in our data were:

  • Neural networks (mentioned in 1.46% of research papers)
  • Non-linear regression (1.14%)
  • Random forest (1.05%)
  • Support vector machines (1.04%)
  • Lasso regression (0.67%)
  • Classification and regression trees (0.38%)
  • Naïve Bayes (0.37%)
  • Gradient boosted models (0.22%)
  • Ridge regression (0.14%).

Note that these models were far less popular than inferential models such as linear and logistic regression (10.35% and 15.04% respectively).

2- Unsupervised learning algorithms are methods used to discover patterns or group unlabeled data (i.e. in cases where we don’t have an outcome variable). The most popular methods in this category were:

  • Factor analysis (mentioned in 1.86% of research papers)
  • K-means clustering (0.98%)
  • Principle component analysis (0.29%)

Popularity of Bayesian methods

Bayesian methods were mentioned only in 5.5% of research papers, and there is no sign of an increasing Bayesian trend between the years 2016 and 2021.

This however does not reflect the importance of Bayesian methods. In fact, some of the best books on statistics, such as Regression and Other Stories by Gelman, Hill and Vehtari, incorporate at least some form of Bayesian thinking when teaching frequentist statistics.

Challenges I faced while analyzing text data for this study

In this bonus section if you will, I thought it would be interesting to share some of the challenges I had to deal with while analyzing the methods sections of these 43,110 research papers looking for mentions of the statistical methods used.

All of the problems mentioned below were taken care of by using appropriate regular expressions — these are sequences of symbols and characters used to search for a particular pattern (corresponding to a statistical test/model) in the text.

1. Different spellings

For instance:

  • HosmerLemeshow, Hosmer Lemeshow, Hosmer and Lemeshow, Hosmer & Lemeshow, etc.
  • K-nearest neighbors, K-nearest neighbours (British English spelling), KNN.
  • Bonferroni correction, Bonferroni method, Bonferroni‘s method
  • Chi square test, Chi-squared test, or χ2 test

2. Misspellings

Several incorrect spellings were surprisingly common, for instance:

Incorrect SpellingNumber of occurrencesCorrect Spelling
Cochrane Q test743 timesCochran Q test
Kolmogorov-Smirnoff40 timesKolmogorov-Smirnov
Kruskal-Willis7 timesKruskal-Wallis

3. Split words

For instance, when searching for the number of papers who reported the use of linear regression, it would be incomplete to search only for the phrase “linear regression” as sometimes the model may be reported as “linear and logistic regression were used”. Also, it would be wrong to search only for the word “linear” since “linear discriminant analysis” is a different statistical method than linear regression.

4. Alternative names

For instance, Student’s t-test is also known as: Independent t-test, Independent-samples t-test, and Two-sample t-test. This complicates the analysis as it requires knowledge of all synonyms for all statistical tests.

Although I did my best in finding these synonyms, if you notice anything missing or want to report an error, please email me by using the form in the contact page.

References

  • Comeau DC, Wei CH, Islamaj Doğan R, and Lu Z. PMC text mining subset in BioC: about 3 million full text articles and growing, Bioinformatics, btz070, 2019.

Further reading