I analyzed a random sample of 76,147 full-text research papers, uploaded to PubMed Central between the years 2016 and 2021, in order to check the popularity of programming languages among medical researchers.
I used the BioC API to download the articles (see the References section below) of which only 12,086 mentioned the use of at least 1 programming language.
Results
R was the most used programming language overall, mentioned in 69.69% of research papers, followed by Matlab (21.31%) and Python (8.98%).
The 6-year trend showed that the popularity of R and Python is increasing (+1.59% and +1.21%) as opposed to Matlab, which showed a decline of 2.01%.
Here’s a table of the top 14 programming languages used in medical research:
Ranking | Programming Language | Number of Mentions (Total: 12086) | Mentions (in Percent) | 6-Year Trend |
---|---|---|---|---|
1 | R | 8423 | 69.69% | +1.59% |
2 | Matlab | 2575 | 21.31% | -2.01% |
3 | Python | 1085 | 8.98% | +1.21% |
4 | Java | 309 | 2.56% | -0.24% |
5 | Perl | 301 | 2.49% | -0.32% |
6 | C++ | 151 | 1.25% | -0.23% |
7 | JavaScript | 97 | 0.80% | – |
8 | SQL | 94 | 0.78% | – |
9 | PHP | 87 | 0.72% | – |
10 | Visual Basic | 40 | 0.33% | – |
11 | C# | 25 | 0.21% | – |
12 | FORTRAN | 25 | 0.21% | – |
13 | Julia | 8 | 0.07% | – |
14 | Maple | 6 | 0.05% | – |
⚠ How was the trend calculated?
The 6-year trend is the linear regression coefficient (reported in percent) obtained by regressing “the percent of articles that mention a particular programming language each year” onto the “years” variable. This trend was calculated only for programming languages with more than 100 mentions over the past 6 years, because otherwise, this number will be reflecting the noise more than the trend
References
- Comeau DC, Wei CH, Islamaj Doğan R, and Lu Z. PMC text mining subset in BioC: about 3 million full text articles and growing, Bioinformatics, btz070, 2019.