I analyzed a random sample of 76,147 full-text research papers, uploaded to PubMed Central between the years 2016 and 2021, in order to check the popularity of programming languages among medical researchers.
I used the BioC API to download the articles (see the References section below) of which only 12,086 mentioned the use of at least 1 programming language.
Results
R was the most used programming language overall, mentioned in 69.69% of research papers, followed by Matlab (21.31%) and Python (8.98%).
The 6-year trend showed that the popularity of R and Python is increasing (+1.59% and +1.21%) as opposed to Matlab, which showed a decline of 2.01%.
Here’s a table of the top 14 programming languages used in medical research:
| Ranking | Programming Language | Number of Mentions (Total: 12086) | Mentions (in Percent) | 6-Year Trend |
|---|---|---|---|---|
| 1 | R | 8423 | 69.69% | +1.59% |
| 2 | Matlab | 2575 | 21.31% | -2.01% |
| 3 | Python | 1085 | 8.98% | +1.21% |
| 4 | Java | 309 | 2.56% | -0.24% |
| 5 | Perl | 301 | 2.49% | -0.32% |
| 6 | C++ | 151 | 1.25% | -0.23% |
| 7 | JavaScript | 97 | 0.80% | – |
| 8 | SQL | 94 | 0.78% | – |
| 9 | PHP | 87 | 0.72% | – |
| 10 | Visual Basic | 40 | 0.33% | – |
| 11 | C# | 25 | 0.21% | – |
| 12 | FORTRAN | 25 | 0.21% | – |
| 13 | Julia | 8 | 0.07% | – |
| 14 | Maple | 6 | 0.05% | – |
⚠ How was the trend calculated?
The 6-year trend is the linear regression coefficient (reported in percent) obtained by regressing “the percent of articles that mention a particular programming language each year” onto the “years” variable. This trend was calculated only for programming languages with more than 100 mentions over the past 6 years, because otherwise, this number will be reflecting the noise more than the trend
References
- Comeau DC, Wei CH, Islamaj Doğan R, and Lu Z. PMC text mining subset in BioC: about 3 million full text articles and growing, Bioinformatics, btz070, 2019.