Programming Languages Popularity in 12,086 Research Papers

I analyzed a random sample of 76,147 full-text research papers, uploaded to PubMed Central between the years 2016 and 2021, in order to check the popularity of programming languages among medical researchers.

I used the BioC API to download the articles (see the References section below) of which only 12,086 mentioned the use of at least 1 programming language.

Results

R was the most used programming language overall, mentioned in 69.69% of research papers, followed by Matlab (21.31%) and Python (8.98%).

The 6-year trend showed that the popularity of R and Python is increasing (+1.59% and +1.21%) as opposed to Matlab, which showed a decline of 2.01%.

Here’s a table of the top 14 programming languages used in medical research:

RankingProgramming LanguageNumber of Mentions
(Total: 12086)
Mentions
(in Percent)
6-Year Trend
1R842369.69%+1.59%
2Matlab257521.31%-2.01%
3Python10858.98%+1.21%
4Java3092.56%-0.24%
5Perl3012.49%-0.32%
6C++1511.25%-0.23%
7JavaScript970.80%
8SQL940.78%
9PHP870.72%
10Visual Basic400.33%
11C#250.21%
12FORTRAN250.21%
13Julia80.07%
14Maple60.05%

⚠ How was the trend calculated?
The 6-year trend is the linear regression coefficient (reported in percent) obtained by regressing “the percent of articles that mention a particular programming language each year” onto the “years” variable. This trend was calculated only for programming languages with more than 100 mentions over the past 6 years, because otherwise, this number will be reflecting the noise more than the trend

References

  • Comeau DC, Wei CH, Islamaj Doğan R, and Lu Z. PMC text mining subset in BioC: about 3 million full text articles and growing, Bioinformatics, btz070, 2019.

Further reading