I analyzed a random sample of 96,685 full-text research papers, uploaded to PubMed Central between the years 2016 and 2021, in order to answer the question:
How many references should you cite when writing a research article?
I used the BioC API to download the data (see the References section below).
Here’s a summary of the key findings
1- The average number of references for a research paper is 45, with 90% of research papers having between 8 and 102 references. However, this number depends a lot on study design. For instance, a systematic review typically has 49 references while a case report has only 24.
2- As a rule of thumb, consider citing 1 reference for every 95 words (or 4 sentences).
3- The more research you do will be reflected in the number of references you use in your paper, since high-quality articles usually have 5 more references than the median.
How many references does a typical article have?
The histogram below shows that most research papers have between 25 to 50 references, and only a few exceed 100:
Because the distribution has a right tail, the median number of references becomes a more reliable metric than the mean. Here are a few other numbers that summarize the data:
Sample Size | 96,685 research papers |
---|---|
Mean | 45.07 references |
Minimum | 1 reference |
25th Percentile | 25 references |
50th Percentile (Median) | 39 references |
75th Percentile | 56 references |
Maximum | 911 references |
From this table we can conclude that:
The median research paper has 39 references, and 50% of papers have between 25 and 56 references. An article can have as few as 1 reference as a minimum, and 911 references as a maximum.
Next, let’s see if the number of references depends on the study design.
Should the study design influence the number of references you use?
The table below shows that:
1- Secondary study designs (systematic reviews and meta-analyses) have the highest number of references (median = 49), which is to be expected as these articles review a large body of information.
2- Experimental, quasi-experimental and analytical designs typically have between 35 and 39 references.
3- Descriptive designs (case reports and case series) have the lowest number of references (median ≈ 25), which also makes sense as these describe the clinical story of a single (or a few) patient(s) and generally have a very short literature review section.
Study Design | Article Count (Total: 16,321) | Median Number of References |
---|---|---|
Meta-Analysis | 1,952 | 49 |
Systematic Review | 884 | 49 |
Quasi-Experiment | 166 | 39 |
Cohort Study | 5,589 | 37 |
Randomized Controlled Trial | 1,137 | 37 |
Cross-Sectional Study | 3,811 | 36 |
Pilot Study | 794 | 36 |
Case-Control Study | 486 | 35 |
Case Series | 195 | 26 |
Case Report | 1,307 | 24 |
How often should you cite in a research paper?
Some journals may specify the maximum number of citations allowed. For instance, Nature allows articles to have at most 30 references in the main text [Source]. So make sure to check the authorship guidelines of the journal where you want to submit.
That being said, often is the case where we ask ourselves: am I taking too much information from outside sources? or maybe too few? So I would argue that it would be useful to know, for a given article size, how many references to cite.
If we measure the length of all the articles in our dataset combined and divide it by the total number of references, we get the following numbers:
On average, 1 reference is cited for every 95 words, that is 1 reference for every 4 sentences. In terms of paragraphs, an article has approximately 1.5 references for each paragraph.
Here’s a table that shows the median number of references cited for each word count category:
Article Size (Word Count) | Median Number of References |
---|---|
(1000, 2000] | 15 |
(2000, 3000] | 28 |
(3000, 4000] | 36 |
(4000, 5000] | 44 |
(5000, 6000] | 51 |
(6000, 7000] | 57 |
(7000, 8000] | 63 |
(8000, 9000] | 67 |
(9000, 10000] | 72 |
Does using more references make your article better?
Hypothesis 1: It is well-known that citing more resources is usually associated with more in-depth research, therefore, we would expect high-quality articles to include a higher number of references.
Hypothesis 2: Some experts believe that a good writing habit is to keep the number of references to a minimum (see: Essentials of Writing Biomedical Research Papers by Mimi Zeiger), so according to this hypothesis, high-quality articles should have, on average, fewer references.
Let’s find out what researchers are doing in practice and which hypothesis our data support.
In order to answer the question, I collected the journal impact factor (JIF) for 71,579 articles and divided the dataset into 2 groups:
- research papers published in low impact journals (JIF ≤ 3): this subset consisted of 34,758 articles
- research papers published in high impact journals (JIF > 3): this subset consisted of 36,821 articles
After controlling for study design, the group with JIF ≤ 3 had a median number of references of 37, while the group with JIF > 3 had a median of 44.
Remember that the median article overall had 39 references (as we saw above), so based on these results, we can conclude that:
High-quality articles, in general, have about 5 more references than the median article. So a comprehensive literature review and a more in-depth discussion section can make the difference between a good and an excellent research article.
References
- Comeau DC, Wei CH, Islamaj Doğan R, and Lu Z. PMC text mining subset in BioC: about 3 million full text articles and growing, Bioinformatics, btz070, 2019.