Ruminating…

Random thoughts about conducting scientific research, supervising students and «toys» that make creative activities even funnier….

Ruminating…

How many citations are actually a lot of citations?

In a previous blog post , I suggested to my younger colleagues that while they should not care so much about the impact factor of the journals they published in (as long as these journals are well-read in their respective fields of research), they should care quite a lot about these papers being cited, and cited by others not self-cited!

A few months ago, I was listening to the introductory talk of for a prestigious award from our national organization when one statement hit me: a physicist with 2000 or more citations is part of the 1% most cited physicists worldwide. There might have been a bit more to that statement but let’s work with it.

In fact, if you search for highly cited research on Google or any of your favorite search engines, this question (and many related ones) is the subject of intense research in itself. It seems, we human like to be able to put a number and rank on things. We also like to establish a hierarchy among things, even human being. 😉

So according to Wikipedia , from January 2000 to February 2010, the top 1% researchers in physics had about 2073 citations or better, or about slightly fewer than 210 citations per year over a 10-year period. That threshold changes with the field of research, of course, and can be much lower for other fields.

My first impression was that this seems low. However, as one digs deeper, one notices that most published manuscripts are not cited at all…Nature had a nice piece in 2014 about the TOP 100 papers . The infographics below is part of the article and is quite telling.

nature_top_100_papers_infographicV2_30.10.14

First, almost 44% of all published manuscripts are never cited. If you have even 1 citations for a manuscript you are already (almost!) in the top half (top 55.8%). With 10 or more citations, your work is now in the top 24% of the most cited work worldwide; this increased to the top 1.8% as you reach 100 or more citations. Main take home message: the average citation per manuscript is clearly below 10!

I also found a great blog post by Scott Weingart , which by restricting his analysis to a single journal, Scientometrics, got very similar results, with 50% of all published papers in that journal having fewer than 4 citations, 70% fewer than 7.

Out of curiosity, I decided to look at my own publications. Two tools are available Google Scholar and Thomson Reuters Web of Sciences. These data are presented in the figure below. As expected the number of manuscripts that gather over 100 citations are few. In my case it hovers between 2% to 5% depending on the tool used; both gives over 60% for 10 or more citations: 2 to 5 manuscripts out of a 100 get to be in the top 1.8% most cited manuscripts and 60 out of the same 100 are in the top quarter. Based on the discussion above, I suppose this is a good sign…

CItation-Fraction

Still in Nature, there was yet another piece about the TOP 1% in science . Here is it claimed that fewer than 1% of all researchers have published consistently every year between 1996 and 2011, but those few who have commanded a “market share”  of (i.e. are authors on) 41% of all published manuscripts in the same period (original manuscript in PLOS ONE ).

There is, of course, Thomson Reuters that has its own version of the “World’s Most Influential Scientific Mind” for a given year (e.g. 2014 ) and highly cited researchers . Here “hot papers” are defined by being in the top 0.1% (you’ve read correctly) by citations for their field of research and “influential researchers” are those having the largest numbers of these “hot papers” ( generally 15+ ). Before you ask, no I am not listed in the 3500 or so researchers described in there…

Now having said all that, more analysis of all of these “metrics” seems to indicate that there is only a weak correlation between the top 1% of highly cited researchers and Nobel Prize winners. As noted by the first Nature article cited above, many of the very highly cited papers are about tools or methods rather than fundamental scientific discoveries made with these tools or methods…

Here is a quick one that was part of a discussion among friends: two researchers have each a paper with 200+ citations. For the purpose of this blog post let’s say exactly 200 citations. One is for a paper published in what is considered a top journal, let’s say Science or Nature. The other for a manuscript published in a low impact factor journal (5-year IF of 2). Which of these two papers have had:

  • The greatest impact on science?
  • On their field of research?

Looking forward to reading you!

Share this:

21 thoughts on “ how many citations are actually a lot of citations ”.

Pingback: Towards systematically better returns from science funding – Tempus Fortium

Pingback: Can A Paper Garner Numerous Reads Despite Minimal Citations?

Thank you Luc 🙏 . Very interesting.

I have 23 citations on one of my papers. How can I know my percentage. If 10 I will be one of the top 24 what about 23?

Pingback: How to read scientific papers quickly and effectively | Molekule Blog

Pingback: Publications and citations for EB2 NIW green card » EB2 NIW Info

So let say I have 50 citations in Google Scholar as of Feb 23, 2022. What would be the possible rank of mine considering published papers/patent/books? Here is the link https://scholar.google.com/citations?user=pO4oG1kAAAAJ&hl=en Thanks.

These statistics are for a “per paper” using Web of Science (not Google Scholar). The 50 citations your are referring to are for all the documents at your name found by Google Scholar.

Pingback: What Is A Good Number Of Citations - Best College Portal

Pingback: How to read scientific papers quickly and effectively – My Blog

Where did you get the reference to this quote “10 or more citations, your work is now in the top 24% of the most cited work worldwide”?

Hi Carole, you can extract it from the infographic in the post or read the background materials cited.

The Web of Science Highly Cited Researchers are those who have a certain number of papers (not books) that are in the top 1% (not 0.1%) of all cited papers during that year (eg among papers from 2015). And those researchers are in the top 0.1% of all researchers-professors. So its top 1% in terms of citations for the highly cited papers and then 0.1% in terms of all researchers (that latter number may be approximate). This is over a 10 year period. It’s hard to get because you basically need papers that are getting about 50 Google scholar citations per year, which usually means about 20-25 Web of Science citations (Web of Science is stricter), to land the paper in the top 1%. And you need about six of those during a given decade to be in the top 0.1% to get the Highly Cited Researcher award. Top 0.1% means in the top 6000 (out of about 6 million professors and other researchers worldwide).

Pingback: Analysis of publication impact in predatory-journal – Nature – Ruminating…

Interesting post and knowledge. Now I feel proud that couple of my papers went beyond 55 citations.

Very interesting article. I was indeed looking for this information!

Pingback: Another year is over… – Ruminating…

Nice article Luc. A minor observation is that those highly cited papers tend to be slanted towards biological sciences, which also tend to have a fairly high number of publications. Putting individual statistics aside, I wonder if citations is loosely correlated to the number of sub-disciplines.

Defining an ‘expert’ is remarkably different from someone who has a high H-index, # of citations, and other publication metrics, and more to do with things like invited lectures, publications in reports, books, guidance documents, and such which are may not be subjected to indexing.

I whole heartedly agree that, for younger researchers, quantity is important, without forsaking quality. And things like blog-posts, comments, etc all count! Later in ones academic career one can be more picky about where to publish.

My experience is that shooting for number only is not the way to go. I am not saying that number of publication is not important since both for promotion and the purpose of obtaining grants, you do need a certain number of them. However, if your publications are not cited, you can have many hundreds it does not matter: you are having no impact on your field.

In grants committees, my experience is that after a few publications per year, it does not matter so much if you have 18, 22 or 25 in the last 5 years. It will be held against you if you have only 2 or 3 however – unless it is 2-3 Science or Nature (Of course of you have 50+ all in good journals, you are not only a “middle” author on these and you can show that your papers have impact – you will certainly get praised).

I would also say that you have to pick carefully where you published. And here you are right your “forum” depends a lot on the path you are following: top research journals in your field vs more applied (clinical for us) journals vs reports and so on. It also depends I guess if you need to show social impact, academic impact, clinical impact …

There is no simple formula for sure 😉

Very interesting post. It seems like a small number (2000 being the 1%) but I don’t have any papers close to those citation numbers. A couple of years ago I posted a fairly simple analysis of my own citation record. But I thought it would be interesting to see how my papers performed in relation to the impact factor of the journal they were in. Most did pretty well, getting more than expected by the journal. Anyway, I thought that would be an interesting addition to your own self analysis (though it’s complicated by the fact that the impact factor of journals varies over time). Here’s the post if you’re interested: http://jasonya.com/wp/publication-impact-factor-analysis/

Thanks Jason!

Leave a comment Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed .

' src=

  • Already have a WordPress.com account? Log in now.
  • Subscribe Subscribed
  • Copy shortlink
  • Report this content
  • View post in Reader
  • Manage subscriptions
  • Collapse this bar

American Psychological Association

Appropriate Level of Citation

The number of sources you cite in your paper depends on the purpose of your work. For most papers, cite one or two of the most representative sources for each key point. Literature review papers, however, typically include a more exhaustive list of references.

Provide appropriate credit to the source (e.g., by using an in-text citation) whenever you do the following:

  • paraphrase (i.e., state in your own words) the ideas of others
  • directly quote the words of others
  • refer to data or data sets
  • reprint or adapt a table or figure, even images from the internet that are free or licensed in the Creative Commons
  • reprint a long text passage or commercially copyrighted test item

Avoid both undercitation and overcitation. Undercitation can lead to plagiarism and/or self-plagiarism . Overcitation can be distracting and is unnecessary.

For example, it is considered overcitation to repeat the same citation in every sentence when the source and topic have not changed. Instead, when paraphrasing a key point in more than one sentence within a paragraph, cite the source in the first sentence in which it is relevant and do not repeat the citation in subsequent sentences as long as the source remains clear and unchanged.

Figure 8.1 in Chapter 8 of the Publication Manual provides an example of an appropriate level of citation.

Determining the appropriate level of citation is covered in the seventh edition APA Style manuals in the Publication Manual Section 8.1 and the Concise Guide Section 8.1

how many citations is good for a research paper

Related handouts

  • In-Text Citation Checklist (PDF, 227KB)
  • Six Steps to Proper Citation (PDF, 112KB)

From the APA Style blog

How to cite your own translations

How to cite your own translations

If you translate a passage from one language into another on your own in your paper, your translation is considered a paraphrase, not a direct quotation.

Key takeaways from the Psi Chi webinar So You Need to Write a Literature Review

Key takeaways from the Psi Chi webinar So You Need to Write a Literature Review

This blog post describes key tasks in writing an effective literature review and provides strategies for approaching those tasks.

image of a laptop with a giant x covering the screen

How to cite a work with a nonrecoverable source

In most cases, nonrecoverable sources such as personal emails, nonarchived social media livestreams (or deleted and unarchived social media posts), classroom lectures, unrecorded webinars or presentations, and intranet sources should be cited only in the text as personal communications.

The “outdated sources” myth

The “outdated sources” myth

The “outdated sources” myth is that sources must have been published recently, such as the last 5 to 10 years. There is no timeliness requirement in APA Style.

From COVID-19 to demands for social justice: Citing contemporary sources for current events

From COVID-19 to demands for social justice: Citing contemporary sources for current events

The guidance in the seventh edition of the Publication Manual makes the process of citing contemporary sources found online easier than ever before.

Citing classical and religious works

Citing classical and religious works

A classical or religious work is cited as either a book or a webpage, depending on what version of the source you are using. This post includes details and examples.

Academic Writer logo

Academic Writer—APA’s essential teaching resource for higher education instructors

Academic Writer’s advanced authoring technology and digital learning tools allow students to take a hands-on approach to learning the scholarly research and writing process.

how many citations is good for a research paper

APA Style webinar on citing works in text

Attend the webinar, “Citing Works in Text Using Seventh Edition APA Style,” on July 14, 2020, to learn the keys to accurately and consistently citing sources in APA Style.

University of Illinois Chicago

University library, search uic library collections.

Find items in UIC Library collections, including books, articles, databases and more.

Advanced Search

Search UIC Library Website

Find items on the UIC Library website, including research guides, help articles, events and website pages.

  • Search Collections
  • Search Website

Measuring Your Impact: Impact Factor, Citation Analysis, and other Metrics: Citation Analysis

  • Measuring Your Impact

Citation Analysis

Find your h-index.

  • Other Metrics/ Altmetrics
  • Journal Impact Factor (IF)
  • Selecting Publication Venues

About Citation Analysis

What is Citation Analysis?

The process whereby the impact or "quality" of an article is assessed by counting the number of times other authors mention it in their work.

Citation analysis invovles counting the number of times an article is cited by other works to measure the impact of a publicaton or author.  The caviat however, there is no single citation analysis tools that collects all publications and their cited references.  For a thorough analysis of the impact of an author or a publication, one needs to look in multiple databases to find all possible cited references. A number of resources are available at UIC  that identify cited works including: Web of Science, Scopus, Google Scholar, and other databases with limited citation data.

Citation Analysis - Why use it?

To find out how much impact a particular article or author has had, by showing which other authors cited the work within their own papers.  The H-Index is one specific method utilizing citation analysis to determine an individuals impact.

Web of Science

Web of Science provides citation counts for articles indexed within it.  It i ndexes over 10,000 journals in the arts, humanities,  sciences, and social sciences.

  • Enter the name of the author in the top search box (e.g. Smith JT).  
  • Select Author from the drop-down menu on the right.
  • To ensure accuracy for popular names, enter Univ Illinois in the middle search box, then select “Address” from the field drop down menu on the right.  (You might have to add the second search box by clicking "add another field" before you enter the address)
  • Click on Search
  • a list of publications by that author name will appear.   To the right of each citation, the number of times the article has been cited will appear.   Click the number next to "times cited" to view the articles that have cited your article

Scopus provide citation counts for articles indexed within it (limited to article written in 1996 and after).   It indexes o ver 15,000 journals from over 4,000 international publishers across the disciplines.

  • Once in Scopus, click on the Author search tab.
  • Enter the name of the author in the search box.  If you are using initials for the first and/or middle name, be sure to enter periods after the initials (e.g. Smith J.T.). 
  • To ensure accuracy if it is a popular name, you may enter University of Illinois in the affiliation field.  
  • If more than one profile appears, click on your profile (or the profile of the person you are examining). 
  • Once you click on the author's profile, a list of the publications will appear and to the right of each ctation, the number of times the article has been cited will appear.  
  • Click the number to view the articles that have cited your article

 Dimensions (UIC does not subscribe but parts are free to use)

  • Indexes over 28000 journals
  • Does not display h-index in Dimensions but can calculate or if faculty, look in MyActivities
  • Includes Altmetrics score
  • Google Scholar

Google Scholar provides citation counts for articles found within Google Scholar.  Depending on the discipline and cited article, it may find more cited references than Web of Science or Scopus because overall, Google Scholar is indexing more journals and more publication types than other databases. Google Scholar is not specific about what is included in its tool but information is available on how Google obtains its content .   Limiting searches to only publications by a specific author name is complicated in Google Scholar.  Using Google Scholar Citations and creating your own profile will make it easy for you to create a list of publications included in Google Scholar.   Using your Google Scholar Citations account, you can see the citation counts for your publications and have GS calculate your h-index.  (You can also search Google Scholar by author name and the title of an article to retrieve citation information for a specific article.)

  • Using your google (gmail) account, create a profile of all your articles captured in Google Scholar.  Follow the prompt on the scrren to set up your profile.   Once complete, this will show all the times the articles have been cited by other documents in Google Scholar and your h-index will be provided.  Its your choice whether you make your profile public or private but if you make it public, you can link to it from your own webpages.

Try Harzing's Publish or Perish Tool in order to more selectively examine published works by a specific author.

Databases containing limited citation counts:

  • PubMed Central
  • Science Direct
  • SciFinder Scholar

About the H-index

The h-index is an index to quantify an individual’s scientific research output ( J.E. Hirsch )   The h-index is an index that attempts to measure both the scientific productivity and the apparent scientific impact of a scientist. The index is based on the set of the researcher's most cited papers and the number of citations that they have received in other people's publications ( Wikipedia )  A scientist has index h if h of [his/her] Np papers have at least h citations each, and the other (Np − h) papers have at most h citations each.

Find your h-index at:

Below are instructions for obtaining your h-index from Web of Science, Scopus, and Google Scholar.

Web of Science provides citation counts for articles indexed within it.  It indexes over 12,000 journals in the arts, humanities,  sciences, and social sciences.  To find an author's h-index in WOS:

  • To ensure accuracy for popular names, add an additional search box and enter "Univ Illinois" and then select “Address” from the field drop down menu on the right.
  • Click on Citation Report on the right hand corner of the results page.  The H-index is on the right of the screen.
  • If more than one profile appears, click on your profile (or the profile of the person you are examining).  Under the Research section, you will see the h-index listed.
  • If you have worked at more than one place, your name may appear twice with 2 separate h-index ratings.  Select the check box next to each relevent profile, and click show documents.

  Google Scholar

  • Using your google (gmail) account, create a profile of all your articles captured in Google Scholar.  Follow the prompt on the screen to set up your profile.   Once complete, this will show all the times the articles have been cited by other documents in Google Scholar and your h-index will be provided.  Its your choice whether you make your profile public or private but if you make it public, you can link to it from your own webpages.
  • See  Albert Einstein's
  • Harzing’s Publish or Perish (POP) 
  • Publish or Perish Searches Google Scholar.  After searching by your name, deselect from the list of articles retrieved those that you did not author.  Your h-index will appear at the top of the tool.  Note:This tool must be downloaded to use
  • << Previous: Measuring Your Impact
  • Next: Find Your H-Index >>
  • Last Updated: Jun 14, 2024 1:10 PM
  • URL: https://researchguides.uic.edu/if

How Many References Should a Research Paper Have? Study of 96,685 Articles

I analyzed a random sample of 96,685 full-text research papers, uploaded to PubMed Central between the years 2016 and 2021, in order to answer the question:

How many references should you cite when writing a research article?

I used the BioC API to download the data (see the References section below).

Here’s a summary of the key findings

1- The average number of references for a research paper is 45 , with 90% of research papers having between 8 and 102 references. However, this number depends a lot on study design . For instance, a systematic review typically has 49 references while a case report has only 24.

2- As a rule of thumb, consider citing 1 reference for every 95 words (or 4 sentences) .

3- The more research you do will be reflected in the number of references you use in your paper, since high-quality articles usually have 5 more references than the median .

How many references does a typical article have?

The histogram below shows that most research papers have between 25 to 50 references, and only a few exceed 100:

Histogram showing that most articles have between 0 and 100 references and only a few have more than 100 references.

Because the distribution has a right tail, the median number of references becomes a more reliable metric than the mean. Here are a few other numbers that summarize the data:

Sample Size96,685 research papers
Mean45.07 references
Minimum1 reference
25th Percentile25 references
50th Percentile (Median)39 references
75th Percentile56 references
Maximum911 references

From this table we can conclude that:

The median research paper has 39 references, and 50% of papers have between 25 and 56 references. An article can have as few as 1 reference as a minimum, and 911 references as a maximum.

Next, let’s see if the number of references depends on the study design.

Should the study design influence the number of references you use?

The table below shows that:

1- Secondary study designs (systematic reviews and meta-analyses) have the highest number of references (median = 49), which is to be expected as these articles review a large body of information.

2- Experimental, quasi-experimental and analytical designs typically have between 35 and 39 references.

3- Descriptive designs (case reports and case series) have the lowest number of references (median ≈ 25), which also makes sense as these describe the clinical story of a single (or a few) patient(s) and generally have a very short literature review section.

Study DesignArticle Count
(Total: 16,321)
Median Number
of References
Meta-Analysis1,95249
Systematic Review88449
Quasi-Experiment16639
Cohort Study5,58937
Randomized Controlled Trial1,13737
Cross-Sectional Study3,81136
Pilot Study79436
Case-Control Study48635
Case Series19526
Case Report1,30724

How often should you cite in a research paper?

Some journals may specify the maximum number of citations allowed. For instance, Nature allows articles to have at most 30 references in the main text [ Source ]. So make sure to check the authorship guidelines of the journal where you want to submit.

That being said, often is the case where we ask ourselves: am I taking too much information from outside sources? or maybe too few? So I would argue that it would be useful to know, for a given article size, how many references to cite.

If we measure the length of all the articles in our dataset combined and divide it by the total number of references, we get the following numbers:

On average, 1 reference is cited for every 95 words, that is 1 reference for every 4 sentences. In terms of paragraphs, an article has approximately 1.5 references for each paragraph.

Here’s a table that shows the median number of references cited for each word count category:

Article Size
(Word Count)
Median Number
of References
(1000, 2000]15
(2000, 3000]28
(3000, 4000]36
(4000, 5000]44
(5000, 6000]51
(6000, 7000]57
(7000, 8000]63
(8000, 9000]67
(9000, 10000]72

Does using more references make your article better?

Hypothesis 1: It is well-known that citing more resources is usually associated with more in-depth research, therefore, we would expect high-quality articles to include a higher number of references.

Hypothesis 2: Some experts believe that a good writing habit is to keep the number of references to a minimum (see: Essentials of Writing Biomedical Research Papers by Mimi Zeiger), so according to this hypothesis, high-quality articles should have, on average, fewer references.

Let’s find out what researchers are doing in practice and which hypothesis our data support.

In order to answer the question, I collected the journal impact factor (JIF) for 71,579 articles and divided the dataset into 2 groups:

  • research papers published in low impact journals (JIF ≤ 3): this subset consisted of 34,758 articles
  • research papers published in high impact journals (JIF > 3): this subset consisted of 36,821 articles

After controlling for study design, the group with JIF ≤ 3 had a median number of references of 37, while the group with JIF > 3 had a median of 44.

Remember that the median article overall had 39 references (as we saw above), so based on these results, we can conclude that:

High-quality articles, in general, have about 5 more references than the median article. So a comprehensive literature review and a more in-depth discussion section can make the difference between a good and an excellent research article.

  • Comeau DC, Wei CH, Islamaj Doğan R, and Lu Z. PMC text mining subset in BioC: about 3 million full text articles and growing,  Bioinformatics , btz070, 2019.

Further reading

  • How Old Should References Be? Based on 3,823,919 Examples
  • Statistical Software Popularity in 40,582 Research Papers
  • Programming Languages Popularity in 12,086 Research Papers
  • Length of a Conclusion Section: Analysis of 47,810 Examples
  • How Long Should a Research Paper Be? Data from 61,519 Examples
  • Directories
  • What are citations and why should I use them?
  • When should I use a citation?
  • Why are there so many citation styles?
  • Which citation style should I use?
  • Chicago Notes Style
  • Chicago Author-Date Style
  • AMA Style (medicine)
  • Bluebook (law)
  • Additional Citation Styles
  • Built-in Citation Tools
  • Quick Citation Generators
  • Citation Management Software
  • Start Your Research
  • Research Guides
  • University of Washington Libraries
  • Library Guides
  • UW Libraries
  • Citing Sources

Citing Sources: What are citations and why should I use them?

What is a citation.

Citations are a way of giving credit when certain material in your work came from another source. It also gives your readers the information necessary to find that source again-- it provides an important roadmap to your research process. Whenever you use sources such as books, journals or websites in your research, you must give credit to the original author by citing the source. 

Why do researchers cite?

Scholarship is a conversation  and scholars use citations not only to  give credit  to original creators and thinkers, but also to  add strength and authority  to their own work.  By citing their sources, scholars are  placing their work in a specific context  to show where they “fit” within the larger conversation.  Citations are also a great way to  leave a trail  intended to help others who may want to explore the conversation or use the sources in their own work.

In short, citations

(1) give credit

(2) add strength and authority to your work

(3) place your work in a specific context

(4) leave a trail for other scholars

"Good citations should reveal your sources, not conceal them. They should honeslty reflect the research you conducted." (Lipson 4)

Lipson, Charles. "Why Cite?"  Cite Right: A Quick Guide to Citation Styles--MLA, APA, Chicago, the Sciences, Professions, and More . Chicago: U of Chicago, 2006. Print.

What does a citation look like?

Different subject disciplines call for citation information to be written in very specific order, capitalization, and punctuation. There are therefore many different style formats. Three popular citation formats are MLA Style (for humanities articles) and APA or Chicago (for social sciences articles).

MLA style (print journal article):  

Whisenant, Warren A. "How Women Have Fared as Interscholastic Athletic Administrators Since the Passage of Title IX." Sex Roles Vol. 49.3 (2003): 179-182.

APA style (print journal article):

Whisenant, W. A. (2003) How Women Have Fared as Interscholastic Athletic Administrators Since the Passage of Title IX. Sex Roles , 49 (3), 179-182.

Chicago style (print journal article):

Whisenant, Warren A. "How Women Have Fared as Interscholastic Athletic Administrators Since the Passage of Title IX." Sex Roles 49, no. 3 (2003): 179-182.

No matter which style you use, all citations require the same basic information:

  • Author or Creator
  • Container (e.g., Journal or magazine, website, edited book)
  • Date of creation or publication
  • Publisher 

You are most likely to have easy access to all of your citation information when you find it in the first place. Take note of this information up front, and it will be much easier to cite it effectively later.

  • << Previous: Basics of Citing
  • Next: When should I use a citation? >>
  • Last Updated: May 1, 2024 12:48 PM
  • URL: https://guides.lib.uw.edu/research/citations
  • Affiliate Program

Wordvice

  • UNITED STATES
  • 台灣 (TAIWAN)
  • TÜRKIYE (TURKEY)
  • Academic Editing Services
  • - Research Paper
  • - Journal Manuscript
  • - Dissertation
  • - College & University Assignments
  • Admissions Editing Services
  • - Application Essay
  • - Personal Statement
  • - Recommendation Letter
  • - Cover Letter
  • - CV/Resume
  • Business Editing Services
  • - Business Documents
  • - Report & Brochure
  • - Website & Blog
  • Writer Editing Services
  • - Script & Screenplay
  • Our Editors
  • Client Reviews
  • Editing & Proofreading Prices
  • Wordvice Points
  • Partner Discount
  • Plagiarism Checker
  • APA Citation Generator
  • MLA Citation Generator
  • Chicago Citation Generator
  • Vancouver Citation Generator
  • - APA Style
  • - MLA Style
  • - Chicago Style
  • - Vancouver Style
  • Writing & Editing Guide
  • Academic Resources
  • Admissions Resources

How Many References in a Research Paper?

how many citations is good for a research paper

Junior researchers frequently wonder how many references should be included in their research papers. The common response? “As many as you need.” What exactly does that mean? While we admit there are very few hard-set rules regarding this issue, in this article, we will try to provide more concrete guidelines that will help you assess whether you have enough references in your paper.

Before we do so, let us briefly explain why references matter and whether the number of references you include can project certain perceptions about the quality of your work. There is such a thing as having too many or too few.

Why are research paper references and citations necessary?

References show that you have carefully reviewed the relevant literature and are now contributing something  novel  to the academic community. You establish authority and credibility when you can critically assess other literature and distinguish your findings from previous works (if any exist). We emphasize “critically assess” in the last sentence because references are only as good as you apply them to your research. Therefore, the famous adage “quality over quantity” is the key to deciding how many references are sufficient.

Likewise, citing your references within the research paper itself (in the form of academic citations ) is crucial in any academic work that makes assertations based on external studies. Failing to cite your sources can result in plagiarism, which even if accidental can still have some devastating consequences for academic researchers hoping to publish their work or finish graduate school.

Number of Sources Used Can Impact Perceptions of Quality

We would be remiss if we didn’t tell you that being at either extreme (having too few or too many references) can reflect poorly on your intellectual aptitude and your study’s validity. Here’s why:

  • If you don’t have enough references, particularly on a topic familiar to a wide audience, readers may think that you haven’t done enough research into existing literature. Surely someone else has thought about related topics or used similar techniques. If you’re sloppy in conducting your diligence, readers will wonder whether your paper is worth reading. What’s novel and valuable about your paper? Were you just as sloppy with conducting your study? The answers to these questions need to be evident.
  • Additionally, readers might be concerned that you may have plagiarized by failing to properly cite information. Unless you’re John Nash, who cited only two texts in his seminal  26-page PhD thesis  (one of which was to his prior work), ensure that you’ve properly researched the relevant papers and included appropriate citations! Especially, make sure that you have found, read, and included all the latest publications on your topic before finalizing and submitting your own paper—if the drafting process took some time, new literature might have come out in the meantime, and you don’t want to give the editor the impression that you are not on top of the newest developments.
  • If you have too many references, readers may wonder if you did any original research at all. Unless you’re writing a literature review, your paper’s primary focus should be on your investigation and findings. Don’t bury your hard work under strings of citations and discussion regarding other works. Show your readers what you’ve discovered and how the new information you present fits into or departs from the academic community’s current understanding of your topic.

Additionally, let us highlight the difference between the number of references versus citations. References are the source materials; therefore, each reference should be listed only once in your references section. Citations are meant to identify the source of the information you use in your paper. You can cite a reference multiple times. Therefore, the number of citations you have is typically larger than the number of references an average paper includes. The opposite situation should never happen!

Key Factors Influencing the Number of References You Use

The following are some of the many factors that may influence the number of references you use:

  • The number of references required for a paper will depend largely on your work’s purpose . For example, literature and systematic reviews are surveys of existing studies. Therefore, their reference lists will be more exhaustive than those of research papers whose primary focus is the current authors’ findings. Indeed, if you examine many journals’ author guidelines , you’ll note that journals have a higher maximum reference limit for review articles than original research papers.
  • The length of your reference list will also depend on your research paper’s subject matter . For example, if you are writing about a field that is less studied (such as a subfield of neuroparasitology) you may discover that there aren’t many papers to cite. Similarly, newer fields will have fewer published papers that can be referenced. If you find yourself in this situation, review the references used by relevant current literature and see if you can expand your research, and thus your reference list, with valuable content from there.
  • Another factor will be your  institution or journal’s requirements . If you are preparing a dissertation or thesis, double-check your department’s requirements. While rare, they may have specific limits. More commonly, journals restrict the number of references due to printing constraints.
  • It may happen that you don’t have  access to certain literature  that could have served as a reference. In such a situation, you may wish to look for an institution that may be able to provide you access to that literature for the purposes of reviewing the content or contact one of the authors directly and ask for a copy.
  • Given that more papers are being published than ever before in most fields, it is likely that reference lists will grow longer simply because there are more data and discussions of existing data available to cite . Keep track of changes to the size of reference lists in publications related to your field.
  • Finally, a paper’s length bears some correlation to the number of references.

So how many references should be included?

Below, we provide tips on how to decide if you have enough resources. We also provide some general reminders on how to effectively use references. After all, references are meant to enhance your paper while still maintaining your research as the focal point.

Use academic journals as a guide

  • One way to gauge how many references you should have is to survey academic journals for your article type in your field. Review their author guidelines for limits on the number of references for your article type, and make sure your reference list complies with those journal restrictions.
  • Read recent articles relevant to your topic; check how many references other authors have included in their papers for the same article type as yours, and how frequently those works were cited per page.
  • Keep in mind that the above methods will give you an estimate of how many references you should include overall but will not tell you how many citations you’ll need per page. The latter is impossible to state simply because certain sections may have no citations at all (the results section , for example).

Statistics regarding the number of references and citations

To give you a general idea, the following are some estimates from a couple of studies that examined the citation characteristics of articles published in various disciplines.

According to  Milojević’s study  encompassing research in the fields of astronomy, mathematics, robotics, ecology, and economics, the highest and average number of references per article page were as follows:

  • Ecology: highest, ~58; average reference per page, 6;
  • Math and robotics: highest, ~28; average reference per page, <1; and
  • Economics: highest, ~ 32; average reference per page, >1 but <2.

The above findings were based on data compiled from the first 20 years of the author’s research. Since then some fields have increased the number of references. Thus, make sure to examine your target journal’s most recent and relevant publications for a better idea of how many references to include based on the specific type of article you plan to write.

In another study by Falagas et al. (2013),  medical journals averaged  29 references for articles that were 7.88 pages long (as printed in journals).

Finally, although the sample size was small (63 journals), Gali Halevi observed the following citation trends of a broader range of disciplines.

  • The average number of references per article was the highest for the social sciences, physics, and astronomy, and arts & humanities (roughly 54 references per article).
  • On the other hand, health professions and earth and planetary sciences had the fewest references per article at an average of 8 and 17 references, respectively.
  • Math and engineering averaged at roughly 29 references per article.
  • Biochemistry, genetics and molecular and other biological sciences averaged at 51.
  • Hard and natural sciences more frequently cited recent literature while social sciences and math were likely to include older sources.

Note that the Halevi study is limited in size, fails to factor in article type and does little to account for variances across different fields and journals. For example, it is possible that more review articles could have been reviewed for certain fields than others. With that said, we provide the above information to provide a rough estimate.

At the end of the day, please keep in mind the requirements of your institution or target journal and the general trends for your specific article type (by examining the most recent relevant publications).

For additional information regarding journal restrictions on the number of references, read this article on ways to grow your publication list .

Some Dos and Don’ts for Reference Citation

  • Don’t repeat references within a reference list.
  • Don’t repeatedly cite yourself. Make sure to balance your discussion with external literature citations.
  • Be careful about citing old references. The rule of thumb is to go back at most five to six years. Exceptions to this rule should be reserved for “seminal” works relevant to explaining what prompted your research. Roughly 85% of all cited works should be less than five years old.
  • Be careful not to cite several references in one place  without  discussing the relevance of each work to your research. In other words, don’t say, “We referred to previous studies in this field (1-7)” unless you later explain how each of reference #s 1-7 apply to your discussion.
  • Confirm the quality of the work you cite. Are there any ethical issues regarding the paper that would disqualify it as a good source? Do your references come from reputable sources such as respected journals rather than random blogs and website links? Remember that your analysis is only as good as the verifiable information you use to conduct your research.
  • One of the main purposes of citing existing literature is to show the “knowledge gap” regarding your topic. Therefore, make sure the works you reference naturally lead readers to wonder about the research question you address in your paper. To explain further, think about your favorite fictional story. A successfully written story only reveals the background information needed for the reader to follow along in the story. You’ll rarely see an author waste time writing about how the main character stubbed his toe one day while going to work unless that event relates to an important aspect of the story. Similarly, the references you cite should support the story building you create in your research paper.
  • Don’t completely ignore the paper that could disprove your hypothesis. You want to show objectivity and that you took a balanced and unbiased approach to conducting your research. Mention the potentially conflicting evidence and explain why you believe it is flawed or inapplicable to your research.
  • In qualitative research papers, you may have fewer references.
  • Anything you cite in your paper should be listed in the references section (or reference list). Anything listed as a reference should have been quoted or paraphrased in the text. If either rule is violated, something is wrong.
  • Finally, remember that a paper will typically have more citations in the Introduction section and Discussion section than in other parts.

Wordvice Academic Resources

If you need help with paraphrasing text from the sources you cite to avoid plagiarism, with different citation styles , or with finding the perfect journal to submit your paper to, then have a look at our collection of articles on the Wordvice academic resources website . We also recommend our professional English editing and proofreading services , including paper editing and academic editing services to ensure that your writing is free of errors before submitting your manuscript to a journal.

Becker Medical Library logotype

  • Library Hours
  • (314) 362-7080
  • [email protected]
  • Library Website
  • Electronic Books & Journals
  • Database Directory
  • Catalog Home
  • Library Home

Quantifying the Impact of My Publications: Citations

  • Levels of Impact
  • Article Metrics
  • Book Metrics
  • What is the h index?
  • Establishing Your Author Name and Presence
  • Enhancing Your Impact
  • Tracking Your Work
  • Telling Your Story

What are Citations?

A citation is a reference to a specific work cited in another work. In most instances, the work is a scholarly work such as a peer-reviewed journal article.

Databases and Resources for Citations

  • CINAHL CINAHL offers a Cited References feature that provides a list of citations that cite a particular publication in the CINAHL database.
  • Google Scholar Google Scholar allows for searching of a particular work or author and retrieves results from a variety of sources such as peer-reviewed papers, theses, books, meeting abstracts, web sites, gray literature, PPTs and other formats. It also allows users to search within cited works. Select the "Cited by" link and selecting the "Search within articles citing...XYZ work" checkbox to discover the impact of a work. A feature of Google Scholar is Google Scholar Citations which allows users to create a personal publication profile which allows for generation of citation reports. See the Establishing Your Author Name tab. 
  • PsycINFO PsycINFO offers a Cited Reference tool that allows for location of cited works indexed in PsycINFO. Use the Cited References feature to find a work, then select Find Citing Articles.
  • Science Direct ScienceDirect is a database that contains over 25% of the world’s science, technology and medicine full text and bibliographic information (journals and books) published by Elsevier and other publishers. ScienceDirect offers a search feature that displays cited references (journal articles or books) for a specific author or publication.
  • SCOPUS SCOPUS provides evidence of collaboration with other scientists and organizations, publication productivity, cited-by works, the h-index and also includes tools for creating citation graphs and Wordle maps. Scopus also offers a Citation Tracker feature that allows for a listing of articles, web sources and patents that cite a particular publication since 1996. The Citation Tracker feature also allows for generation of a Citation Overview chart and for removal of self-citations from the overall citation counts.
  • Web of Science Web of Science provides evidence of collaboration with other scientists and institutions, publication productivity, cited-by works, the h-index and also includes tools for creating citation maps. A Cited Reference Search will provide the number of times a particular publication has been cited in the Web of Science database since its publication. According to Clarivate Analytics, a Cited Reference Search will “. . . discover how a known idea or innovation has been confirmed, applied, improved, extended, or corrected.”
  • PubMed Central PMC offers a feature that allows for location of citing works in PMC. From the record page for a work, use the Cited By Other Articles in PMC feature to find other citing works in PMC.
  • Publish or Perish Publish or Perish is a software program that retrieves and analyzes academic citations. It uses Google Scholar to obtain the raw citations, then analyzes these and presents a variety of statistics.

What is Citation Analysis?

 Citation analysis is the examination of citing works to an individual work (or a group of works). It allows for contextual understanding as to the nature, purpose and motivation of the citing author/s and is a traditional tool for measuring impact using publication data. Some characteristics of citing works that can be used for narrative purposes include:

  • affiliations represented by the authors of the citing works (academia, industry, government, etc.)
  • countries represented by the authors of the citing works
  • languages of the citing works
  • subject focus or discipline of the citing works
  • document types of citing works

The inherent assumption is that significant publications will demonstrate a high citation count. In some instances, citation analysis may not provide the entire story of the impact of a research project. Issues are:

  • Do numbers tell a story?
  • Is citation analysis meaningful for non-academia audiences such as the public, policy-makers, clinical trial participants, healthcare providers, or grant funding agencies?
  • Are citations indicative or predictive of translational or clinical applications?
  • Do citations provide a full narrative of meaningful health outcomes?

How Do My Citations Rank?

There are two resources for determining how the number of citations to a work compares to other cited works.

Elsevier Scopus provides traditional citation counts, the Field Weighted Citation Impact (FWCI) and benchmarking.

  • Number of times a work has been cited.
  • The FWCI is the ratio of the article's citations to the average number of citations received by all similar articles over a three-year window. Each discipline makes an equal contribution to the metric, which eliminates differences in researcher citation behavior. A FWCI greater than 1.00 means the article is more cited than expected according to the average.
  • Citation benchmarking shows how citations received by this article compare with the average for similar articles. 99th percentile is high, and indicates an article in the top 1% globally. It takes into account the year of publication, document type and disciplines associated with its source.

Clarivate Analytics Essential Science Indicator s (ESI) is another resource. ESI is a compilation of science performance statistics and science trends data using data from Clarivate Analytics Web of Science . ESI provides Field Baseline tables based on specific research fields for citation analysis: Citation Rates and Percentiles . Each table contains citation data for the past ten years.

As an example, the Citation Rates Table displays data on the average citation rates of papers within the scientific fields over each of the past 10 years. The calculation is number of citations / number of papers , where papers is defined as regular scientific articles, review articles, proceedings papers, and research notes. The Percentiles Table displays levels of citation activity. The larger the minimum number of citations, the smaller the peer group.

  • << Previous: Article Metrics
  • Next: Book Metrics >>
  • Last Updated: Jun 24, 2024 7:38 AM
  • URL: https://beckerguides.wustl.edu/impactofpublications
 Research in International Management  

How many references is enough?

Some reflections on why more references isn't always better, but how strategic referencing might help

Anne-Wil Harzing - Sat 30 May 2020 10:40 (updated Sat 15 Apr 2023 07:23)

how many citations is good for a research paper

Like most senior academics I am reviewing a lot of papers, both for journals where I am on the editorial board and for my junior colleagues in the context of my staff development work at Middlesex . Many of these papers have sentences followed (or interrupted in the middle) by a long and sometimes very long string of references. Most of these papers had well over 100 references in total, which might even lead the paper to be inaccurately classified as a review article by the Web of Science . So I started wondering: How many references is enough?

More isn't always better

My "rule of thumb" has always been to use a maximum of three references to support a particular statement. The role of a literature review is to provide a targeted review of the literature.  In my view, there are several reasons why it is wise not to use too many references:

  • It really disturbs the flow of the paper.
  • It may provide an implicit signal that your work isn't very interesting or important, as so much has already been published in the field. This means you need to work harder to convince the reviewers you are making a unique contribution.
  • It shows that you may not be able to distinguish what the key works in a particular field are and thus are not really that well versed in the literature.
  • It does make you look like a bit of a student as having lots of references is quite typical of PhD theses. So it might lead the reviewer to think you are an immature academic writer, which is something you would want to avoid.
  • The more references you use (beyond the ones that are really needed to substantiate your argument), the higher the risk that one of the reviewer knows the work you cite better than you do and disagrees with you that this work supports the statement(s) you are making. This might lead the reviewers to be more critical of the rest of the paper as well.

The other side of the coin

On the other hand, there are also good reasons to not be stingy with references.

  • You need to demonstrate your thorough knowledge of the field by citing enough of the key references. If the field has been very active, this might mean referencing quite a lot of studies.
  • You need to cite the "key people" in a particular field for both intrinsic (this is what is expected in a literature review) and extrinsic reasons (they might be your reviewers and might be annoyed if you have missed their work). However, unless they are intrinsically important, don't "slip in" some references to people you think might be your reviewers by adding them to a block of other references. I have lost count of the number of times a reference to my work really had nothing to do with the content (see also Are referencing errors undermining our scholarship and credibility? ).
  • You need to show you are part of the journal "conversation" (see also Why does my paper get a desk-reject time and again? ). Again do this only when the references are intrinsically important. Editors are not stupid, they can  spot easily that you have just added references to their journal last minute after having been rejected at your preferred journal.

how many citations is good for a research paper

General recommendations

As always, looking at your target journal will give you a better feel of the preferences in your sub-discipline (for details of this principle see the slides here: Middlesex University Summer 2019 writing boot-camp ). When doing a final read of the paper, you might also want to reconsider deleting references that you are only citing once and only in connection with a lot of others. These are clearly not essential to your paper. For further guidelines, refer to these - slightly edited - responses to the question: How many references do I need at https://academia.stackexchange.com/ (a very useful site). For general guidance on when to use references see one of my first white papers: Writing coursework assignments

Answer 1: The short and somewhat unsatisfactory answer is: enough

There is no formal limit but obviously too many becomes impractical. If you can reference a huge number of references for a single statement (sentence) it is normal to pick one or possibly a few by using a format indicating these references are just examples, "e.g., Smith et al., 1943; Turner and Anthony, 1963)", "[some statement] by, for example, Smith et al. (1943) and Turner and Anthony (1963)." Exactly when it is reasonable to show examples and when one actually have to show all references is a matter of context. If you, for example, have a series of references that together build up some matter and where none is more important than the other and none summarize the other, it could be necessary to list them all regardless of how many there are. I suggest you try to look at a number of different papers of a similar type (literature review) to the one you are writing to see how others handle such instances. [...] The main point of this is to know when it is sufficient to list only (good) examples rather than all possible references. This is of course a matter of training and learning to assess when which format is appropriate. It is therefore necessary to assess when papers simply duplicate each other (from whatever view point you reference) or when they each contribute something unique that merits their reference.

Answer 2: Don't over do it

A literature review in an article is meant as a general reference, so the reader can get "up to speed" in the state of the art of the topic under discussion. In your thesis, you have to show that you are able to search the literature, you understand it, and are able to extract the important information. If you put every single article, you are not fulfilling any. On the one hand, the reader will not know what are the most relevant articles for your work. On the other front, anyone can get all the articles published in a subfield in the last couple of years and write a sentence, based on the abstract and the figures, in just a few days. In short, show that you have comprehended the literature by finding the most informative subset of articles.

Want to know more about referencing?

I can highly recommend this article by Matthieu Mansard in European Management Review, who distinguishes four key motives of referencing: epistemic, rhetoric, symbolic, and economic.

  • Mandard, M (2021). On the shoulders of giants? Motives to cite in management research. European Management Review , DOI: abs/10.1111/emre.12495

Related blogposts

  • The four P's of getting published
  • The four C's of getting cited
  • How to write for US journals with non-US data
  • Useful resources when preparing for journal submission
  • Why does my paper get a desk-reject time and again?
  • Strange journal invitations popping up in my inbox every day
  • Are referencing errors undermining our scholarship and credibility?
  • Last impressions count too! The importance of conclusions

Find the resources on my website useful?

I cover all the expenses of operating my website privately. If you enjoyed this post and want to support me in maintaining my website, consider buying a copy of one of my books (see below) or  supporting the Publish or Perish software .

Aug 2022:
 
Nov 2022:
 
Feb 2023:
 
May 2023:
 

Copyright © 2023 Anne-Wil Harzing . All rights reserved. Page last modified on Sat 15 Apr 2023 07:23

how many citations is good for a research paper

Anne-Wil Harzing's profile and contact details >>

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Research Paper Summary: How much should be citations versus original thought?

I'm finalizing a single-paragraph summary of a multi-page research paper for school and I was concerned when I noticed that most of my summary was in my own words. I mean that most of what I wrote was not citing the original paper's author but rather their paper's idea summarized and re-formed by me. Of course , I included lots of citations and references with proper citations and references but that only accounted for maybe 10%-20% of the summary.

Should a summary of a research paper include mostly citations and paraphrasing from the original author or rather be summarized in the words of the one writing the summary with citations here and there?

  • academic-writing

Cyn's user avatar

  • Welcome to Writing.SE Cam J, glad you found us. We have a tour and help center you might wish to check out. –  Cyn Commented Sep 19, 2019 at 3:53

3 Answers 3

When I was a graduate student teaching undergraduates how to write research papers, the real problem was over-quoting. Students would quote or paraphrase large amounts of other people's work and not do much original writing. It was more stringing the quotes into something more or less coherent.

A (good) research paper is analysis. You've read other people's work and use it to inform your discussion. Depending on the field and the wishes of your teacher, you include a small or large number of quotes and ideas (properly cited), but you're still explaining it in your own words.

A summary of a research paper is a condensation of this idea. As the one grading papers, I would always prefer a well-informed discussion in the author's own words, peppered with other people's quotes and ideas as needed. While none of the papers I wrote or graded had summaries that were separate, they all had introductions and conclusions. These are the parts most in the author's words. The body of the paper is the place to set out the arguments, which require citations.

Show your teacher that you understand the the research you did by pulling it together into an original paper.

My rule of thumb has always been, don't quote anything unless you are going to comment on the quotation. If all you are doing is citing an information source to support an assertion, make the assertion in your own words and cite the source in a footnote.

But a summary is not about proving or supporting anything. It is a statement of what the argument of the paper is. It is the job of the rest of the paper to support the conclusions. It is the job of the summary to briefly sketch the main argument and conclusion so that a reader can quickly tell if the paper is relevant to their own work. I can't see why you would quote or footnote anything in a summary.

In writing anything, you always have to keep in mind what the reader's purpose is in reading it. The reader's purpose in reading a research paper is twofold: to discover its conclusions, and to test its data and method to determine whether to accept its conclusions. For this second part, they need access to the research material via quotations or footnotes.

The reader's purpose in reading a summary of a research paper is simply and solely to determine if the paper is relevant to their work and therefore worth their time. They are not, at this stage questioning it conclusions, data, or method, only it relevance. So they don't need access to the supporting evidence, and so they don't need quotations or footnotes.

All that said, different institutions have different standards, not all of which make actual sense, but all of which you have to follow if you write for that institution. So look up what the standards and practices are for the institution you are writing for and follow them.

Poetry and Precision

When I taught ENGL 100 and Technical Communication, I advised students to quote only for Poetry or Precision -- if there's no other way to say the sentence without losing details (precision) or they're saying something in an especially apt way (poetry) - then quote. Also, quote as little as possible - aim for an integrated quote -- only quoting the specific phrase or sentence needed.

I rather like the UNC Writing Center's handouts on working with sources: https://writingcenter.unc.edu/tips-and-tools/ Some strong ones are Quoting (includes verbs of attribution) and the How We Cite/Why We Cite videos.

The graphic here on Bloom's Taxonomy can give also an idea of why a lot of summary/paraphrase is a good idea: It shows that 're not just finding and regurgitating information, but you're truly comprehending it and ready to analyze/evaluate/synthesize it into a product of your own mind.

enter image description here

Your Answer

Reminder: Answers generated by artificial intelligence tools are not allowed on Writing Stack Exchange. Learn more

Sign up or log in

Post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged academic-writing structure citations research or ask your own question .

  • Featured on Meta
  • Upcoming sign-up experiments related to tags

Hot Network Questions

  • Benefit of splitting and routing/encrypting file packets separately?
  • Are there really half-a billion visible supernovae exploding all the time?
  • Idiom for a situation where a problem has two simultaneous but unrelated causes?
  • A class for students who want to get better at a subject, aside from their public education
  • What is the purpose of the M1 pin on a Z80
  • Do I need to staple cable for new wire run through a preexisting wall?
  • Would a PhD from Europe, Canada, Australia, or New Zealand be accepted in the US?
  • Is there some sort of kitchen utensil/device like a cylinder with a strainer?
  • Can a contract require you to accept new T&C?
  • Android dot folder (hidden folder) is not showing in windows
  • What does "the dogs of prescriptivism" mean?
  • Mixed models - Intuition of correlated discrete random effects
  • LD_DEBUG_OUTPUT does not work
  • Looking for a caveman-discovers-fire short story that is a pun on "nuclear" power
  • How to avoid pgfornament being clipped by its bounding box?
  • Short story about soldiers who are fighting against an enemy which turns out to be themselves
  • Are there substantive differences between the different approaches to "size issues" in category theory?
  • Did James Madison say or write that the 10 Commandments are critical to the US nation?
  • Historically are there any documented attempts at finding algorithms that are asymptotically faster than the FFT for the Discrete Fourier Transform?
  • Freewheeling diode in a capacitor
  • Is there an explicit construction of a Lebesgue-measurable set which is not a Borel set?
  • Issues with my D&D group
  • What gets to be called a "proper class?"
  • Can a unique position be deduced if pieces are replaced by checkers (can see piece color but not type)

how many citations is good for a research paper

Finding How Many Times an Article Has Been Cited, and by Which Articles (aka Cited Reference Searching)

Several UMGC Library databases and other resources can be used to find how many times a particular article has been cited and to also find a list of the documents that have cited the article. These resources include:

Scroll down the page or use the links above to jump down the page to learn how to find citation information in these resources.

Scopus's most complete citation information is for documents published during or after 1970.

To use Scopus to find how many documents have cited a given article, enter information about the article into the database, as shown in the screen shot below. It's generally sufficient to enter just the first few words of an article's title -- which should be put between quotation marks so that the words will be searched for as a phrase -- into the search box and to then select the Article Title option from the field selection menu next to the search box. You may also want to click on the + icon next to the field selection menu to generate a new line with a new search box into which you can enter the first author's last name and then select the First Author option. (In this case, the article of interest is Robert M. Grant's 1996 article from Strategic Management Journal entitled "Toward a knowledge-based theory of the firm.")

how many citations is good for a research paper

The number in the "Cited by" column shows the number of times that the article has been cited by other documents that are indexed by Scopus. In this case, when the screen shot below was taken, it showed that Grant's article had been cited by 8,494 documents indexed by Scopus :

how many citations is good for a research paper

The number of citing documents is a hyperlink, so you can click on the 8494 to see a list of all 8,494 documents that have cited Grant's article.

Emerald Insight

To use Emerald Insight to find how many documents have cited a given article, go to the Advanced Search page and enter information about the article into the database, as shown in the screen shot below. It's generally sufficient to enter just the first few words of an article's title -- which should be put between quotation marks so that the words will be searched for as a phrase -- into one search box, selecting Title from the field selection menu next to the search box. You may also want to click on the "Add Row" button to generate a new line with a new search box into which you can enter an author's last name -- which should also be put between quotation marks if the last name consists of two or more words -- and select Contributor from the field selection menu next to the search box. (In this case, the article of interest is Jeroen P. J. de Jong and Deanne N. Den Hartog's 2007 article from European Journal of Innovation Management entitled "How leaders influence employees' innovative behaviour.")

how many citations is good for a research paper

From the search results page, click on the ALTMETRICS link for the article.

how many citations is good for a research paper

You will then be taken to a page that will show how much attention the article has received on sources such as Twitter, Facebook, etc. Click on the Dimensions citations tab to find how many publications cite the article.

how many citations is good for a research paper

The View all 426 publications that cite this research output on Dimensions sentence is a hyperlink, so you can click on the sentence to see a list of all 426 documents that have cited de Jong and Den Hartog's article.

Google Scholar

To use Google Scholar to find how many documents have cited a given article, enter information about the article into the search engine's search box, as shown in the screen shot below. It's generally sufficient to enter just the first few words of an article's title -- which should be put between quotation marks so that the words will be searched for as a phrase -- along with an author's last name. Google uses an implied AND, so there's no need to put AND between an article's title and an author's last name. (In this case, the article of interest is Sara L. Rynes, Barry Gerhart, and Kathleen A. Minette's 2004 article from Human Resource Management entitled "The importance of pay in employee motivation: Discrepancies between what people say and what they do .")

how many citations is good for a research paper

The number of citations for the article will be shown below the article's information on the search results page. In this case when the screen shot was taken, Google Scholar indicated that the article had been cited by 451 documents that are indexed by Google Scholar:

how many citations is good for a research paper

The number of citing documents is a hyperlink, so you can click on the Cited by 451 to see a list of all 451 documents that have cited Rynes, Gerhart, and Minette's article.

ProQuest databases (e.g., ABI/INFORM Collection)

To use ProQuest databases to find how many documents have cited a given article, enter information about the article into a ProQuest database (e.g., ABI/INFORM Collection ), as shown in the screen shot below. It's generally sufficient to enter just the first few words of an article's title -- which should be put between quotation marks so that the words will be searched for as a phrase -- into one search box, selecting the Document title - TI option from the field selection menu next to the search box. You may also want to enter an author's last name in another search box and select the Author - AU option from the field selection menu next to the search box. (In this case, the article of interest is Fariza H. Rusly, James L. Corner, and Peter Sun's 2012 article from Journal of Knowledge Management entitled "Positioning change readiness in knowledge management research.")

how many citations is good for a research paper

The number of citations for the article will be shown below the article's information on the search results page. In this case, when the screen shot was taken, ABI/INFORM Collection indicated that the article had been cited by 15 documents that are indexed by the database :

how many citations is good for a research paper

The number of citing documents is a hyperlink, so you can click on the Cited by (15) link to see a list of all 15 documents that have cited Rusly, Corner, and Sun's article.

The References entry indicates how many articles Rusly, Corner, and Sun's article cites (i.e., how many entries are on their own article's reference list).

EBSCO databases (e.g., Business Source Complete, PsycINFO, etc.)

Certain EBSCO databases, including Business Source Complete and PsycINFO , provide citation information for articles. To use those databases to find how many documents have cited a given article, enter information about the article into the database, as shown in the screen shot below. It's generally sufficient to enter just the first few words of an article's title -- which should be put between quotation marks so that the words will be searched for as a phrase -- into one search box, selecting the TI Title option. You may also want to enter an author's last name in another search box and select the AU Author option. (In this case, the article of interest is Dan Wheatley's 2012 article from New Technology, Work & Employment entitled "Good to be home? Time-use and satisfaction levels among home-based teleworkers.")

how many citations is good for a research paper

The number of citations for the article will be shown on the search results page. In this case, when the screen shot was taken, Business Source Complete indicated that the article had been cited by 1 document indexed by the database :

how many citations is good for a research paper

The number of citing documents is a hyperlink, so you can click on the Times Cited in this Database (1) link to see the 1 document that has cited Wheatley's article.

T he Cited References entry indicates how many articles Wheatley's article cites (i.e., how many entries are on his own article's reference list).

If you have any questions about this information, please use the UMGC Library's Ask a Librarian service to receive assistance.

Frequently asked questions

How many quotes should i use.

In scientific subjects, the information itself is more important than how it was expressed, so quoting should generally be kept to a minimum. In the arts and humanities, however, well-chosen quotes are often essential to a good paper.

In social sciences, it varies. If your research is mainly quantitative , you won’t include many quotes, but if it’s more qualitative , you may need to quote from the data you collected .

As a general guideline, quotes should take up no more than 5–10% of your paper. If in doubt, check with your instructor or supervisor how much quoting is appropriate in your field.

Frequently asked questions: Citing sources

A scientific citation style is a system of source citation that is used in scientific disciplines. Some commonly used scientific citation styles are:

  • Chicago author-date , CSE , and Harvard , used across various sciences
  • ACS , used in chemistry
  • AMA , NLM , and Vancouver , used in medicine and related disciplines
  • AAA , APA , and ASA , commonly used in the social sciences

There are many different citation styles used across different academic disciplines, but they fall into three basic approaches to citation:

  • Parenthetical citations : Including identifying details of the source in parentheses —usually the author’s last name and the publication date, plus a page number if available ( author-date ). The publication date is occasionally omitted ( author-page ).
  • Numerical citations: Including a number in brackets or superscript, corresponding to an entry in your numbered reference list.
  • Note citations: Including a full citation in a footnote or endnote , which is indicated in the text with a superscript number or symbol.

A source annotation in an annotated bibliography fulfills a similar purpose to an abstract : they’re both intended to summarize the approach and key points of a source.

However, an annotation may also evaluate the source , discussing the validity and effectiveness of its arguments. Even if your annotation is purely descriptive , you may have a different perspective on the source from the author and highlight different key points.

You should never just copy text from the abstract for your annotation, as doing so constitutes plagiarism .

Most academics agree that you shouldn’t cite Wikipedia as a source in your academic writing , and universities often have rules against doing so.

This is partly because of concerns about its reliability, and partly because it’s a tertiary source. Tertiary sources are things like encyclopedias and databases that collect information from other sources rather than presenting their own evidence or analysis. Usually, only primary and secondary sources are cited in academic papers.

A Wikipedia citation usually includes the title of the article, “Wikipedia” and/or “Wikimedia Foundation,” the date the article was last updated, and the URL.

In APA Style , you’ll give the URL of the current revision of the article so that you’re sure the reader accesses the same version as you.

There’s some disagreement about whether Wikipedia can be considered a reliable source . Because it can be edited by anyone, many people argue that it’s easy for misleading information to be added to an article without the reader knowing.

Others argue that because Wikipedia articles cite their sources , and because they are worked on by so many editors, misinformation is generally removed quickly.

However, most universities state that you shouldn’t cite Wikipedia in your writing.

Hanging indents are used in reference lists in various citation styles to allow the reader to easily distinguish between entries.

You should apply a hanging indent to your reference entries in APA , MLA , and Chicago style.

A hanging indent is used to indent all lines of a paragraph except the first.

When you create a hanging indent, the first line of the paragraph starts at the border. Each subsequent line is indented 0.5 inches (1.27 cm).

APA and MLA style both use parenthetical in-text citations to cite sources and include a full list of references at the end, but they differ in other ways:

  • APA in-text citations include the author name, date, and page number (Taylor, 2018, p. 23), while MLA in-text citations include only the author name and page number (Taylor 23).
  • The APA reference list is titled “References,” while MLA’s version is called “ Works Cited .”
  • The reference entries differ in terms of formatting and order of information.
  • APA requires a title page , while MLA requires a header instead.

A parenthetical citation in Chicago author-date style includes the author’s last name, the publication date, and, if applicable, the relevant page number or page range in parentheses . Include a comma after the year, but not after the author’s name.

For example: (Swan 2003, 6)

To automatically generate accurate Chicago references, you can use Scribbr’s free Chicago reference generator .

APA Style distinguishes between parenthetical and narrative citations.

In parenthetical citations , you include all relevant source information in parentheses at the end of the sentence or clause: “Parts of the human body reflect the principles of tensegrity (Levin, 2002).”

In narrative citations , you include the author’s name in the text itself, followed by the publication date in parentheses: “Levin (2002) argues that parts of the human body reflect the principles of tensegrity.”

In a parenthetical citation in MLA style , include the author’s last name and the relevant page number or range in parentheses .

For example: (Eliot 21)

A parenthetical citation gives credit in parentheses to a source that you’re quoting or paraphrasing . It provides relevant information such as the author’s name, the publication date, and the page number(s) cited.

How you use parenthetical citations will depend on your chosen citation style . It will also depend on the type of source you are citing and the number of authors.

APA does not permit the use of ibid. This is because APA in-text citations are parenthetical and there’s no need to shorten them further.

Ibid. may be used in Chicago footnotes or endnotes .

Write “Ibid.” alone when you are citing the same page number and source as the previous citation.

When you are citing the same source, but a different page number, use ibid. followed by a comma and the relevant page number(s). For example:

  • Ibid., 40–42.

Only use ibid . if you are directing the reader to a previous full citation of a source .

Ibid. only refers to the previous citation. Therefore, you should only use ibid. directly after a citation that you want to repeat.

Ibid. is an abbreviation of the Latin “ibidem,” meaning “in the same place.” Ibid. is used in citations to direct the reader to the previous source.

Signal phrases can be used in various ways and can be placed at the beginning, middle, or end of a sentence.

To use signal phrases effectively, include:

  • The name of the scholar(s) or study you’re referencing
  • An attributive tag such as “according to” or “argues that”
  • The quote or idea you want to include

Different citation styles require you to use specific verb tenses when using signal phrases.

  • APA Style requires you to use the past or present perfect tense when using signal phrases.
  • MLA and Chicago requires you to use the present tense when using signal phrases.

Signal phrases allow you to give credit for an idea or quote to its author or originator. This helps you to:

  • Establish the credentials of your sources
  • Display your depth of reading and understanding of the field
  • Position your own work in relation to other scholars
  • Avoid plagiarism

A signal phrase is a group of words that ascribes a quote or idea to an outside source.

Signal phrases distinguish the cited idea or argument from your own writing and introduce important information including the source of the material that you are quoting , paraphrasing , or summarizing . For example:

“ Cognitive psychologist Steven Pinker (1994) insists that humans possess an innate faculty for comprehending grammar.”

If you’re quoting from a text that paraphrases or summarizes other sources and cites them in parentheses , APA and Chicago both recommend retaining the citations as part of the quote. However, MLA recommends omitting citations within a quote:

  • APA: Smith states that “the literature on this topic (Jones, 2015; Sill, 2019; Paulson, 2020) shows no clear consensus” (Smith, 2019, p. 4).
  • MLA: Smith states that “the literature on this topic shows no clear consensus” (Smith, 2019, p. 4).

Footnote or endnote numbers that appear within quoted text should be omitted in all styles.

If you want to cite an indirect source (one you’ve only seen quoted in another source), either locate the original source or use the phrase “as cited in” in your citation.

To present information from other sources in academic writing , it’s best to paraphrase in most cases. This shows that you’ve understood the ideas you’re discussing and incorporates them into your text smoothly.

It’s appropriate to quote when:

  • Changing the phrasing would distort the meaning of the original text
  • You want to discuss the author’s language choices (e.g., in literary analysis )
  • You’re presenting a precise definition
  • You’re looking in depth at a specific claim

To paraphrase effectively, don’t just take the original sentence and swap out some of the words for synonyms. Instead, try:

  • Reformulating the sentence (e.g., change active to passive , or start from a different point)
  • Combining information from multiple sentences into one
  • Leaving out information from the original that isn’t relevant to your point
  • Using synonyms where they don’t distort the meaning

The main point is to ensure you don’t just copy the structure of the original text, but instead reformulate the idea in your own words.

“ Et al. ” is an abbreviation of the Latin term “et alia,” which means “and others.” It’s used in source citations to save space when there are too many authors to name them all.

Guidelines for using “et al.” differ depending on the citation style you’re following:

To insert endnotes in Microsoft Word, follow the steps below:

  • Click on the spot in the text where you want the endnote to show up.
  • In the “References” tab at the top, select “Insert Endnote.”
  • Type whatever text you want into the endnote.

If you need to change the type of notes used in a Word document from footnotes to endnotes , or the other way around, follow these steps:

  • Open the “References” tab, and click the arrow in the bottom-right corner of the “Footnotes” section.
  • In the pop-up window, click on “Convert…”
  • Choose the option you need, and click “OK.”

To insert a footnote automatically in a Word document:

  • Click on the point in the text where the footnote should appear
  • Select the “References” tab at the top and then click on “Insert Footnote”
  • Type the text you want into the footnote that appears at the bottom of the page

Footnotes are notes indicated in your text with numbers and placed at the bottom of the page. They’re used to provide:

  • Citations (e.g., in Chicago notes and bibliography )
  • Additional information that would disrupt the flow of the main text

Be sparing in your use of footnotes (other than citation footnotes), and consider whether the information you’re adding is relevant for the reader.

Footnotes appear at the bottom of the page they refer to. This is convenient for the reader but may cause your text to look cluttered if there are a lot of footnotes.

Endnotes appear all together at the end of the whole text. This may be less convenient for the reader but reduces clutter.

Both footnotes and endnotes are used in the same way: to cite sources or add extra information. You should usually choose one or the other to use in your text, not both.

An in-text citation is an acknowledgement you include in your text whenever you quote or paraphrase a source. It usually gives the author’s last name, the year of publication, and the page number of the relevant text. In-text citations allow the reader to look up the full source information in your reference list and see your sources for themselves.

If you are reusing content or data you used in a previous assignment, make sure to cite yourself. You can cite yourself just as you would cite any other source: simply follow the directions for that source type in the citation style you are using.

Keep in mind that reusing your previous work can be considered self-plagiarism , so make sure you ask your professor or consult your university’s handbook before doing so.

A credible source should pass the CRAAP test  and follow these guidelines:

  • The information should be up to date and current.
  • The author and publication should be a trusted authority on the subject you are researching.
  • The sources the author cited should be easy to find, clear, and unbiased.
  • For a web source, the URL and layout should signify that it is trustworthy.

Peer review is a process of evaluating submissions to an academic journal. Utilizing rigorous criteria, a panel of reviewers in the same subject area decide whether to accept each submission for publication. For this reason, academic journals are often considered among the most credible sources you can use in a research project– provided that the journal itself is trustworthy and well-regarded.

Academic dishonesty can be intentional or unintentional, ranging from something as simple as claiming to have read something you didn’t to copying your neighbor’s answers on an exam.

You can commit academic dishonesty with the best of intentions, such as helping a friend cheat on a paper. Severe academic dishonesty can include buying a pre-written essay or the answers to a multiple-choice test, or falsifying a medical emergency to avoid taking a final exam.

Academic dishonesty refers to deceitful or misleading behavior in an academic setting. Academic dishonesty can occur intentionally or unintentionally, and varies in severity.

It can encompass paying for a pre-written essay, cheating on an exam, or committing plagiarism . It can also include helping others cheat, copying a friend’s homework answers, or even pretending to be sick to miss an exam.

Academic dishonesty doesn’t just occur in a classroom setting, but also in research and other academic-adjacent fields.

To apply a hanging indent to your reference list or Works Cited list in Word or Google Docs, follow the steps below.

Microsoft Word:

  • Highlight the whole list and right click to open the Paragraph options.
  • Under Indentation > Special , choose Hanging from the dropdown menu.
  • Set the indent to 0.5 inches or 1.27cm.

Google Docs:

  • Highlight the whole list and click on Format >  Align and indent >  Indentation options .
  • Under  Special indent , choose Hanging from the dropdown menu.

When the hanging indent is applied, for each reference, every line except the first is indented. This helps the reader see where one entry ends and the next begins.

For a published interview (whether in video , audio, or print form ), you should always include a citation , just as you would for any other source.

For an interview you conducted yourself , formally or informally, you often don’t need a citation and can just refer to it in the text or in a footnote , since the reader won’t be able to look them up anyway. MLA , however, still recommends including citations for your own interviews.

The main elements included in a newspaper interview citation across APA , MLA , and Chicago style are the names of the interviewer and interviewee, the interview title, the publication date, the name of the newspaper, and a URL (for online sources).

The information is presented differently in different citation styles. One key difference is that APA advises listing the interviewer in the author position, while MLA and Chicago advise listing the interviewee first.

The elements included in a newspaper article citation across APA , MLA , and Chicago style are the author name, the article title, the publication date, the newspaper name, and the URL if the article was accessed online .

In APA and MLA, the page numbers of the article appear in place of the URL if the article was accessed in print. No page numbers are used in Chicago newspaper citations.

Untitled sources (e.g. some images ) are usually cited using a short descriptive text in place of the title. In APA Style , this description appears in brackets: [Chair of stained oak]. In MLA and Chicago styles, no brackets are used: Chair of stained oak.

For social media posts, which are usually untitled, quote the initial words of the post in place of the title: the first 160 characters in Chicago , or the first 20 words in APA . E.g. Biden, J. [@JoeBiden]. “The American Rescue Plan means a $7,000 check for a single mom of four. It means more support to safely.”

MLA recommends quoting the full post for something short like a tweet, and just describing the post if it’s longer.

The main elements included in image citations across APA , MLA , and Chicago style are the name of the image’s creator, the image title, the year (or more precise date) of publication, and details of the container in which the image was found (e.g. a museum, book , website ).

In APA and Chicago style, it’s standard to also include a description of the image’s format (e.g. “Photograph” or “Oil on canvas”). This sort of information may be included in MLA too, but is not mandatory.

The main elements included in a lecture citation across APA , MLA , and Chicago style are the name of the speaker, the lecture title, the date it took place, the course or event it was part of, and the institution it took place at.

For transcripts or recordings of lectures/speeches, other details like the URL, the name of the book or website , and the length of the recording may be included instead of information about the event and institution.

The main elements included in a YouTube video citation across APA , MLA , and Chicago style are the name of the author/uploader, the title of the video, the publication date, and the URL.

The format in which this information appears is different for each style.

All styles also recommend using timestamps as a locator in the in-text citation or Chicago footnote .

Each annotation in an annotated bibliography is usually between 50 and 200 words long. Longer annotations may be divided into paragraphs .

The content of the annotation varies according to your assignment. An annotation can be descriptive, meaning it just describes the source objectively; evaluative, meaning it assesses its usefulness; or reflective, meaning it explains how the source will be used in your own research .

Any credible sources on your topic can be included in an annotated bibliography . The exact sources you cover will vary depending on the assignment, but you should usually focus on collecting journal articles and scholarly books . When in doubt, utilize the CRAAP test !

An annotated bibliography is an assignment where you collect sources on a specific topic and write an annotation for each source. An annotation is a short text that describes and sometimes evaluates the source.

The elements included in journal article citations across APA , MLA , and Chicago style are the name(s) of the author(s), the title of the article, the year of publication, the name of the journal, the volume and issue numbers, the page range of the article, and, when accessed online, the DOI or URL.

In MLA and Chicago style, you also include the specific month or season of publication alongside the year, when this information is available.

In APA , MLA , and Chicago style citations for sources that don’t list a specific author (e.g. many websites ), you can usually list the organization responsible for the source as the author.

If the organization is the same as the website or publisher, you shouldn’t repeat it twice in your reference:

  • In APA and Chicago, omit the website or publisher name later in the reference.
  • In MLA, omit the author element at the start of the reference, and cite the source title instead.

If there’s no appropriate organization to list as author, you will usually have to begin the citation and reference entry with the title of the source instead.

The main elements included in website citations across APA , MLA , and Chicago style are the author, the date of publication, the page title, the website name, and the URL. The information is presented differently in each style.

When you want to cite a specific passage in a source without page numbers (e.g. an e-book or website ), all the main citation styles recommend using an alternate locator in your in-text citation . You might use a heading or chapter number, e.g. (Smith, 2016, ch. 1)

In APA Style , you can count the paragraph numbers in a text to identify a location by paragraph number. MLA and Chicago recommend that you only use paragraph numbers if they’re explicitly marked in the text.

For audiovisual sources (e.g. videos ), all styles recommend using a timestamp to show a specific point in the video when relevant.

The abbreviation “ et al. ” (Latin for “and others”) is used to shorten citations of sources with multiple authors.

“Et al.” is used in APA in-text citations of sources with 3+ authors, e.g. (Smith et al., 2019). It is not used in APA reference entries .

Use “et al.” for 3+ authors in MLA in-text citations and Works Cited entries.

Use “et al.” for 4+ authors in a Chicago in-text citation , and for 10+ authors in a Chicago bibliography entry.

Check if your university or course guidelines specify which citation style to use. If the choice is left up to you, consider which style is most commonly used in your field.

  • APA Style is the most popular citation style, widely used in the social and behavioral sciences.
  • MLA style is the second most popular, used mainly in the humanities.
  • Chicago notes and bibliography style is also popular in the humanities, especially history.
  • Chicago author-date style tends to be used in the sciences.

Other more specialized styles exist for certain fields, such as Bluebook and OSCOLA for law.

The most important thing is to choose one style and use it consistently throughout your text.

The main elements included in all book citations across APA , MLA , and Chicago style are the author, the title, the year of publication, and the name of the publisher. A page number is also included in in-text citations to highlight the specific passage cited.

In Chicago style and in the 6th edition of APA Style , the location of the publisher is also included, e.g. London: Penguin.

A block quote is a long quote formatted as a separate “block” of text. Instead of using quotation marks , you place the quote on a new line, and indent the entire quote to mark it apart from your own words.

The rules for when to apply block quote formatting depend on the citation style:

  • APA block quotes are 40 words or longer.
  • MLA block quotes are more than 4 lines of prose or 3 lines of poetry.
  • Chicago block quotes are longer than 100 words.

In academic writing , there are three main situations where quoting is the best choice:

  • To analyze the author’s language (e.g., in a literary analysis essay )
  • To give evidence from primary sources
  • To accurately present a precise definition or argument

Don’t overuse quotes; your own voice should be dominant. If you just want to provide information from a source, it’s usually better to paraphrase or summarize .

Every time you quote a source , you must include a correctly formatted in-text citation . This looks slightly different depending on the citation style .

For example, a direct quote in APA is cited like this: “This is a quote” (Streefkerk, 2020, p. 5).

Every in-text citation should also correspond to a full reference at the end of your paper.

A quote is an exact copy of someone else’s words, usually enclosed in quotation marks and credited to the original author or speaker.

The DOI is usually clearly visible when you open a journal article on an academic database. It is often listed near the publication date, and includes “doi.org” or “DOI:”. If the database has a “cite this article” button, this should also produce a citation with the DOI included.

If you can’t find the DOI, you can search on Crossref using information like the author, the article title, and the journal name.

A DOI is a unique identifier for a digital document. DOIs are important in academic citation because they are more permanent than URLs, ensuring that your reader can reliably locate the source.

Journal articles and ebooks can often be found on multiple different websites and databases. The URL of the page where an article is hosted can be changed or removed over time, but a DOI is linked to the specific document and never changes.

When a book’s chapters are written by different authors, you should cite the specific chapter you are referring to.

When all the chapters are written by the same author (or group of authors), you should usually cite the entire book, but some styles include exceptions to this.

  • In APA Style , single-author books should always be cited as a whole, even if you only quote or paraphrase from one chapter.
  • In MLA Style , if a single-author book is a collection of stand-alone works (e.g. short stories ), you should cite the individual work.
  • In Chicago Style , you may choose to cite a single chapter of a single-author book if you feel it is more appropriate than citing the whole book.

Articles in newspapers and magazines can be primary or secondary depending on the focus of your research.

In historical studies, old articles are used as primary sources that give direct evidence about the time period. In social and communication studies, articles are used as primary sources to analyze language and social relations (for example, by conducting content analysis or discourse analysis ).

If you are not analyzing the article itself, but only using it for background information or facts about your topic, then the article is a secondary source.

A fictional movie is usually a primary source. A documentary can be either primary or secondary depending on the context.

If you are directly analyzing some aspect of the movie itself – for example, the cinematography, narrative techniques, or social context – the movie is a primary source.

If you use the movie for background information or analysis about your topic – for example, to learn about a historical event or a scientific discovery – the movie is a secondary source.

Whether it’s primary or secondary, always properly cite the movie in the citation style you are using. Learn how to create an MLA movie citation or an APA movie citation .

To determine if a source is primary or secondary, ask yourself:

  • Was the source created by someone directly involved in the events you’re studying (primary), or by another researcher (secondary)?
  • Does the source provide original information (primary), or does it summarize information from other sources (secondary)?
  • Are you directly analyzing the source itself (primary), or only using it for background information (secondary)?

Some types of source are nearly always primary: works of art and literature, raw statistical data, official documents and records, and personal communications (e.g. letters, interviews ). If you use one of these in your research, it is probably a primary source.

Primary sources are often considered the most credible in terms of providing evidence for your argument, as they give you direct evidence of what you are researching. However, it’s up to you to ensure the information they provide is reliable and accurate.

Always make sure to properly cite your sources to avoid plagiarism .

Common examples of secondary sources include academic books, journal articles , reviews, essays , and textbooks.

Anything that summarizes, evaluates or interprets primary sources can be a secondary source. If a source gives you an overview of background information or presents another researcher’s ideas on your topic, it is probably a secondary source.

Common examples of primary sources include interview transcripts , photographs, novels, paintings, films, historical documents, and official statistics.

Anything you directly analyze or use as first-hand evidence can be a primary source, including qualitative or quantitative data that you collected yourself.

The Scribbr Citation Generator is developed using the open-source Citation Style Language (CSL) project and Frank Bennett’s citeproc-js . It’s the same technology used by dozens of other popular citation tools, including Mendeley and Zotero.

You can find all the citation styles and locales used in the Scribbr Citation Generator in our publicly accessible repository on Github .

Ask our team

Want to contact us directly? No problem.  We  are always here for you.

Support team - Nina

Our team helps students graduate by offering:

  • A world-class citation generator
  • Plagiarism Checker software powered by Turnitin
  • Innovative Citation Checker software
  • Professional proofreading services
  • Over 300 helpful articles about academic writing, citing sources, plagiarism, and more

Scribbr specializes in editing study-related documents . We proofread:

  • PhD dissertations
  • Research proposals
  • Personal statements
  • Admission essays
  • Motivation letters
  • Reflection papers
  • Journal articles
  • Capstone projects

Scribbr’s Plagiarism Checker is powered by elements of Turnitin’s Similarity Checker , namely the plagiarism detection software and the Internet Archive and Premium Scholarly Publications content databases .

The add-on AI detector is powered by Scribbr’s proprietary software.

Predicting citation impact of academic papers across research areas using multiple models and early citations

  • Open access
  • Published: 25 June 2024

Cite this article

You have full access to this open access article

how many citations is good for a research paper

  • Fang Zhang 1 , 2 &
  • Shengli Wu   ORCID: orcid.org/0000-0003-2008-1736 1 , 3  

As the volume of scientific literature expands rapidly, accurately gauging and predicting the citation impact of academic papers has become increasingly imperative. Citation counts serve as a widely adopted metric for this purpose. While numerous researchers have explored techniques for projecting papers’ citation counts, a prevalent constraint lies in the utilization of a singular model across all papers within a dataset. This universal approach, suitable for small, homogeneous collections, proves less effective for large, heterogeneous collections spanning various research domains, thereby curtailing the practical utility of these methodologies. In this study, we propose a pioneering methodology that deploys multiple models tailored to distinct research domains and integrates early citation data. Our approach encompasses instance-based learning techniques to categorize papers into different research domains and distinct prediction models trained on early citation counts for papers within each domain. We assessed our methodology using two extensive datasets sourced from DBLP and arXiv. Our experimental findings affirm that the proposed classification methodology is both precise and efficient in classifying papers into research domains. Furthermore, the proposed prediction methodology, harnessing multiple domain-specific models and early citations, surpasses four state-of-the-art baseline methods in most instances, substantially enhancing the accuracy of citation impact predictions for diverse collections of academic papers.

Avoid common mistakes on your manuscript.

Introduction

The rapid advancement of science and technology has led to a staggering increase in the number of academic publications produced globally each year (Zhu & Ban, 2018 ). In this ever-growing landscape, effectively evaluating the impact of research papers has become a critical issue (Castillo et al., 2007 ; Chakraborty et al., 2014 ; Li et al., 2019 ; Yan et al., 2011 ). Citation count, which measures the frequency with which a paper is referenced by other works, is widely recognized as the most prevalent metric for assessing the influence of academic papers, authors, and institutions (Bu et al., 2021 ; Cao et al., 2016 ; Lu et al., 2017 ; Redner, 1998 ; Stegehuis et al., 2015 ; Wang et al., 2021 ). Building upon the foundation of citation counts, numerous additional measures have been proposed to quantify research impact from various perspectives (Braun et al., 2006 ; Egghe, 2006 ; Garfield, 1972 , 2006 ; Hirsch, 2005 ; Persht, 2009 ; Yan & Ding, 2010 ).

Predicting the impact of scientific papers has garnered significant research attention due to its profound implications (Abramo et al., 2019 ; Abrishami & Aliakbary, 2019 ; Bai et al., 2019 ; Cao et al., 2016 ; Chen & Zhang, 2015 ; Li et al., 2019 ; Liu et al., 2020 ; Ma et al., 2021 ; Ruan et al., 2020 ; Su, 2020 ; Wang et al., 2013 , 2021 , 2023 ; Wen et al., 2020 ; Xu et al., 2019 ; Yan et al., 2011 ; Yu et al., 2014 ; Zhao & Feng, 2022 ; Zhu & Ban, 2018 ). See “ Citation count prediction ” section for more detailed discussion about them. Accurately forecasting the future citation impact of academic papers, particularly those recently published, offers invaluable benefits to various stakeholders within the research ecosystem. Precisely predicting the impact of papers, especially those published for a short time, would be helpful for researchers to find potentially high-impact papers and interesting research topics at an earlier stage. It is also helpful for institutions, government agencies, and funding bodies to evaluate published papers, researchers, and project proposals, among others.

For large and diverse collections encompassing papers from various research areas, a one-size-fits-all approach to citation impact prediction may be inadequate. Even within a broad field like Computing, sub-fields such as Theoretical Computing, Artificial Intelligence, Systems, and Applications can exhibit distinct citation patterns. Previous study has demonstrated that citation dynamics can vary significantly across research areas, journals, researchers in different age groups, among other factors (Kelly, 2015 ; Levitt & Thelwall, 2008 ; Mendoza, 2021 ; Milz & Seifert, 2018 ). To illustrate this point, let us consider an example from the DBLP dataset used in our study. Figure  1 a depicts the average citation distributions of papers in three research areas: Cryptography, Computer Networks, and Software Engineering. We can observe striking differences in their citation patterns:

Software Engineering papers consistently attract relatively few citations over time, without a pronounced peak in their citation curve.

Artificial Intelligence papers garner the highest citation counts among the three areas. Their citation curve rises rapidly, peaking around year 4, followed by a gradual decline until year 7, after which the decrease becomes more precipitous.

Cryptography papers exhibit a steadily increasing citation trend over the first 10 years, reaching a peak around year 11, followed by a slow decline in citations thereafter.

figure 1

Citation patterns in different research areas or different classes of the same research area

These divergent citation patterns across research areas highlight the limitations of employing a single, universal model for citation impact prediction. In light of these observations, a more effective strategy would be to segment papers into distinct groups based on their research areas and develop tailored prediction models for each group. By accounting for the unique citation characteristics of different domains, such a group-specific modelling approach has the potential to significantly enhance the accuracy and reliability of citation impact predictions, particularly for large and heterogeneous collections of academic papers.

Citation patterns are not solely determined by research areas but also influenced by the quality and intrinsic characteristics of individual papers. Even within the same research area, the citation dynamics of papers can vary considerably (Garfield, 2006 ; Wang et al., 2021 ; Yan & Ding, 2010 ). High-impact papers may exhibit significantly different citation trajectories compared to average or low-impact works. Accounting for these differences by employing multiple models tailored to papers with varying citation potential could further improve prediction performance. Figure  1 b illustrates this phenomenon using an example from the Embedded & Real-Time Systems research area. All papers in this domain can be categorized into four classes based on their cumulative citation counts ( cc ) over 15 years: cc  < 10, 10 ≤  cc  < 50, 50 ≤  cc  < 100, and cc  ≥ 100. The general pattern observed for all the curves is that they initially increase for a few years and then decrease afterwards. However, the peak point varies depending on the total number of citations. Papers with higher citation counts take more years to reach their peak point. This finding suggests that class-based prediction can be a viable approach for our prediction task, as it account for the varying peak times based on the citation count classes.

If all of the papers are not classified, then it is necessary to have a classification system that encompasses multiple categories and an automated method for allocating each paper into one or more suitable categories. For a large collection of papers to be classified, both the effectiveness and efficiency of the allocating method are crucial factors to consider.

Taking into account all the observations mentioned earlier, we propose MM, a prediction method based on Multiple Models tailored for different research areas and citation counts, to predict the future citation counts of a paper. This work makes the following contributions:

A new instance-based learning method is introduced to classify papers into a given number of research areas. Both paper contents (titles and abstracts) and citations are considered separately. An ensemble-based method is then employed to make the final decision. Experiments with the DBLP dataset demonstrate that the proposed method can achieve excellent classification performance.

A prediction method for paper citation counts is proposed. For any paper to be predicted, a suitable prediction model is chosen based on its research area and early citation history. This customized approach enables each document to use a fitting model.

Experiments with two datasets show that the proposed prediction method outperforms four baseline methods in this study, demonstrating its superiority.

The remainder of this article is structured as follows: “ Related work ” section reviews related work on citation count prediction and classification of academic papers. “ Methodology ” section describes the proposed method in detail. “ Experimental settings and results ” section presents the experimental settings, procedures, and results, along with an analysis of the findings. Finally, “ Conclusion ” section concludes the paper.

Related work

In this work, the primary task is citation count prediction of papers, while classification of scientific papers serves as an additional task that may be required for the prediction task. Accordingly, we review some related work on citation count prediction and classification of academic papers separately in the following sections.

  • Citation count prediction

In the literature, there are numerous papers on predicting the citation counts of scientific papers. These methods can be categorized into three groups based on the information used for prediction.

The first group relies solely on the paper’s citation history as input. Wang et al. ( 2013 ) developed a model called WSB to predict the total number of citations a paper will receive, assuming its earlier citation data is known. Cao et al. ( 2016 ) proposed a data analytic approach to predict the long-term citation count of a paper using its short-term (three years after publication) citation data. Given a large collection of papers C with long citation histories, for a paper p with a short citation history, they matched it with a group of papers in C with similar early citation data and then used those papers in C to predict p’s later citation counts. Abrishami and Aliakbary ( 2019 ) proposed a long-term citation prediction method called NNCP based on Recurrent Neural Network (RNN) and the sequence-to-sequence model. Their dataset comprised papers published in five authoritative journals: Nature, Science, NEJM (The New England Journal of Medicine), Cell, and PNAS (Proceedings of the National Academy of Sciences). Wang et al. ( 2021 ) introduced a nonlinear predictive combination model, NCFCM, that utilized multilayer perceptron (MLP) to combine WSB and an improved version of AVR for predicting citation counts.

The second group uses not only the citation data but also some other extracted features from the paper or the wider academic network for the prediction task. Yu et al. ( 2014 ) adopted a stepwise multiple regression model using four groups of 24 features, including paper, author, publication, and citation-related features. Bornmann et al. ( 2014 ) took the percentile approach of Hazen ( 1914 ), considering the journal’s impact and other variables such as the number of authors, cited references, and pages. Castillo et al. ( 2007 ) used information about past papers written by the same author(s). Chen and Zhang ( 2015 ) applied Gradient Boosting Regression Trees (GBRT) with six paper content features and 10 author features. Bai et al. ( 2019 ) made long-term predictions using the Gradient Boosting Decision Tree (GBDT) model with five features, including the citation count within 5 years after publication, authors’ impact factor, h-index, Q value, and the journal's impact factor. Akella et al. ( 2021 ) exploited 21 features derived from social media shares, mentions, and reads of scientific papers to predict future citations with various machine learning models, such as Random Forest, Decision Tree, Gradient Boosting, and others. Xu et al. ( 2019 ) extracted 22 features from heterogeneous academic networks and employed a Convolutional Neural Network (CNN) to capture the complex nonlinear relationship between early network features and the final cumulative citation count. Ruan et al. ( 2020 ) employed a four-layer BP neural network to predict the 5th year citation counts of papers, using a total of 30 features, including paper, author, publication, reference, and early citation-related features. By extracting high-level semantic features from metadata text, Ma et al. ( 2021 ) adopted a neural network to consider both semantic information and the early citation counts to predict long-term citation counts. Wang et al. ( 2023 ) applied neural network technology to a heterogeneous network including author and paper information. Huang et al. ( 2022 ) argued that citations should not be treated equally, as the citing text and the section in which the citation occurs significantly impact its importance. Thus, they applied deep learning models to perform fine-grained citation prediction—not just citation count for the whole paper but citation count occurring in each section.

The third group uses other types of information beyond those mentioned above. To investigate the impact of peer-reviewing data on prediction performance, Li et al. ( 2019 ) adopted a neural network prediction model, incorporating an abstract-review match method and a cross-review match mechanism to learn deep features from peer-reviewing texts. Combining these learned features with breadth features (topic distribution, topic diversity, publication year, number of authors, and average author h-index), they employed a multilayer perceptron (MLP) to predict citation counts. Li et al. ( 2022 ) also utilized peer-reviewing text for prediction, using an aspect-aware capsule network. Zhao and Feng ( 2022 ) proposed an end-to-end deep learning framework called DeepCCP, which takes an early citation network as input and predicts the citation count using both GRU and CNN, instead of extracting features.

Citation counts of a paper can be affected by many factors such as research areas, paper types, age, sex, and other aspects of the authors (Andersen & Nielsen, 2018 ; Mendoza, 2021 ; Thelwall, 2020 ). Levitt and Thelwall ( 2008 ) compared patterns of annual citations of highly cited papers across six research areas. To our knowledge, Abramo et al. ( 2019 ) is the only work that uses multiple regression models for prediction, with one model for each subject category. Abramo et al. ( 2019 ) is the most relevant to our work in this article. However, there are two major differences. First, we propose a paper classification method in this paper, while no paper classification is required in Abramo et al. ( 2019 ). Second, we apply multiple models for papers in each category, whereas only one model is used for each category in Abramo et al. ( 2019 ).

Classification of scientific papers

Classification of scientific papers becomes a critical issue when organizing and managing an increasing number of publications through computerized solutions. In previous research, typically, meta-data such as title, abstract, keywords, and citations of papers were used for this task, while full text was not considered due to its unavailability in most situations.

Various machine learning methods, such as K-Nearest Neighbors (Lukasik et al., 2013 ; Waltman & Van Eck, 2012 ), K-means (Kim & Gil, 2019 ), and Naïve Bayes (Eykens et al., 2021 ), have been applied. Recently, deep neural network models, such as Convolutional Neural Networks (Daradkeh et al., 2022 ; Rivest et al., 2021 ), Recurrent Neural Networks (Hoppe et al., 2021 ; Semberecki & Maciejewski, 2017 ), and pre-trained language models (Hande et al., 2021 ; Kandimalla et al., 2020 ), have also been utilized.

One key issue is the classification system to be used. There are many different classification systems. Both Thomson Reuters’ Web of Science database (WoS) and Elsevier’s Scopus database have their own general classification systems, covering many subjects/research areas. Some systems focus on one particular subject, such as the medical subject headings (MeSH), the physics and astronomy classification scheme (PACS), the Chemical Abstracts Sections, the journal of economic literature (JEL), and the ACM Computing Classification System.

Based on the WoS classification system, Kandimalla et al. ( 2020 ) applied a deep attentive neural network (DANN) to a collection of papers from the WoS database for the classification task. It was assumed that each paper belonged to only one category, and only abstracts were used.

Zhang et al. ( 2022 ) compared three classification systems: Thomson Reuters’ Web of Science, Fields of Research provided by Dimensions, and the Subjects Classification provided by Springer Nature. Among these, the second one was generated by machine learning methods automatically, while the other two were generated manually by human experts. It is found there are significant inconsistency between machine and human-generated systems.

Rather than using an existing classification system, some researchers build their own classification system using the collection to be classified or other resources such as Wikipedia.

Shen et al. ( 2018 ) organized scientific publications into a hierarchical concept structure of up to six levels. The first two levels (similar to areas and sub-areas) were manually selected, while the others were automatically generated. Wikipedia pages were used to represent the concepts. Each publication or concept was represented as an embedding vector, thus the similarity between a publication and a concept could be calculated by the cosine similarity of their vector representations. It is a core component for the construction of the Microsoft Academic Graph.

In the same vein as Shen et al. ( 2018 ), Toney-Wails and Dunham ( 2022 ) also used Wikipedia pages to represent concepts and build the classification system. Both publications and concepts were represented as embedding vectors. Their database contains more than 184 million documents in English and more than 44 million documents in Chinese.

Mendoza et al. ( 2022 ) presented a benchmark corpus and a classification system as well, which could be used for the academic paper classification task. The classification system used is the 36 subjects defined in the UK Research Excellent Framework. Footnote 1 According to Cressey and Gibney ( 2014 ), this practice is the largest overall assessment of university research outputs ever undertaken globally. The 191,000 submissions to REF 2014 comprise a very good data set because every paper was manually categorized by experts when submitted.

Liu et al. ( 2022 ) described the NLPCC 2022 Task 5 Track 1, a multi-label classification task for scientific literature, where one paper may belong to multiple categories simultaneously. The data set, crawled from the American Chemistry Society’s publication website, comprises 95,000 papers’ meta-data including titles and abstracts. A hierarchical classification system, with a maximum of three levels, was also defined.

As we can see, the classification problem of academic papers is quite complicated. Many classification systems and classification methods are available. However, classification systems and classification methods are related to each other. The major goal of this work is to perform citation count prediction of published papers, in which classification of papers is a basic requirement. For example, considering the DBLP dataset which includes over four million papers, special consideration is required to perform the classification task effectively and efficiently. We used the classification system from CSRankings, Footnote 2 which included a set of four categories (research areas) and 26 sub-categories in total. A group of top venues were identified for each sub-category. However, many more venues in DBLP are not assigned to any category. We used all those recommended venue papers in the CSRankings system as representative papers of a given research area. An instance-based learning approach was used to measure the semantic similarity of the target paper and all the papers in a particular area. A decision could be made based on the similarity scores that the target paper obtained for all research areas. Besides, citation data between the target paper and all the papers in those recommended venues is also considered. Quite different from those proposed classification methods before, this instance-based learning approach suits our purpose well. See “ Methodology ” section for more details.

Methodology

This research aims to predict the number of citations of academic papers in the next couple of years based on their metadata including title, abstract and citation data since publication. The main idea of our approach is: for a paper, depends on its research area and early citation count, we use a specific model to make the prediction. There are two key issues. Academic paper classification and citation count prediction methods. Let us detail them one by one in the following subsections.

Computing classification system

To carry out the classification task of academic papers, a suitable classification system is required. There are many classification systems available for natural science, social science, humanities, or specific branches of science or technology. Since one of the datasets used in this study is DBLP, which includes over four million papers on computer science so far, we will focus our discussion on classification systems and methods for computer science.

In computer science, there are quite a few classification systems available. For example, both the Association for Computing Machinery (ACM) and the China Computer Federation (CCF) define their own classification systems. However, both are not very suitable for our purpose. The ACM’s classification system is quite complicated, but it does not provide any representative venues for any of the research areas. The CCF defines 10 categories and recommends dozens of venues in each category. However, some journals and conferences publish papers in more than one category, but they are only recommended in one category. For instance, both the journals IEEE Transactions on Knowledge and Data Engineering and Data and Knowledge Engineering publish papers on Information Systems and Artificial Intelligence, but they are only recommended in the Database/Data Mining/Content Retrieval category.

In this research, we used the classification system from CSRankings. This system divides computer science into four areas: AI, System, Theory, and Interdisciplinary Areas. Then, each area is further divided into several sub-areas, totalling 26 sub-areas. We flatten these 26 sub-areas for classification, while ignoring the four general areas at level one. One benefit of using this system is that it lists several key venues for every sub-area. For example, three venues are identified for Computer Vision: CVPR (IEEE Conference on Computer Vision and Pattern Recognition), ECCV (European Conference on Computer Vision), and ICCV (IEEE International Conference on Computer Vision). This is very useful for the paper classification task, as we will discuss now.

Paper classification

For this research, we need a classification algorithm that can perform the classification task for all the papers in the DBLP dataset effectively and efficiently.

Although many classification methods have been proposed, we could not find a method that suits our case well. Therefore, we developed our own approach. Using the classification system of CSRankings, we assume that all the papers published in those identified venues belong to that given research area, referred to as seed papers. For all the non-seed papers, we need to decide the areas to which they belong. This is done by considering three aspects together: content, references, and citations. Let us look at the first aspect first.

The collection of all the seed papers, denoted as C , was indexed using the Search engine Lucene Footnote 3 with the BM25 model. Both titles and abstracts were used in the indexing process. Each research area \({a}_{k}\) is presented by all its seed papers C ( \({a}_{k}\) ). For a given non-seed paper p , we use its title and abstract as a query to search for similar papers in C . Then each seed paper s will obtain a score (similarity between p and s )

in which b 1 and b 2 are two parameters (set to 0.75 and 1.2, respectively, as default setting values of Lucene in the experiments), T s is the set of all the terms in s , \(AL(C)\) is the average length of all the documents in C , \(f\left({t}_{j},s\right)\) is the term frequency of \({t}_{j}\) in s , \(idf\left({t}_{j}\right)\) is the inverse document frequency of \({t}_{j}\) in collection C with all the seed papers. \(idf\left({t}_{j}\right)\) is defined as

in which \(\left|C\right|\) is the number of papers in \(C\) , and \(\left|C({t}_{j})\right|\) is the number of papers in C satisfying the condition that \({t}_{j}\) appears in them. For a paper p and a research area \({a}_{k}\) , we can calculate the average similarity score between p and all the seed papers in C ( \({a}_{k}\) ) as

where C ( \({a}_{k}\) ) is the collection of seed papers in area \({a}_{k}\) .

We also consider citations between \(p\) and any of the papers in C . Citations in two different directions are considered separately: \(citingNum\left( p,{a}_{k}\right)\) denotes the number of papers in C ( \({a}_{k}\) ) that p cites, and \(citedNum\left(p,{a}_{k}\right)\) denotes the number of papers in C ( \({a}_{k}\) ) that cites p . Now we want to combine the three features. Normalization is required. For example, \(sim\left({p,a}_{k}\right)\) can be normalized by

in which \(RA\) is the set of 26 research areas. \(citingNum\left(p,{a}_{k}\right)\) and \(citedNum\left(p,{a}_{k}\right)\) can be normalized similarly. Then we let

for any \({a}_{k}\in RArea\) , in which \({\beta }_{1}\) , \({\beta }_{2}\) , and \({\beta }_{3}\) are three parameters. When applying Eq.  5 to \(p\) and all 26 research areas, we may obtain corresponding scores for each area. p can be put to research area \({a}_{k}\) if \(score\left(p,{a}_{k}\right)\) is the biggest among all 26 scores for all research areas. The values of \({\beta }_{1}\) , \({\beta }_{2}\) , and \({\beta }_{3}\) are decided by Euclidean Distance with multiple linear regression with a training data set (Wu et al., 2023 ). Compared with other similar methods such as Stacking with MLS and StackingC, this method can achieve comparable performance but much more efficient than the others. It should be very suitable for large-scale datasets.

In this study, we assume that each paper just belongs to one of the research areas. If required, this method can be modified to support multi-label classification, then a paper may belong to more than one research area at the same time. We may set a reasonable threshold \(\tau\) , and for any testing paper \(p\) and research area \({a}_{k}\) , if \(score\left(p,{a}_{k}\right)>\tau\) , then paper \(p\) belongs to research area \({a}_{k}\) . However, this is beyond the scope of this research, and we leave it for further study.

In summary, the proposed classification algorithm instance-based learning (IBL) is sketched as follows:

figure a

As we observed that papers in the same research area may have different citation patterns, it is better to treat them using multiple prediction models rather than one unified model. Therefore, for all the papers in a research area, we divide them into up to 10 groups according to the number of citations already obtained in the first m years. In a specific research area, for a group of papers considered, we count the number of citations they obtained during a certain period. We use cc ( i ) to represent the number of papers cited i times, where i ranges from 0 to n .

A threshold of 100 is set. We consider the values of cc (0), cc (1),…, cc ( n ) in order. If cc (0) is greater than or equal to the threshold, we create a group with those papers that received zero citations. Otherwise, we combine cc (0) with cc (1), and if the sum is still less than the threshold, we continue adding the next value cc (2), and so on, until the cumulative sum reaches or exceeds the threshold. At this point, we create a group with all the papers contributing to that cumulative sum. We then move on to the next unassigned value of cc ( i ) and repeat the process, creating new groups until all papers are assigned to a group. The last group may contain fewer than 100 papers, but it is still considered a valid group.

A regression model is set for each of these groups for prediction. For the training data set, all the papers are classified by research area with known citation history of up to t years. For all the papers belonging to a group \({g}_{i}\) inside a research area \({a}_{k}\) , we put their information together. Consider

\({c}_{0}\) , \({c}_{1}\) , …, \({c}_{m}\) , and \({\text{c}}_{t}\) are citation counts of all the papers involved up to year 0, 1,…, m , and in year t ( t  ≥  m ). We can train the weights \({\text{w}}_{0}\) , \({\text{w}}_{1}\) ,…, \({\text{w}}_{m}\) , and b for this group by multiple linear regression using \({c}_{0}\) , \({c}_{1}\) , …, \({c}_{m}\) as independent variables and \(c^{\prime}_{t}\) as the target variable. The same applies to all other groups and research areas.

To predict the future citation counts of a paper, we need to decide which research area and group that paper should be in. Then the corresponding model can be chosen for the prediction. Algorithm MM is sketched as follows:

figure b

Note that classification and citation count prediction are two separate tasks. When performing the citation count prediction task, it is required that all the papers involved should have a research area label. Such a requirement can be satisfied in different ways. For example, in the WoS system, it has a list of journals, and each journal is assigned to one or two research areas. All the papers published in those journals are classified by the journals publishing them. In arXiv, an open-access repository of scientific papers, all the papers are assigned a research area label by the authors when uploading them. When performing the citation count prediction task on such datasets, we do not need to do anything else. However, for papers in DBLP, all the papers are not classified. It is necessary to classify them in some way before we can perform citation count prediction for all the papers involved. In this study, we proposed an instance-based learning approach, which provides an efficient and effective solution to this problem.

Experimental settings and results

Two datasets were used for this study. One is a DBLP dataset, and other is an arXiv dataset.

We downloaded a DBLP dataset (Tang et al., 2008 ). Footnote 4 It contains 4,107,340 papers in computer science and 36,624,464 citations from 1961 to 2019. For every paper, the dataset provides its metadata, such as title, abstract, references, authors and their affiliations, publication year, the venue in which the paper was published, and citations since publication. Some subsets of it were used in this study.

For the classification part, we used two subsets of the dataset. The first one ( C 1 ) is all the papers published in those 72 recommended venues in CSRankings between 1965 and 2019. There are 191,727 papers. C 1 is used as seed papers for all 26 research areas. The second subset ( C 2 ) includes 1300 papers, 50 for each research area. Those papers were randomly selected from a group of 54 conferences and journals and judged manually. C 2 is used for the testing of the proposed classification method.

For the prediction part, we also used two subsets. One subset for training and the other for testing. The training dataset ( C 3 ) includes selected papers published between 1990 and 1994, and the testing dataset ( C 4 ) includes selected papers published in 1995. For all those papers between 1990 and 1994 or in 1995, we removed those that did not get any citation and those with incomplete information. After such processing, we obtain 38,247 papers for dataset C 3 , and 9967 papers for dataset C 4 .

We also downloaded an arXiv dataset (Saier and Farber, 2020 ). Footnote 5 It contains 1,043,126 papers in many research areas including Physics, Mathematics, Computer Science, and others, with 15,954,664 citations from 1991 to 2022. For every paper, its metadata such as title, abstract, references, authors and affiliations, publication year, and citations since publication was provided. Importantly, each paper is given a research area label by the authors. Therefore, it is no need to classify papers when we use this dataset for citation count prediction. Two subsets were generated in this study. One subset for training and the other for testing. The training dataset ( C 5 ) includes all the papers published between 2008 and 2013, and the testing dataset ( C 6 ) includes all the papers published in 2014. There are 5876 papers in dataset C 5 and 1471papers in dataset C 6 .

Classification results

In the CSRankings classification system, there are a total of 26 special research areas. A few top venues are recommended for each of them. We assume that all the papers published in those recommended conferences belong to the corresponding research area solely. For example, three conferences CVPR, ECCV, and ICCV are recommended for Computer Vision. We assume that all the papers published in these three conferences belong to the Computer Vision research area but no others.

To evaluate the proposed method, we used a set of 1300 non-seed papers ( C 2 ). It included 50 papers for each research area. All of them were labelled manually. In Eq.  5 , three parameters need to be trained. Therefore, we divided those 1300 papers into two equal partitions of 650, and each included the same number of papers in every research area. Then the two-fold cross-validation was performed. Table 1 shows the average performance.

We can see that the proposed method with all three features, content similarity (Sim), citation to other papers (To evaluate the proposed method, we used a set of 1300 non-seed papers ( C 2 ). It included 50 papers for each research area. All of them were labelled manually. In Eq.  5 , three parameters need to be trained. Therefore, we divided those 1300 papers into two equal partitions of 650, and each included the same number of papers in every research area. Then the two-fold cross-validation was performed. Table 1 shows the average performance.

We can see that the proposed method with all three features, content similarity ( \(sim\) ), citation to other papers ( \(citingNum\) ), and citation by others ( \(citedNum\) ), are useful for the classification task. Roughly citation in both directions ( \(citingNum+citedNum\) ) and content similarity ( \(sim\) ) have the same ability. Considering three features together, we can obtain an accuracy, or an F-measure, of approaching 0.8. We are satisfied with this solution. On the one hand, its classification performance is good compared with other methods in the same category, e.g., (Ambalavanan & Devarakonda, 2020 ; Kandimalla et al., 2020 ). In Kandimalla et al. ( 2020 ), F-scores across 81 subject categories are between 0.5 and 0.8 (See Fig.  1 in that paper). In Ambalavanan and Devarakonda ( 2020 ), the four models ITL, Cascade Learner, Ensemble-Boolean, and Ensemble-FFN obtain an F-score of 0.553, 0.753, 0.628, and 0.477, respectively, on the Marshall dataset they experimented with (see Table  4 in their paper). Although those results may not be comparable since the datasets used are different, it is an indicator that our method is very good. Besides, our method can be implemented very efficiently. When the seed papers are indexed, we can deal with a large collection of papers very quickly with very little resource. The method is very scalable.

Setting for the prediction task

For the proposed method MM, we set 10 as the number of groups in each research area for the DBLP dataset, and 5 for the arXiv dataset. This is mainly because the arXiv dataset is smaller and has fewer papers in each research area.

Apart from MM, five baseline prediction methods were used for comparison:

Mean of early years (MEY). It is a simple prediction function which returns the average of early citations of the paper as its predicted citations in the future (Abrishami & Aliakbary, 2019 ).

AVR. Assume that there is a collection of papers with known citation history as the training data set. For a given paper for prediction, this method finds a group of most similar papers in the training set relating to their early citations (with the minimal sum of the squared citation count difference over the years), and then utilizes the average citations of those similar papers in the subsequent years as the predicted citation counts of the paper (Cao et al., 2016 ).

RNN adopts a Recurrent Neural Network to predict papers’ future citation counts based on their early citation data (Abrishami & Aliakbary, 2019 ).

OLS. Linear regression is used for the prediction model (Abramo et al., 2019 ). There are four variants. Both OLS_res and OLS_log only use early citations as independent variables in their prediction models, while OLS2_res and OLS2_log use early citations and impact factors of journals in their prediction models. OLS_res and OLS2_res apply a linear regularization to their early citations, while OLS_log and OLS2_log apply a logarithmic regularization to their early citations.

NCFCM adopts a neural network to predict papers’ future citation counts based on early citation data and two simple prediction model data (Wang et al., 2021 ).

Evaluation metrics

Two popular metrics are used to evaluate the proposed method and compare it with the baselines: mean square error (MSE) and the coefficient of determination (R 2 ). For a given set of actual values Y \(=\{{y}_{1},{y}_{2},\dots ,{y}_{n}\}\) and set of predicted values \(\widehat{Y\boldsymbol{ }}=\{{\widehat{y}}_{1},{\widehat{y}}_{2},\dots ,{\widehat{y}}_{n}\}\) , MSE and R 2 are defined as follows:

where \({\overline{y} }_{i}\) is the average of all n values in y . MSE measures the variation of the predicted values from the actual values, thus smaller values of MSE are desirable. R 2 measures the corelation between the predicted values and actual values, and its value is between 0 and 1, where R 2  = 0 means no correlation, R 2  = 1 means a perfect positive correlation between the predicted values and the actual values, thus larger values of R 2 are desirable.

Evaluation results

Evaluation has been carried out on two different aspects: overall performance for all the papers and for 100 highly citated papers.

Overall prediction performance

For papers with 0–5 years of citation history, we predict their citation counts in three continuous years in the future. The results are shown in Tables 2 , 3 , 4 , 5 , 6 , 7 . “Zero years of early citation data” means that the prediction was made in the same year as the paper was published. “One year of early citation data” means that the prediction was made in the next year as the paper was published. The number in bold indicates the best performance.

One can see that MM performs the best in most cases. In a few cases, OLS2_res performs the best. This is because OLS2_res considers both the paper’s early citation history and the journal’s impact factor, and the latter is not considered in any other method. In this way, it gives OLS2_res some advantages, especially when the citation history is very short. In a few cases, RNN performs the best on the arXiv dataset. In one case, AVE and NCFCM tied for first place in R 2 . Because linear regression is used in both OLS_res and MM, a comparison between them is able to show that dividing papers into multiple research areas is a very useful strategy for us to obtain better prediction performance. See “ Ablation Study of MM ” section for further experiments and analysis.

Prediction performance of highly cited papers

An important application of citation prediction is the early detection of highly cited papers (Abrishami & Aliakbary, 2019 ). Therefore, we evaluate the performance of the proposed method and its competitors in predicting highly cited papers. Based on the total citation counts in 2000 (DBLP) and in 2019 (arXiv), 100 most cited papers were selected for prediction. For all the papers involved, we compute the MSE values between the predicted citation counts and the actual citation counts of them. The results are shown in Table  8 .

From Table  8 , one can see that MM performs better than all the others, except when k  = 0 (which means zero years of early citation data) OLS2_res performs slightly better than MM in the DBLP dataset. In all other cases, MM outperforms the competing methods.

Ablation study of MM

MM mainly incorporates two factors including research area and early citation counts into consideration. It is desirable to find how these two factors impact prediction performance. Another angle is the number of groups divided in each research area. To find out the impact of these features on prediction performance, we define some variants that implement none or one of the features of MM.

MM-RA (RA). A variant of the MM algorithm that only considers research area but not early citation counts.

MM-CC (CC). A variant of the MM algorithm that only considers early citation counts.

MM-5. A variant of the MM algorithm that divides all the papers in the area into 5 instead of 10 groups.

MEY. It is a simplest variant of MM. It considers neither research area nor early citation counts.

Now let us have a look at how these variants perform compared with the original algorithm. See Tables 9 , 10 , 11 , 12 , 13 , 14 , 15 for the results. It is not surprising that MM performs better than three variants of MM including RA, CC and MEY, while MEY, the variant with none of the two components, performs the worst in predicting the citation counts of papers. Such a phenomenon demonstrates that both components including research area and early citation counts are useful for prediction performance, either used separately or in combination. However, the usefulness of these two components is not the same. The performance of CC is not as good as RA when k  = 0 and k  = 1, but better than RA when k  > 1. Understandably, this indicates that RA is a more useful resource than citation history when the citation history is short, but the latter becomes more and more useful when the citation history becomes longer.

When applying the standard MM algorithm, we divide all the papers in one research area into 10 groups based on the number of citations they obtain in the early years. MM-5 reduces the number of groups from 10 to 5 simply by combining two neighbouring groups to one. MM is better than MM-5 in most cases and on average. The difference between them is small in most cases. However, it is noticeable that MM-5 performs better than MM in two cases, mainly because the size of some of the groups is very small, and the prediction based on such small groups is not very accurate.

Impact of classification on MM

For the DBLP dataset, some papers were classified automatically through the venues in which they were published, while many others were classified through the classification method IBL. It would be interesting to make a comparison of these two groups when performing the prediction task. The results are shown in Table  16 We can see that the group of non-seed papers gets better perdition results than the group of seed papers by a clear margin in all the cases. This demonstrates that the two methods IBL and MM can work together well for achieving good prediction results. On the other hand, such a result is a little surprising. Why can the non-seed group perform better than the seed group? One major reason is for the citation count prediction task, MSE values and citation counts have a strong positive correlation. In this case, there are 2346 seed papers ( C 7 ), whose average citation count is 6.339, while there are 7621 non-seed papers ( C 8 ), whose average citation count is 3.085. These two groups are not directly comparable because of the difference in average citation counts. Note that C 4  =  C 7  +  C 8 (see “ Datasets ” section for C 4 ’s definition).

To make the comparison fair to both parties, we select a subgroup from each of them by adding a restriction: those papers obtain a citation count in the range of [10,20] by the year 2000. We obtain 318 papers ( C 9 ) from the seed paper group and 418 papers ( C 10 ) from the non-seed paper group. Coincidentally, the average citation counts for both sub-groups are the same: 13.443. This time, the two groups are in a perfect condition for a comparison. Table 17 shows the results. Not surprisingly, those MSE value pairs are very close. It demonstrates that for papers either classified by our classification algorithm IBL, or categorized by recommended top venues, the prediction is equally good. It also implies that IBL can perform the classification work properly.

Different from previous studies, this paper applies multiple models to predict the citation counts of papers in the next couple of years, and each model fits a special research area and early citation history of the paper in question. The rationale behind this is: in general, papers in different research areas and with different early citation counts have their own citation patterns. To verify the prediction performance of the proposed method, we have tested it with two datasets taken from DBLP and arXiv. The experimental results show that the proposed MM method outperforms all the baseline methods involved in most cases in two tasks: the overall prediction performance of a large collection of paper and prediction performance of a group of highly cited papers.

As an important component of prediction for research papers, we have also presented a novel instance-based learning model for classification of research papers. By predefining a small group of papers in each category, the proposed method can classify new papers very efficiently with good accuracy.

As our future work, we would incorporating other types of information such as publication venues and author information. Then, prediction performance can be further improved. Secondly, we plan to explore using some deep learning methods for research paper classification. For example, such methods can be used to compare the content similarity of two research papers.

https://ref.ac.uk/2014/

https://CSRankings.org

https://lucene.apache.org

https://www.aminer.cn/

http://doi.org/10.5281/zenodo.3385851

Abramo, G., D’Angelo, C., & Felici, G. (2019). Predicting publication long-term impact through a combination of early citations and journal impact factor. Journal of Informetrics, 13 (1), 32–49.

Article   Google Scholar  

Abrishami, A., & Aliakbary, S. (2019). Predicting citation counts based on deep neural network learning techniques. Journal of Informetrics, 13 (2), 485–499.

Akella, A., Alhoori, H., Kondamudi, P., et al. (2021). Early indicators of scientific impact: Predicting citations with altmetrics. Journal of Informetrics, 15 (2), 101128.

Ambalavanan, A. K., & Devarakonda, M. V. (2020). Using the contextual language model BERT for multi-criteria classification of scientific articles. Journal of Biomedical Informatics, 112 , 103578.

Andersen, J. P., & Nielsen, M. W. (2018). Google Scholar and Web of Science: Examining gender differences in citation coverage across five scientific disciplines. Journal of Informetrics, 12 (3), 950–959.

Bai, X., Zhang, F., & Lee, I. (2019). Predicting the citations of scholarly paper. Journal of Informetrics, 13 (1), 407–418.

Bornmann, L., Leydesdorff, L., & Wang, J. (2014). How to improve the prediction based on citation impact percentiles for years shortly after the publication data? Journal of Informetrics, 8 (1), 175–180.

Braun, T., Glänzel, W., & Schubeert, A. (2006). Hirsch-type index for journals. Scientometrics, 69 (1), 169–173.

Bu, Y., Lu, W., Wu, Y., Chen, H., & Huang, Y. (2021). How wide is the citation impact of scientific publications? A cross-discipline and large-scale analysis. Information Processing & Management, 58 (1), 102429.

Cao, X., Chen, Y., & Liu, K. (2016). A data analytic approach to quantifying scientific impact. Journal of Informetrics, 10 (2), 471–484.

Castillo, C., Donato, D., & Gionis, A. (2007). Estimating number of citations using author reputation. String processing and information retrieval (pp. 107–117). Berlin: Springer.

Chapter   Google Scholar  

Chakraborty, T., Kumar, S., Goyal, P., Ganguly, N., & Mukherjee, A. (2014). Towards a stratified learning approach to predict future citation counts. In  IEEE/ACM joint conference on digital libraries  (pp. 351–360). IEEE.

Chen, J., & Zhang, C. (2015). Predicting citation counts of papers. In  2015 IEEE 14th international conference on cognitive informatics & cognitive computing (ICCI* CC)  (pp. 434–440). IEEE.

Cressey, D., & Gibney, E. (2014). UK releases world’s largest university assessment. Nature . https://doi.org/10.1038/nature.2014.16587

Daradkeh, M., Abualigah, L., Atalla, S., & Mansoor, W. (2022). Scientometric analysis and classification of research using convolutional neural networks: A case study in data science and analytics. Electronics, 11 (13), 2066.

Egghe, L. (2006). Theory and practice of the g-index. Scientometrics, 69 (1), 131–152.

Eykens, J., Guns, R., & Engels, T. (2021). Fine-grained classification of social science journal articles using textual data: A comparison of supervised machine learning approaches. Quantitative Science Studies, 2 (1), 89–110.

Garfield, E. (1972). Citation analysis as a tool in journal evaluation. Science, 178 (4060), 471–479.

Garfield, E. (2006). The history and meaning of the journal impact factor. JAMA, 295 (1), 90–93.

Hande, A., Puranik, K., Priyadharshini, R., & Chakravarthi, B. (2021). Domain identification of scientific articles using transfer learning and ensembles. PAKDD, 2021 , 88–97.

Google Scholar  

Hazen, A. (1914). Storage to be provided in impounding reservoirs for municipal water supply. Transactions of American Society of Civil Engineers, 77 (1914), 1539–1640.

Hirsch, J. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Science of the United States of America, 102 (46), 16569–16572.

Hoppe, F., Dessì, D., & Sack, H. (2021). Deep learning meets knowledge graphs for scholarly data classification. WWW (companion Volume), 2021 , 417–421.

Huang, S., Huang, Y., Bu, Y., et al. (2022). Fine-gained citation count prediction via a transformer-based model with among-attention mechanism. Information Processing & Management, 59 (2), 102799.

Kandimalla, B., Rohatgi, S., Wu, J., & Lee Giles, C. (2020). Large scale subject category classification of scholarly papers with deep attentive neural networks. Frontiers in Research Metrics and Analytics, 5 , 600382.

Kelly, M. (2015). Citation patterns of engineering, statistics, and computer science researchers: An internal and external citation analysis across multiple engineering subfields. College and Research Libraries, 76 (7), 859–882.

Kim, S., & Gil, J. (2019). Research paper classification systems based on TF-IDF and LDA schemes. Human-Centric Computing and Information Sciences, 9 , 30.

Levitt, J. M., & Thelwall, M. (2008). Patterns of annual citation of highly cited articles and the prediction of their citation ranking: A comparison across subjects. Scientometrics, 77 (1), 41–60.

Li, S., Zhao, W. X., Yin, E. J., & Wen, J. R. (2019). A neural citation count prediction model based on peer review text. In  Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP)  (pp. 4914–4924).

Li, S., Li, Y., Zhao, W., et al. (2022). Interpretable aspect-aware capsule network for peer review based citation count prediction. ACM Transaction on Information System, 40 (1), 1–29.

Article   MathSciNet   Google Scholar  

Liu, L., Yu, D., Wang, D., et al. (2020). Citation count prediction based on neural Hawkes model. IEICE Transactions on Information and Systems, 103 (11), 2379–2388.

Liu, M., Zhang, H., Tian, Y., et al. (2022). Overview of NLPCC2022 shared task 5 track 1: Multi-label classification for scientific literature. NLPCC, 2 (2022), 320–327.

Lu, C., Ding, Y., & Zhang, C. (2017). Understanding the impact change of a highly cited article: A content-based citation analysis. Scientometrics, 112 (3), 927–945.

Lukasik, M., Kusmierczyk, T., Bolikowski, L., & Nguyen, H. (2013). Hierarchical, multi-label classification of scholarly publications: Modifications of ML-KNN algorithm. Intelligent Tools for Building a Scientific Information Platform, 2013 , 343–363.

Ma, A., Liu, Y., Xu, X., et al. (2021). A deep learning based citation count prediction model with paper metadata semantic features. Scientometrics, 126 (2), 6803–6823.

Mendoza, Ó. E., Kusa, W., El-Ebshihy, A., Wu, R., Pride, D., Knoth, P., Herrmannova, D., Piroi, F., Pasi, G. & Hanbury, A. (2022). Benchmark for research theme classification of scholarly documents. In  Proceedings of the third workshop on scholarly document processing  (pp. 253–262).

Mendoza, M. (2021). Differences in citation patterns across areas, article types and age groups of researchers. Publications, 9 (4), 47.

Milz, T., & Seifert, C. (2018). Who cites what in computer science? Analysing citation patterns across conference rank and gender. TPDL, 2018 , 321–325.

Persht, A. (2009). The most influential journals: Impact factor and Eigenfactor. Proceedings of the National Academy of Sciences, 106 (17), 6883–6884.

Redner, S. (1998). How popular is your paper? An empirical study of the citation distribution. European Physical Journal B, 4 (2), 131–134.

Rivest, M., Vignola-Gagné, E., & Archambault, É. (2021). Article-level classification of scientific publications: A comparison of deep learning, direct citation and bibliographic coupling. PLoS ONE, 16 (5), e0251493.

Ruan, X., Zhu, Y., Li, J., et al. (2020). Predicting the citation counts of individual papers via a BP neural network. Journal of Informetrics, 4 (3), 101039.

Saier, T., & Färber, M. (2020). UnarXive: A large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata. Scientometrics, 125 , 3085–3108.

Semberecki, P., & Maciejewski, H. (2017). Deep learning methods for subject text classification of articles. FedCSIS, 2017 , 357–360.

Shen, Z., Ma, H., & Wang, K. (2018). A web-scale system for scientific knowledge exploration. ACL, 4 , 87–92.

Stegehuis, C., Litvak, N., & Waltman, L. (2015). Predicting the long-term citation impact of recent publications. Journal of Informetrics, 9 (3), 642–657.

Su, Z. (2020). Prediction of future citation count with machine learning and neural network. In  2020 Asia-Pacific conference on image processing, electronics and computers (IPEC)  (pp. 101–104). IEEE.

Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). Arnetminer: Extraction and mining of academic social networks. In  Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining  (pp. 990–998).

Thelwall, M. (2020). Gender differences in citation impact for 27 fields and six English-speaking countries 1996–2014. Quantitative Science Studies, 1 (2), 599–617.

Toney, A., & Dunham, J. (2022). Multi-label classification of scientific research documents across domains and languages. In  Proceedings of the third workshop on scholarly document processing  (pp. 105–114).

Waltman, L., & van Eck, N. (2012). A new methodology for constructing a publication-level classification system of science. Journal of the American Society for Information Science and Technology, 63 (12), 2378–2392.

Wang, B., Wu, F., & Shi, L. (2023). AGSTA-NET: Adaptive graph spatiotemporal attention network for citation count prediction. Scientometrics, 128 (1), 511–541.

Wang, D., Song, C., & Barabasi, A. (2013). Quantifying long-term scientific impact. Science, 342 (6154), 127–132.

Wang, K., Shi, W., Bai, J., et al. (2021). Prediction and application of article potential citations based on nonlinear citation-forecasting combined model. Scientometrics, 126 (8), 6533–6550.

Wen, J., Wu, L., & Chai, J. (2020). Paper citation count prediction based on recurrent neural network with gated recurrent unit. In  2020 IEEE 10th international conference on electronics information and emergency communication (ICEIEC)  (pp. 303–306). IEEE.

Wu, S., Li, J., & Ding, W. (2023). A geometric framework for multiclass ensemble classifiers. Machine Learning, 112 (12), 4929–4958.

Xu, J., Li, M., Jiang, J., et al. (2019). Early prediction of scientific impact based on multi-bibliographic features and convolutional neural network. IEEE ACCESS, 7 , 92248–92258.

Yan, R., Tang, J., Liu, X., Shan, D., & Li, X. (2011). Citation count prediction: learning to estimate future citations for literature. In  Proceedings of the 20th ACM international conference on Information and knowledge management  (pp. 1247–1252).

Yan, E., & Ding, Y. (2010). Weighted citation: An indicator of an article’s prestige. Journal of the American Society for Information Science and Technology, 61 (8), 1635–1643.

Yu, T., Yu, G., Li, P. Y., & Wang, L. (2014). Citation impact prediction for scientific papers using stepwise regression analysis. Scientometrics, 101 , 1233–1252.

Zhang, L., Sun, B., Shu, F., & Huang, Y. (2022). Comparing paper level classifications across different methods and systems: an investigation of Nature publications. Scientometrics, 127 (12), 7633–7651.

Zhao, Q., & Feng, X. (2022). Utilizing citation network structure to predict paper citation counts: A deep learning approach. Journal of Informetrics, 16 (1), 101235.

Zhu, X. P., & Ban, Z. (2018). Citation count prediction based on academic network features. In  2018 IEEE 32nd international conference on advanced information networking and applications (AINA)  (pp. 534-541). IEEE.

Download references

No funding was received for conducting this study.

Author information

Authors and affiliations.

School of Computing Science, Jiangsu University, Zhenjiang, China

Fang Zhang & Shengli Wu

School of Education, Hubei University of Arts and Science, Xiangyang, China

School of Computing, Ulster University, Belfast, UK

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed to the study conception and design. Data collection, programming and analysis were performed by Fang Zhang. The first draft of the manuscript was written by Shengli Wu, and all authors read and approved the final manuscript.

Corresponding author

Correspondence to Shengli Wu .

Ethics declarations

Competing interests.

The authors have no relevant financial and non-financial interests to disclose.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Zhang, F., Wu, S. Predicting citation impact of academic papers across research areas using multiple models and early citations. Scientometrics (2024). https://doi.org/10.1007/s11192-024-05086-0

Download citation

Received : 30 August 2023

Accepted : 13 June 2024

Published : 25 June 2024

DOI : https://doi.org/10.1007/s11192-024-05086-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Multiple models
  • Classification of academic papers
  • Find a journal
  • Publish with us
  • Track your research

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

What is the average number of references in a PhD thesis in the humanities?

How many references on average are typical for a PhD Thesis?

More specifically, I'm looking for a comparison of the average number of references for an engineering PhD thesis and PhD theses in other fields.

I'm looking for references or data on the subject.

  • reference-request

Community's user avatar

  • 5 I'm speaking from what I've been told as a grad student, not from a position of authority, but....as many it takes to show that you've done an exhaustive review of the literature surrounding the subject and are the (new) expert in your chosen area. –  tonysdg Commented Sep 21, 2015 at 2:45
  • 2 Related: How many references should be included in the bibliography of a Master thesis? –  ff524 Commented Sep 21, 2015 at 3:35
  • 3 As always, PhD Comics has the answer to all your questions: phdcomics.com/comics/archive/phd091615s.gif –  boscovich Commented Sep 21, 2015 at 6:15
  • Somehow related, see this blog post and plot: Average dissertation and thesis length (by field). He shares R code. However, the analysis is being done on a single university repository. You can use a similar methodology to calculate the number of references. –  Piotr Migdal Commented Sep 21, 2015 at 8:41
  • Also: phdcomics.com/comics.php?f=1823 (but it is for papers, not - theses). –  Piotr Migdal Commented Sep 25, 2015 at 15:27

5 Answers 5

As alluded to in the comments, there is no generic "right" answer: the number will depend strongly on the nature of your thesis, the state of the literature related to it, and your personal scholastic style. Nevertheless, in the spirit of engineering approximations, I will supply you with some rough guidelines.

As a lower bound, I would be startled if it was possible to place most theses within the context of pre-existing work in less than about 20 references. In certain contexts such as an obscure corner of pure mathematics perhaps this might be possible, but in most cases work does not take place in a vacuum, and other people will have either used similar techniques or cared about similar problems in the past.

As an upper bound, I would be concerned with a thesis with on the order of 200 or more references that the student hasn't been doing enough work on their own original research and has been investing too much time and energy in building a literature review, rather than a thesis. Again, there will likely be cases where a thesis with an extremely high number of references would be reasonable, but these are likely to be unusual.

Notice the extremely wide range and suspiciously round numbers in my estimates: really there is no "right" answer for how many there should be in general , but perhaps these will help with some initial first impressions in one's thinking.

jakebeal's user avatar

  • 9 "upper bound... 200 or more": In a corpus of PhD theses in Agricultural Botany or Agricultural and Food Economics at the University of Reading, average number of citations per thesis were 248.8 and 333.5, respectively. ( Source ) (I don't know how many unique references these citations were to, though) –  ff524 Commented Sep 21, 2015 at 3:15
  • 2 "lower bound...less than about 20 citations": as a fun data point, John Nash's PhD thesis had 2 references. –  ff524 Commented Sep 21, 2015 at 3:39
  • @ff524 Apologies for any confusion between citation and reference in my original writing; I've clarified. The citation data point is interesting, but doesn't answer the references question---I certainly tend to see many citations to key references. Nash, I would definitely count as an unusual data point. :-) –  jakebeal Commented Sep 21, 2015 at 3:46
  • 4 I think your upper bound is too low. In energy studies, I expect to see 150-250 references. 200 is not unusual for a lit review. –  410 gone Commented Sep 21, 2015 at 6:22
  • Nash referenced two other sources - one of which his own. :D –  fgysin Commented Sep 21, 2015 at 9:32

A quick look at the mostly Mechanical Engineering theses I have on my computer at the moment gives an average of just over 100 references, with a range of approximately 80-150.

I haven't been able to find any quantifiable data but I feel this is about right. Some people will have significantly more and a few less.

I also suspect there is significant variations between subjects with arts and social sciences having significantly more references and more theoretical subjects having slightly less.

nivag's user avatar

In essence, the number should be small enough to prove that the major composition of your thesis is your own work, yet large enough to satisfy the committee during your thesis defence.

As first stated, there is no actual upper or lower bound per se . But required number of items to be referred is more of a hidden factor that is highly dependant on your doctoral committee. Thus it would be best to consult your advisor and, if permitted so, members of your doctoral doctoral committee. In addition to that, you ought to look through some PhD dissertation accomplished by others in your field to get a good idea of the average number if that is what you really want.

Ébe Isaac's user avatar

It really depends on the subject. Imagine you are writing a thesis on three complex inter-related medical conditions and you include a well-sustained hypothesis for further study. It would not be very hard to reach 300 or more citations. You know that people reading your work will not be specialists on the three subjects at the same time, so you have to provide a good and comprehensive introduction on each subject, so that everyone can understand the results and discussions. This alone can take 200 or more references, some of them will be used in the discussion. Now imagine that two of these medical conditions have recent definitions and guidelines that are completely different from those from two years ago, and you have to contrast results with papers using the old guidelines, as well as some few papers that use the new guidelines. Now you have to explain both guidelines and in which way they differ from each other. Furthermore, the pathophysiologic models for the interaction between these conditions are complex and comprise systemic, organic, cellular, molecular levels, and you have to explain some of them in order to establish your hypothesis. This is where I'm headed... around 320 references and counting...

Slinkman's user avatar

For me, I think that the use of references should not limit the creativity of the Doctorat candidate. Rather, they should be that springboard that get them produce more than what they get as intake. Numerically, I expect to have each single reference to be equated with two pages of the candidate's work as a maximum. This would mean that a dissertation of 300 pages should not have used more than150 references. The number wil' vary according g to the length of the dissertation. The higher the number of references, the more limited the originality of the work and the poorer the creativity displayed.

Afkinich Taoufik Allah's user avatar

You must log in to answer this question.

Not the answer you're looking for browse other questions tagged phd citations thesis reference-request ..

  • Featured on Meta
  • Upcoming sign-up experiments related to tags

Hot Network Questions

  • Can apophatic theology offer a coherent resolution to the "problem of the creator of God"?
  • Idiom for a situation where a problem has two simultaneous but unrelated causes?
  • Output the Steiner system S(5,8,24)
  • Freewheeling diode in a capacitor
  • Parity of a number with missing digits
  • How to make sure to only get full frame lenses for the Canon EF (non-mirrorless) mount?
  • Shouldn't this est be a sunt in this sentence?
  • Can a unique position be deduced if pieces are replaced by checkers (can see piece color but not type)
  • which one is correct, "looking forward to hear" OR "looking forward to hearing"
  • My 5-year-old is stealing food and lying
  • Forcing QGIS to export using left hand rule
  • Why would anyone kill a dragon rather than subdue it in OD&D?
  • Is it legal to discriminate on marital status for car insurance/pensions etc.?
  • What's the necessary and sufficient condition for a real sequence to be written as the self-convolution of another real sequence?
  • Is there some sort of kitchen utensil/device like a cylinder with a strainer?
  • Convolution of sinusoid and running sum filter of order 5
  • How can I enable read only mode in microSD card
  • LD_DEBUG_OUTPUT does not work
  • What does "the dogs of prescriptivism" mean?
  • Android dot folder (hidden folder) is not showing in windows
  • Is this professor being unnecessarily harsh or did I actually make a mistake?
  • Do I need to staple cable for new wire run through a preexisting wall?
  • How to join two PCBs with a very small separation?
  • how do i permanently disable my microphone?

how many citations is good for a research paper

IMAGES

  1. Citing of Six or More Multiple Authors in APA

    how many citations is good for a research paper

  2. How To Cite a Research Paper: Citation Styles Guide

    how many citations is good for a research paper

  3. Citing Multiple Sources in APA With the Same Author

    how many citations is good for a research paper

  4. In-text Citation

    how many citations is good for a research paper

  5. APA Citations for a Thesis or Dissertation

    how many citations is good for a research paper

  6. 010 Mla Format Citing Research Paper Samplewrkctd ~ Museumlegs

    how many citations is good for a research paper

VIDEO

  1. Day-7 Compiling References/ Bibliography

  2. What is Citation?

  3. Research Metrics (Advantages & Types)

  4. Day-7 Compiling References/ Bibliography

  5. Day-9 Writing research paper or report/ understanding impact factor and publication facets

  6. Day-1 Problem identification and formulation/ The art of writing a review of literature

COMMENTS

  1. How many citations are actually a lot of citations?

    First, almost 44% of all published manuscripts are never cited. If you have even 1 citations for a manuscript you are already (almost!) in the top half (top 55.8%). With 10 or more citations, your work is now in the top 24% of the most cited work worldwide; this increased to the top 1.8% as you reach 100 or more citations.

  2. How many citations does it take to indicate an academic article is

    Simon John Cotterill. Newcastle University. The time-frame is important - another metric might be citations per year. For me 12 citations for an article published in 2013 would indicate a level of ...

  3. Appropriate level of citation

    Figure 8.1 in Chapter 8 of the Publication Manual provides an example of an appropriate level of citation. The number of sources you cite in your paper depends on the purpose of your work. For most papers, cite one or two of the most representative sources for each key point. Literature review papers typically include a more exhaustive list of ...

  4. citations

    Oct 21, 2013 at 11:13. As far as I know, there is no limit on the number of references, only on the number of pages, which is respected. - user102. Oct 21, 2013 at 11:53. There is probably some discipline influence at work. In the social sciences, it is not uncommon to have around 50 references in a journal paper. - Shion. Oct 21, 2013 at ...

  5. Citation Analysis

    The index is based on the set of the researcher's most cited papers and the number of citations that they have received in other people's publications A scientist has index h if h of [his/her] Np papers have at least h citations each, and the other (Np − h) papers have at most h citations each. Find your h-index at: Web of Science

  6. Citations, Citation Indicators, and Research Quality: An Overview of

    For example, in many humanities disciplines, an average paper receives less than one citation during a 10-year period, compared with more than 40 citations in some biomedical fields (data from Web of Science 2005-2015).

  7. How to Cite Sources

    At college level, you must properly cite your sources in all essays, research papers, and other academic texts (except exams and in-class exercises). ... There are many different citation styles, but they typically use one of three systems: parenthetical, note, or numerical citations. 531.

  8. Citation Styles Guide

    The Bluebook: A Uniform System of Citation is the main style guide for legal citations in the US. It's widely used in law, and also when legal materials need to be cited in other disciplines. Bluebook footnote citation. 1 David E. Pozen, Freedom of Information Beyond the Freedom of Information Act, 165, U. P🇦 . L.

  9. How Many References Should a Research Paper Have? Study of 96,685

    1- The average number of references for a research paper is 45, with 90% of research papers having between 8 and 102 references. However, this number depends a lot on study design. For instance, a systematic review typically has 49 references while a case report has only 24. 2- As a rule of thumb, consider citing 1 reference for every 95 words ...

  10. Citing Sources: What are citations and why should I use them?

    "Good citations should reveal your sources, not conceal them. They should honeslty reflect the research you conducted." (Lipson 4) Lipson, Charles. "Why Cite?" Cite Right: A Quick Guide to Citation Styles--MLA, APA, Chicago, the Sciences, Professions, and More. Chicago: U of Chicago, 2006. Print.

  11. Q: What is the standard number of references a paper should have?

    Answer: Regarding the number of reference citations, there is no specific number or range that is considered as normal or standard. You should cite just about enough sources that are required for your paper. The number of references you will use depends on how much literature exists on the topic. If you are working on a very new and original ...

  12. Writing a Research Paper Introduction

    Step 1: Introduce your topic. Step 2: Describe the background. Step 3: Establish your research problem. Step 4: Specify your objective (s) Step 5: Map out your paper. Research paper introduction examples. Frequently asked questions about the research paper introduction.

  13. How Many References in a Research Paper?

    According to Milojević's study encompassing research in the fields of astronomy, mathematics, robotics, ecology, and economics, the highest and average number of references per article page were as follows: Ecology: highest, ~58; average reference per page, 6; Math and robotics: highest, ~28; average reference per page, <1; and.

  14. Quantifying the Impact of My Publications: Citations

    As an example, the Citation Rates Table displays data on the average citation rates of papers within the scientific fields over each of the past 10 years. The calculation is number of citations / number of papers , where papers is defined as regular scientific articles, review articles, proceedings papers, and research notes.

  15. How many references is enough?

    More isn't always better. My "rule of thumb" has always been to use a maximum of three references to support a particular statement. The role of a literature review is to provide a targeted review of the literature. In my view, there are several reasons why it is wise not to use too many references: It really disturbs the flow of the paper.

  16. Citations are a good way to determine the quality of research

    However, many would contend that the number of times a paper has been subsequently cited is a good indicator of the impact of the paper and therefore the quality of the research . In this Topical Debate, the previous and current Editors in Chief of this journal engage in a spirited and timely debate regarding citations and whether they are a ...

  17. publications

    However, I am more interested in the total number of citations that is considered normal for a paper (to be more specific, a Master Thesis, which in my case will be around 60 pages of content.) I heard that about 1 - 1.5 multiplied with page count would be a good number of sources cited.

  18. Research Paper Summary: How much should be citations versus original

    A (good) research paper is analysis. You've read other people's work and use it to inform your discussion. Depending on the field and the wishes of your teacher, you include a small or large number of quotes and ideas (properly cited), but you're still explaining it in your own words. A summary of a research paper is a condensation of this idea.

  19. citations

    24. I have read that references in scientific papers should be no more than 2-3 years old, since such fields move fast, and no more than 10 years for arts or related fields: A good rule of thumb is to use sources published in the past 10 years for research in the arts, humanities, literature, history, etc.

  20. Finding How Many Times an Article Has Been Cited

    From the search results page, click on the ALTMETRICS link for the article. You will then be taken to a page that will show how much attention the article has received on sources such as Twitter, Facebook, etc. Click on the Dimensions citations tab to find how many publications cite the article. The View all 426 publications that cite this ...

  21. Number of citations after PhD

    6. Such a short time after publication the number of citations will mostly represent how well a paper was advertised rather that its quality. - Bitwise. Dec 7, 2013 at 16:55. Try and check sucessful scientists: In some fields (math/TCS), only like 10-20% of their articles really get significantly cited, most have up to ~5 citations.

  22. How many quotes should I use?

    In social sciences, it varies. If your research is mainly quantitative, you won't include many quotes, but if it's more qualitative, you may need to quote from the data you collected. As a general guideline, quotes should take up no more than 5-10% of your paper. If in doubt, check with your instructor or supervisor how much quoting is ...

  23. Predicting citation impact of academic papers across research areas

    Citation counts of a paper can be affected by many factors such as research areas, paper types, age, sex, and other aspects of the authors (Andersen & Nielsen, 2018; Mendoza, 2021; Thelwall, 2020). Levitt and Thelwall ( 2008 ) compared patterns of annual citations of highly cited papers across six research areas.

  24. citations

    "upper bound... 200 or more": In a corpus of PhD theses in Agricultural Botany or Agricultural and Food Economics at the University of Reading, average number of citations per thesis were 248.8 and 333.5, respectively. (I don't know how many unique references these citations were to, though) -