• Locations and Hours
  • UCLA Library
  • Research Guides
  • Biomedical Library Guides

Systematic Reviews

  • Types of Literature Reviews

What Makes a Systematic Review Different from Other Types of Reviews?

  • Planning Your Systematic Review
  • Database Searching
  • Creating the Search
  • Search Filters and Hedges
  • Grey Literature
  • Managing and Appraising Results
  • Further Resources

Reproduced from Grant, M. J. and Booth, A. (2009), A typology of reviews: an analysis of 14 review types and associated methodologies. Health Information & Libraries Journal, 26: 91–108. doi:10.1111/j.1471-1842.2009.00848.x

Aims to demonstrate writer has extensively researched literature and critically evaluated its quality. Goes beyond mere description to include degree of analysis and conceptual innovation. Typically results in hypothesis or mode Seeks to identify most significant items in the field No formal quality assessment. Attempts to evaluate according to contribution Typically narrative, perhaps conceptual or chronological Significant component: seeks to identify conceptual contribution to embody existing or derive new theory
Generic term: published materials that provide examination of recent or current literature. Can cover wide range of subjects at various levels of completeness and comprehensiveness. May include research findings May or may not include comprehensive searching May or may not include quality assessment Typically narrative Analysis may be chronological, conceptual, thematic, etc.
Mapping review/ systematic map Map out and categorize existing literature from which to commission further reviews and/or primary research by identifying gaps in research literature Completeness of searching determined by time/scope constraints No formal quality assessment May be graphical and tabular Characterizes quantity and quality of literature, perhaps by study design and other key features. May identify need for primary or secondary research
Technique that statistically combines the results of quantitative studies to provide a more precise effect of the results Aims for exhaustive, comprehensive searching. May use funnel plot to assess completeness Quality assessment may determine inclusion/ exclusion and/or sensitivity analyses Graphical and tabular with narrative commentary Numerical analysis of measures of effect assuming absence of heterogeneity
Refers to any combination of methods where one significant component is a literature review (usually systematic). Within a review context it refers to a combination of review approaches for example combining quantitative with qualitative research or outcome with process studies Requires either very sensitive search to retrieve all studies or separately conceived quantitative and qualitative strategies Requires either a generic appraisal instrument or separate appraisal processes with corresponding checklists Typically both components will be presented as narrative and in tables. May also employ graphical means of integrating quantitative and qualitative studies Analysis may characterise both literatures and look for correlations between characteristics or use gap analysis to identify aspects absent in one literature but missing in the other
Generic term: summary of the [medical] literature that attempts to survey the literature and describe its characteristics May or may not include comprehensive searching (depends whether systematic overview or not) May or may not include quality assessment (depends whether systematic overview or not) Synthesis depends on whether systematic or not. Typically narrative but may include tabular features Analysis may be chronological, conceptual, thematic, etc.
Method for integrating or comparing the findings from qualitative studies. It looks for ‘themes’ or ‘constructs’ that lie in or across individual qualitative studies May employ selective or purposive sampling Quality assessment typically used to mediate messages not for inclusion/exclusion Qualitative, narrative synthesis Thematic analysis, may include conceptual models
Assessment of what is already known about a policy or practice issue, by using systematic review methods to search and critically appraise existing research Completeness of searching determined by time constraints Time-limited formal quality assessment Typically narrative and tabular Quantities of literature and overall quality/direction of effect of literature
Preliminary assessment of potential size and scope of available research literature. Aims to identify nature and extent of research evidence (usually including ongoing research) Completeness of searching determined by time/scope constraints. May include research in progress No formal quality assessment Typically tabular with some narrative commentary Characterizes quantity and quality of literature, perhaps by study design and other key features. Attempts to specify a viable review
Tend to address more current matters in contrast to other combined retrospective and current approaches. May offer new perspectives Aims for comprehensive searching of current literature No formal quality assessment Typically narrative, may have tabular accompaniment Current state of knowledge and priorities for future investigation and research
Seeks to systematically search for, appraise and synthesis research evidence, often adhering to guidelines on the conduct of a review Aims for exhaustive, comprehensive searching Quality assessment may determine inclusion/exclusion Typically narrative with tabular accompaniment What is known; recommendations for practice. What remains unknown; uncertainty around findings, recommendations for future research
Combines strengths of critical review with a comprehensive search process. Typically addresses broad questions to produce ‘best evidence synthesis’ Aims for exhaustive, comprehensive searching May or may not include quality assessment Minimal narrative, tabular summary of studies What is known; recommendations for practice. Limitations
Attempt to include elements of systematic review process while stopping short of systematic review. Typically conducted as postgraduate student assignment May or may not include comprehensive searching May or may not include quality assessment Typically narrative with tabular accompaniment What is known; uncertainty around findings; limitations of methodology
Specifically refers to review compiling evidence from multiple reviews into one accessible and usable document. Focuses on broad condition or problem for which there are competing interventions and highlights reviews that address these interventions and their results Identification of component reviews, but no search for primary studies Quality assessment of studies within component reviews and/or of reviews themselves Graphical and tabular with narrative commentary What is known; recommendations for practice. What remains unknown; recommendations for future research
  • << Previous: Home
  • Next: Planning Your Systematic Review >>
  • Last Updated: Jul 23, 2024 3:40 PM
  • URL: https://guides.library.ucla.edu/systematicreviews

Charles Sturt University

Literature Review: Types of literature reviews

  • Traditional or narrative literature reviews
  • Scoping Reviews
  • Systematic literature reviews
  • Annotated bibliography
  • Keeping up to date with literature
  • Finding a thesis
  • Evaluating sources and critical appraisal of literature
  • Managing and analysing your literature
  • Further reading and resources

Types of literature reviews

types of the literature review

The type of literature review you write will depend on your discipline and whether you are a researcher writing your PhD, publishing a study in a journal or completing an assessment task in your undergraduate study.

A literature review for a subject in an undergraduate degree will not be as comprehensive as the literature review required for a PhD thesis.

An undergraduate literature review may be in the form of an annotated bibliography or a narrative review of a small selection of literature, for example ten relevant articles. If you are asked to write a literature review, and you are an undergraduate student, be guided by your subject coordinator or lecturer.

The common types of literature reviews will be explained in the pages of this section.

  • Narrative or traditional literature reviews
  • Critically Appraised Topic (CAT)
  • Scoping reviews
  • Annotated bibliographies

These are not the only types of reviews of literature that can be conducted. Often the term "review" and "literature" can be confusing and used in the wrong context. Grant and Booth (2009) attempt to clear up this confusion by discussing 14 review types and the associated methodology, and advantages and disadvantages associated with each review.

Grant, M. J. and Booth, A. (2009), A typology of reviews: an analysis of 14 review types and associated methodologies . Health Information & Libraries Journal, 26 , 91–108. doi:10.1111/j.1471-1842.2009.00848.x

What's the difference between reviews?

Researchers, academics, and librarians all use various terms to describe different types of literature reviews, and there is often inconsistency in the ways the types are discussed. Here are a couple of simple explanations.

  • The image below describes common review types in terms of speed, detail, risk of bias, and comprehensiveness:

Description of the differences between review types in image form

"Schematic of the main differences between the types of literature review" by Brennan, M. L., Arlt, S. P., Belshaw, Z., Buckley, L., Corah, L., Doit, H., Fajt, V. R., Grindlay, D., Moberly, H. K., Morrow, L. D., Stavisky, J., & White, C. (2020). Critically Appraised Topics (CATs) in veterinary medicine: Applying evidence in clinical practice. Frontiers in Veterinary Science, 7 , 314. https://doi.org/10.3389/fvets.2020.00314 is licensed under CC BY 3.0

  • The table below lists four of the most common types of review , as adapted from a widely used typology of fourteen types of reviews (Grant & Booth, 2009).  
Identifies and reviews published literature on a topic, which may be broad. Typically employs a narrative approach to reporting the review findings. Can include a wide range of related subjects. 1 - 4 weeks 1
Assesses what is known about an issue by using a systematic review method to search and appraise research and determine best practice. 2 - 6 months 2
Assesses the potential scope of the research literature on a particular topic. Helps determine gaps in the research. (See the page in this guide on  .) 1 - 4 weeks 1 - 2
Seeks to systematically search for, appraise, and synthesise research evidence so as to aid decision-making and determine best practice. Can vary in approach, and is often specific to the type of study, which include studies of effectiveness, qualitative research, economic evaluation, prevalence, aetiology, or diagnostic test accuracy. 8 months to 2 years 2 or more

Grant, M.J. & Booth, A. (2009).  A typology of reviews: An analysis of 14 review types and associated methodologies. Health Information & Libraries Journal, 26 (2), 91-108. https://doi.org/10.1111/j.1471-1842.2009.00848.x

See also the Library's  Literature Review guide.

Critical Appraised Topic (CAT)

For information on conducting a Critically Appraised Topic or CAT

Callander, J., Anstey, A. V., Ingram, J. R., Limpens, J., Flohr, C., & Spuls, P. I. (2017).  How to write a Critically Appraised Topic: evidence to underpin routine clinical practice.  British Journal of Dermatology (1951), 177(4), 1007-1013. https://doi.org/10.1111/bjd.15873 

Books on Literature Reviews

Cover Art

  • << Previous: Home
  • Next: Traditional or narrative literature reviews >>
  • Last Updated: Aug 11, 2024 4:07 PM
  • URL: https://libguides.csu.edu.au/review

Acknowledgement of Country

Charles Sturt University is an Australian University, TEQSA Provider Identification: PRV12018. CRICOS Provider: 00005F.

Research-Methodology

Types of Literature Review

There are many types of literature review. The choice of a specific type depends on your research approach and design. The following types of literature review are the most popular in business studies:

Narrative literature review , also referred to as traditional literature review, critiques literature and summarizes the body of a literature. Narrative review also draws conclusions about the topic and identifies gaps or inconsistencies in a body of knowledge. You need to have a sufficiently focused research question to conduct a narrative literature review

Systematic literature review requires more rigorous and well-defined approach compared to most other types of literature review. Systematic literature review is comprehensive and details the timeframe within which the literature was selected. Systematic literature review can be divided into two categories: meta-analysis and meta-synthesis.

When you conduct meta-analysis you take findings from several studies on the same subject and analyze these using standardized statistical procedures. In meta-analysis patterns and relationships are detected and conclusions are drawn. Meta-analysis is associated with deductive research approach.

Meta-synthesis, on the other hand, is based on non-statistical techniques. This technique integrates, evaluates and interprets findings of multiple qualitative research studies. Meta-synthesis literature review is conducted usually when following inductive research approach.

Scoping literature review , as implied by its name is used to identify the scope or coverage of a body of literature on a given topic. It has been noted that “scoping reviews are useful for examining emerging evidence when it is still unclear what other, more specific questions can be posed and valuably addressed by a more precise systematic review.” [1] The main difference between systematic and scoping types of literature review is that, systematic literature review is conducted to find answer to more specific research questions, whereas scoping literature review is conducted to explore more general research question.

Argumentative literature review , as the name implies, examines literature selectively in order to support or refute an argument, deeply imbedded assumption, or philosophical problem already established in the literature. It should be noted that a potential for bias is a major shortcoming associated with argumentative literature review.

Integrative literature review reviews , critiques, and synthesizes secondary data about research topic in an integrated way such that new frameworks and perspectives on the topic are generated. If your research does not involve primary data collection and data analysis, then using integrative literature review will be your only option.

Theoretical literature review focuses on a pool of theory that has accumulated in regard to an issue, concept, theory, phenomena. Theoretical literature reviews play an instrumental role in establishing what theories already exist, the relationships between them, to what degree existing theories have been investigated, and to develop new hypotheses to be tested.

At the earlier parts of the literature review chapter, you need to specify the type of your literature review your chose and justify your choice. Your choice of a specific type of literature review should be based upon your research area, research problem and research methods.  Also, you can briefly discuss other most popular types of literature review mentioned above, to illustrate your awareness of them.

[1] Munn, A. et. al. (2018) “Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach” BMC Medical Research Methodology

Types of Literature Review

  John Dudovskiy

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • How to Write a Literature Review | Guide, Examples, & Templates

How to Write a Literature Review | Guide, Examples, & Templates

Published on January 2, 2023 by Shona McCombes . Revised on September 11, 2023.

What is a literature review? A literature review is a survey of scholarly sources on a specific topic. It provides an overview of current knowledge, allowing you to identify relevant theories, methods, and gaps in the existing research that you can later apply to your paper, thesis, or dissertation topic .

There are five key steps to writing a literature review:

  • Search for relevant literature
  • Evaluate sources
  • Identify themes, debates, and gaps
  • Outline the structure
  • Write your literature review

A good literature review doesn’t just summarize sources—it analyzes, synthesizes , and critically evaluates to give a clear picture of the state of knowledge on the subject.

Instantly correct all language mistakes in your text

Upload your document to correct all your mistakes in minutes

upload-your-document-ai-proofreader

Table of contents

What is the purpose of a literature review, examples of literature reviews, step 1 – search for relevant literature, step 2 – evaluate and select sources, step 3 – identify themes, debates, and gaps, step 4 – outline your literature review’s structure, step 5 – write your literature review, free lecture slides, other interesting articles, frequently asked questions, introduction.

  • Quick Run-through
  • Step 1 & 2

When you write a thesis , dissertation , or research paper , you will likely have to conduct a literature review to situate your research within existing knowledge. The literature review gives you a chance to:

  • Demonstrate your familiarity with the topic and its scholarly context
  • Develop a theoretical framework and methodology for your research
  • Position your work in relation to other researchers and theorists
  • Show how your research addresses a gap or contributes to a debate
  • Evaluate the current state of research and demonstrate your knowledge of the scholarly debates around your topic.

Writing literature reviews is a particularly important skill if you want to apply for graduate school or pursue a career in research. We’ve written a step-by-step guide that you can follow below.

Literature review guide

Don't submit your assignments before you do this

The academic proofreading tool has been trained on 1000s of academic texts. Making it the most accurate and reliable proofreading tool for students. Free citation check included.

types of the literature review

Try for free

Writing literature reviews can be quite challenging! A good starting point could be to look at some examples, depending on what kind of literature review you’d like to write.

  • Example literature review #1: “Why Do People Migrate? A Review of the Theoretical Literature” ( Theoretical literature review about the development of economic migration theory from the 1950s to today.)
  • Example literature review #2: “Literature review as a research methodology: An overview and guidelines” ( Methodological literature review about interdisciplinary knowledge acquisition and production.)
  • Example literature review #3: “The Use of Technology in English Language Learning: A Literature Review” ( Thematic literature review about the effects of technology on language acquisition.)
  • Example literature review #4: “Learners’ Listening Comprehension Difficulties in English Language Learning: A Literature Review” ( Chronological literature review about how the concept of listening skills has changed over time.)

You can also check out our templates with literature review examples and sample outlines at the links below.

Download Word doc Download Google doc

Before you begin searching for literature, you need a clearly defined topic .

If you are writing the literature review section of a dissertation or research paper, you will search for literature related to your research problem and questions .

Make a list of keywords

Start by creating a list of keywords related to your research question. Include each of the key concepts or variables you’re interested in, and list any synonyms and related terms. You can add to this list as you discover new keywords in the process of your literature search.

  • Social media, Facebook, Instagram, Twitter, Snapchat, TikTok
  • Body image, self-perception, self-esteem, mental health
  • Generation Z, teenagers, adolescents, youth

Search for relevant sources

Use your keywords to begin searching for sources. Some useful databases to search for journals and articles include:

  • Your university’s library catalogue
  • Google Scholar
  • Project Muse (humanities and social sciences)
  • Medline (life sciences and biomedicine)
  • EconLit (economics)
  • Inspec (physics, engineering and computer science)

You can also use boolean operators to help narrow down your search.

Make sure to read the abstract to find out whether an article is relevant to your question. When you find a useful book or article, you can check the bibliography to find other relevant sources.

You likely won’t be able to read absolutely everything that has been written on your topic, so it will be necessary to evaluate which sources are most relevant to your research question.

For each publication, ask yourself:

  • What question or problem is the author addressing?
  • What are the key concepts and how are they defined?
  • What are the key theories, models, and methods?
  • Does the research use established frameworks or take an innovative approach?
  • What are the results and conclusions of the study?
  • How does the publication relate to other literature in the field? Does it confirm, add to, or challenge established knowledge?
  • What are the strengths and weaknesses of the research?

Make sure the sources you use are credible , and make sure you read any landmark studies and major theories in your field of research.

You can use our template to summarize and evaluate sources you’re thinking about using. Click on either button below to download.

Take notes and cite your sources

As you read, you should also begin the writing process. Take notes that you can later incorporate into the text of your literature review.

It is important to keep track of your sources with citations to avoid plagiarism . It can be helpful to make an annotated bibliography , where you compile full citation information and write a paragraph of summary and analysis for each source. This helps you remember what you read and saves time later in the process.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

To begin organizing your literature review’s argument and structure, be sure you understand the connections and relationships between the sources you’ve read. Based on your reading and notes, you can look for:

  • Trends and patterns (in theory, method or results): do certain approaches become more or less popular over time?
  • Themes: what questions or concepts recur across the literature?
  • Debates, conflicts and contradictions: where do sources disagree?
  • Pivotal publications: are there any influential theories or studies that changed the direction of the field?
  • Gaps: what is missing from the literature? Are there weaknesses that need to be addressed?

This step will help you work out the structure of your literature review and (if applicable) show how your own research will contribute to existing knowledge.

  • Most research has focused on young women.
  • There is an increasing interest in the visual aspects of social media.
  • But there is still a lack of robust research on highly visual platforms like Instagram and Snapchat—this is a gap that you could address in your own research.

There are various approaches to organizing the body of a literature review. Depending on the length of your literature review, you can combine several of these strategies (for example, your overall structure might be thematic, but each theme is discussed chronologically).

Chronological

The simplest approach is to trace the development of the topic over time. However, if you choose this strategy, be careful to avoid simply listing and summarizing sources in order.

Try to analyze patterns, turning points and key debates that have shaped the direction of the field. Give your interpretation of how and why certain developments occurred.

If you have found some recurring central themes, you can organize your literature review into subsections that address different aspects of the topic.

For example, if you are reviewing literature about inequalities in migrant health outcomes, key themes might include healthcare policy, language barriers, cultural attitudes, legal status, and economic access.

Methodological

If you draw your sources from different disciplines or fields that use a variety of research methods , you might want to compare the results and conclusions that emerge from different approaches. For example:

  • Look at what results have emerged in qualitative versus quantitative research
  • Discuss how the topic has been approached by empirical versus theoretical scholarship
  • Divide the literature into sociological, historical, and cultural sources

Theoretical

A literature review is often the foundation for a theoretical framework . You can use it to discuss various theories, models, and definitions of key concepts.

You might argue for the relevance of a specific theoretical approach, or combine various theoretical concepts to create a framework for your research.

Like any other academic text , your literature review should have an introduction , a main body, and a conclusion . What you include in each depends on the objective of your literature review.

The introduction should clearly establish the focus and purpose of the literature review.

Depending on the length of your literature review, you might want to divide the body into subsections. You can use a subheading for each theme, time period, or methodological approach.

As you write, you can follow these tips:

  • Summarize and synthesize: give an overview of the main points of each source and combine them into a coherent whole
  • Analyze and interpret: don’t just paraphrase other researchers — add your own interpretations where possible, discussing the significance of findings in relation to the literature as a whole
  • Critically evaluate: mention the strengths and weaknesses of your sources
  • Write in well-structured paragraphs: use transition words and topic sentences to draw connections, comparisons and contrasts

In the conclusion, you should summarize the key findings you have taken from the literature and emphasize their significance.

When you’ve finished writing and revising your literature review, don’t forget to proofread thoroughly before submitting. Not a language expert? Check out Scribbr’s professional proofreading services !

This article has been adapted into lecture slides that you can use to teach your students about writing a literature review.

Scribbr slides are free to use, customize, and distribute for educational purposes.

Open Google Slides Download PowerPoint

If you want to know more about the research process , methodology , research bias , or statistics , make sure to check out some of our other articles with explanations and examples.

  • Sampling methods
  • Simple random sampling
  • Stratified sampling
  • Cluster sampling
  • Likert scales
  • Reproducibility

 Statistics

  • Null hypothesis
  • Statistical power
  • Probability distribution
  • Effect size
  • Poisson distribution

Research bias

  • Optimism bias
  • Cognitive bias
  • Implicit bias
  • Hawthorne effect
  • Anchoring bias
  • Explicit bias

A literature review is a survey of scholarly sources (such as books, journal articles, and theses) related to a specific topic or research question .

It is often written as part of a thesis, dissertation , or research paper , in order to situate your work in relation to existing knowledge.

There are several reasons to conduct a literature review at the beginning of a research project:

  • To familiarize yourself with the current state of knowledge on your topic
  • To ensure that you’re not just repeating what others have already done
  • To identify gaps in knowledge and unresolved problems that your research can address
  • To develop your theoretical framework and methodology
  • To provide an overview of the key findings and debates on the topic

Writing the literature review shows your reader how your work relates to existing research and what new insights it will contribute.

The literature review usually comes near the beginning of your thesis or dissertation . After the introduction , it grounds your research in a scholarly field and leads directly to your theoretical framework or methodology .

A literature review is a survey of credible sources on a topic, often used in dissertations , theses, and research papers . Literature reviews give an overview of knowledge on a subject, helping you identify relevant theories and methods, as well as gaps in existing research. Literature reviews are set up similarly to other  academic texts , with an introduction , a main body, and a conclusion .

An  annotated bibliography is a list of  source references that has a short description (called an annotation ) for each of the sources. It is often assigned as part of the research process for a  paper .  

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

McCombes, S. (2023, September 11). How to Write a Literature Review | Guide, Examples, & Templates. Scribbr. Retrieved September 27, 2024, from https://www.scribbr.com/dissertation/literature-review/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, what is a theoretical framework | guide to organizing, what is a research methodology | steps & tips, how to write a research proposal | examples & templates, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

  • UConn Library
  • Literature Review: The What, Why and How-to Guide
  • Introduction

Literature Review: The What, Why and How-to Guide — Introduction

  • Getting Started
  • How to Pick a Topic
  • Strategies to Find Sources
  • Evaluating Sources & Lit. Reviews
  • Tips for Writing Literature Reviews
  • Writing Literature Review: Useful Sites
  • Citation Resources
  • Other Academic Writings

What are Literature Reviews?

So, what is a literature review? "A literature review is an account of what has been published on a topic by accredited scholars and researchers. In writing the literature review, your purpose is to convey to your reader what knowledge and ideas have been established on a topic, and what their strengths and weaknesses are. As a piece of writing, the literature review must be defined by a guiding concept (e.g., your research objective, the problem or issue you are discussing, or your argumentative thesis). It is not just a descriptive list of the material available, or a set of summaries." Taylor, D.  The literature review: A few tips on conducting it . University of Toronto Health Sciences Writing Centre.

Goals of Literature Reviews

What are the goals of creating a Literature Review?  A literature could be written to accomplish different aims:

  • To develop a theory or evaluate an existing theory
  • To summarize the historical or existing state of a research topic
  • Identify a problem in a field of research 

Baumeister, R. F., & Leary, M. R. (1997). Writing narrative literature reviews .  Review of General Psychology , 1 (3), 311-320.

What kinds of sources require a Literature Review?

  • A research paper assigned in a course
  • A thesis or dissertation
  • A grant proposal
  • An article intended for publication in a journal

All these instances require you to collect what has been written about your research topic so that you can demonstrate how your own research sheds new light on the topic.

Types of Literature Reviews

What kinds of literature reviews are written?

Narrative review: The purpose of this type of review is to describe the current state of the research on a specific topic/research and to offer a critical analysis of the literature reviewed. Studies are grouped by research/theoretical categories, and themes and trends, strengths and weakness, and gaps are identified. The review ends with a conclusion section which summarizes the findings regarding the state of the research of the specific study, the gaps identify and if applicable, explains how the author's research will address gaps identify in the review and expand the knowledge on the topic reviewed.

  • Example : Predictors and Outcomes of U.S. Quality Maternity Leave: A Review and Conceptual Framework:  10.1177/08948453211037398  

Systematic review : "The authors of a systematic review use a specific procedure to search the research literature, select the studies to include in their review, and critically evaluate the studies they find." (p. 139). Nelson, L. K. (2013). Research in Communication Sciences and Disorders . Plural Publishing.

  • Example : The effect of leave policies on increasing fertility: a systematic review:  10.1057/s41599-022-01270-w

Meta-analysis : "Meta-analysis is a method of reviewing research findings in a quantitative fashion by transforming the data from individual studies into what is called an effect size and then pooling and analyzing this information. The basic goal in meta-analysis is to explain why different outcomes have occurred in different studies." (p. 197). Roberts, M. C., & Ilardi, S. S. (2003). Handbook of Research Methods in Clinical Psychology . Blackwell Publishing.

  • Example : Employment Instability and Fertility in Europe: A Meta-Analysis:  10.1215/00703370-9164737

Meta-synthesis : "Qualitative meta-synthesis is a type of qualitative study that uses as data the findings from other qualitative studies linked by the same or related topic." (p.312). Zimmer, L. (2006). Qualitative meta-synthesis: A question of dialoguing with texts .  Journal of Advanced Nursing , 53 (3), 311-318.

  • Example : Women’s perspectives on career successes and barriers: A qualitative meta-synthesis:  10.1177/05390184221113735

Literature Reviews in the Health Sciences

  • UConn Health subject guide on systematic reviews Explanation of the different review types used in health sciences literature as well as tools to help you find the right review type
  • << Previous: Getting Started
  • Next: How to Pick a Topic >>
  • Last Updated: Sep 21, 2022 2:16 PM
  • URL: https://guides.lib.uconn.edu/literaturereview

Creative Commons

  • Chester Fritz Library
  • Library of the Health Sciences
  • Thormodsgard Law Library

Literature Reviews

  • Get started

Literature Reviews within a Scholarly Work

Literature reviews as a scholarly work.

  • Finding Literature Reviews
  • Your Literature Search
  • Library Books
  • How to Videos
  • Communicating & Citing Research
  • Bibliography

Literature reviews summarize and analyze what has been written on a particular topic and identify gaps or disagreements in the scholarly work on that topic.

Within a scholarly work, the literature review situates the current work within the larger scholarly conversation and emphasizes how that particular scholarly work contributes to the conversation on the topic. The literature review portion may be as brief as a few paragraphs focusing on a narrow topic area.

When writing this type of literature review, it's helpful to start by identifying sources most relevant to your research question. A citation tracking database such as Web of Science can also help you locate seminal articles on a topic and find out who has more recently cited them. See "Your Literature Search" for more details.

A literature review may itself be a scholarly publication and provide an analysis of what has been written on a particular topic without contributing original research. These types of literature reviews can serve to help keep people updated on a field as well as helping scholars choose a research topic to fill gaps in the knowledge on that topic. Common types include:

Systematic Review

Systematic literature reviews follow specific procedures in some ways similar to setting up an experiment to ensure that future scholars can replicate the same steps. They are also helpful for evaluating data published over multiple studies. Thus, these are common in the medical field and may be used by healthcare providers to help guide diagnosis and treatment decisions. Cochrane Reviews are one example of this type of literature review.

Semi-Systematic Review

When a systematic review is not feasible, a semi-systematic review can help synthesize research on a topic or how a topic has been studied in different fields (Snyder 2019). Rather than focusing on quantitative data, this review type identifies themes, theoretical perspectives, and other qualitative information related to the topic. These types of reviews can be particularly helpful for a historical topic overview, for developing a theoretical model, and for creating a research agenda for a field (Snyder 2019). As with systematic reviews, a search strategy must be developed before conducting the review.

Integrative Review

An integrative review is less systematic and can be helpful for developing a theoretical model or to reconceptualize a topic. As Synder (2019) notes, " This type of review often re quires a more creative collection of data, as the purpose is usually not to cover all articles ever published on the topic but rather to combine perspectives and insights from di ff erent fi elds or research traditions" (p. 336).

Sythesize and compare evidence Quantitative, comprehensive for specific area, systematic search strategy, informs policy/practice Health sciences, social sciences, STEM
Overview research area & changes over time Quantitative or qualitative, less detailed/thorough search strategy, identifies themes or research gaps or develops a theoretical model or provides a history of the field All
Synthesize literature to develop new perspectives or theories Qualitative, non-systematic search strategy, combines ideas from different fields, focus on creating new frameworks or theories by critiquing previous ideas Social sciences, humanities

Source: Snyder, H. (2019). Literature review as a research methodology: An overview and guidelines. Journal of Business Research. 104. 333-339. doi: 10.1016/j.jbusres.2019.07.039

  • << Previous: Get started
  • Next: Finding Literature Reviews >>
  • Last Updated: Sep 6, 2024 5:10 PM
  • URL: https://libguides.und.edu/literature-reviews

Duke University Libraries

Literature Reviews

  • Types of reviews
  • Getting started

Types of reviews and examples

Choosing a review type.

  • 1. Define your research question
  • 2. Plan your search
  • 3. Search the literature
  • 4. Organize your results
  • 5. Synthesize your findings
  • 6. Write the review
  • Artificial intelligence (AI) tools
  • Thompson Writing Studio This link opens in a new window
  • Need to write a systematic review? This link opens in a new window

types of the literature review

Contact a Librarian

Ask a Librarian

  • Meta-analysis
  • Systematized

Definition:

"A term used to describe a conventional overview of the literature, particularly when contrasted with a systematic review (Booth et al., 2012, p. 265).

Characteristics:

  • Provides examination of recent or current literature on a wide range of subjects
  • Varying levels of completeness / comprehensiveness, non-standardized methodology
  • May or may not include comprehensive searching, quality assessment or critical appraisal

Mitchell, L. E., & Zajchowski, C. A. (2022). The history of air quality in Utah: A narrative review.  Sustainability ,  14 (15), 9653.  doi.org/10.3390/su14159653

Booth, A., Papaioannou, D., & Sutton, A. (2012). Systematic approaches to a successful literature review. London: SAGE Publications Ltd.

"An assessment of what is already known about a policy or practice issue...using systematic review methods to search and critically appraise existing research" (Grant & Booth, 2009, p. 100).

  • Assessment of what is already known about an issue
  • Similar to a systematic review but within a time-constrained setting
  • Typically employs methodological shortcuts, increasing risk of introducing bias, includes basic level of quality assessment
  • Best suited for issues needing quick decisions and solutions (i.e., policy recommendations)

Learn more about the method:

Khangura, S., Konnyu, K., Cushman, R., Grimshaw, J., & Moher, D. (2012). Evidence summaries: the evolution of a rapid review approach.  Systematic reviews, 1 (1), 1-9.  https://doi.org/10.1186/2046-4053-1-10

Virginia Commonwealth University Libraries. (2021). Rapid Review Protocol .

Quarmby, S., Santos, G., & Mathias, M. (2019). Air quality strategies and technologies: A rapid review of the international evidence.  Sustainability, 11 (10), 2757.  https://doi.org/10.3390/su11102757

Grant, M.J. & Booth, A. (2009). A typology of reviews: an analysis of the 14 review types and associated methodologies.  Health Information & Libraries Journal , 26(2), 91-108. https://www.doi.org/10.1111/j.1471-1842.2009.00848.x

Developed and refined by the Evidence for Policy and Practice Information and Co-ordinating Centre (EPPI-Centre), this review "map[s] out and categorize[s] existing literature on a particular topic, identifying gaps in research literature from which to commission further reviews and/or primary research" (Grant & Booth, 2009, p. 97).

Although mapping reviews are sometimes called scoping reviews, the key difference is that mapping reviews focus on a review question, rather than a topic

Mapping reviews are "best used where a clear target for a more focused evidence product has not yet been identified" (Booth, 2016, p. 14)

Mapping review searches are often quick and are intended to provide a broad overview

Mapping reviews can take different approaches in what types of literature is focused on in the search

Cooper I. D. (2016). What is a "mapping study?".  Journal of the Medical Library Association: JMLA ,  104 (1), 76–78. https://doi.org/10.3163/1536-5050.104.1.013

Miake-Lye, I. M., Hempel, S., Shanman, R., & Shekelle, P. G. (2016). What is an evidence map? A systematic review of published evidence maps and their definitions, methods, and products.  Systematic reviews, 5 (1), 1-21.  https://doi.org/10.1186/s13643-016-0204-x

Tainio, M., Andersen, Z. J., Nieuwenhuijsen, M. J., Hu, L., De Nazelle, A., An, R., ... & de Sá, T. H. (2021). Air pollution, physical activity and health: A mapping review of the evidence.  Environment international ,  147 , 105954.  https://doi.org/10.1016/j.envint.2020.105954

Booth, A. (2016). EVIDENT Guidance for Reviewing the Evidence: a compendium of methodological literature and websites . ResearchGate. https://doi.org/10.13140/RG.2.1.1562.9842 . 

Grant, M.J. & Booth, A. (2009). A typology of reviews: an analysis of the 14 review types and associated methodologies.  Health Information & Libraries Journal , 26(2), 91-108.  https://www.doi.org/10.1111/j.1471-1842.2009.00848.x

"A type of review that has as its primary objective the identification of the size and quality of research in a topic area in order to inform subsequent review" (Booth et al., 2012, p. 269).

  • Main purpose is to map out and categorize existing literature, identify gaps in literature—great for informing policy-making
  • Search comprehensiveness determined by time/scope constraints, could take longer than a systematic review
  • No formal quality assessment or critical appraisal

Learn more about the methods :

Arksey, H., & O'Malley, L. (2005) Scoping studies: towards a methodological framework.  International Journal of Social Research Methodology ,  8 (1), 19-32.  https://doi.org/10.1080/1364557032000119616

Levac, D., Colquhoun, H., & O’Brien, K. K. (2010). Scoping studies: Advancing the methodology. Implementation Science: IS, 5, 69. https://doi.org/10.1186/1748-5908-5-69

Example : 

Rahman, A., Sarkar, A., Yadav, O. P., Achari, G., & Slobodnik, J. (2021). Potential human health risks due to environmental exposure to nano-and microplastics and knowledge gaps: A scoping review.  Science of the Total Environment, 757 , 143872.  https://doi.org/10.1016/j.scitotenv.2020.143872

A review that "[compiles] evidence from multiple...reviews into one accessible and usable document" (Grant & Booth, 2009, p. 103). While originally intended to be a compilation of Cochrane reviews, it now generally refers to any kind of evidence synthesis.

  • Compiles evidence from multiple reviews into one document
  • Often defines a broader question than is typical of a traditional systematic review

Choi, G. J., & Kang, H. (2022). The umbrella review: a useful strategy in the rain of evidence.  The Korean Journal of Pain ,  35 (2), 127–128.  https://doi.org/10.3344/kjp.2022.35.2.127

Aromataris, E., Fernandez, R., Godfrey, C. M., Holly, C., Khalil, H., & Tungpunkom, P. (2015). Summarizing systematic reviews: Methodological development, conduct and reporting of an umbrella review approach. International Journal of Evidence-Based Healthcare , 13(3), 132–140. https://doi.org/10.1097/XEB.0000000000000055

Rojas-Rueda, D., Morales-Zamora, E., Alsufyani, W. A., Herbst, C. H., Al Balawi, S. M., Alsukait, R., & Alomran, M. (2021). Environmental risk factors and health: An umbrella review of meta-analyses.  International Journal of Environmental Research and Public Dealth ,  18 (2), 704.  https://doi.org/10.3390/ijerph18020704

A meta-analysis is a "technique that statistically combines the results of quantitative studies to provide a more precise effect of the result" (Grant & Booth, 2009, p. 98).

  • Statistical technique for combining results of quantitative studies to provide more precise effect of results
  • Aims for exhaustive, comprehensive searching
  • Quality assessment may determine inclusion/exclusion criteria
  • May be conducted independently or as part of a systematic review

Berman, N. G., & Parker, R. A. (2002). Meta-analysis: Neither quick nor easy. BMC Medical Research Methodology , 2(1), 10. https://doi.org/10.1186/1471-2288-2-10

Hites R. A. (2004). Polybrominated diphenyl ethers in the environment and in people: a meta-analysis of concentrations.  Environmental Science & Technology ,  38 (4), 945–956.  https://doi.org/10.1021/es035082g

A systematic review "seeks to systematically search for, appraise, and [synthesize] research evidence, often adhering to the guidelines on the conduct of a review" provided by discipline-specific organizations, such as the Cochrane Collaboration (Grant & Booth, 2009, p. 102).

  • Aims to compile and synthesize all known knowledge on a given topic
  • Adheres to strict guidelines, protocols, and frameworks
  • Time-intensive and often takes months to a year or more to complete
  • The most commonly referred to type of evidence synthesis. Sometimes confused as a blanket term for other types of reviews

Gascon, M., Triguero-Mas, M., Martínez, D., Dadvand, P., Forns, J., Plasència, A., & Nieuwenhuijsen, M. J. (2015). Mental health benefits of long-term exposure to residential green and blue spaces: a systematic review.  International Journal of Environmental Research and Public Health ,  12 (4), 4354–4379.  https://doi.org/10.3390/ijerph120404354

"Systematized reviews attempt to include one or more elements of the systematic review process while stopping short of claiming that the resultant output is a systematic review" (Grant & Booth, 2009, p. 102). When a systematic review approach is adapted to produce a more manageable scope, while still retaining the rigor of a systematic review such as risk of bias assessment and the use of a protocol, this is often referred to as a  structured review  (Huelin et al., 2015).

  • Typically conducted by postgraduate or graduate students
  • Often assigned by instructors to students who don't have the resources to conduct a full systematic review

Salvo, G., Lashewicz, B. M., Doyle-Baker, P. K., & McCormack, G. R. (2018). Neighbourhood built environment influences on physical activity among adults: A systematized review of qualitative evidence.  International Journal of Environmental Research and Public Health ,  15 (5), 897.  https://doi.org/10.3390/ijerph15050897

Huelin, R., Iheanacho, I., Payne, K., & Sandman, K. (2015). What’s in a name? Systematic and non-systematic literature reviews, and why the distinction matters. https://www.evidera.com/resource/whats-in-a-name-systematic-and-non-systematic-literature-reviews-and-why-the-distinction-matters/

Flowchart of review types

  • Review Decision Tree - Cornell University For more information, check out Cornell's review methodology decision tree.
  • LitR-Ex.com - Eight literature review methodologies Learn more about 8 different review types (incl. Systematic Reviews and Scoping Reviews) with practical tips about strengths and weaknesses of different methods.
  • << Previous: Getting started
  • Next: 1. Define your research question >>
  • Last Updated: Sep 17, 2024 1:24 PM
  • URL: https://guides.library.duke.edu/litreviews

Duke University Libraries

Services for...

  • Faculty & Instructors
  • Graduate Students
  • Undergraduate Students
  • International Students
  • Patrons with Disabilities

Twitter

  • Harmful Language Statement
  • Re-use & Attribution / Privacy
  • Support the Libraries

Creative Commons License

  • University of Wisconsin–Madison
  • University of Wisconsin-Madison
  • Research Guides
  • Evidence Synthesis, Systematic Review Services
  • Literature Review Types, Taxonomies

Evidence Synthesis, Systematic Review Services : Literature Review Types, Taxonomies

  • Develop a Protocol
  • Develop Your Research Question
  • Select Databases
  • Select Gray Literature Sources
  • Write a Search Strategy
  • Manage Your Search Process
  • Register Your Protocol
  • Citation Management
  • Article Screening
  • Risk of Bias Assessment
  • Synthesize, Map, or Describe the Results
  • Find Guidance by Discipline
  • Manage Your Research Data
  • Browse Evidence Portals by Discipline
  • Automate the Process, Tools & Technologies
  • Adapting Systematic Review Methods
  • Additional Resources

Choosing a Literature Review Methodology

Growing interest in evidence-based practice has driven an increase in review methodologies. Your choice of review methodology (or literature review type) will be informed by the intent (purpose, function) of your research project and the time and resources of your team. 

  • Decision Tree (What Type of Review is Right for You?) Developed by Cornell University Library staff, this "decision-tree" guides the user to a handful of review guides given time and intent.

Types of Evidence Synthesis*

Critical Review - Aims to demonstrate writer has extensively researched literature and critically evaluated its quality. Goes beyond mere description to include degree of analysis and conceptual innovation. Typically results in hypothesis or model.

Mapping Review (Systematic Map) - Map out and categorize existing literature from which to commission further reviews and/or primary research by identifying gaps in research literature.

Meta-Analysis - Technique that statistically combines the results of quantitative studies to provide a more precise effect of the results.

Mixed Studies Review (Mixed Methods Review) - Refers to any combination of methods where one significant component is a literature review (usually systematic). Within a review context it refers to a combination of review approaches for example combining quantitative with qualitative research or outcome with process studies.

Narrative (Literature) Review - Broad, generic term - Refers to an examination and general synthesis of the research literature, often with a wide scope; completeness and comprehensiveness may vary. Does not follow an established protocol.

Overview - Generic term: summary of the [medical] literature that attempts to survey the literature and describe its characteristics.

Qualitative Systematic Review or Qualitative Evidence Synthesis - Method for integrating or comparing the findings from qualitative studies. It looks for ‘themes’ or ‘constructs’ that lie in or across individual qualitative studies.

Rapid Review - Assessment of what is already known about a policy or practice issue, by using systematic review methods to search and critically appraise existing research.

Scoping Review or Evidence Map - Preliminary assessment of potential size and scope of available research literature. Aims to identify nature and extent of research.

State-of-the-art Review - Tend to address more current matters in contrast to other combined retrospective and current approaches. May offer new perspectives on issue or point out area for further research.

Systematic Review - Seeks to systematically search for, appraise and synthesize research evidence, often adhering to guidelines on the conduct of a review. (An emerging subset includes Living Reviews or Living Systematic Reviews - A [review or] systematic review which is continually updated, incorporating relevant new evidence as it becomes available.)

Systematic Search and Review - Combines strengths of critical review with a comprehensive search process. Typically addresses broad questions to produce ‘best evidence synthesis.’

Umbrella Review - Specifically refers to review compiling evidence from multiple reviews into one accessible and usable document. Focuses on broad condition or problem for which there are competing interventions and highlights reviews that address these interventions and their results.

*Apart from some qualifying description for "Narrative (Literature) Review", these definitions are provided in Grant & Booth's "A Typology of Reviews: An Analysis of 14 Review Types and Associated Methodologies."

Literature Review Types/Typologies, Taxonomies

Grant, M. J., and A. Booth. "A Typology of Reviews: An Analysis of 14 Review Types and Associated Methodologies."  Health Information and Libraries Journal  26.2 (2009): 91-108.  DOI: 10.1111/j.1471-1842.2009.00848.x  Link

Munn, Zachary, et al. “Systematic Review or Scoping Review? Guidance for Authors When Choosing between a Systematic or Scoping Review Approach.” BMC Medical Research Methodology , vol. 18, no. 1, Nov. 2018, p. 143. DOI: 10.1186/s12874-018-0611-x. Link

Sutton, A., et al. "Meeting the Review Family: Exploring Review Types and Associated Information Retrieval Requirements."  Health Information and Libraries Journal  36.3 (2019): 202-22.  DOI: 10.1111/hir.12276  Link

  • << Previous: Home
  • Next: The Systematic Review Process >>
  • Last Updated: Sep 27, 2024 11:50 AM
  • URL: https://researchguides.library.wisc.edu/literature_review

Literature Reviews

  • Getting Started

Selecting a Review Type

Defining the scope of your review, four common types of reviews.

  • Developing a Research Question
  • Searching the Literature
  • Searching Tips
  • ChatGPT [beta]
  • Documenting your Search
  • Using Citation Managers
  • Concept Mapping
  • Writing the Review
  • Further Resources

More Review Types

types of the literature review

This article by Sutton & Booth (2019) explores 48 distinct types of Literature Reviews:

Which Review is Right for You?

types of the literature review

The  Right Review tool  has questions about your lit review process and plans. It offers a qualitative and quantitative option. At completion, you are given a lit review type recommendation.

types of the literature review

You'll want to think about the kind of review you are doing. Is it a selective or comprehensive review? Is the review part of a larger work or a stand-alone work ?

For example, if you're writing the Literature Review section of a journal article, that's a selective review which is part of a larger work. Alternatively, if you're writing a review article, that's a comprehensive review which is a stand-alone work. Thinking about this will help you develop the scope of the review.

This exercise will help define the scope of your Literature Review, setting the boundaries for which literature to include and which to exclude.

A FEW GENERAL CONSIDERATIONS WHEN DEFINING SCOPE

  • Which populations to investigate — this can include gender, age, socio-economic status, race, geographic location, etc., if the research area includes humans.
  • What years to include — if researching the legalization of medicinal cannabis, you might only look at the previous 20 years; but if researching dolphin mating practices, you might extend many more decades.
  • Which subject areas — if researching artificial intelligence, subject areas could be computer science, robotics, or health sciences
  • How many sources  — a selective review for a class assignment might only need ten, while a comprehensive review for a dissertation might include hundreds. There is no one right answer.
  • There will be many other considerations that are more specific to your topic. 

Most databases will allow you to limit years and subject areas, so look for those tools while searching. See the Searching Tips tab for information on how use these tools.

LITERATURE REVIEW

  • Often used as a generic term to describe any type of review
  • More precise definition:  Published materials that provide an examination of published literature . Can cover wide range of subjects at various levels of comprehensiveness.
  • Identifies gaps in research, explains importance of topic, hypothesizes future work, etc.
  • Usually written as part of a larger work like a journal article or dissertation

SCOPING REVIEW

  • Conducted to address broad research questions with the goal of understanding the extent of research that has been conducted.
  • Provides a preliminary assessment of the potential size and scope of available research literature. It aims to identify the nature and extent of research evidence (usually including ongoing research) 
  • Doesn't assess the quality of the literature gathered (i.e. presence of literature on a topic shouldn’t be conflated w/ the quality of that literature)
  • " Preparing scoping reviews for publication using methodological guides and reporting standards " is a great article to read on Scoping Reviews

SYSTEMATIC REVIEW

  • Common in the health sciences ( Taubman Health Sciences Library guide to Systematic Reviews )
  • Goal: collect all literature that meets specific criteria (methodology, population, treatment, etc.) and then appraise its quality and synthesize it
  • Follows strict protocol for literature collection, appraisal and synthesis
  • Typically performed by research teams 
  • Takes 12-18 months to complete
  • Often written as a stand alone work

META-ANALYSIS

  • Goes one step further than a systematic review by statistically combining the results of quantitative studies to provide a more precise effect of the results. 
  • << Previous: Getting Started
  • Next: Developing a Research Question >>
  • Last Updated: Sep 17, 2024 12:03 PM
  • URL: https://guides.lib.umich.edu/litreview

Harvey Cushing/John Hay Whitney Medical Library

  • Collections
  • Research Help

YSN Doctoral Programs: Steps in Conducting a Literature Review

  • Biomedical Databases
  • Global (Public Health) Databases
  • Soc. Sci., History, and Law Databases
  • Grey Literature
  • Trials Registers
  • Data and Statistics
  • Public Policy
  • Google Tips
  • Recommended Books
  • Steps in Conducting a Literature Review

What is a literature review?

A literature review is an integrated analysis -- not just a summary-- of scholarly writings and other relevant evidence related directly to your research question.  That is, it represents a synthesis of the evidence that provides background information on your topic and shows a association between the evidence and your research question.

A literature review may be a stand alone work or the introduction to a larger research paper, depending on the assignment.  Rely heavily on the guidelines your instructor has given you.

Why is it important?

A literature review is important because it:

  • Explains the background of research on a topic.
  • Demonstrates why a topic is significant to a subject area.
  • Discovers relationships between research studies/ideas.
  • Identifies major themes, concepts, and researchers on a topic.
  • Identifies critical gaps and points of disagreement.
  • Discusses further research questions that logically come out of the previous studies.

APA7 Style resources

Cover Art

APA Style Blog - for those harder to find answers

1. Choose a topic. Define your research question.

Your literature review should be guided by your central research question.  The literature represents background and research developments related to a specific research question, interpreted and analyzed by you in a synthesized way.

  • Make sure your research question is not too broad or too narrow.  Is it manageable?
  • Begin writing down terms that are related to your question. These will be useful for searches later.
  • If you have the opportunity, discuss your topic with your professor and your class mates.

2. Decide on the scope of your review

How many studies do you need to look at? How comprehensive should it be? How many years should it cover? 

  • This may depend on your assignment.  How many sources does the assignment require?

3. Select the databases you will use to conduct your searches.

Make a list of the databases you will search. 

Where to find databases:

  • use the tabs on this guide
  • Find other databases in the Nursing Information Resources web page
  • More on the Medical Library web page
  • ... and more on the Yale University Library web page

4. Conduct your searches to find the evidence. Keep track of your searches.

  • Use the key words in your question, as well as synonyms for those words, as terms in your search. Use the database tutorials for help.
  • Save the searches in the databases. This saves time when you want to redo, or modify, the searches. It is also helpful to use as a guide is the searches are not finding any useful results.
  • Review the abstracts of research studies carefully. This will save you time.
  • Use the bibliographies and references of research studies you find to locate others.
  • Check with your professor, or a subject expert in the field, if you are missing any key works in the field.
  • Ask your librarian for help at any time.
  • Use a citation manager, such as EndNote as the repository for your citations. See the EndNote tutorials for help.

Review the literature

Some questions to help you analyze the research:

  • What was the research question of the study you are reviewing? What were the authors trying to discover?
  • Was the research funded by a source that could influence the findings?
  • What were the research methodologies? Analyze its literature review, the samples and variables used, the results, and the conclusions.
  • Does the research seem to be complete? Could it have been conducted more soundly? What further questions does it raise?
  • If there are conflicting studies, why do you think that is?
  • How are the authors viewed in the field? Has this study been cited? If so, how has it been analyzed?

Tips: 

  • Review the abstracts carefully.  
  • Keep careful notes so that you may track your thought processes during the research process.
  • Create a matrix of the studies for easy analysis, and synthesis, across all of the studies.
  • << Previous: Recommended Books
  • Last Updated: Jun 20, 2024 9:08 AM
  • URL: https://guides.library.yale.edu/YSNDoctoral

Banner

How to Conduct a Literature Review: Types of Literature Reviews

  • What is a Literature Review?

Types of Literature Reviews

  • Finding "The Literature"
  • Organizing/Writing
  • Citation Help

Need more help? Ask a librarian!

  • Online Form
  • Contact a Subject Specialist

Reference hours:

  • Mon-Thurs: 9 am - 11 pm
  • Fri: 9 am - 4 pm
  • Sat: 9 am - 4:30 pm
  • Sun: 3 p.m. - 10:30 p.m.

Literature reviews are pervasive throughout various academic disciplines, and thus you can adopt various approaches to effectively organize and write your literature review.  The University of Southern California created a summarized list of the various types of literature reviews, reprinted here:

  • Argumentative Review
  • Integrative Review Considered a form of research that reviews, critiques, and synthesizes representative literature on a topic in an integrated way such that new frameworks and perspectives on the topic are generated. The body of literature includes all studies that address related or identical hypotheses. A well-done integrative review meets the same standards as primary research in regard to clarity, rigor, and replication.
  • Historical Review Few things rest in isolation from historical precedent. Historical reviews are focused on examining research throughout a period of time, often starting with the first time an issue, concept, theory, phenomena emerged in the literature, then tracing its evolution within the scholarship of a discipline. The purpose is to place research in a historical context to show familiarity with state-of-the-art developments and to identify the likely directions for future research.
  • Methodological Review A review does not always focus on  what  someone said [content], but  how  they said it [method of analysis]. This approach provides a framework of understanding at different levels (i.e. those of theory, substantive fields, research approaches and data collection and analysis techniques), enables researchers to draw on a wide variety of knowledge ranging from the conceptual level to practical documents for use in fieldwork in the areas of ontological and epistemological consideration, quantitative and qualitative integration, sampling, interviewing, data collection and data analysis, and helps highlight many ethical issues which we should be aware of and consider as we go through our study.
  • Systematic Review This form consists of an overview of existing evidence pertinent to a clearly formulated research question, which uses pre-specified and standardized methods to identify and critically appraise relevant research, and to collect, report, and analyse data from the studies that are included in the review. Typically it focuses on a very specific empirical question, often posed in a cause-and-effect form, such as "To what extent does A contribute to B?"
  • Theoretical Review The purpose of this form is to concretely examine the corpus of theory that has accumulated in regard to an issue, concept, theory, phenomena. The theoretical literature review help establish what theories already exist, the relationships between them, to what degree the existing theories have been investigated, and to develop new hypotheses to be tested. Often this form is used to help establish a lack of appropriate theories or reveal that current theories are inadequate for explaining new or emerging research problems. The unit of analysis can focus on a theoretical concept or a whole theory or framework.
  • << Previous: Starting Your Research
  • Next: Finding "The Literature" >>
  • Last Updated: Aug 9, 2024 11:12 AM
  • URL: https://libguides.jsu.edu/literaturereview

How to Conduct a Literature Review: A Guide for Graduate Students

  • Let's Get Started!
  • Traditional or Narrative Reviews
  • Systematic Reviews
  • Typology of Reviews
  • Literature Review Resources
  • Developing a Search Strategy
  • What Literature to Search
  • Where to Search: Indexes and Databases
  • Finding articles: Libkey Nomad
  • Finding Dissertations and Theses
  • Extending Your Searching with Citation Chains
  • Forward Citation Chains - Cited Reference Searching
  • Keeping up with the Literature
  • Managing Your References
  • Need More Information?

What is a Literature Review?

A literature review summarizes and synthesizes material on a research topic. It provides a summary of previous research and provides context for the material presented in your thesis. The literature review is your opportunity to show what you understand about your topic area, and distinguish previous research from the work you are doing. For example, your thesis may be building on an existing theory or model and extending it a new direction. It's important to provide context for your project by providing a roadmap to previous literature.

Purpose of a Literature Review

  • Identifies gaps in current knowledge
  • Helps you to avoid reinventing the wheel by discovering the research already conducted on a topic
  • Sets the background on what has been explored on a topic so far
  • Increases your breadth of knowledge in your area of research
  • Helps you identify seminal works in your area
  • Allows you to provide the intellectual context for your work and position your research with other, related research
  • Provides you with opposing viewpoints
  • Helps you to discover research methods which may be applicable to your work
  • Research methods for post graduates Full cite: Greenfield (2002) Research Methods for postgraduates. 2nd Ed. London: Arnold
  • A typology of reviews: an analysis of 14 review types and associated methodologies Full cite: Grant, M. J. and Booth, A. (2009), A typology of reviews: an analysis of 14 review types and associated methodologies. Health Information & Libraries Journal, 26, 91–108. doi:10.1111/j.1471-1842.2009.00848.x

Literature Reviews: An Overview for Graduate Students

This section adapted from The Literature Review, by Charles Stuart University Library. Available: https://libguides.csu.edu.au/review.

  • << Previous: Let's Get Started!
  • Next: Traditional or Narrative Reviews >>

The library's collections and services are available to all ISU students, faculty, and staff and Parks Library is open to the public .

  • Last Updated: Aug 12, 2024 4:07 PM
  • URL: https://instr.iastate.libguides.com/gradlitrev

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PLoS Comput Biol
  • v.9(7); 2013 Jul

Logo of ploscomp

Ten Simple Rules for Writing a Literature Review

Marco pautasso.

1 Centre for Functional and Evolutionary Ecology (CEFE), CNRS, Montpellier, France

2 Centre for Biodiversity Synthesis and Analysis (CESAB), FRB, Aix-en-Provence, France

Literature reviews are in great demand in most scientific fields. Their need stems from the ever-increasing output of scientific publications [1] . For example, compared to 1991, in 2008 three, eight, and forty times more papers were indexed in Web of Science on malaria, obesity, and biodiversity, respectively [2] . Given such mountains of papers, scientists cannot be expected to examine in detail every single new paper relevant to their interests [3] . Thus, it is both advantageous and necessary to rely on regular summaries of the recent literature. Although recognition for scientists mainly comes from primary research, timely literature reviews can lead to new synthetic insights and are often widely read [4] . For such summaries to be useful, however, they need to be compiled in a professional way [5] .

When starting from scratch, reviewing the literature can require a titanic amount of work. That is why researchers who have spent their career working on a certain research issue are in a perfect position to review that literature. Some graduate schools are now offering courses in reviewing the literature, given that most research students start their project by producing an overview of what has already been done on their research issue [6] . However, it is likely that most scientists have not thought in detail about how to approach and carry out a literature review.

Reviewing the literature requires the ability to juggle multiple tasks, from finding and evaluating relevant material to synthesising information from various sources, from critical thinking to paraphrasing, evaluating, and citation skills [7] . In this contribution, I share ten simple rules I learned working on about 25 literature reviews as a PhD and postdoctoral student. Ideas and insights also come from discussions with coauthors and colleagues, as well as feedback from reviewers and editors.

Rule 1: Define a Topic and Audience

How to choose which topic to review? There are so many issues in contemporary science that you could spend a lifetime of attending conferences and reading the literature just pondering what to review. On the one hand, if you take several years to choose, several other people may have had the same idea in the meantime. On the other hand, only a well-considered topic is likely to lead to a brilliant literature review [8] . The topic must at least be:

  • interesting to you (ideally, you should have come across a series of recent papers related to your line of work that call for a critical summary),
  • an important aspect of the field (so that many readers will be interested in the review and there will be enough material to write it), and
  • a well-defined issue (otherwise you could potentially include thousands of publications, which would make the review unhelpful).

Ideas for potential reviews may come from papers providing lists of key research questions to be answered [9] , but also from serendipitous moments during desultory reading and discussions. In addition to choosing your topic, you should also select a target audience. In many cases, the topic (e.g., web services in computational biology) will automatically define an audience (e.g., computational biologists), but that same topic may also be of interest to neighbouring fields (e.g., computer science, biology, etc.).

Rule 2: Search and Re-search the Literature

After having chosen your topic and audience, start by checking the literature and downloading relevant papers. Five pieces of advice here:

  • keep track of the search items you use (so that your search can be replicated [10] ),
  • keep a list of papers whose pdfs you cannot access immediately (so as to retrieve them later with alternative strategies),
  • use a paper management system (e.g., Mendeley, Papers, Qiqqa, Sente),
  • define early in the process some criteria for exclusion of irrelevant papers (these criteria can then be described in the review to help define its scope), and
  • do not just look for research papers in the area you wish to review, but also seek previous reviews.

The chances are high that someone will already have published a literature review ( Figure 1 ), if not exactly on the issue you are planning to tackle, at least on a related topic. If there are already a few or several reviews of the literature on your issue, my advice is not to give up, but to carry on with your own literature review,

An external file that holds a picture, illustration, etc.
Object name is pcbi.1003149.g001.jpg

The bottom-right situation (many literature reviews but few research papers) is not just a theoretical situation; it applies, for example, to the study of the impacts of climate change on plant diseases, where there appear to be more literature reviews than research studies [33] .

  • discussing in your review the approaches, limitations, and conclusions of past reviews,
  • trying to find a new angle that has not been covered adequately in the previous reviews, and
  • incorporating new material that has inevitably accumulated since their appearance.

When searching the literature for pertinent papers and reviews, the usual rules apply:

  • be thorough,
  • use different keywords and database sources (e.g., DBLP, Google Scholar, ISI Proceedings, JSTOR Search, Medline, Scopus, Web of Science), and
  • look at who has cited past relevant papers and book chapters.

Rule 3: Take Notes While Reading

If you read the papers first, and only afterwards start writing the review, you will need a very good memory to remember who wrote what, and what your impressions and associations were while reading each single paper. My advice is, while reading, to start writing down interesting pieces of information, insights about how to organize the review, and thoughts on what to write. This way, by the time you have read the literature you selected, you will already have a rough draft of the review.

Of course, this draft will still need much rewriting, restructuring, and rethinking to obtain a text with a coherent argument [11] , but you will have avoided the danger posed by staring at a blank document. Be careful when taking notes to use quotation marks if you are provisionally copying verbatim from the literature. It is advisable then to reformulate such quotes with your own words in the final draft. It is important to be careful in noting the references already at this stage, so as to avoid misattributions. Using referencing software from the very beginning of your endeavour will save you time.

Rule 4: Choose the Type of Review You Wish to Write

After having taken notes while reading the literature, you will have a rough idea of the amount of material available for the review. This is probably a good time to decide whether to go for a mini- or a full review. Some journals are now favouring the publication of rather short reviews focusing on the last few years, with a limit on the number of words and citations. A mini-review is not necessarily a minor review: it may well attract more attention from busy readers, although it will inevitably simplify some issues and leave out some relevant material due to space limitations. A full review will have the advantage of more freedom to cover in detail the complexities of a particular scientific development, but may then be left in the pile of the very important papers “to be read” by readers with little time to spare for major monographs.

There is probably a continuum between mini- and full reviews. The same point applies to the dichotomy of descriptive vs. integrative reviews. While descriptive reviews focus on the methodology, findings, and interpretation of each reviewed study, integrative reviews attempt to find common ideas and concepts from the reviewed material [12] . A similar distinction exists between narrative and systematic reviews: while narrative reviews are qualitative, systematic reviews attempt to test a hypothesis based on the published evidence, which is gathered using a predefined protocol to reduce bias [13] , [14] . When systematic reviews analyse quantitative results in a quantitative way, they become meta-analyses. The choice between different review types will have to be made on a case-by-case basis, depending not just on the nature of the material found and the preferences of the target journal(s), but also on the time available to write the review and the number of coauthors [15] .

Rule 5: Keep the Review Focused, but Make It of Broad Interest

Whether your plan is to write a mini- or a full review, it is good advice to keep it focused 16 , 17 . Including material just for the sake of it can easily lead to reviews that are trying to do too many things at once. The need to keep a review focused can be problematic for interdisciplinary reviews, where the aim is to bridge the gap between fields [18] . If you are writing a review on, for example, how epidemiological approaches are used in modelling the spread of ideas, you may be inclined to include material from both parent fields, epidemiology and the study of cultural diffusion. This may be necessary to some extent, but in this case a focused review would only deal in detail with those studies at the interface between epidemiology and the spread of ideas.

While focus is an important feature of a successful review, this requirement has to be balanced with the need to make the review relevant to a broad audience. This square may be circled by discussing the wider implications of the reviewed topic for other disciplines.

Rule 6: Be Critical and Consistent

Reviewing the literature is not stamp collecting. A good review does not just summarize the literature, but discusses it critically, identifies methodological problems, and points out research gaps [19] . After having read a review of the literature, a reader should have a rough idea of:

  • the major achievements in the reviewed field,
  • the main areas of debate, and
  • the outstanding research questions.

It is challenging to achieve a successful review on all these fronts. A solution can be to involve a set of complementary coauthors: some people are excellent at mapping what has been achieved, some others are very good at identifying dark clouds on the horizon, and some have instead a knack at predicting where solutions are going to come from. If your journal club has exactly this sort of team, then you should definitely write a review of the literature! In addition to critical thinking, a literature review needs consistency, for example in the choice of passive vs. active voice and present vs. past tense.

Rule 7: Find a Logical Structure

Like a well-baked cake, a good review has a number of telling features: it is worth the reader's time, timely, systematic, well written, focused, and critical. It also needs a good structure. With reviews, the usual subdivision of research papers into introduction, methods, results, and discussion does not work or is rarely used. However, a general introduction of the context and, toward the end, a recapitulation of the main points covered and take-home messages make sense also in the case of reviews. For systematic reviews, there is a trend towards including information about how the literature was searched (database, keywords, time limits) [20] .

How can you organize the flow of the main body of the review so that the reader will be drawn into and guided through it? It is generally helpful to draw a conceptual scheme of the review, e.g., with mind-mapping techniques. Such diagrams can help recognize a logical way to order and link the various sections of a review [21] . This is the case not just at the writing stage, but also for readers if the diagram is included in the review as a figure. A careful selection of diagrams and figures relevant to the reviewed topic can be very helpful to structure the text too [22] .

Rule 8: Make Use of Feedback

Reviews of the literature are normally peer-reviewed in the same way as research papers, and rightly so [23] . As a rule, incorporating feedback from reviewers greatly helps improve a review draft. Having read the review with a fresh mind, reviewers may spot inaccuracies, inconsistencies, and ambiguities that had not been noticed by the writers due to rereading the typescript too many times. It is however advisable to reread the draft one more time before submission, as a last-minute correction of typos, leaps, and muddled sentences may enable the reviewers to focus on providing advice on the content rather than the form.

Feedback is vital to writing a good review, and should be sought from a variety of colleagues, so as to obtain a diversity of views on the draft. This may lead in some cases to conflicting views on the merits of the paper, and on how to improve it, but such a situation is better than the absence of feedback. A diversity of feedback perspectives on a literature review can help identify where the consensus view stands in the landscape of the current scientific understanding of an issue [24] .

Rule 9: Include Your Own Relevant Research, but Be Objective

In many cases, reviewers of the literature will have published studies relevant to the review they are writing. This could create a conflict of interest: how can reviewers report objectively on their own work [25] ? Some scientists may be overly enthusiastic about what they have published, and thus risk giving too much importance to their own findings in the review. However, bias could also occur in the other direction: some scientists may be unduly dismissive of their own achievements, so that they will tend to downplay their contribution (if any) to a field when reviewing it.

In general, a review of the literature should neither be a public relations brochure nor an exercise in competitive self-denial. If a reviewer is up to the job of producing a well-organized and methodical review, which flows well and provides a service to the readership, then it should be possible to be objective in reviewing one's own relevant findings. In reviews written by multiple authors, this may be achieved by assigning the review of the results of a coauthor to different coauthors.

Rule 10: Be Up-to-Date, but Do Not Forget Older Studies

Given the progressive acceleration in the publication of scientific papers, today's reviews of the literature need awareness not just of the overall direction and achievements of a field of inquiry, but also of the latest studies, so as not to become out-of-date before they have been published. Ideally, a literature review should not identify as a major research gap an issue that has just been addressed in a series of papers in press (the same applies, of course, to older, overlooked studies (“sleeping beauties” [26] )). This implies that literature reviewers would do well to keep an eye on electronic lists of papers in press, given that it can take months before these appear in scientific databases. Some reviews declare that they have scanned the literature up to a certain point in time, but given that peer review can be a rather lengthy process, a full search for newly appeared literature at the revision stage may be worthwhile. Assessing the contribution of papers that have just appeared is particularly challenging, because there is little perspective with which to gauge their significance and impact on further research and society.

Inevitably, new papers on the reviewed topic (including independently written literature reviews) will appear from all quarters after the review has been published, so that there may soon be the need for an updated review. But this is the nature of science [27] – [32] . I wish everybody good luck with writing a review of the literature.

Acknowledgments

Many thanks to M. Barbosa, K. Dehnen-Schmutz, T. Döring, D. Fontaneto, M. Garbelotto, O. Holdenrieder, M. Jeger, D. Lonsdale, A. MacLeod, P. Mills, M. Moslonka-Lefebvre, G. Stancanelli, P. Weisberg, and X. Xu for insights and discussions, and to P. Bourne, T. Matoni, and D. Smith for helpful comments on a previous draft.

Funding Statement

This work was funded by the French Foundation for Research on Biodiversity (FRB) through its Centre for Synthesis and Analysis of Biodiversity data (CESAB), as part of the NETSEED research project. The funders had no role in the preparation of the manuscript.

Usc Upstate Library Home

Literature Review: Types of Literature Reviews

  • Literature Review
  • Purpose of a Literature Review
  • Work in Progress
  • Compiling & Writing
  • Books, Articles, & Web Pages

Types of Literature Reviews

  • Departmental Differences
  • Citation Styles & Plagiarism
  • Know the Difference! Systematic Review vs. Literature Review

It is important to think of knowledge in a given field as consisting of three layers.

  • First, there are the primary studies that researchers conduct and publish.
  • Second, are the reviews of those studies that summarize and offer new interpretations built from and often extending beyond the original studies.
  • Third, there are the perceptions, conclusions, opinions, and interpretations that are shared informally that become part of the lore of the field.

In composing a literature review, it is important to note that it is often this third layer of knowledge that is cited as "true" even though it often has only a loose relationship to the primary studies and secondary literature reviews.

Given this, while literature reviews are designed to provide an overview and synthesis of pertinent sources you have explored, there are several approaches to how they can be done, depending upon the type of analysis underpinning your study. Listed below are definitions of types of literature reviews:

Argumentative Review      This form examines literature selectively in order to support or refute an argument, deeply embedded assumption, or philosophical problem already established in the literature. The purpose is to develop a body of literature that establishes a contrarian viewpoint. Given the value-laden nature of some social science research [e.g., educational reform; immigration control], argumentative approaches to analyzing the literature can be a legitimate and important form of discourse. However, note that they can also introduce problems of bias when they are used to make summary claims of the sort found in systematic reviews.

Integrative Review      Considered a form of research that reviews, critiques, and synthesizes representative literature on a topic in an integrated way such that new frameworks and perspectives on the topic are generated. The body of literature includes all studies that address related or identical hypotheses. A well-done integrative review meets the same standards as primary research in regard to clarity, rigor, and replication.

Historical Review      Few things rest in isolation from historical precedent. Historical reviews are focused on examining research throughout a period of time, often starting with the first time an issue, concept, theory, phenomenon emerged in the literature, then tracing its evolution within the scholarship of a discipline. The purpose is to place research in a historical context to show familiarity with state-of-the-art developments and to identify the likely directions for future research.

Methodological Review      A review does not always focus on what someone said [content], but how they said it [method of analysis]. This approach provides a framework of understanding at different levels (i.e. those of theory, substantive fields, research approaches, and data collection and analysis techniques), enables researchers to draw on a wide variety of knowledge ranging from the conceptual level to practical documents for use in fieldwork in the areas of ontological and epistemological consideration, quantitative and qualitative integration, sampling, interviewing, data collection and data analysis, and helps highlight many ethical issues which we should be aware of and consider as we go through our study.

Systematic Review      This form consists of an overview of existing evidence pertinent to a clearly formulated research question, which uses pre-specified and standardized methods to identify and critically appraise relevant research, and to collect, report, and analyze data from the studies that are included in the review. Typically it focuses on a very specific empirical question, often posed in a cause-and-effect form, such as "To what extent does A contribute to B?"

Theoretical Review      The purpose of this form is to concretely examine the corpus of theory that has accumulated in regard to an issue, concept, theory, phenomenon. The theoretical literature review help establish what theories already exist, the relationships between them, to what degree the existing theories have been investigated, and to develop new hypotheses to be tested. Often this form is used to help establish a lack of appropriate theories or reveal that current theories are inadequate for explaining new or emerging research problems. The unit of analysis can focus on a theoretical concept or a whole theory or framework.

* Kennedy, Mary M. "Defining a Literature." Educational Researcher 36 (April 2007): 139-147.

All content is from The Literature Review created by Dr. Robert Larabee USC

  • << Previous: Books, Articles, & Web Pages
  • Next: Departmental Differences >>
  • Last Updated: Sep 10, 2024 11:32 AM
  • URL: https://uscupstate.libguides.com/Literature_Review

Systematic Reviews: Methods & Resources

  • Methods & Resources
  • Protocol & Registration
  • Search Strategy
  • Where to Search
  • Study selection and appraisal
  • Data Extraction, Study Characteristics, Results
  • Reporting the quality/risk of bias
  • PRISMA Reporting Items
  • Manage citations using RefWorks This link opens in a new window
  • Covidence Guide This link opens in a new window

Table of Contents

Many organizations have created guidelines to standardize reporting of analytical research. See some of the main ones below. The NIH offers a useful chart of Research Reporting Guidelines , and you can find over 500 on the EQUATOR network

  • PRISMA Guidelines Gold-standard guideline on how to perform and write-up a systematic review and/or meta-analysis of the outcomes reported in multiple clinical trials of therapeutic interventions
  • AHRQ's Methods Guide for Effectiveness and Comparative Effectiveness Reviews
  • Synthesis without meta-analysis (SWiM) in systematic reviews Campbell, M. (2020). Synthesis without meta-analysis (SWiM) in systematic reviews: reporting guideline. BMJ, 368. Guideline on how to analyze evidence for a narrative review, to provide a recommendation based on heterogenous study types
  • Methods Manual for Community Guide Systematic Reviews Community Preventive Services Task Force (2021). The Methods Manual for Community Guide Systematic Reviews. (Public Health Prevention systematic review guidelines)
  • Planning Worksheet for Structured Literature Reviews Cornell University Library (2019). A basic framework for a literature review.
  • STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) statement
  • MOOSE Reporting Guidelines for Meta-analyses of Observational Studies Brooke BS, Schwartz TA, Pawlik TM. MOOSE Reporting Guidelines for Meta-analyses of Observational Studies. JAMA Surg. 2021;156(8):787–788. doi:10.1001/jamasurg.2021.0522

Tools and Guidance

  • Right Review Flowchart to help you choose the proper review methodology for your project
  • Systematic Review Accelerator Catalog of tools that support various tasks within the systematic review and wider evidence synthesis process. Tools include the 'Polyglot Search Translator'.
  • Institute of Medicine. (2011). Finding What Works in Health Care: Standards for Systematic Reviews. Washington, DC: National Academies
  • Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly work in Medical Journals International Committee of Medical Journal Editors (2022)
  • Cochrane Handbook for Systematic Reviews of Interventions
  • Joanna Briggs Institute (JBI) Manual for Evidence Synthesis Provides guidance on how to analyze both quantitative and qualitative research
  • How to do a systematic review Pollock, A., & Berge, E. (2018). International journal of stroke : official journal of the International Stroke Society, 13(2), 138–156. https://doi.org/10.1177/1747493017743796
  • Cochrane Qualitative & Implementation Methods Group. (2019). Training resources.
  • Meeting the review family: exploring review types and associated information retrieval requirements Sutton, A., Clowes, M., Preston, L., & Booth, A. (2019). Health information and libraries journal, 36(3), 202–222. https://doi.org/10.1111/hir.12276

Cover Art

Software tools for systematic reviews

  • Covidence Available for free to GW affiliates, this is a popular tool for facilitating screening decisions, used by the Cochrane Collaboration. Register for an account.
  • Statistical software available at Himmelfarb SPSS, SAS, Stata, NVivo, Atlas.ti, and MATLAB
  • RedCAP Software to create survey forms for research or data collection or data extraction.
  • SRDR tool from AHRQ Free, web-based and has a training environment, tutorials, and example templates of systematic review data extraction forms
  • RevMan 5 ReviewManager (RevMan) is Cochrane's bespoke software for writing Cochrane Reviews.
  • Rayyan Free, web-based tool for collecting and screening citations. It has options to screen with multiple people, masking each other.
  • GradePro Free, web application to create, manage and share summaries of research evidence (called Evidence Profiles and Summary of Findings Tables) for reviews or guidelines, uses the GRADE criteria to evaluate each paper under review.
  • DistillerSR Needs subscription. Create coded data extraction forms from templates.
  • EPPI Reviewer Needs subscription. Like DistillerSR, tool for text mining, data clustering, classification and term extraction
  • SUMARI Needs subscription. Qualitative data analysis.
  • Dedoose Needs subscription. Qualitative data analysis, similar to NVIVO in that it can be used to code interview transcripts, identify word co-occurence, cloud based.

Forest Plot Generators

  • Meta-Essentials a free set of workbooks designed for Microsoft Excel that, based on your input, automatically produce meta-analyses including Forest Plots. Produced for Erasmus University Rotterdam joint research institute.
  • Neyeloff, Fuchs & Moreira Another set of Excel worksheets and instructions to generate a Forest Plot. Published as Neyeloff, J.L., Fuchs, S.C. & Moreira, L.B. Meta-analyses and Forest plots using a microsoft excel spreadsheet: step-by-step guide focusing on descriptive data analysis. BMC Res Notes 5, 52 (2012). https://doi-org.proxygw.wrlc.org/10.1186/1756-0500-5-52
  • For R programmers instructions are at https://cran.r-project.org/web/packages/forestplot/vignettes/forestplot.html and you can download the R code package from https://github.com/gforge/forestplot
  • << Previous: Home
  • Next: Protocol & Registration >>

Creative Commons License

  • Last Updated: Sep 27, 2024 10:38 AM
  • URL: https://guides.himmelfarb.gwu.edu/systematic_review

GW logo

  • Himmelfarb Intranet
  • Privacy Notice
  • Terms of Use
  • GW is committed to digital accessibility. If you experience a barrier that affects your ability to access content on this page, let us know via the Accessibility Feedback Form .
  • Himmelfarb Health Sciences Library
  • 2300 Eye St., NW, Washington, DC 20037
  • Phone: (202) 994-2962
  • [email protected]
  • https://himmelfarb.gwu.edu

University of Texas

  • University of Texas Libraries
  • UT Libraries

Systematic Reviews & Evidence Synthesis Methods

Types of reviews.

  • Formulate Question
  • Find Existing Reviews & Protocols
  • Register a Protocol
  • Searching Systematically
  • Supplementary Searching
  • Managing Results
  • Deduplication
  • Critical Appraisal
  • Glossary of terms
  • Librarian Support
  • Video tutorials This link opens in a new window
  • Systematic Review & Evidence Synthesis Boot Camp

Not sure what type of review you want to conduct?

There are many types of reviews ---  narrative reviews ,  scoping reviews , systematic reviews, integrative reviews, umbrella reviews, rapid reviews and others --- and it's not always straightforward to choose which type of review to conduct. These Review Navigator tools (see below) ask a series of questions to guide you through the various kinds of reviews and to help you determine the best choice for your research needs.

  • Which review is right for you? (Univ. of Manitoba)
  • What type of review is right for you? (Cornell)
  • Review Ready Reckoner - Assessment Tool (RRRsAT)
  • A typology of reviews: an analysis of 14 review types and associated methodologies. by Grant & Booth
  • Meeting the review family: exploring review types and associated information retrieval requirements | Health Info Libr J, 2019
Label Description Search Appraisal Synthesis Analysis
Critical Review Aims to demonstrate writer has extensively researched literature and critically evaluated its quality. Goes beyond mere description to include degree of analysis and conceptual innovation. Typically results in hypothesis or model Seeks to identify most significant items in the field No formal quality assessment. Attempts to evaluate according to contribution Typically narrative, perhaps conceptual or chronological Significant component: seeks to identify conceptual contribution to embody existing or derive new theory
Literature Review Generic term: published materials that provide examination of recent or current literature. Can cover wide range of subjects at various levels of completeness and comprehensiveness. May include research findings May or may not include comprehensive searching May or may not include quality assessment Typically narrative Analysis may be chronological, conceptual, thematic, etc.
Mapping review/ systematic map Map out and categorize existing literature from which to commission further reviews and/or primary research by identifying gaps in research literature Completeness of searching determined by time/scope constraints No formal quality assessment May be graphical and tabular Characterizes quantity and quality of literature, perhaps by study design and other key features. May identify need for primary or secondary research
Meta-analysis Technique that statistically combines the results of quantitative studies to provide a more precise effect of the results Aims for exhaustive, comprehensive searching. May use funnel plot to assess completeness Quality assessment may determine inclusion/exclusion and/or sensitivity analyses Graphical and tabular with narrative commentary Numerical analysis of measures of effect assuming absence of heterogeneity
Mixed studies review/mixed methods review Refers to any combination of methods where one significant component is a literature review (usually systematic). Within a review context it refers to a combination of review approaches for example combining quantitative with qualitative research or outcome with process studies Requires either very sensitive search to retrieve all studies or separately conceived quantitative and qualitative strategies Requires either a generic appraisal instrument or separate appraisal processes with corresponding checklists Typically both components will be presented as narrative and in tables. May also employ graphical means of integrating quantitative and qualitative studies Analysis may characterise both literatures and look for correlations between characteristics or use gap analysis to identify aspects absent in one literature but missing in the other
Overview Generic term: summary of the [medical] literature that attempts to survey the literature and describe its characteristics May or may not include comprehensive searching (depends whether systematic overview or not) May or may not include quality assessment (depends whether systematic overview or not) Synthesis depends on whether systematic or not. Typically narrative but may include tabular features Analysis may be chronological, conceptual, thematic, etc.
Qualitative systematic review/qualitative evidence synthesis Method for integrating or comparing the findings from qualitative studies. It looks for ‘themes’ or ‘constructs’ that lie in or across individual qualitative studies May employ selective or purposive sampling Quality assessment typically used to mediate messages not for inclusion/exclusion Qualitative, narrative synthesis Thematic analysis, may include conceptual models
Rapid review Assessment of what is already known about a policy or practice issue, by using systematic review methods to search and critically appraise existing research Completeness of searching determined by time constraints Time-limited formal quality assessment Typically narrative and tabular Quantities of literature and overall quality/direction of effect of literature
Scoping review Preliminary assessment of potential size and scope of available research literature. Aims to identify nature and extent of research evidence (usually including ongoing research) Completeness of searching determined by time/scope constraints. May include research in progress No formal quality assessment Typically tabular with some narrative commentary Characterizes quantity and quality of literature, perhaps by study design and other key features. Attempts to specify a viable review
State-of-the-art review Tend to address more current matters in contrast to other combined retrospective and current approaches. May offer new perspectives on issue or point out area for further research Aims for comprehensive searching of current literature No formal quality assessment Typically narrative, may have tabular accompaniment Current state of knowledge and priorities for future investigation and research
Systematic review Seeks to systematically search for, appraise and synthesis research evidence, often adhering to guidelines on the conduct of a review Aims for exhaustive, comprehensive searching Quality assessment may determine inclusion/exclusion Typically narrative with tabular accompaniment What is known; recommendations for practice. What remains unknown; uncertainty around findings, recommendations for future research
Systematic search and review Combines strengths of critical review with a comprehensive search process. Typically addresses broad questions to produce ‘best evidence synthesis’ Aims for exhaustive, comprehensive searching May or may not include quality assessment Minimal narrative, tabular summary of studies What is known; recommendations for practice. Limitations
Systematized review Attempt to include elements of systematic review process while stopping short of systematic review. Typically conducted as postgraduate student assignment May or may not include comprehensive searching May or may not include quality assessment
Typically narrative with tabular accompaniment  

Reproduced from Grant MJ, Booth A. A typology of reviews: an analysis of 14 review types and associated methodologies . Health Info Libr J. 2009 Jun;26(2):91-108. doi: 10.1111/j.1471-1842.2009.00848.x

  • Last Updated: Sep 24, 2024 9:42 AM
  • URL: https://guides.lib.utexas.edu/systematicreviews

Creative Commons License

ON YOUR 1ST ORDER

Different Types of Literature Review: Which One Fits Your Research?

By Laura Brown on 13th October 2023

You might not have heard that there are multiple kinds of literature review. However, with the progress in your academic career you will learn these classifications and may need to use different types of them. However, there is nothing to worry if you aren’t aware of them now, as here we are going to discuss this topic in detail.

There are approximately 14 types of literature review on the basis of their specific objectives, methodologies, and the way they approach and analyse existing literature in academic research. Of those 14, there are 4 major types. But before we delve into the details of each one of them and how they are useful in academics, let’s first understand the basics of literature review.

Demystifying 14 Different Types of Literature Reviews

What is Literature Review?

A literature review is a critical and systematic summary and evaluation of existing research. It is an essential component of academic and research work, providing an overview of the current state of knowledge in a particular field.

In easy words, a literature review is like making a big, organised summary of all the important research and smart books or articles about a particular topic or question. It’s something scholars and researchers do, and it helps everyone see what we already know about that topic. It’s kind of like taking a snapshot of what we understand right now in a certain field.

It serves with some specific purpose in the research.

  • Provides a comprehensive understanding of existing research on a topic.
  • Identifies gaps, trends, and inconsistencies in the literature.
  • Contextualise your own research within the broader academic discourse.
  • Supports the development of theoretical frameworks or research hypotheses.

4 Major Types Of Literature Review

The four major types include, Narrative Review, Systematic Review, Meta-Analysis, and Scoping Review. These are known as the major ones because they’re like the “go-to” methods for researchers in academic and research circles. Think of them as the classic tools in the researcher’s toolbox. They’ve earned their reputation because they have a unique style for literature review introduction , clear steps and specific qualities that make them super handy for different research needs.

1. Narrative Review

Narrative reviews present a well-structured narrative that reads like a cohesive story, providing a comprehensive overview of a specific topic. These reviews often incorporate historical context and offer a broad understanding of the subject matter, making them valuable for researchers looking to establish a foundational understanding of their area of interest. They are particularly useful when a historical perspective or a broad context is necessary to comprehend the current state of knowledge in a field.

2. Systematic Review

Systematic reviews are renowned for their methodological rigour. They involve a meticulously structured process that includes the systematic selection of relevant studies, comprehensive data extraction, and a critical synthesis of their findings. This systematic approach is designed to minimise bias and subjectivity, making systematic reviews highly reliable and objective. They are considered the gold standard for evidence-based research as they provide a clear and rigorous assessment of the available evidence on a specific research question.

3. Meta Analysis

Meta analysis is a powerful method for researchers who prefer a quantitative and statistical perspective. It involves the statistical synthesis of data from various studies, allowing researchers to draw more precise and generalisable conclusions by combining data from multiple sources. Meta analyses are especially valuable when the aim is to quantitatively measure the effect size or impact of a particular intervention, treatment, or phenomenon.

4. Scoping Review

Scoping reviews are invaluable tools, especially for researchers in the early stages of exploring a topic. These reviews aim to map the existing literature, identifying gaps and helping clarify research questions. Scoping reviews provide a panoramic view of the available research, which is particularly useful when researchers are embarking on exploratory studies or trying to understand the breadth and depth of a subject before conducting more focused research.

Different Types Of Literature review In Research

There are some more approaches to conduct literature review. Let’s explore these classifications quickly.

5. Critical Review

Critical reviews provide an in-depth evaluation of existing literature, scrutinising sources for their strengths, weaknesses, and relevance. They offer a critical perspective, often highlighting gaps in the research and areas for further investigation.

6. Theoretical Review

Theoretical reviews are centred around exploring and analysing the theoretical frameworks, concepts, and models present in the literature. They aim to contribute to the development and refinement of theoretical perspectives within a specific field.

7. Integrative Review

Integrative reviews synthesise a diverse range of studies, drawing connections between various research findings to create a comprehensive understanding of a topic. These reviews often bridge gaps between different perspectives and provide a holistic overview.

8. Historical Review

Historical reviews focus on the evolution of a topic over time, tracing its development through past research, events, and scholarly contributions. They offer valuable context for understanding the current state of research.

9. Methodological Review

Among the different kinds of literature reviews, methodological reviews delve into the research methods and methodologies employed in existing studies. Researchers assess these approaches for their effectiveness, validity, and relevance to the research question at hand.

10. Cross-Disciplinary Review

Cross-disciplinary reviews explore a topic from multiple academic disciplines, emphasising the diversity of perspectives and insights that each discipline brings. They are particularly useful for interdisciplinary research projects and uncovering connections between seemingly unrelated fields.

11. Descriptive Review

Descriptive reviews provide an organised summary of existing literature without extensive analysis. They offer a straightforward overview of key findings, research methods, and themes present in the reviewed studies.

12. Rapid Review

Rapid reviews expedite the literature review process, focusing on summarising relevant studies quickly. They are often used for time-sensitive projects where efficiency is a priority, without sacrificing quality.

13. Conceptual Review

Conceptual reviews concentrate on clarifying and developing theoretical concepts within a specific field. They address ambiguities or inconsistencies in existing theories, aiming to refine and expand conceptual frameworks.

14. Library Research

Library research reviews rely primarily on library and archival resources to gather and synthesise information. They are often employed in historical or archive-based research projects, utilising library collections and historical documents for in-depth analysis.

Each type of literature review serves distinct purposes and comes with its own set of strengths and weaknesses, allowing researchers to choose the one that best suits their research objectives and questions.

Choosing the Ideal Literature Review Approach in Academics

In order to conduct your research in the right manner, it is important that you choose the correct type of review for your literature. Here are 8 amazing tips we have sorted for you in regard to literature review help so that you can select the best-suited type for your research.

  • Clarify Your Research Goals: Begin by defining your research objectives and what you aim to achieve with the literature review. Are you looking to summarise existing knowledge, identify gaps, or analyse specific data?
  • Understand Different Review Types: Familiarise yourself with different kinds of literature reviews, including systematic reviews, narrative reviews, meta-analyses, scoping reviews, and integrative reviews. Each serves a different purpose.
  • Consider Available Resources: Assess the resources at your disposal, including time, access to databases, and the volume of literature on your topic. Some review types may be more resource-intensive than others.
  • Alignment with Research Question: Ensure that the chosen review type aligns with your research question or hypothesis. Some types are better suited for answering specific research questions than others.
  • Scope and Depth: Determine the scope and depth of your review. For a broad overview, a narrative review might be suitable, while a systematic review is ideal for an in-depth analysis.
  • Consult with Advisors: Seek guidance from your academic advisors or mentors. They can provide valuable insights into which review type best fits your research goals and resources.
  • Consider Research Field Standards: Different academic fields have established standards and preferences for different forms of literature review. Familiarise yourself with what is common and accepted in your field.
  • Pilot Review: Consider conducting a small-scale pilot review of the literature to test the feasibility and suitability of your chosen review type before committing to a larger project.

Bonus Tip: Crafting an Effective Literature Review

Now, since you have learned all the literature review types and have understood which one to prefer, here are some bonus tips for you to structure a literature review of a dissertation .

  • Clearly Define Your Research Question: Start with a well-defined and focused research question to guide your literature review.
  • Thorough Search Strategy: Develop a comprehensive search strategy to ensure you capture all relevant literature.
  • Critical Evaluation: Assess the quality and credibility of the sources you include in your review.
  • Synthesise and Organise: Summarise the key findings and organise the literature into themes or categories.
  • Maintain a Systematic Approach: If conducting a systematic review, adhere to a predefined methodology and reporting guidelines.
  • Engage in Continuous Review: Regularly update your literature review to incorporate new research and maintain relevance.

Some Useful Tools And Resources For You

Effective literature reviews demand a range of tools and resources to streamline the process.

  • Reference management software like EndNote, Zotero, and Mendeley helps organise, store, and cite sources, saving time and ensuring accuracy.
  • Academic databases such as PubMed, Google Scholar, and Web of Science provide access to a vast array of scholarly articles, with advanced search and citation tracking features.
  • Research guides from universities and libraries offer tips and templates for structuring reviews.
  • Research networks like ResearchGate and Academia.edu facilitate collaboration and access to publications. Literature review templates and research workshops provide additional support.

Some Common Mistakes To Avoid

Avoid these common mistakes when crafting literature reviews.

  • Unclear research objectives result in unfocused reviews, so start with well-defined questions.
  • Biased source selection can compromise objectivity, so include diverse perspectives.
  • Never miss on referencing; proper citation and referencing are essential for academic integrity.
  • Don’t overlook older literature, which provides foundational insights.
  • Be mindful of scope creep, where the review drifts from the research question; stay disciplined to maintain focus and relevance.

While Summing Up On Various Types Of Literature Review

As we conclude this classification of fourteen distinct approaches to conduct literature reviews, it’s clear that the world of research offers a multitude of avenues for understanding, analysing, and contributing to existing knowledge.

Whether you’re a seasoned scholar or a student beginning your academic journey, the choice of review type should align with your research objectives and the nature of your topic. The versatility of these approaches empowers you to tailor your review to the demands of your project.

Remember, your research endeavours have the potential to shape the future of knowledge, so choose wisely and dive into the world of literature reviews with confidence and purpose. Happy reviewing!

Laura Brown

Laura Brown, a senior content writer who writes actionable blogs at Crowd Writer.

Marshall University

SOC 200 - Sims: What are Literature Reviews?

What are literature reviews.

  • How to Write a Lit Review
  • How to Choose a Topic
  • Finding the Literature

A  literature review  provides an overview of a topic, and is something most of you have encountered at one time or another. It is usually an article, or a section of an article,* that  compiles and summarizes published materials (books, articles, etc.) which provide an examination of recent or current literature on a chosen topic.

Review articles can cover a wide range of subject matter at various levels of completeness and comprehensiveness based on analyses of literature that may include research findings. The review may reflect the state of the art. It also includes reviews as a literary form.

As a publication type, it is an article or book published after examination of previously published material on a subject. It may be comprehensive to various degrees, and the time range of material scrutinized may be broad or narrow, although the reviews most often desired are reviews of the current literature. The textual material examined may be equally broad and can encompass, in medicine specifically, clinical material as well as experimental research or case reports.

State-of-the-art reviews tend to address more current matters. A review of the literature must be differentiated from a HISTORICAL ARTICLE on the same subject, but a review of historical literature is also within the scope of this publication type.

* Lit reviews aren't always obviously labeled "literature review"; they may be embedded within sections such as the introduction or background. 

Example Literature Review:

  • Dance therapy for individuals with Parkinson’s disease: improving quality of life Notice how the introduction and subheadings provide background on the topic and describe way it's important. Some studies are grouped together that convey a similar idea. Limitations of some studies are addressed as a way of showing the significance of the research topic.

Types of Literature Reviews

  • Systematic review
  • Meta-analysis
  • Integrative Review
  • Scoping review
  • Rapid review
  • Umbrella review
  • Systematized Review

A systematic review attempts to collate all empirical evidence that fits pre-specified eligibility criteria in order to answer a specific research question. It uses explicit, systematic methods that are selected with a view to minimizing bias, thus providing more reliable findings from which conclusions can be drawn and decisions made (Antman 1992, Oxman 1993).

The key characteristics of a systematic review are:

  • a clearly stated set of objectives with pre-defined eligibility criteria for studies;
  • an explicit, reproducible methodology;
  • a systematic search that attempts to identify all studies that would meet the eligibility criteria;
  • an assessment of the validity of the findings of the included studies, for example through the assessment of risk of bias; and
  • a systematic presentation, and synthesis, of the characteristics and findings of the included studies.

From  Cochrane  Handbook, 1.2.2

  • How to do a systematic review (NSU)
  • Summarizing systematic reviews: methodological development, conduct and reporting of an umbrella review approach

Many, but not all, systematic reviews contain meta-analyses.  Meta-analysis is the use of statistical methods to summarise the results of independent studies. By combining information from all relevant studies, meta-analyses can provide more precise estimates of the effects of health care than those derived from the individual studies included within a review. Meta-analyses also facilitate investigations of the consistency of evidence across studies, and the exploration of differences across studies ( Cochrane Handbook, 1.2.2 ). More information on meta-analyses can be found in  Cochrane  Handbook, Chapter 9 .

A  meta-analysis  goes beyond critique and integration and conducts secondary statistical analyses on the outcomes of similar studies.  It is a systematic review that uses quantitative methods to synthesize and summarize the results.

An advantage of a meta-analysis is the ability to be completely objective in evaluating research findings.  Not all topics, however, have sufficient research evidence to allow a meta-analysis to be conducted.  In that case, an integrative review is an appropriate strategy.  

An  integrative review  summarizes past research and draws overall conclusions from the body of literature on a particular topic. The body of literature comprises all studies that address related or identical hypotheses. In a properly executed integrative review, the effects of subjectivity are minimized through carefully applied criteria for evaluation. A well-done integrative review meets the same standards as primary research in regard to clarity, rigor and replication.

"In general, scoping reviews are commonly used for ‘reconnaissance’ – to clarify working definitions and conceptual boundaries of a topic or field. Scoping reviews are therefore particularly useful when a body of literature has not yet been comprehensively reviewed, or exhibits a complex or heterogeneous nature not amenable to a more precise systematic review of the evidence. While scoping reviews may be conducted to determine the value and probable scope of a full systematic review, they may also be undertaken as exercises in and of themselves to summarize and disseminate research findings, to identify research gaps, and to make recommendations for the future research."

From  Peters, MD, Godfrey, CM,  Khalil, H, McInerney, P, Parker, D & Soares, CB 2015, 'Guidance for conducting systematic scoping reviews', International Journal of Evidence-Based Healthcare, vol. 13, no. 3, pp. 141-146 :
  • Guidance for conducting systematic scoping reviews (methodology paper)

A rapid review  is an assessment of what is already known about a policy or practice issue, by using systematic review methods to search and critically appraise existing research. "Rapid reviews have emerged as a streamlined approach to synthesizing evidence in a timely manner -typically for the purpose of informing emergent decisions faced by decision makers in health care settings."

Khangura, S, Konnyu, K, Cushman, R, Grimshaw, J & Moher, D 2012, 'Evidence summaries: The evolution of a rapid review approach', Systematic Reviews, vol. 1, no. 1, p. 10.

An umbrella review is a synthesis of existing reviews, only including the highest level of evidence such as systematic reviews and meta-analyes. It specifically refers to review compiling evidence from multiple reviews into one accessible and usable document. Umbrella reviews focuses on broad condition or problem for which there are competing interventions and highlights reviews that address these interventions and their result.

Methodology paper: Aromataris, E, Fernandez, R, Godfrey, CM, Holly, C, Khalil, H & Tungpunkom, P 2015, 'Summarizing systematic reviews: Methodological development, conduct and reporting of an umbrella review approach',  Int J Evid Based Healthc , vol. 13, no. 3, pp. 132-140

A systematized review attempts to include elements of the systematic review process while stopping short of the systematic review. Systematized reviews are typically conducted as a postgraduate student assignment, in recognition that they are not able to draw upon the resources required for a full systematic review (such as two reviewers).

Special Thanks

Special Thanks to Dr. Julie Sarpy, PhD, MSLS, MA, AHIP, for permission to reuse content from her Medical Sciences guide . Dr. Sarpy is a  Reference and Instruction Librarian at the Martin and Gail Press Health Professions Division Library, and a   Liaison to the Dr. Kiran C. Patel College of Allopathic Medicine,  as well as an Adjunct Assistant Professor with the Department of Medical Education at the  Dr. Kiran C. Patel College of Osteopathic Medicine at Nova Southeastern University in Ft. Lauderdale, Florida.

  • << Previous: Home
  • Next: How to Write a Lit Review >>
  • Last Updated: Sep 27, 2024 3:57 PM
  • URL: https://libguides.marshall.edu/soc200-sims
  • Open access
  • Published: 27 September 2024

Decision threshold models in medical decision making: a scoping literature review

  • Andrew Scarffe 1 , 2 , 3 ,
  • Alison Coates 1 ,
  • Kevin Brand 1 &
  • Wojtek Michalowski 1  

BMC Medical Informatics and Decision Making volume  24 , Article number:  273 ( 2024 ) Cite this article

Metrics details

Decision thresholds play important role in medical decision-making. Individual decision-making differences may be attributable to differences in subjective judgments or cognitive processes that are captured through the decision thresholds. This systematic scoping review sought to characterize the literature on non-expected utility decision thresholds in medical decision-making by identifying commonly used theoretical paradigms and contextual and subjective factors that inform decision thresholds.

A structured search designed around three concepts—individual decision-maker, decision threshold, and medical decision—was conducted in MEDLINE (Ovid) and Scopus databases from inception to July 2023. ProQuest (Dissertations and Theses) database was searched to August 2023. The protocol, developed a priori, was registered on Open Science Framework and PRISMA-ScR guidelines were followed for reporting on this study. Titles and abstracts of 1,618 articles and the full texts for the 228 included articles were reviewed by two independent reviewers. 95 articles were included in the analysis. A single reviewer used a pilot-tested data collection tool to extract study and author characteristics, article type, objectives, theoretical paradigm, contextual or subjective factors, decision-maker, and type of medical decision.

Of the 95 included articles, 68 identified a theoretical paradigm in their approach to decision thresholds. The most common paradigms included regret theory, hybrid theory, and dual processing theory. Contextual and subjective factors that influence decision thresholds were identified in 44 articles.

Conclusions

Our scoping review is the first to systematically characterizes the available literature on decision thresholds within medical decision-making. This study offers an important characterization of the literature through the identification of the theoretical paradigms for non-expected utility decision thresholds. Moreover, this study provides insight into the various contextual and subjective factors that have been documented within the literature to influence decision thresholds, as well as these factors juxtapose theoretical paradigms.

Peer Review reports

Decision thresholds play an important and understudied role in decision-making. Often, decision thresholds are viewed as the “linchpin” between evidence and decision-making [ 1 , 2 ]. The concept of a decision threshold is often familiar to those with a basic understanding of the judicial system where standards of proof assume different evidentiary cut-offs such as, “‘beyond a reasonable doubt’ (typical of criminal court) and the less-stringent ‘preponderance of evidence’ (typical of a civil court)” [ 1 ]. Decision thresholds, be it for individuals or society, reflect the perceived value of the consequences (e.g., convicting an innocent person to prison versus letting a guilty person walk free) [ 1 , 3 , 4 , 5 , 6 , 7 ]; effectively, a decision threshold dichotomizes a decision into taking or not taking an action [ 8 , 9 , 10 , 11 ]. Generally, differing decisions are attributed to individual differences in subjective judgements that are attributed to different decision thresholds [ 12 ]. Differences in assessments of the benefits and harms of particular decisions, or subjective risk-assessment judgements, help to explain the variation in decision thresholds [ 13 , 14 ]. Within the context of medical decision-making, Djulbegovic and colleagues assert that the “development of [the] threshold model is considered as one of the most important advances in medical decision-making” [ 15 ]. Understanding the ways that individuals incorporate subjective judgements and values into decision thresholds [ 7 ] and the cognitive mechanisms they employ to inform decisions could suggest opportunities to understand variation within medical decision-making [ 8 , 13 ].

While arguably of central interest to the decision making process, surprisingly little research is dedicated to decision thresholds [ 16 ]. Within the existing literature, several different theoretical paradigms have been identified as underpinning how an individual might arrive at a decision threshold [ 9 ]. A narrative review by Djulbegovic and colleagues [ 14 ] summarized theoretical paradigms underpinning decision thresholds that inform medical decision-making: 1) expected utility theory (EUT) models [ 10 , 11 ], 2) regret based decision models [ 8 , 17 , 18 , 19 , 20 , 21 ], 3) dual processing/dual system models [ 12 , 22 ], and 4) other models [ 14 ]. Djulbegovic and colleagues argued/asserted that theoretical paradigms for decision thresholds have not been adequately explored or employed empirically, nor has the concept been meaningfully integrated into theoretical investigations of clinical decision-making [ 14 ]. Despite the perceived importance of decision thresholds in medical decision-making, the various papers bearing upon the topic have yet to be systematically reviewed. While EUT threshold models are well established within the medical decision-making literature (see Pauker and Kassirer [ 10 , 11 ]) [ 14 ], a considerable body of literature recognizes that in practice different underpinning paradigms appear to be plausible and, on occasion, better matches to practice [ 14 , 23 , 24 ]. EUT and non-EUT paradigms differ in how they weigh probabilities and scale utilities [ 2 ]. Typically, an EUT approach does not account for contextual and subjective factors that inform an individual’s decision-making, whereas non-EUT paradigms often account for affect, emotion, values, preferences and other contextual/ situational factors [ 2 ]. We chose to exclude EUT decision threshold paradigms from our systematic scoping review because a considerable body of literature has recognized that people do not always make decisions with a goal of maximizing expected utility [ 23 , 24 , 25 , 26 ]. For example, Djulbegovic and colleagues (2014) investigated which decision threshold paradigms (i.e., EUT, regret and dual-processing theory) most accurately reflected physicians’ actions for pulmonary embolisms and acute myeloid leukemia [ 25 ]. Results from this study suggested that the EUT paradigm for decision threshold was a weaker predictor of physician decisions than regret-based and dual processing theory paradigms (the latter of which was the best predictor) [ 25 ].

We employed a scoping review approach to systematically review alternative theoretical paradigms underpinning decision thresholds. We seek to categorize the non-EUT paradigms (i.e., those do not share the same general structure as described in Pauker and Kassirer) [ 10 , 11 ]. This scoping review adds to the body of knowledge by summarizing the most recent research on decision thresholds within medical decision-making, and by extending this corpus of knowledge to include literature on the paradigms that physicians and non-physicians (e.g., allied health professionals, lay-persons/ patients, etc.) use for non-EUT decision thresholds in medical decision-making. The following two objectives were driving the review described in this paper:

Objective 1: Identify and categorize what theoretical paradigms have been developed to establish or estimate decision threshold(s) within medical decision-making.

Objective 2: Identify what consideration an individual’s subjective judgment(s) (e.g., attitudes, emotions, preferences, risk perceptions, values) have been given in a context of decision threshold(s) within medical decision-making.

Following Johanna Briggs Institute (JBI) methodology [ 27 ], we conducted a scoping review to systematically identify the non-EUT theoretical paradigms underpinning medical decision thresholds. The protocol, developed a priori, was registered on Open Science Framework [ 28 ] and PRISMA-ScR guidelines [ 29 ] were followed for reporting on this study. Our methods are summarized here. Full methodological details are included in the supplementary text.

Data sources

We designed a search strategy around three concepts: individual decision-maker, decision threshold, and medical decision. We sought to identify articles where the term “decision threshold” and other like terms appeared within the title or abstract of an article in the context of “medical decisions” or other like terms. The search was piloted in MEDLINE (Ovid) and subsequently translated for Scopus search syntax. Both searches were completed in July, 2023. The ProQuest (Dissertations and Theses) database was searched in August 2023 using the phrase “decision threshold”. Results of the search were deduplicated using the Zotero Duplicate Merger Plug-In ( https://www.zotero.org/ ). We used Covidence software ( https://www.covidence.org/ ) to facilitate article review and selection.

Study selection

We selected articles published in English that acknowledged theoretical paradigms for decision thresholds, and where medical or health related decisions were made by individuals (e.g., physicians, allied health professionals, lay individuals, etc.). We excluded articles that applied decision threshold models at the policy level (e.g., cost-effectiveness analysis and willingness to pay thresholds).

Since we wished to categorize the theories that have been conceptualized as underpinning decision thresholds (objective 1), our screening criteria were deliberately broad. As the literature on decision thresholds transcends disciplinary boundaries, we kept the title and abstract screening criteria purposefully conceptual rather than concrete to allow for fields using different terms to capture the same or similar ideas. We piloted screening criteria, summarized in Table  1 , in a subset of studies prior to implementation and achieved high interrater agreement. Titles and abstracts of articles identified in our search were screened by two independent reviewers (AS and AC) and disagreements were discussed and resolved through consensus. Full text of articles passing initial screening were reviewed by the two independent reviewers (AS and AC) using the full text screening criteria summarized in Table  1 following a pilot screening process where, again, high inter-rater agreement was achieved. Disagreements on full text screening assessments were resolved through a consensus discussion.

Reference lists of all articles included after full-text review were screened using title and abstract criteria. Potentially suitable articles were assessed using full-text inclusion and exclusion criteria.

A total of 2,358 articles were identified through database and reference list searching. After de-duplication, 1,618 articles entered the two-stage screening process. Ultimately, 95 articles were included in the review. Details of study selection are depicted in Fig.  1 .

figure 1

PRISMA flow diagram

Data extraction

A data extraction form designed to capture details about each article was pilot tested by a single researcher (AS) on a subset of included articles. Team discussions helped to streamline the form. Extracted characteristics included: author details, title of the journal, country of study (where applicable), the objective of the article, article type (e.g., empirical, conceptual/ theoretical, review, etc.), theoretical foundations (if any), contextual/ subjective factors (if any), types of decision threshold (e.g., treat/ not treat; test/ not test, etc.), decision threshold mathematical formulation (if any), the decision-maker (e.g., physician, nurse, patient, guardian/ caregiver, etc.), type of medical decision (e.g., coronary artery disease, stroke, COPD, etc.), authors’ conclusions, as well as relevant references (see Identifying Additional Literature ). All data extraction was performed by a single researcher (AS); team discussions were called for difficult or ambiguous extractions.

Identifying additional literature

The reference lists of all articles that met the full-text inclusion criteria were scanned using title and abstract criteria and potentially suitable articles were assessed using full-text inclusion/exclusion criteria. Data was charted from included articles as detailed above.

Synthesis of results

Selected characteristics of the included articles are tabulated in the supplementary text. Extracted data were evaluated thematically to identify the theoretical paradigms represented in the literature. We narratively summarized the literature within each paradigm and illustrate the distribution of records within each paradigm in tabular form. We also tabulated summary data from articles that discussed contextual and subjective factors associated with decision thresholds and summarize these narratively.

Of the 95 included articles, 27 discussed the concept of decision thresholds in medical decision-making applications but did not reference or explicitly state a theoretical paradigm as conceptually underpinning the decision threshold, while the remaining 68 did. Here we discuss these remaining 68 articles (Table  2 ).

The most common ex-ante theoretical paradigms (i.e., estimating a decision threshold prior to the decision being made) used to determine an individual’s decision threshold are regret theory, hybrid theory, and dual processing theory. The most common ex-post methodological approach (i.e., estimating a decision threshold after a decision has been made) used to estimate an individual’s decision threshold involves regression techniques (e.g., ordinal, logistical, etc.).

  • Regret theory

Regret theory is the most common theoretical paradigm for non-EUT decision thresholds (mentioned/discussed in 24 out of 68 articles~ 35%). Shortly after regret theory was first propounded [ 23 , 24 , 89 ], Feinstein [ 35 ] tailored the concept to the medical decision-making setting by identifying regret as a factor in “qualitative decision analysis” which he referred to as the “chagrin factor” [ 35 ]. Djulbegovic and colleagues [ 8 ] propose two different regret based processes for specifying decision thresholds: 1) they revise the EUT model to incorporate regret, and 2) they also propose the concept of “acceptable regret” (i.e., described as the “level of regret which the decision maker can comfortably tolerate” [ 8 ]), which simplifies to a distinct decision threshold equation. In the case of dual processing theory, the concept of regret is incorporated into other theorical paradigm [ 12 ].

  • Hybrid theory

The hybrid theory is another common paradigm for non-EUT decision thresholds, discussed in 16 of 68 articles in our review (~ 24%). Although hybrid theory sometimes shares similar structures to EUT decision-thresholds, as it is often slightly modified version of EUT decision threshold [ 43 ], it merits separate consideration as a non-EUT decision threshold paradigm. What constitutes a “hybrid theory” is poorly defined despite relatively frequent use of this term. For example, Djulbegovic and colleagues [ 43 ] articulate a hybrid theory paradigm that incorporates a mechanism how concepts such as a patient’s “relative value” for specific outcomes can modify the expected value calculation (i.e., commonly incorporating utilities) to arrive at the optimal clinical decision [ 43 ]. Alternatively, other scholars have stated that, within the hybrid model, individuals are believed to “use a kind of intuitive threshold they cannot explicate but that is based on their knowledge and/or perception about the harm and benefit of a treatment” [ 51 ]. In broad terms, the literature describes the hybrid threshold model as retaining “the original EUT formulation but invites the decision-maker to weigh health outcomes (morbidities and mortalities) differently when they occur in patients with and without disease” [ 14 ].

Verp and Heckerling [ 41 ], the earliest proponents for a hybrid theory approach to decision thresholds (in the context of medical decision-making), describe a decision threshold approach that considers patients’ preferences for pre-natal testing [ 41 ]. Basinga and colleagues [ 51 ] provide an example of how a hybrid threshold model would calculate the associated decision threshold that incorporates a weighed value of morbidity with respect to mortality and weighed value of provoked death relative to natural death to incorporate a decision-maker’s preference. Within other hybrid theories, the element of patient choice is incorporated into the threshold equation [ 76 , 79 ]. For example, Djulbegovic and colleagues [ 79 ] propose a modified threshold formulation where the quotient of absolute risk of a major adverse event occurring and relative risk ratio is multiplied by a constant that “refers to the patient’s relative value of avoiding treatment harms[…] with respect to the impact of disease without treatment” [ 79 ].

  • Dual processing theory

Six of the reviewed articles adopted a theoretical paradigm of dual processing or dual systems theory (hereafter, dual processing theory). Djulbegovic and colleagues [ 12 ] were the first to apply dual processing theory to medical decision thresholds. In this paradigm, it is presumed that people make decisions by drawing on a combination of “type I system” reasoning (i.e., affectively driven, fast, intuitive) and “type II system” reasoning (i.e., analytical, calculative, deliberative) [ 12 ]. Djulbegovic and colleagues [ 61 ] posit that physicians do use the threshold model to inform their decisions, and they claim that the dual processing model may best explain physicians’ decision thresholds [ 61 ].

Of the six articles that identify dual processing theory as a theoretical underpinning for decision thresholds, two explored dual processing theory from a theoretical or conceptual perspective using hypothetical vignettes to illustrate its potential application [ 12 , 22 ], three articles were reviews [ 9 , 14 , 77 ], and one article reported on the explanatory power of dual processing theory relative to EUT and regret theory decision threshold models [ 61 ].

Information theory

Information theory is used infrequently in the medical decision-making literature and is most commonly used in fields such as economics or engineering [ 90 ]. We identified two articles that used information theory in relation to decision thresholds. Within information theory, uncertainty, or “entropy” as it is known in the field of thermodynamics [ 39 ], can be expressed in terms of the benefit to risk ratio of a particular therapy [ 39 ], effectively incorporating a measure of choice [ 39 ]. The articles by Asch and colleagues [ 36 ] and Djulbegovic and colleagues [ 39 ] theoretically incorporate information theory into the classic EUT model by Pauker and Kassirer [ 10 , 11 ].

Signal detection theory

We identified four articles that referenced Signal Detection Theory (SDT) relative to medical decision thresholds. SDT extends the concept of decision thresholds by providing a mechanism to contrast “signal” or “hit” (i.e., true positives and false negatives) and “noise” or “miss” (i.e., false positives and true negatives) relative to a decision [ 13 , 54 , 58 , 72 ].

Three of the four articles used SDT as a methodology to elicit an individual’s decision threshold when presented with hypothetical vignettes [ 13 , 54 , 58 ]. In the fourth article, Hozo and colleagues [ 72 ] addressed to the role of SDT from a theoretical perspective and identify the link between decision threshold models and SDT, fast-and-frugal decision trees (FFT) and evidence accumulation theory (EAT) [ 72 ].

Regression (Logistic/ Ordinal)

Our review identified 18 articles referencing a regression model for decision thresholds (~ 27%). Linear or logistic regression was a common method to estimate decision thresholds. Unlike many of other theories for informing decision thresholds in an ex-ante fashion, those thresholds that are identified through regression (commonly, logistic or ordinal) are almost exclusively decision thresholds that are determined ex-post.

Within articles that use a regression, Eisenberg and Hershey [ 31 ] are commonly cited for their four-step method for calculating the test and test-treatment thresholds within medical decision-making. Young and colleagues [ 33 ] build on Eisenberg and Hershey [ 31 ] and propose three different approaches to estimating the decision threshold: 1) modal distribution, 2) the unweighted midpoint, 2) the weighted midpoint. Plasencia and colleagues [ 38 ] elaborate on the logistic function as proposed by Hartz and colleagues [ 32 ], and reduce the dependencies on the approach proposed by Young and colleagues [ 33 ]. Specifically, Plasencia and colleagues [ 38 ], provide a regression paradigm that can account, “for other factors that may influence disease probability, or threshold, at which a particular proportion of physicians makes a decision” [ 38 ]. More recently, Ebell and colleagues [ 66 ] modify the logistic model proposed by Plasencia and colleagues [ 38 ] to estimate thresholds for both testing and treatment decisions. They explore testing and treatment thresholds for: influenza, acute coronary syndrome, pneumonia, deep vein thrombosis, and urinary tract infection [ 66 ]. Ebell and colleagues [ 66 ] are commonly cited for their use of online clinical vignettes, which (at the time) provided a novel way to explore test and treatment thresholds for common conditions.

Net-Benefit

We identified three articles that specifically explored the concept of decision thresholds through a net-benefit approach. Although commonly the net-benefit approach is synonymous with EUT, the articles included in this review adopt a modified net-benefit approach. For example, Glasziou and Irwig [ 40 ] propose a net-benefit approach to decision thresholds that incorporates a weighting function for specific patient values for specific outcomes. This approach shares similar properties to typical EUT models but does not explicitly reference EUT [ 40 ]. Vickers and colleagues [ 60 , 70 ] explore a net-benefit approach relative to the selection of diagnostic tests and incorporate a decision threshold approach into the net-benefit equation alongside specificity and sensitivity.

Other theoretical paradigms

Our review identified several other theoretical paradigms (i.e., classified as “Other” in Table  2 ), however each of these paradigms was associated with just one publication and therefore is not discussed in greater detail here. These “other” approaches include linear information theory [ 36 ], info-gap theory [ 55 ], generalized linear receiver operator characteristic (GROC) curves [ 37 ], rituals [ 49 ], social judgement theory [ 13 ], general assessment and decision making model [ 13 ], systems dynamics [ 69 ], fast-and-frugal decision trees [ 72 ], evidence accumulation theory [ 72 ], therapeutic risk thresholds [ 74 ], and the smooth ambiguity model [ 85 ].

Contextual and subjective factors

Our review sought to identify what consideration an individual’s (patient or provider) subjective judgment(s) (e.g., attitudes, emotions, preferences, risk perceptions, values) have been given in a context of decision threshold(s) within medical decision-making (i.e., Objective 2).

Of the 95 articles included in the scoping review (Fig.  1 ), 44 consider contextual or subjective factors that may influence an individual’s decision threshold (~ 46%). Contextual and subjective factors were infrequently grounded within a theoretical paradigm for decision thresholds (24 out of 44 articles- ~ 55%). Of the articles that included theoretical paradigm, regression techniques were most used (11 out of 24 articles- ~ 46%) to determine decision thresholds in an ex-post fashion and to identify the factors associated with heterogeneous decision thresholds. Where theoretical paradigms were not cited or regression techniques were not employed, studies relied on a variety of parametric (e.g., t-tests, ANOVAs) and non-parametric (e.g., Wilcoxon singed rank test) tests to quantify the decision threshold.

Contextual factors

Two broad categories of contextual factors were discussed in the literature as potentially influencing an individual’s decision threshold: person-specific factors, and occupation-related factors (see Table  3 ). We categorize “person-specific factors” as those that provide contextual understanding of the individual (e.g., age, gender, country, education level, patient vs. physician). The studies with “person-specific factors” capture both physicians and lay individuals. Alternatively, the “occupation related factors” are specific to studies that have investigated the influence of various contextual factors relative to physician decision thresholds.

Across the identified articles there were mixed results regarding the statistical significance of several contextual factors associated with an individual’s decision threshold. Interpretation of these various contextual factors can be found in the supplementary text. We also highlight that aside from regression models, signal detection theory [ 13 , 54 , 58 ] and regret theory [ 64 , 65 , 86 ] were the most common theoretical paradigms sited relative to contextual factors influencing decision thresholds.

Subjective factors

While we identified several subjective factors that influence decision thresholds, their potential influence relative to decision thresholds was mentioned but not necessarily quantified. We categorize these factors in three broad categories: 1) health related factors, 2) personal factors, and 3) perceptive factors (see Table  4 ).

Similar to the findings associated with contextual factors, the statistical significance associated with various subjective factors was inconsistent across studies. With a greater frequency than with contextual factors, studies discussed (but did not measure) the potential influence that subjective factors have on an individual’s decision threshold. Aside from regression models, we observe that hybrid theory was often used as a theoretical paradigm relative to subjective factors and decision thresholds (5 articles [ 41 , 62 , 67 , 81 , 85 ]). A plausible explanation for use of a hybrid theory being is that this theory incorporates decision-maker preferences and choices into the estimation of the decision threshold.

Classification of theoretical paradigms: descriptive, normative and prescriptive

Theoretical models for decision-making can be classified as descriptive, normative, or prescriptive (see Table  5 ) [ 2 , 110 , 111 , 112 ]. Descriptive and prescriptive approaches provide an alternative view of “rationality” compared to the normative approach which is most often consistent with EUT [ 113 ].

The decision-theoretical classification of theoretical paradigms provides guidance as to how the theoretical paradigms ought to be evaluated [ 114 ]. In our review, included studies often commented on the duality or plurality of possible classifications of their theoretical paradigm, emphasizing how the authors conceptualized their contribution and sought to evaluate their study. For example, Djulbegovic and colleagues (1999) discuss that the concept that “acceptable regret may have prescriptive as well as descriptive value” [ 8 ] and Asch and colleagues (1990) noted that their threshold models, informed by information theory, “are neither solely descriptive nor solely normative” [ 36 ]. While the regression method for decision thresholds may be commonly considered a descriptive decision-theoretical classification, it can also be considered normative owing to its connection to EUT. Dual processing theory has typically been characterized as a descriptive theory [ 12 ], yet hybrid theory, essentially a version of dual processing theory [ 14 ], has been previously classified as a prescriptive theory [ 51 ]. To further complicate efforts to categorize these models, not all authors provide reflections on/ identification of the decision-theoretical classification they associate with their research. Although classifying models and paradigms as descriptive, prescriptive, or normative could augment our understanding of their potential applications, accurately doing so will require future researchers to reflect and report on this in their publications.

Regret theory: a larger body of literature

We have identified regret theory as the most common non-EUT paradigm used to inform decision thresholds within the context of medical decision-making. Importantly, regret theory extends beyond the medical decision-making literature; regret research appears in many different fields, including economics, psychology, medicine, law, organizational behaviour, to name few [ 115 ].

Regret is often identified as a driver of ‘irrational’ decision-making given its tendency to influence people to make decisions that are inconsistent with EUT [ 116 ]. Under the traditional approach to decision-making, decisions are expected to be driven by EUT [ 117 ] where a ‘rational’ decision is one that maximizes the expected utility of the final assets [ 23 , 24 ]. Other scholars maintain that because anticipated regret causes people to think more elaborately before making their decision, that regret can also induce rational decision-making [ 118 ].

Regret is a cognitively based emotion that refers to the affective reaction of unfavourable outcomes [ 119 , 120 ]. Specifically, regret can be measured as “the difference between the utility of the action taken and the utility of the action that in retrospect should have been taken” [ 19 ]. Unlike other emotions, regret is uniquely tied to decision-making [ 115 ]. Regret can be experienced , ex-post, once the outcomes of a decision have become known, or can be anticipated , ex-ante, before the decision is made and is a reflection of how a person anticipates feeling if an undesirable outcome were to occur [ 118 , 121 ]. Within medical decision-making most decisions cannot be reversed (e.g., surgery cannot be undone, a vaccine cannot be ungiven) and consequently decision-making is informed by anticipatory (ex-ante) regret [ 25 ]. Although decision thresholds are not specifically referenced, some of the early research on regret in medical decision-making was led by Ritov and Baron on vaccine hesitancy and omission bias [ 122 , 123 ].

Regression: is it a novel theoretical paradigm?

Regression models are commonly employed to estimate decision thresholds in an ex-post fashion (i.e., after the decision has been made) in the medical decision-making literature. Within our scoping review we classified “regression” as a theoretical paradigm for decision thresholds to capture the articles that discuss decision thresholds and use regression models within the context of a medical decision. However, on a conceptual level, we maintain that “regression” should not necessarily be a considered a theoretical paradigm that is distinctive from EUT. To the extent that our inclusion and exclusion criteria were established a priori (as is appropriate for scoping review methodology) we could not exclude these articles because they do not explicitly reference an EUT theoretical paradigm These articles used regression as a method of analysis to estimate an individual’s decision threshold. To this end, the articles that leverage regression models explore various factors that may inform a decision threshold or use scales to derive a quantifiable decision threshold, but do not propose a novel theoretical construct (beyond EUT) to understand such a threshold.

Hybrid theory, net-benefit, information theory and personal preferences

Our review identified several different theoretical paradigms for decision thresholds. The theoretical paradigms for decision thresholds of: “hybrid theory”, “information theory” and “net-benefit” are distinctive in their own-right but often have similarities. Importantly, each of these paradigms are referred to as distinctive theoretical paradigms within the literature. However, within each of these theoretical paradigms the incorporation of the concept of “choice” or “values” was a common theme. To this end, while the underpinning mathematical equations to derive the associated decision threshold respective of each of these theoretical paradigms may be different, the consideration of an individual’s preferences (choices) differentiates these theoretical paradigms from those of regret theory, dual processing theory, signal detection theory and regression techniques.

Outside of the medical decision-making literature (e.g., sociology, cognition, etc.) the dual processing theory is, “widely accepted as a dominant explanation of cognitive processes that characterizes human decision-making” [ 12 ]. Within the fields of sociology and cognition the dual processing approach to decision-making is recognized as the dominant mechanism for reasoning [ 124 , 125 , 126 , 127 ]. Specifically, the dual processing theory adds a decisions threshold that is calculated through a combination of affective reasoning (i.e., “type I”) and analytical reasoning (i.e., “type II”) [ 12 , 22 ]. Effectively, the DPM incorporates “type I system” reasoning by using anticipated regret as a proxy of the affect or emotion that is commonly used in “type I system” reasoning [ 12 ]. The DPM also incorporates “type II system” reasoning through an EUT approach [ 12 ]. It is the combination of “type I” and “type II” system reasoning to estimate a decision threshold that differentiates dual processing theory from regret theory and EUT. We, again, reflect that it has been claimed that the dual processing theory of decision thresholds has been demonstrated to be more consistent with physician decision-making than thresholds determined by EUT and regret theory [ 25 ]. It is also worthy of highlight that the hybrid theory is generally considered a version of the dual processing theory of decision-threshold analysis [ 14 ]. In this review we classified “dual processing theory” and “hybrid theory” as distinct theoretical paradigms for decision thresholds. However, if we accepted these theoretical paradigms as one and the same then dual processing theory would become the most frequently used theoretical paradigm for non-EUT decision thresholds within medical decision-making. It is important to note that in the literature dual processing theory and hybrid theory are recognized as distinct theoretical paradigms. Dual processing theory incorporates “type I” and “type II” system reasoning, whereas hybrid theory suggests that decision-makers should incorporate weighting/ preferences relative to the identified possible outcomes and does not consider a balance between “type I” and “type II” system reasoning. To this end, there may be the potential for hybrid theory to be incorporated into dual processing theory by way of the “type I” system reasoning.

Contextual and subjective factors: limited theoretical underpinning

This scoping review identified several contextual and subjective factors that have been reported in the literature to inform or influence an individual’s decision threshold. Importantly, the evidence on almost every contextual and subjective factor is inconsistent and the effects of the identified factors on decision thresholds is not certain. For example, five studies identified that age was a significant factor in determining an individual’s decision thresholds [ 34 , 78 , 82 , 91 , 92 ], whereas three other studies found age to not be a significant factor [ 93 , 95 , 128 ]. Part of the heterogeneity observed in the literature on the contextual and subjective factors influencing decision thresholds may be a result of poor theoretical conceptualization of decision thresholds. In the majority of studies where contextual or subjective factors were explored, there was little, if any, theorization of decision threshold and instead there was a reliance on regression models. Consequently, it may be possible that contextual and subjective factors have different levels of influence depending on how the decision threshold is theorized.

Contributions

The 2015 narrative review by Djulbegovic and colleagues [ 14 ] offered a comprehensive description of the research on decision thresholds in medical decision-making at that time. A decade later, we decided to re-examine the full extent of this research using a rigorous and systematic approach. Notably, the literature on decision thresholds in medical decision-making has matured and almost doubled in volume since 2013 – a limiting date used by Djulbegovic and colleagues in their review [ 14 ].

Djulbegovic and colleagues identified regret theory, hybrid theory, and dual processing theory as the theoretical paradigms for physician’s decision thresholds. While this finding remains true today, our review also revealed several other theoretical paradigms for decision thresholds were used by physicians and lay individuals. We also identified various contextual and subject factors that may influence an individual’s decision thresholds.

Limitations

Although our scoping review followed best methodological practices, it still has a few limitations. First, our review was limited to articles in English language that were indexed on SCOPUS, MEDLINE (Ovid), or ProQuest (Theses and Dissertations). Consequently, it is possible that our review failed to identify relevant articles that were not written in English and/or not indexed on either of the identified databases. Second, while our search strings and selection criteria were intentionally broad and used the most frequently employed terms, decision thresholds can be referred to by a wide variety of names (e.g., test threshold, diagnostic threshold, isolation/ quarantine threshold, thresholds for specific diseases, etc.). Thus, our review may have failed to identify relevant articles; however, we believe this is likely a small minority of articles which would not meaningfully change the overall findings of our systematic scoping review. Third, our review excluded decision thresholds that pertained to a clinical outcome assessment/ clinical guidelines/ prediction models (e.g., studies that sought to quantify the safety and accuracy of a diagnostic clinical aid) as well as policy thresholds (e.g., willingness-to-pay thresholds) as they are not informative of how individual’s make decisions. Fourth, within the included articles we reflect that there is a concentration of authors who are repeatedly included in the review. For example, Dr. Benjamin Djulbegovic is listed as an author on 21 of the 68 articles that discussed various theoretical paradigms for decision thresholds. While we do not perceive this to be a methodological limitation of our scoping review, the concentration of authors is an important consideration relative to the intellectual heterogeneity within the corpus of the literature. Fifth, our review did not attempt to interrogate the intricate differences between the different theoretical paradigms as it was beyond the scope of our review. It is possible that some of theoretical paradigms may share similar structures to EUT paradigms (e.g., regression, information theory, etc.). However, to the extent that these theoretical paradigms are uniquely identified within the literature, they warrant identification as non EUT theoretical paradigms within this review. Sixth, our review included articles that were indexed as of July 2023. Consequently, there may be new articles that were published since this time that are not captured in our review; ultimately, this is an unavoidable limitation of any systematic scoping review within an active and evolving field. Finally, our review had a relatively narrow focus on medical decision making. Given, decision thresholds are explored within other contexts (e.g., insurance [ 129 , 130 , 131 ], risk perception [ 132 , 133 , 134 ], law [ 3 , 4 , 5 , 6 , 7 , 135 ] econometrics [ 110 ], etc.), it is possible that there may be additional theoretical paradigms and contextual/subjective factors (i.e., identified in other bodies of literature) than were discussed within this review.

Implications for practice and future research

From the perspective of clinical practice, decision thresholds have been called the “linchpin” between evidence based medicine and decision-making [ 2 ]. Yet, despite their significant role in clinical decision-making, there is a lack of consensus on which theoretical paradigm should be used to determine patient or physician decision thresholds [ 136 ]. Consequently, greater consideration should be given for choosing the theoretical paradigm used to inform ex-ante decision thresholds. From a practical perspective, if a physician can better understand a patient’s implicit decision threshold (e.g., perhaps through a regret theory or dual processing theory lens), they can improve patients’ decision-making by focusing on specific anticipated regrets of false positives or false negatives.

From a future research perspective, we have several recommendations. First, authors should be cognisant of the multiplicity of like-terms that are often used to reference decision thresholds (e.g., diagnostic thresholds, treatment thresholds, testing thresholds, etc.) which can only be resolved through a convergence within the academic community to adopt consistent terminology. We propose “decision thresholds” to be sufficiently flexible to capture the breadth and scope of related terms. Consistency in language allows for easier identification of literature and may help to avoid duplicative or redundant research effort. Secondly, we encourage authors to consider, and identify, within which decision-theoretical classification their paradigm should be interpreted (i.e., descriptively, normatively, and/or prescriptively). Third, we call for more research on how decision thresholds inform individual medical decisions. Specifically, it would be advantageous for future scholarship to conduct additional research on which theoretical paradigm for decision thresholds best explains an individual’s decision threshold for medical decisions. Consequently, this would help to narrow the number of theoretical paradigms that are used to inform decision threshold analyses [ 25 , 71 ]. Finally, we encourage additional methodological research on the contextual and subjective factors that inform an individual’s decision threshold; additional research is required to better understand the impact these factors might have on decision thresholds.

To our knowledge, this is the first review of non-EUT decision thresholds used in medical decision-making that adopts a rigorous systematic search and reporting methodology (i.e., PRISMA-ScR). As the body of literature on the role of decision thresholds in medical decision-making continues to grow in popularity, this study offers a critical, systematic, characterization of the existing literature. Importantly, this study will help to ensure that authors of future scholarship to appropriately situate their work within the body of literature and leverage appropriate and relevant theoretical paradigms to underpin their understanding of decision thresholds.

Regret theory, hybrid theory, and dual processing theory were identified as the most common theoretical paradigms that are used to inform an individual’s ex-ante decision threshold, but other theories have been introduced in recent years. Further, although a substantial set of studies examine contextual and subjective factors that impact decision thresholds, we note considerable heterogeneity in the reported effect of these factors. We also observe a striking infrequency of theoretical grounding in these studies.

Availability of data and materials

The search strategies used to inform this scoping review are available in the supplementary text. Summaries of included articles are also available in the supplementary text.

Brand KP, Finkel AM. A Decision-Analytic Approach to Addressing the Evidence About Football and Chronic Traumatic Encephalopathy. Semin Neurol. 2019;16:s-0039-1688484.

Google Scholar  

Djulbegović B, Hozo I. Threshold decision-making in clinical medicine : with practical application to hematology and oncology. Cham: Springer International Publishing AG; 2023. (Cancer Treatment and Research Series ; v.189); 2023.

Book   Google Scholar  

Dekay ML. The Difference between Blackstone-Like Error Ratios and Probabilistic Standards of Proof. Law Soc Inq. 1996;62:95–132.

Article   Google Scholar  

Weiss C. Expressing scientific uncertainty. Law Probab. Risk. 2003;2:25–46.

Weiss C. Scientific Uncertainty and Science-Based Precaution. Int Environ Agreem Polit Law Econ. 2003;3(2):137–66.

Dekay ML, Patiño-Echeverri D, Fischbeck PS. Better safe than sorry: Precautionary reasoning and implied dominance in risky decisions. J Behav Decis Mak. 2009;22(3):338–61.

Dekay ML, Small MJ, Fischbeck PS, Farrow RS, Cullen A, Kadane JB, et al. Risk-based decision analysis in support of precautionary policies. J Risk Res. 2002;5(4):391–417.

Djulbegovic B, Hozo I, Schwartz A, McMasters KM. Acceptable regret in medical decision making. Med Hypotheses. 1999;53(3):253–9.

Article   CAS   PubMed   Google Scholar  

Djulbegovic B, Hamm RM, Mayrhofer T, Hozo I, Van den Ende J. Rationality, practice variation and person-centred health policy: a threshold hypothesis. J Eval Clin Pract. 2015;21(6):1121–4.

Article   PubMed   PubMed Central   Google Scholar  

Pauker S, Kassirer J. The threshold approach to clinical decision making. N Engl J Med. 1980;302(20):1109–17.

Stephen P, Jerome K. Therapeutic Decision Making: A Cost-Benefit Analysis. N Engl J Med. 1975;293(5):229–34.

Djulbegovic B, Hozo I, Beckstead J, Tsalatsanis A, Pauker SG. Dual processing model of medical decision-making. BMC Med Inform Decis Mak. 2012;12(1):94.

Cheyne H, Dalgleish L, Tucker J, Kane F, Shetty A, McLeod S, et al. Risk assessment and decision making about in-labour transfer from rural maternity care: a social judgment and signal detection analysis. BMC Med Inform Decis Mak. 2012;12(101088682):122.

Djulbegovic B, van den Ende J, Hamm RM, Mayrhofer T, Hozo I, Pauker SG, et al. When is rational to order a diagnostic test, or prescribe treatment: the threshold model as an explanation of practice variation. Eur J Clin Invest. 2015;45(5):485–93.

Article   PubMed   Google Scholar  

Djulbegovic B, Hozo I, Mayrhofer T, Ende J, Guyatt G. The threshold model revisited. J Eval Clin Pract. 2019;25(2):186–95.

Felder S, Mayrhofer T. Threshold analysis in the presence of both the diagnostic and the therapeutic risk. Eur J Health Econ. 2018;19(7):1019–26.

Djulbegovic B, Tsalatsanis A, Mhaskar R, Hozo I, Miladinovic B, Tuch H. Eliciting regret improves decision making at the end of life. Eur J Cancer. 2016;68:27–37.

Djulbegovic M, Beckstead J, Elqayam S, Reljic T, Kumar A, Paidas C, et al. Thinking Styles and Regret in Physicians. Antonietti A, editor. PLOS ONE. 2015;10(8):e0134038.

Hozo I, Tsalatsanis A, Djulbegovic B. Expected utility versus expected regret theory versions of decision curve analysis do generate different results when treatment effects are taken into account. J Eval Clin Pract. 2018;24(1):65–71.

Hozo I, Djulbegovic B. When is diagnostic testing inappropriate or irrational? Acceptable regret approach. Med Decis Mak Int J Soc Med Decis Mak. 2008;28(4):540–53.

Tsalatsanis A, Hozo I, Vickers A, Djulbegovic B. A regret theory approach to decision curve analysis: A novel method for eliciting decision makers’ preferences and decision-making. BMC Med Inform Decis Mak. 2010;10(51):1–14.

Tsalatsanis A, Hozo I, Kumar A, Djulbegovic B. Dual Processing Model for Medical Decision-Making: An Extension to Diagnostic Testing. Brock G, editor. PLOS ONE. 2015;10(8):e0134800.

Bell DE. Regret in Decision Making under Uncertainty. Oper Res. 1982;30(5):961–81.

Loomes G, Sugden R. Regret Theory: An Alternative Theory of Rational Choice Under Uncertainty. Econ J. 1982;92(368):805.

Djulbegovic B, Elqayam S, Reljic T, Hozo I, Miladinovic B, Tsalatsanis A, et al. How do physicians decide to treat: an empirical evaluation of the threshold model. BMC Med Inform Decis Mak. 2014;14(1):47.

Kahneman D, Tversky A. The Psychology of Preferences. Sci Am. 1982;246(1):160–73.

Peters MDJ, Godfrey CM, Khalil H, McInerney P, Parker D, Soares CB. Guidance for conducting systematic scoping reviews. Int J Evid Based Healthc. 2015;13(3):141–6.

Scarffe AD, Coates A. Approaches to decision threshold models in medical decision-making: A scoping review protocol. OSF. 2023 Aug 29; Available from: https://osf.io/7dxr9/?view_only= .

Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann Intern Med. 2018 Oct 2 [cited 2023 Jul 29];169(7):467–73. Available from: https://www.acpjournals.org/doi/10.7326/M18-0850 .

Christensen-Szalanski JJ, Diehr PH, Bushyhead JB, Wood RW. Two studies of good clinical judgment. Med Decis Mak Int J Soc Med Decis Mak. 1982;2(3):275–83.

Article   CAS   Google Scholar  

Eisenberg JM, Hershey JC. Derived thresholds. Determining the diagnostic probabilities at which clinicians initiate testing and treatment. Med Decis Mak Int J Soc Med Decis Mak. 1983;3(2):155–68.

Hartz A, McKinney WP, Centor R, Krieg A, Simms G, Henck S. Stochastic thresholds. Med Decis Mak Int J Soc Med Decis Mak. 1986;6(3):145–8.

Young MJ, Eisenberg JM, Williams SV, Hershey JC. Comparing aggregate estimates of derived thresholds for clinical decisions. Health Serv Res. 1986;20(6 Pt 1):763–80.

CAS   PubMed   PubMed Central   Google Scholar  

Young M, Fried LS, Eisenberg J, Hershey J, Williams S. Do cardiologists have higher thresholds for recommending coronary arteriography than family physicians? Health Serv Res. 1987;22(5):623.

Feinstein AR. The “Chagrin Factor” and Qualitative Decision Analysis. Arch Intern Med. 1985;145(7):1257–9.

Asch DA, Patton JP, Hershey JC. Knowing for the sake of knowing: the value of prognostic information. Med Decis Mak Int J Soc Med Decis Mak. 1990;10(1):47–57.

Sainfort F. Evaluation of medical technologies: a generalized ROC analysis. Med Decis Mak Int J Soc Med Decis Mak. 1991;11(3):208–20.

Plasencia CM, Alderman BW, Baron AE, Rolfs RT, Boyko EJ. A method to describe physician decision thresholds and its application in examining the diagnosis of coronary artery disease based on exercise treadmill testing. Med Decis Mak Int J Soc Med Decis Mak. 1992;12(3):204–12.

Djulbegovic B, Hozo I, Abdomerovic I, Hozo S. Diagnostic entropy as a function of therapeutic benefit/risk ratio. Med Hypotheses. 1995;45(5):503–9.

Glasziou PP, Irwig LM. An evidence based approach to individualising treatment. BMJ. 1995;311(7016):1356–9.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Verp MS, Heckerling PS. Use of decision analysis to evaluate patients’ choices of diagnostic prenatal test. Am J Med Genet. 1995;58(4):337–44.

Hozo I, Djulbegovic B. Using the Internet to calculate clinical action thresholds. Comput Biomed Res Int J. 1999;32(2):168–85.

Djulbegovic B, Hozo I, Lyman GH. Linking evidence-based medicine therapeutic summary measures to clinical decision analysis. MedGenMed Medscape Gen Med. 2000;2(1):E6.

CAS   Google Scholar  

Van Hoe L, Miserez M. Effectiveness of imaging studies in acute appendicitis: A simplified decision model. Eur J Emerg Med. 2000;7:25–40.

McAlister FA, O’Connor AM, Wells G, Grover SA, Laupacis A. When should hypertension be treated? The different perspectives of Canadian family physicians and patients. CMAJ Can Med Assoc J J Assoc Medicale Can. 2000;163(4):403–8.

Coenen S, Van Royen P, Vermeire E, Hermann I, Denekens J. Antibiotics for coughing in general practice: a qualitative decision analysis. Fam Pract. 2000;17(5):380–5.

Sinclair JC, Cook RJ, Guyatt GH, Pauker SG, Cook DJ. When should an effective treatment be used? Derivation of the threshold number needed to treat and the minimum event rate for treatment. J Clin Epidemiol. 2001;54(3):253–62.

Cotler SJ, Patil R, McNutt RA, Speroff T, Banaad-Omiotek G, Ganger DR, et al. Patients’ values for health states associated with hepatitis C and physicians’ estimates of those values. Am J Gastroenterol. 2001;96(9):2730–6.

Sonnenberg A. Personal view: cost and benefit of medical rituals in gastroenterology. Aliment Pharmacol Ther. 2004;20(9):939–42.

Ng AK, Li S, Neuberg D, Silver B, Weeks J, Mauch P. Factors influencing treatment recommendations in early-stage Hodgkin’s disease: a survey of physicians. Ann Oncol Off J Eur Soc Med Oncol. 2004;15(2):261–9.

Basinga P, Moreira J, Bisoffi Z, Bisig B, Van den Ende J. Why Are Clinicians Reluctant to Treat Smear-Negative Tuberculosis? An Inquiry about Treatment Thresholds in Rwanda. Med Decis Making. 2007;27(1):53–60.

Hozo I, Schell MJ, Djulbegovic B. Decision-Making When Data and Inferences Are Not Conclusive: Risk-Benefit and Acceptable Regret Approach. Semin Hematol. 2008;45(3):150–9.

Moreira J, Bisig B, Muwawenimana P, Basinga P, Bisoffi Z, Haegeman F, et al. Weighing Harm in Therapeutic Decisions of Smear-Negative Pulmonary Tuberculosis. Med Decis Making. 2009;29(3):380–90.

Thompson C, Dalgleish L, Bucknall T, Estabrooks C, Hutchinson AM, Fraser K, et al. The effects of time pressure and experience on nurses’ risk assessment decisions: a signal detection analysis. Nurs Res. 2008;57(5):302–11.

Ben-Haim Y, Zacksenhouse M, Keren C, Dacso CC. Do we know how to set decision thresholds for diabetes? Med Hypotheses. 2009;73(2):189–93.

Boland MV, Lehmann HP. A new method for determining physician decision thresholds using empiric, uncertain recommendations. BMC Med Inform Decis Mak. 2010;10(101088682):20.

Tsalatsanis A, Barnes LE, Hozo I, Djulbegovic B. Extensions to regret-based decision curve analysis: an application to hospice referral for terminal patients. BMC Med Inform Decis Mak. 2011;11(101088682):77.

Mohan D, Rosengart MR, Farris C, Fischhoff B, Angus DC, Barnato AE. Sources of non-compliance with clinical practice guidelines in trauma triage: a decision science study. Implement Sci IS. 2012;7(101258411):103.

Pines JM, Lessler AL, Ward MJ, Mark CD. The mortality benefit threshold for patients with suspected pulmonary embolism. Acad Emerg Med. 2012;19(9):E1109–13.

Vickers AJ, Cronin AM, Gonen M. A simple decision analytic solution to the comparison of two binary diagnostic tests. Stat Med. 2013;32(11):1865–76.

Djulbegovic B, Elqayam S, Reljic T, Hozo I, Miladinovic B, Tsalatsanis A, et al. How do physicians decide to treat: an empirical evaluation of the threshold model. BMC Med Inform Decis Mak. 2014;14(101088682):47.

Felder S, Mayrhofer T. Risk preferences: consequences for test and treatment thresholds and optimal cutoffs. Med Decis Mak Int J Soc Med Decis Mak. 2014;34(1):33–41.

Hernandez JM, Tsalatsanis A, Humphries LA, Miladinovic B, Djulbegovic B, Velanovich V. Defining optimum treatment of patients with pancreatic adenocarcinoma using regret-based decision curve analysis. Ann Surg. 2014;259(6):1208–14.

Sreeramareddy CT, Rahman M, Harsha Kumar HN, Shah M, Hossain AM, Sayem MA, et al. Intuitive weights of harm for therapeutic decision making in smear-negative pulmonary Tuberculosis: an interview study of physicians in India, Pakistan and Bangladesh. BMC Med Inform Decis Mak. 2014;14(101088682):67.

Cucchetti A, Djulbegovic B, Tsalatsanis A, Vitale A, Hozo I, Piscaglia F, et al. When to perform hepatic resection for intermediate-stage hepatocellular carcinoma. Hepatol Baltim Md. 2015;61(3):905–14.

Ebell MH, Locatelli I, Senn N. A novel approach to the determination of clinical decision thresholds. Evid Based Med. 2015;20(2):41–7.

Sonnenberg A. Ignorance isn’t bliss: why patients become angry. Eur J Gastroenterol Hepatol. 2015;27(6):619–22.

Courbage C, Rey B. Decision Thresholds and Changes in Risk for Preventive Treatment. Health Econ. 2016;25(1):111–24.

Sheldrick RC, Breuer DJ, Hassan R, Chan K, Polk DE, Benneyan J. A system dynamics model of clinical decision thresholds for the detection of developmental-behavioral disorders. Implement Sci IS. 2016;11(1):156.

Vickers AJ, Van Calster B, Steyerberg EW. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ Online. 2016;352. Available from: https://www.scopus.com/inward/record.uri?eid=2-s2.0-84959328652&doi=10.1136%2fbmj.i6&partnerID=40&md5=fe1ef082c28f9376c332111191d28ead .

Tsalatsanis A, Hozo I, Djulbegovic B. Acceptable regret model in the end-of-life setting: Patients require high level of certainty before forgoing management recommendations. Eur J Cancer. 2017;75:159–66.

Hozo I, Djulbegovic B, Luan S, Tsalatsanis A, Gigerenzer G. Towards theory integration: Threshold model as a link between signal detection theory, fast-and-frugal trees and evidence accumulation theory. J Eval Clin Pract. 2017;23(1):49–65.

Ebell MH, Locatelli I, Mueller Y, Senn N, Morgan K. Diagnosis and treatment of community-acquired pneumonia in patients with acute cough: a quantitative study of decision thresholds in primary care. Br J Gen Pract. 2018;68(676):e765–74.

Fujii Y, Osaki Y. Regret-sensitive treatment decisions. Health. Econ Rev. 2018;8(1):14.

Djulbegovic B, Hozo I, Mayrhofer T, van den Ende J, Guyatt G. The threshold model revisited. J Eval Clin Pract. 2019;25(2):186–95.

Boyles TH, Lynen L, Seddon JA. Decision-making in the diagnosis of tuberculous meningitis. Tuberculous Meningitis International Research Consortium, editor. Wellcome Open Res. 2020;5(101696457):11.

De Alencastro L, Locatelli I, Clair C, Ebell MH, Senn N. Correlation of clinical decision-making with probability of disease: A web-based study among general practitioners. PloS One. 2020;15(10 October):e0241210.

Djulbegovic M, Lee AI, Chen K. Which patients with unprovoked venous thromboembolism should receive extended anticoagulation with direct oral anticoagulants? A systematic review, network meta-analysis, and decision analysis. J Eval Clin Pract. 2020;26(1):7–17.

Patel BS, Steinberg E, Pfohl SR, Shah NH. Learning decision thresholds for risk stratification models from aggregate clinician behavior. J Am Med Inform Assoc JAMIA. 2021;28(10):2258–64.

Courbage C, Peter R. On the effect of uncertainty on personal vaccination decisions. Health Econ. 2021;30(11):2937–42.

van Overbeeke E, Hauber B, Michelsen S, Peerlinck K, Lambert C, Hermans C, et al. Patient preferences for gene therapy in haemophilia: Results from the PAVING threshold technique survey. Haemophilia. 2021;27(6):957–66.

Cai X, Ebell MH, Geyer RE, Thompson M, Gentile NL, Lutz B. The impact of a rapid home test on telehealth decision-making for influenza: a clinical vignette study. BMC Prim Care. 2022;23(1):75.

Wen FH, Chou WC, Hou MM, Su PJ, Shen WC, Chen JS, et al. Associations of surrogates’ death-preparedness states with decisional conflict and heightened decisional regret over cancer patients’ last 6 months of life. Psychooncology. 2022;31(9):1502–9.

Sevim D, Felder S. Decision Thresholds for Medical Tests Under Ambiguity Aversion. Front Health Serv. 2022;2(9918334887706676):825315.

Cucchetti A, Djulbegovic B, Crippa S, Hozo I, Sbrancia M, Tsalatsanis A, et al. Regret affects the choice between neoadjuvant therapy and upfront surgery for potentially resectable pancreatic cancer. Reg-PanC study group, editor. Surg U S. 2023;173(6):1421–7.

Djulbegovic B, Hozo I, Lizarraga D, Guyatt G. Decomposing clinical practice guidelines panels’ deliberation into decision theoretical constructs. J Eval Clin Pract. 2023;29(3):459–71.

Taylor SP, Weissman GE, Kowalkowski M, Admon AJ, Skewes S, Xia Y, et al. A Quantitative Study of Decision Thresholds for Initiation of Antibiotics in Suspected Sepsis. Med Decis Mak. 2023;43(2):175–82.

Bell DE. Risk Premiums for Decision Regret. Manag Sci. 1983;29(10):1156–66.

Diamond GA, Hirsch M, Forrester JS, Staniloff HM, Vas R, Halpern SW, et al. Application of information theory to clinical diagnostic testing. The electrocardiographic stress test Circulation. 1981;63(4):915–21.

CAS   PubMed   Google Scholar  

Steel N. Thresholds for taking antihypertensive drugs in different professional and lay groups: questionnaire survey. BMJ. 2000;320(7247):1446–7.

Brundage MD, Feldman-Stewart D, Cosby R, Gregg R, Dixon P, Youssef Y, et al. Cancer patients’ attitudes toward treatment options for advanced non-small cell lung cancer: implications for patient education and decision support. Patient Educ Couns. 2001;45(2):149–57.

Cahan A, Gilon D, Manor O, Paltiel O. Probabilistic reasoning and clinical decision-making: do doctors overestimate diagnostic probabilities? QJM Mon J Assoc Physicians. 2003;96(10):763–9.

Douglas H. Weighing Complex Evidence in a Democratic Society. Kennedy Inst Ethics J. 2012;22(2):139–62.

Nair T, Savulescu J, Everett J, Tonkens R, Wilkinson D. Settling for second best: when should doctors agree to parental demands for suboptimal medical treatment? J Med Ethics. 2017;43(12):831–40.

Lahaye S, Regpala S, Lacombe S, Sharma M, Gibbens S, Ball D, et al. Evaluation of patients’ attitudes towards stroke prevention and bleeding risk in atrial fibrillation. Thromb Haemost. 2014;111(3):465–73.

van der Keylen P, Zeschick N, Schlenz AR, Kuhlein T. Treatment thresholds and minimal clinically important effect sizes of antiosteoporotic medication-Survey among physicians and lay persons in Germany. PloS One. 2022;17(8 August):e0272985.

Winkenwerder W, Levy BD, Eisenberg JM, Williams SV, Young MJ, Hershey JC. Variation in physicians’ decision-making thresholds in management of a sexually transmitted disease. J Gen Intern Med. 1993;8(7):369–73.

Hanson LC, Danis M, Garrett JM, Mutran E. Who decides? Physicians’ willingness to use life-sustaining treatment. Arch Intern Med. 1996;156(7):785–9.

Connors GR, Siner JM. Clinical Reasoning and Risk in the Intensive Care Unit. Clin Chest Med. 2015;36(3):449–59.

Di Stefano LM, Wood K, Mactier H, Bates SE, Wilkinson D. Viability and thresholds for treatment of extremely preterm infants: survey of UK neonatal professionals. Arch Dis Child Fetal Neonatal Ed. 2021;106(6):F596–602.

Stojan JN, Daniel M, Hartley S, Gruppen L. Dealing with uncertainty in clinical reasoning: A threshold model and the roles of experience and task framing. Med Educ. 2022;56(2):195–201.

Greenfield S, Bryan S, Gill P, Gutridge K, Marshall T. Factors influencing clinicians’ decisions to prescribe medication to prevent coronary heart disease. J Clin Pharm Ther. 2005;30(1):77–84.

Ost DE, Gould MK. Decision making in patients with pulmonary nodules. Am J Respir Crit Care Med. 2012;185(4):363–72.

Donner-Banzhoff N, Muller B, Beyer M, Haasenritter J, Seifart C. Thresholds, rules and defensive strategies: how physicians learn from their prior diagnosis-related experiences. Diagn Berl Ger. 2020;7(2):115–21.

Billington EO, Feasel AL, Kline GA. At Odds About the Odds: Women’s Choices to Accept Osteoporosis Medications Do Not Closely Agree with Physician-Set Treatment Thresholds. J Gen Intern Med. 2020;35(1):276–82.

Man-Son-Hing M, Gage BF, Montgomery AA, Howitt A, Thomson R, Devereaux PJ, et al. Preference-based antithrombotic therapy in atrial fibrillation: implications for clinical decision making. Med Decis Mak Int J Soc Med Decis Mak. 2005;25(5):548–59.

Minami CA, King TA, Mittendorf EA. Patient preferences for locoregional therapy in early-stage breast cancer. Breast Cancer Res Treat. 2020;183(2):291–309.

Lazarus DR, Ost DE. The solitary pulmonary nodule-deciding when to act? Semin Respir Crit Care Med. 2013;34(6):748–61.

Kahneman D, Tversky A. Prospect Theory: An Analysis of Decision under Risk. Econometrica. 1979;47(2):263–92.

Baron J. Normative, descriptive and prescriptive responses. Behav Brain Sci. 1994;17(1):32–42.

Baron J. The point of normative models in judgment and decision making. Front Psychol. 2012 [cited 2024 Aug 18];3. Available from: http://journal.frontiersin.org/article/10.3389/fpsyg.2012.00577/abstract .

Djulbegovic B, Elqayam S. Many faces of rationality: Implications of the great rationality debate for clinical decision-making. J Eval Clin Pract. 2017;23(5):915–22.

Bell DE, Raiffa H, Tversky A. Descriptive, normative, and prescriptive interactions in decision making. In: Bell DE, Raiffa H, Tversky A, editors. Decision Making. 1st ed. Cambridge University Press; 1988 [cited 2024 Aug 18]. p. 9–30. Available from: https://www.cambridge.org/core/product/identifier/CBO9780511598951A010/type/book_part .

Zeelenberg M, Pieters RA. Theory of Regret Regulation 1.0. J Consum Psychol. 2007;17(1):3–18.

Sugden R. Regret, recrimination and rationality. Theory Decis. 1985;19:77–99.

Von Neumann J, Morgenstern O. Theory of games and economic behavior, 2nd rev. 1947.

Zeelenberg M. Anticipated regret, expected feedback and behavioral decision making. J Behav Decis Mak. 1999;12(2):93–106.

Landman J. Regret and Elation Following Action and Inaction: Affective Responses to Positive Versus Negative Outcomes. Pers Soc Psychol Bull. 1987;13(4):524–36.

Zeelenberg M, Pieters R. Consequences of regret aversion in real life: The case of the Dutch postcode lottery. Organ Behav Hum Decis Process. 2004;93(2):155–68.

Zeelenberg M, Beattie J, van der Pligt J, de Vries NK. Consequences of Regret Aversion: Effects of Expected Feedback on Risky Decision Making. Organ Behav Hum Decis Process. 1996;65(2):148–58.

Ritov I, Baron J. Reluctance to vaccinate: Omission bias and ambiguity. J Behav Decis Mak. 1990;3:263–77.

Ritov I, Baron J. Outcome Knowledge, Regret, and Omission Bias. Organ Behav Hum Decis Process. 1995;64(2):119–27.

Kahneman D. Thinking, Fast and Slow, vol. 1st. Straus and Giroux: Edition. Farrar; 2011.

Lizardo O, Mowry R, Sepulvado B, Stoltz DS, Taylor MA, Van Ness J, et al. What Are Dual Process Models? Implications for Cultural Analysis in Sociology. Sociol Theory. 2016;34(4):287–310.

Strandell J. Bridging the Vocabularies of Dual-Process Models of Culture and Cognition. In: Brekhus WH, Ignatow G, editors. The Oxford Handbook of Cogntive Sociology. 2019.

Vaisey S. Motivation and Justification: A Dual-Process Model of Culture in Action. Am J Sociol. 2009;114(6):1675–715.

Douglas F, Petrie KJ, Cundy T, Horne A, Gamble G, Grey A. Differing perceptions of intervention thresholds for fracture risk: a survey of patients and doctors. Osteoporos Int. 2012;23(8):2135–40.

Robinson PJ, Botzen WJW. The impact of regret and worry on the threshold level of concern for flood insurance demand: Evidence from Dutch homeowners. Judgm Decis Mak. 2018;13(3):237–45.

Robinson PJ, Botzen WJW. Determinants of Probability Neglect and Risk Attitudes for Disaster Risk: An Online Experimental Study of Flood Insurance Demand among Homeowners. Risk Anal. 2019;39(11):2514–27.

Robinson PJ, Botzen WJW. Flood insurance demand and probability weighting: The influences of regret, worry, locus of control and the threshold of concern heuristic. Water Resour Econ. 2020;30: 100144.

Slovic P, Fischhoff B, Lichtenstein S, Corrigan B, Combs B. Preference for Insuring against Probable Small Losses: Insurance Implications. J Risk Insur. 1977;44(2):237.

Botzen WJW, Kunreuther H, Michel-Kerjan E. Divergence between individual perceptions and objective indicators of tail risks: Evidence from floodplain residents in New York City. Judgm Decis Mak. 2015;10(4):21.

Botzen WJW, Duijndam SJ, Robinson PJ, Beukering P. Behavioral biases and heuristics in perceptions of COVID‐19 risks and prevention decisions. Risk Anal. 2022;29:risa.13882.

Lempert RO. Modeling Relevance. Mich Law Rev. 1977;75(5/6):1021.

Djulbegovic B, Hozo I, Mandrola J. Sorites paradox and persistence in overuse and underuse in healthcare delivery services. J Eval Clin Pract. 2023;29(6):877–9.

Download references

Acknowledgements

This manuscript is based on research presented in a chapter of AS’s doctoral dissertation at the University of Ottawa. The authors would also like to thank the anonymous peer-reviewers of this manuscript whose intellectual reflections contributed positively to this paper.

AS would like to acknowledge the financial support of the following scholarships: Mitacs Accelerate Entrepreneur Graduate Scholarship, the Ontario Graduate Scholarship, the Queen Elizabeth II Graduate Scholarship for Science and Technology, and the University of Ottawa. AC would like to acknowledge the financial support from the following scholarships: Canada Graduate Scholarship -Doctoral (CGS D) from the Social Sciences and Humanities Research Council (SSHRC), and the University of Ottawa. KB was the primary investigator for the Mitacs Accelerate Entrepreneur Graduate Scholarship. All funding agreements ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing this manuscript.

Author information

Authors and affiliations.

Telfer School of Management, University of Ottawa, Ottawa, ON, Canada

Andrew Scarffe, Alison Coates, Kevin Brand & Wojtek Michalowski

Ottawa Hospital Research Institute, The Ottawa Hospital, Ottawa, ON, Canada

Andrew Scarffe

Bob Gaglardi School of Business and Economics, Thompson Rivers University, Kamloops, BC, Canada

You can also search for this author in PubMed   Google Scholar

Study concept and design: AS, AC, KB, WM; acquisition of data: AS, AC; critical revision of manuscript: AS, AC, KB, WM; final approval of the version to be published: AS, AC, KB, WM; agreement to be accountable for all aspects of the work: AS, AC, KB, WM.

Corresponding author

Correspondence to Andrew Scarffe .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Scarffe, A., Coates, A., Brand, K. et al. Decision threshold models in medical decision making: a scoping literature review. BMC Med Inform Decis Mak 24 , 273 (2024). https://doi.org/10.1186/s12911-024-02681-2

Download citation

Received : 19 June 2024

Accepted : 12 September 2024

Published : 27 September 2024

DOI : https://doi.org/10.1186/s12911-024-02681-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Decision-making
  • Decision thresholds
  • Medical decision-making
  • Scoping review

BMC Medical Informatics and Decision Making

ISSN: 1472-6947

types of the literature review

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 28 September 2024

A framework for human evaluation of large language models in healthcare derived from literature review

  • Thomas Yu Chow Tam 1   na1 ,
  • Sonish Sivarajkumar   ORCID: orcid.org/0000-0003-4173-1988 2   na1 ,
  • Sumit Kapoor 3 ,
  • Alisa V. Stolyar 1 ,
  • Katelyn Polanska 1 ,
  • Karleigh R. McCarthy 1 ,
  • Hunter Osterhoudt 1 ,
  • Xizhi Wu 1 ,
  • Shyam Visweswaran   ORCID: orcid.org/0000-0002-2079-8684 4 , 5 ,
  • Sunyang Fu   ORCID: orcid.org/0000-0003-1691-5179 6 ,
  • Piyush Mathur 7 , 8 ,
  • Giovanni E. Cacciamani 9 ,
  • Cong Sun 10 ,
  • Yifan Peng   ORCID: orcid.org/0000-0001-9309-8331 10 &
  • Yanshan Wang   ORCID: orcid.org/0000-0003-4433-7839 1 , 2 , 4 , 5 , 11  

npj Digital Medicine volume  7 , Article number:  258 ( 2024 ) Cite this article

Metrics details

  • Health care
  • Medical research

With generative artificial intelligence (GenAI), particularly large language models (LLMs), continuing to make inroads in healthcare, assessing LLMs with human evaluations is essential to assuring safety and effectiveness. This study reviews existing literature on human evaluation methodologies for LLMs in healthcare across various medical specialties and addresses factors such as evaluation dimensions, sample types and sizes, selection, and recruitment of evaluators, frameworks and metrics, evaluation process, and statistical analysis type. Our literature review of 142 studies shows gaps in reliability, generalizability, and applicability of current human evaluation practices. To overcome such significant obstacles to healthcare LLM developments and deployments, we propose QUEST, a comprehensive and practical framework for human evaluation of LLMs covering three phases of workflow: Planning, Implementation and Adjudication, and Scoring and Review. QUEST is designed with five proposed evaluation principles: Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, and Trust and Confidence.

Similar content being viewed by others

types of the literature review

A toolbox for surfacing health equity harms and biases in large language models

types of the literature review

The imperative for regulatory oversight of large language models (or generative AI) in healthcare

types of the literature review

Evaluation and mitigation of the limitations of large language models in clinical decision-making

Introduction.

Due to the remarkable ability to generate coherent responses to questions and requests, generative artificial intelligence (GenAI), specifically large language models (LLMs) such as proprietary LLMs (e.g., GPT-4 1 ) and open-source LLMs (e.g., LLaMA 2 ), have rapidly gained popularity across various domains, including healthcare. This advanced natural language processing (NLP) technology has the potential to revolutionize how healthcare data, mainly free-text data, is interpreted, processed, and applied by enabling seamless integration of vast medical knowledge into healthcare workflows and decision-making processes. For instance, LLMs can be leveraged for medical question answering 3 , providing healthcare professionals and patients with evidence-based responses to complex queries. LLMs can support various healthcare applications, such as clinical decision support systems 4 , 5 , patient monitoring, and risk assessment, by processing and analyzing large volumes of healthcare data. Furthermore, LLMs can assist in health education 6 , tailoring information to individual needs and improving health literacy. As GenAI capabilities advance, LLMs are poised to play an increasingly pivotal role in improving patient care through personalized medicine and enhancing healthcare processes. Therefore, their effective evaluation is critical.

For NLP technologies, quantitative evaluation metrics such as Bilingual Evaluation Understudy (BLEU) for machine translation 7 and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) for summarization 8 have been employed, along with benchmarks like Holistic Evaluation of Language Models (HELM) 9 , for comprehensive automatic evaluation. While quantitative evaluation metrics such as accuracy, the F measure, and the area under the receiver operating characteristic (AUROC) curve offer statistical measures of accuracy, they cannot fully evaluate the generative nature of LLMs and do not assess the clinical utility and accuracy needed for deployment in healthcare 10 , 11 . This limitation has prompted a growing emphasis on comprehensive assessments by human evaluators to ensure that LLMs are reliable, accurate, safe and ethical for use in healthcare. Recent suggestions of using LLMs to evaluate LLM outputs are problematic 12 , particularly considering the questionable quality of summarization and the presence of misinformation in LLMs 9 . Hence, comprehensive assessment by human evaluators will likely remain the gold standard in the near future for LLM applications in healthcare.

Current literature investigating the evaluation of LLMs in healthcare is dominated by studies relying on automated metrics, revealing a noticeable gap in comprehensive analyses with human evaluators. Wei et al. 13 reviewed 60 articles that used ChatGPT’s responses to medical questions to assess the performance of ChatGPT in medical Question-Answering (QA). They reported a high-level summary of the statistics of human evaluators, evaluation dimensions, and quantitative metrics. Park et al. 14 examined 55 articles with the use of LLMs in medical applications. Of the 55 articles, they found that 36 incorporated human evaluation. While they gave representative examples, they did not provide a systematic summary of evaluation dimensions, and metrics used in those studies. Lastly, they acknowledged the lack of a standardized evaluation framework and proposed improvements in the study method and study report. Yuan et al. 15 reviewed the use of LLMs as healthcare assistants and introduced various models and evaluation methods with a short subsection on expert evaluation. However, it does not systematically survey human evaluation. Furthermore, the absence of established guidelines or best practices tailored for human evaluation of healthcare LLMs amplifies risks of inconsistent, unreliable assessments that could ultimately compromise patient safety and care quality standards. Awasthi et al. 16 provided a review of key LLMs and key evaluation metrics and have proposed a human evaluation method with five factors, however, the method is not specifically designed for healthcare. Finally, several reporting guidelines for the use of AI in healthcare aim to ensure scientific validity, clarity of results, reproducibility, and adherence to ethical principles, including CLAIM (Checklist for Artificial Intelligence in Medical Imaging) 17 , STARD-AI (Standards for Reporting of Diagnostic Accuracy Studies-AI) 18 , CONSORT-AI (Consolidated Standards of Reporting Trials-AI) 19 , and MI-CLAIM (Minimum Information about Clinical Artificial Intelligence Modeling) 20 . However, none of these guidelines specifically address the standards and reporting of human evaluation of LLMs in healthcare.

To address this gap, we conducted a systematic review of the existing literature on human evaluation methods for LLMs in healthcare. Our primary objectives are:

To identify and analyze studies reporting human evaluations of LLMs across a diverse range of medical domains, tasks, and specialties.

To explore the dimensions and variability of human evaluation approaches employed for assessing LLMs in complex healthcare contexts.

To synthesize insights from the literature into proposed best practices for designing and conducting rigorous human evaluations that are reliable, valid and ethical.

To provide actionable guidelines for developing standardized human evaluation frameworks for healthcare uses of LLMs.

Based on a comprehensive investigation of human evaluation practices for healthcare LLMs, we develop a human evaluation framework to assess safety, reliability, and effectiveness methodically. Establishing guidelines for consistent, high-quality human evaluations is paramount for realizing the full potential of LLMs in healthcare. Our findings are intended to serve as a foundation for catalyzing further research into this underexplored yet critical area at the intersection of GenAI and medicine.

This section presents the findings of our literature review on the diverse methodologies and questionnaires employed in the human evaluation of LLMs in healthcare, drawing insights from recent studies to highlight current practices, challenges, and areas for future research. Supplementary Fig. 1 shows the distribution of the LLMs experimented as reported in these reviewed studies.

Healthcare applications of LLMs

Figure 1 illustrates the distribution of healthcare applications for LLMs that underwent human evaluation, providing insights into the diverse range of healthcare domains where these models are being utilized. Clinical Decision Support (CDS) emerges as the most prevalent application, accounting for 28.1% of the categorized tasks. This is followed by medical education and examination at 24.8%, Patient education at 19.6%, and medical question answering at 15.0%. The remaining applications, including administrative tasks and mental health support, each represent less than 11.8% of the total. This distribution highlights the focus of researchers and healthcare professionals on leveraging LLMs to enhance decision-making, improve patient care, and facilitate education and communication in various medical specialties..

figure 1

The reviewed studies showcased a diverse range of healthcare applications for LLMs from bench to bedside and beyond, each aiming to enhance different aspects of patient care and clinical practice, biomedical and health sciences research, and education.

As illustrated in Fig. 1 , CDS was the most prevalent application, accounting for 28.1% of the categorized tasks. Studies such as Lechien et al. 21 and Seth et al. 22 provide illustrative examples of how LLMs can improve accuracy and reliability in real-time patient monitoring and diagnosis, respectively. In Lechien et al., forty clinical cases, i.e., medical history and clinical examination of patients consulting at the Otolaryngology-Head and Neck Surgery department, are submitted to ChatGPT for differential diagnosis, management, and treatment suggestions. Expert evaluators rated ChatGPT performance with the Ottawa Clinic Assessment Tool and assessed that ChatGPT’s primary diagnoses and other differential diagnoses are plausible in 90% of the cases and argued it can be “a promising adjunctive tool in laryngology and head and neck surgery practice”. In Seth et al., six questions regarding the diagnosis and management of Carpal Tunnel Syndrome (CTS) are posed to ChatGPT to simulate patient-physician consultation and the author suggested the response have the quality of providing “validated medical information on CTS to non-medical individuals” 22 . Milliard et al. scrutinized the efficacy of ChatGPT-generated answers for the management of bloodstream infection, setting up comparisons against the plan suggested by infectious disease consultants based on literature and guidelines 23 . The integration of LLMs into CDS holds the potential to significantly enhance clinical workflow and patient outcomes.

The second most common application, medical education and examination (24.8%), was explored by researchers like Yaneva et al. 24 , Wu et al. 25 , and Ghosh et al. 26 . Yaneva et al. 24 evaluated the performance of LLMs on medical licensing examinations, such as the United States Medical Licensing Examination(USMLE), suggesting their potential in medical education. Ghosh et al. 26 took this a step further, demonstrating through statistical analysis that LLMs can address higher-order problems related to medical biochemistry.

Patient education, the third most prevalent application (19.6%), was investigated by studies such as Choi et al. 27 , in which they assessed ChatGPT as a self-learning tool in pharmacology, and Kavadella et al. 28 , in which they assessed ChatGPT for undergraduate dental education. In addition, Baglivo et al. conducted a feasibility study and evaluated the use of AI Chatbots in providing complex medical answers related to vaccinations and offering valuable educational support, even outperforming medical students in both direct and scenario-based question-answering tasks 29 . Alapati et al. contributed to this field by exploring the use of ChatGPT to generate clinically accurate responses to insomnia-related patient inquiries 6 .

Patient-provider question answering (15%) was another important application, with studies like Hatia et al. 30 and Ayers et al. 3 taking the lead. Hatia et al. analyzed the performance of ChatGPT in delivering accurate orthopedic information for patients, thus proposing it as a replacement for informed consent 31 . Ayers et al. conducted a comparative study, employing qualitative and quantitative methods to enhance our understanding of LLM effectiveness in generating accurate and empathetic responses to patient questions posed in an online forum 3 .

In the field of translational research, Peng et al. 32 and Xie et al. 33 provide insightful contributions. Peng et al. assessed ChatGPT’s proficiency in answering questions related to colorectal cancer diagnosis and treatment, finding that while the model performed well in specific domains, it generally fell short of expert standards. Xie et al. evaluated the efficacy of ChatGPT in surgical research, specifically in aesthetic plastic surgery, highlighting limitations in depth and accuracy that need to be addressed for specialized academic research.

The studies by Tang et al. 34 , Moramarco et al. 35 , Bernstein et al. 36 , and Hirosawa et al. 37 underscore the expanding role of LLMs in medical evidence compilation, diagnostic proposals, and clinical determinations. Tang et al. employed a t-test to counterbalance the correctness of medical evidence compiled by ChatGPT against that of healthcare practitioners. Moramarco et al. used chi-square examinations to detect differences in the ease and clarity of patient-oriented clinical records crafted by LLMs. Bernstein et al. enlisted the McNemar test to track down precision and dependability in diagnostic suggestions from LLMs and ophthalmologists. Hirosawa et al. carried out a comparison between LLM diagnoses and gold-standard doctor diagnoses, targeting differential diagnosis accuracy.

Medical specialties

To ensure a thorough analysis of medical specialties, we have adopted the classifications defined by the 24 certifying boards of the American Board of Medical Specialties 38 . Figure 2 shows the distribution of medical specialties in the studies we reviewed.

figure 2

The literature review revealed a diverse range of medical specialties leveraging LLMs, with Radiology the leading specialty. Urology and General Surgery also emerged as prominent specialties, along with Plastic Surgery, Otolaryngology, Ophthalmology, Orthopedic Surgery and Psychiatry, while other specialties had fewer than 5 articles each. This distribution highlights the broad interest and exploration of LLMs across various medical domains, indicating the potential for transformative impacts in multiple areas of healthcare, and the need for comprehensive human evaluation in these areas.

As illustrated in Fig. 2 , the literature review revealed a diverse range of medical specialties leveraging LLMs, with Radiology leading ( n  = 12). Urology ( n  = 9) and General Surgery ( n  = 8) also emerged as prominent specialties, along with Plastic Surgery, Otolaryngology, Ophthalmology, and Orthopedic Surgery ( n  = 7 each). Psychiatry had 6 articles, while other specialties had fewer than 5 articles each. This distribution highlights the broad interest and exploration of LLMs across various medical domains, indicating the potential for transformative impacts in multiple areas of healthcare, and the need for comprehensive human evaluation in these areas.

In Radiology, human evaluation plays a crucial role in assessing the quality and accuracy of generated reports. Human evaluation in Radiology also extends to assessing the clinical utility and interpretability of LLM outputs, ensuring they align with radiology practices 39 . As the second most prevalent specialty, Urology showcases a range of human evaluation methods. Most evaluate methods are applied in patient-centric applications, such as patient education and disease management, which often utilize user satisfaction surveys, feedback forms, and usability assessments to gauge the effectiveness of LLM-based interventions. In the General Surgery specialty, human evaluation focuses on the practical application of LLMs in pre-operative planning, surgical simulations, and post-operative care. Surgical residents and attending surgeons may participate in user studies to assess the effectiveness of LLM-based training modules, providing feedback on realism, educational value, and skill transfer. Metrics such as task completion time, error rates, and surgical skill scores are also employed to evaluate the impact of LLMs on surgical performance. In Plastic Surgery and Otolaryngology specialties, human evaluation often revolves around patient satisfaction and aesthetic outcome, for instance, to gather feedback on LLM-assisted cosmetic and reconstructive surgery planning.

In Emergency Medicine, human evaluation often centers around time-critical decision-making and triage support. Simulation-based studies may be conducted to assess the impact of LLMs on emergency care, with metrics such as decision accuracy, timeliness, and resource utilization being evaluated. Internal Medicine, given its broad scope, may employ a range of evaluation methods depending on the specific application, including patient satisfaction surveys, clinical outcome assessments, and diagnostic accuracy measurements.

Evaluation design

The evaluation of LLMs in healthcare demands a comprehensive and multifaceted approach that reflects the complexities of medical specialties and clinical tasks. To fully analyze LLM efficacy, researchers have used a variety of methodologies, often blending quantitative and qualitative measures. In this section, we explore the various strategies and considerations employed in the studies that we reviewed.

Evaluation principles and dimensions: QUEST - five principles of evaluation dimensions

We categorized the evaluation methods in the reviewed articles into 17 dimensions that are further grouped into five principles. These include Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, and Trust and Confidence, which we name them using the acronym QUEST. Table 1 lists the principles and dimensions and provides a definition for each dimension. Most of the definitions were adapted from the meanings provided by the Merriam-Webster English Dictionary. Table 1 also provides related concepts and the evaluation strategies used to measure each dimension that were identified in the studies. Table 2 outlines examples of questions used to evaluate each dimension of LLM responses, aligned with QUEST principles.

Q uality of Information examines the multi-dimensional quality of information provided by the LLM response, including their accuracy, relevance, currency, comprehensiveness, consistency, agreement, and usefulness; U nderstanding and Reasoning explores the ability of the LLMs in understanding the prompt and logical reasoning in its response; E xpression Style and Persona measures the response style of the LLMs in terms of clarity and empathy; S afety and Harm concerns the safety dimensions of LLM response, bias, harm, self-awareness, and fabrication, falsification, or plagiarism; and, T rust and Confidence considers the trust and satisfaction the user ascribe to the LLM response.

Evaluation checklist

Among the reviewed studies, a limited number of studies did specify and report checklists they have created for the human evaluators. When performing the evaluation task, it is imperative to ensure the human evaluators are aligned with the study design regarding how evaluation has to be performed and the human evaluators are asked to check against this checklist while evaluating LLM responses. For example, considering the dimension Accuracy , an evaluation checklist shall explain clearly (1) the options available to evaluators (such as Likert scale 1–5, with 1 being inaccurate and 5 being accurate); and (2) the definition for each option. However, due to lack of reporting, it is unclear whether the reviewed studies provide adequate training and evaluation examples to align with the understanding of the recruited human evaluators.

Evaluation samples

While the above dimensions and checklists provide human evaluators the concrete qualities to evaluate, another key consideration is evaluation samples, i.e., the text responses output by the LLMs. In particular, we examined the number and the variability of samples evaluated by human evaluators in the reviewed articles.

Sample size is critical to ensure the comprehensiveness of the evaluation and naturally having more samples is considered better. However, this is limited by a combination of constraints such as the number of evaluation dimensions, the complexity of the evaluation process, the number of evaluators, and the funding. In Fig. 3 , the distribution of the aggregate sample sizes in studies reviewed is shown. The majority of studies have 100 or below LLMs output for human evaluation, but we do observe one outlier study with 2995 samples by Moramarco et al. 35 . They designed the evaluation to be completed using Amazon Mechanical Turk (MTurk), an online crowdsourcing service. As the authors noted, MTurk has limitations in controlling the reading age and language capabilities of the annotators, necessitating a larger sample size to account for variability in the annotations. A total of 2995 sentences were evaluated in this study, with each sentence being evaluated seven times by different evaluators.

figure 3

The left panel shows the distribution of sample size for all studies while the right panel depicts the distribution for studies with 1–100 sample(s).

Sample variability is important to ensure the diversity and generalizability in the human evaluation of LLMs. Depending on the availability of data and/or applications, while most questions/prompts in the reviewed studies are patient-agnostic, such as “why am I experiencing sudden blurring of vision?”, a subset of the reviewed studies incorporated patient population variability into their experiments and evaluated the quality of LLMs samples in different subgroups. Specifically, using prompt templates, these studies prompted the LLMs with different patient-specific information from sources such as patients’ clinical notes from electronic health records 21 , 40 or clinician-prepared clinical vignettes 41 , 42 . This variability allows researchers to evaluate and compare the LLMs' performance across different patient subpopulations, characterized by their symptoms, diagnoses, demographic information, and other factors.

Selection and recruitment of human evaluators

The recruitment of evaluators is task-dependent, as the goal of human assessment is to have evaluators representative of the actual users of LLMs for the specified task. Based on the reviewed articles, there are mainly two types of evaluators, expert and non-expert. Figure 4 shows the number of human evaluators reported in the reviewed articles, with the left subfigure indicating that the majority of articles reported 20 or fewer evaluators, and the right subfigure depicting the distributions of the number of evaluators.

figure 4

The left panel shows the distribution of the number of human evaluators for all studies while the right panel depicts the distribution for studies with 1–20 human evaluator(s).

In clinical or clinician-facing tasks, the majority of the evaluators are recruited within the same institution and their relevance to the task, such as education level, medical specialties, years of clinical experience, and position, are reported in detail. Some studies also describe the demographics of the evaluators, such as the country (e.g., Singhal et al. 10 ). As shown in Fig. 4 , a majority of studies have less than 20 evaluators, with only 3 studies recruiting more than 50 evaluators 43 , 44 , 45 .

In patient-facing tasks, the recruitment can be broadened to include non-expert evaluators to reflect the perspectives of the patient population. Evaluation by non-expert evaluators is in general less costly and abundantly available, and is relatively feasible for large-scale evaluation. For example, Moramarco et al. 35 , where the author evaluated the user-friendliness of LLM-generated response, they recruited a variety of evaluators online via crowdsourcing platform MTurk. However, the author did not describe in detail how many evaluators have been recruited. In Singhal et al. 10 , in addition to expert evaluation, 5 evaluators without medical background from India were recruited to evaluate the helpfulness and actionability of the LLMs’ response.

Generally, in studies with non-expert evaluation, we observe a decrease in the number of dimensions but an increase in the number of evaluators when comparing expert evaluation, showing a potential tradeoff between the depth and breadth of human evaluation.

Human evaluators and sample sizes for specific healthcare applications

We also investigated the relationship among human evaluators and sample sizes for different healthcare applications in the reviewed studies. Table 3 shows the median, mean, and standard deviation (S.D.) values of the number of evaluation samples and human evaluators for each healthcare application. Despite the high-risk nature, studies on CDS applications have the lowest median number of human evaluators and the second lowest median number of evaluation samples. A possible reason could be that CDS requires more qualified human evaluators who are more difficult and expensive to recruit. Patient-facing applications (e.g., patient education and patient-provider question answering), on the other hand, have a larger number of both evaluation sample size and human evaluators. It is notable that the variability in sample sizes across the reviewed studies is very high.

Figure 5 exhibits an inverse relationship between evaluation sample size and the number of human evaluators as reported in the reviewed studies. This exhibits a potential challenge in recruiting a large number of evaluators who have the capacity and/or capability to evaluate a high quantity of samples.

figure 5

An inverse relationship is found between evaluation sample size and the number of human evaluators as reported in the reviewed studies. This exhibits a potential challenge in recruiting a large number of evaluators who have the capacity and/or capability to evaluate a high quantity of samples.

Evaluation tools

The evaluation of LLMs in healthcare relies on a range of evaluation tools to assess their responses and overall performance. A key aspect of this evaluation is the assessment of narrative coherence and logical reasoning, which often involves binary variables to determine how well the model incorporate internal and external information 40 . Additionally, the evaluation also extends to identifying errors or limitations in the model’s responses, categorized into logical, informational, and statistical errors. Analyzing these errors provides valuable insights into areas where the LLMs may need improvement or further training.

Likert scale is another widely adopted tool used in human evaluations of LLMs, ranging from simple binary scales to more nuanced 5-point or 7-point scales 40 . Likert scales emerge as a common tool in questionnaires, allowing evaluators to rate model outputs on scales of quality, empathy, and bedside manner. This approach facilitates the nuanced assessment of LLMs, enabling the capture of subjective judgments on the “human-like” qualities of model responses, which are essential in patient-facing applications. These scales allow participants to express degrees of agreement or satisfaction with LLM outputs, providing a quantitative measure that can be easily analyzed while capturing subtleties in perception and experience. For example, 4-point Likert-like scales allow evaluators to differentiate between completely accurate and partially accurate answers, offering a more detailed understanding of the LLM’s performance 46 . Additionally, 5-point Likert scales have been utilized to capture the perceptions of evaluators regarding the quality of simplified medical reports generated by LLMs 47 . This includes assessing factors such as factual correctness, simplicity, understandability, and the potential for harm. By employing these evaluation tools, researchers can quantitatively analyze the performance of LLMs and conduct downstream statistical analysis while also capturing the subtleties inherent in human perception and experience.

Comparative analysis

The reviewed studies often employ comparative analyses, comparing LLM outputs against human-generated responses, other LLM-generated outputs, or established clinical guidelines. This direct comparison allows for a quantitative and qualitative assessment of the evaluation dimensions such as accuracy, relevance, adherence to medical standards, and more, as exhibited by the LLMs. The distribution of comparison analyses used in the studies is provided in Supplementary Fig. 2 . By treating human responses or guidelines as benchmarks, researchers can identify areas where LLMs excel or require improvement. Notably, 20% ( n  = 29) of the studies incorporated a unique approach by comparing LLM-generated outputs with those of other LLMs. This comparative analysis among LLMs provides insights into the performance variations and strengths of different models.

For instance, Agarwal et al. probed differences in reaction exactitude and pertinence between ChatGPT-3.5 and Claude-2 by taking advantage of repeated measures Analysis of variance (ANOVA), centering on diversified clinical query categories 48 . Wilhelm et al. weighed the performance of four influential LLMs - Claude-1, GPT-3.5, Command-Xlarge-nightly, and Bloomz- by implementing ANOVA to investigate statistical differences in therapy guidance produced by every model 49 .

Gilson et al. 50 executed a thorough examination of outputs from ChatGPT across situations extracted from USMLE 50 . Ayers et al. 3 compared responses from ChatGPT to those supplied by physicians on Reddit’s “Ask Doctors” threads, utilizing chi-square tests to establish whether notable differences existed concerning advice quality and relevance. Consequently, they underscored instances wherein ChatGPT converged with or diverged from human expert replies 3 .

Some of the reviewed studies also consider the importance of testing LLMs in both controlled and real-world scenarios. Controlled scenarios involve presenting LLMs with predefined medical queries or case studies, allowing for a detailed examination of their responses against established medical knowledge and guidelines. In contrast, real-world scenarios test the practical utility and integration of LLMs into live clinical environments, providing insights into their effectiveness within actual healthcare workflows.

Blinded vs. unblinded

A prominent feature of human evaluations is the use of blind assessments, where evaluators are unaware of whether the responses are generated by LLMs or humans. Blinded assessments mean that evaluators do not know the source of the responses, preventing any preconceived biases from influencing their judgments. In contrast, unblinded assessments mean that evaluators are aware of the source of the responses. Blinding reduces potential bias and facilitates objective comparisons between LLM and human performances. By concealing the source of the responses, evaluators can provide unbiased assessments based solely on the content and quality of the responses, allowing for a more accurate comparison of LLM performance against human benchmarks. This approach is particularly valuable when assessing the quality and relevance of LLM outputs in direct relation to human expertise.

In the reviewed studies, a mixed approach to blinding was observed. Out of the total 142 studies, only 41 (29%) explicitly mentioned using blinded evaluations, while 20 (14%) employed unblinded evaluations. Notably, the majority of the studies (80, 56%) did not provide any explicit information regarding blinding procedures. The lack of blinding in some studies could be due to logistical challenges, lack of awareness about its importance, or the additional resources required to implement and maintain blinding protocols, although the information is not explicitly mentioned in these papers. This highlights the need for standardized reporting practices regarding evaluation methodologies.

Among the studies employing blind assessments, the approaches also vary significantly. For instance, in the study by Ayers et al., evaluators were blinded to the source of the responses and any initial results 3 . In contrast, Dennstadt et al. utilized blinded evaluations specifically for multiple-choice questions, determining the proportion of correct answers provided by the LLM 51 . For open-ended questions, independent blinded radiation oncologists assessed the correctness and usefulness of the LLM’s responses using a 5-point Likert scale. To strengthen the reliability and validity of human evaluation studies and enable more robust assessments of LLM performance, we recommend that future studies should consistently implement and report blinding procedures in their evaluation methodologies. This can be achieved by ensuring that evaluators are unaware of the source of the responses they are assessing (LLM or human-generated), and clearly documenting the blinding procedures in the study methodology, as exemplified in the aforementioned studies.

Statistical analysis

After collecting the ratings from evaluators, various statistical techniques are employed in the literature for analyzing the evaluation results. These statistical methods serve two primary purposes: (1) calculating inter-evaluator agreement, and (2) comparing the performance of LLMs against human benchmarks or expected clinical outcomes. Table 4 shows an overview of the top 11 statistical analysis conducted in the reviewed studies. To help researchers decide which statistical analysis method to choose, we provide a decision tree in Fig. 6 .

figure 6

The choice of specific statistical tests is based on the type of data and the evaluation objectives within the context of each study. Parametric tests such as t -tests and ANOVA are chosen when the data are normally distributed and the goal is to compare means between groups, ensuring that the means of different groups are statistically analyzed to identify significant differences. Non-parametric tests like the Mann–Whitney U test and Kruskal–Wallis test are used when the data do not meet normality assumptions, providing robust alternatives for comparing medians or distributions between groups for ordinal or non-normally distributed data. Chi-Square and Fisher’s Exact tests are suitable for analyzing categorical data and assessing associations or goodness-of-fit between observed and expected frequencies, making them appropriate for evaluating the fit between LLM-generated medical evidence and clinical guidelines. Measures like Cohen’s Kappa and ICC are utilized to assess inter-rater reliability, ensuring that the agreement between evaluators is not due to chance and enhancing the reliability of the evaluation results.

Ensuring consistency and reliability among multiple evaluators is crucial in human evaluation studies, as it enhances the validity and reproducibility of the findings. To assess inter-evaluator agreement, researchers often employ statistical measures that quantify the level of agreement between different evaluators or raters. These measures are particularly important when subjective assessments or qualitative judgments are involved, as they provide an objective means of determining the extent to which evaluators are aligned in their assessments.

Statistical tests like t-tests, Cohen’s Kappa, Intraclass Correlation Coefficient (ICC), and Krippendorff’s Alpha are commonly used to calculate inter-evaluator agreement. These tests take into account the possibility of agreement occurring by chance and provide a standardized metric for quantifying the level of agreement between evaluators. Schmidt et al. determined statistical significance in radiologic reporting using basic p -values 39 , while studies like Sorin et al. and Elyoseph et al. 52 used ICC to assess rater agreement and diagnostic capabilities. Sallam et al. 53 and Varshney et al. 54 have used a combination of t -tests and Cohen’s kappa to identify potential sources of disagreement, such as ambiguity in the evaluation guidelines or differences in interpretations.

Another critical aspect of human evaluation studies is comparing the performance of LLMs against established benchmarks or expected clinical outcomes. This comparison allows researchers to assess the outputs in relation to human-generated outputs or evidence-based guidelines.

Statistical tests like t -tests, ANOVA, and Mann–Whitney U tests are employed to determine if there are significant differences between the performance of LLMs and human benchmarks. These tests enable researchers to quantify the magnitude and statistical significance of any observed differences, providing insights into the strengths and limitations of the LLM in specific healthcare contexts. Wilhelm et al. applied ANOVA and pairwise t -tests for therapy recommendation differences 49 ; and Tang et al. utilized the Mann–Whitney U test for medical evidence retrieval tasks under non-normal distribution conditions 34 . Liu et al. combined the -WhitneManny Wilcoxon test and the Kruskal–Wallis test for evaluating the reviewer ratings for the AI-generated suggestions 5 . Bazzari and Bazzari chose the Mann–Whitney U test to compare LLM effectiveness in telepharmacy against traditional methods when faced with non-normal sample distributions 46 . Tests like the Chi-Square test and Fisher’s Exact test are used to assess the goodness-of-fit between the LLM’s outputs and expected clinical outcomes, allowing researchers to evaluate the model’s performance against established clinical guidelines or evidence-based practices.

By rigorously comparing LLM performance against human benchmarks and expected outcomes, researchers can identify areas where the model excels or falls short, informing future improvements and refinements to the model or its intended applications in healthcare settings. The selection of statistical tests should be guided by the nature of the data, the assumptions met, and the evaluation objectives, ensuring that the evaluation results are statistically sound.

Specialized frameworks

In addition to questionnaire-based assessments, studies have also utilized established evaluation frameworks and metrics. Frameworks like SERVQUAL, PEMAT-P, and SOLO structure have been applied to structure the assessment of LLM performance comprehensively (Table 5 ). Various metrics, including accuracy rates, user satisfaction indices, and ethical compliance rates, have been employed to quantify and compare the performance of LLMs against defined standards.

The SERVQUAL model, a five-dimension framework, was employed by Choi et al. to assess the service quality of ChatGPT in providing medical information to patients with kidney cancer, with responses from urologists and urological oncologists surveyed using this framework 55 . Studies like Choi et al. 55 shed light on the potential and limitations of LLMs in direct patient interactions and learning gains. They investigated ChatGPT’s ability to provide accessible medical information to patients with kidney cancer, using the SERVQUAL model to assess service quality.

In addition to generic evaluation scales, some studies employ specialized questionnaires designed to assess specific aspects of LLM performance, such as factual consistency, medical harmfulness, and coherence. The DISCERN instrument, a validated tool for judging the quality of written consumer health information, has been adapted in several studies to evaluate the trustworthiness and quality of information provided by LLMs. However, these specialized frameworks do not cover all metrics and fail to provide a comprehensive method of evaluation across all QUEST dimensions.

QUEST human evaluation framework

Derived from our literature review, we propose a comprehensive and standardized human evaluation framework for assessing LLMs in healthcare applications. Named the QUEST Human Evaluation Framework, it adheres to the QUEST dimensions and is designed for broad adoption by the community. Figure 7 systematically outlines the framework’s three phases: Planning, Implementation and Adjudication, and Scoring and Review.

figure 7

The QUEST Human Evaluation Framework is derived from our literature review and is a comprehensive and standardized human evaluation framework for assessing LLMs in healthcare applications. It adheres to the QUEST dimensions and is designed for broad adoption by the community. It entails three phases, namely Planning, Implementation and Adjudication, and Scoring and Review.

Planning phase

In practice, any LLM implemented for a specific use case in healthcare must address a well-defined problem. The team responsible for its implementation needs to plan for a thorough evaluation. There are four fundamental considerations when planning this evaluation:

Goals of the model: Define the objectives the model aims to achieve.

Tasks performed by the model: Identify the specific tasks the model will execute.

Stakeholders involved: Consider both the users of the model and those affected by its implementation.

Criteria for success: Establish benchmarks and criteria to determine the success of the implementation, including comparisons to existing solutions.

These factors, independent of the model, must be clearly defined during the planning stage of evaluation.

Based on these fundamental considerations, the team can discuss and define the sample size, evaluation checklist, and the number and background of evaluators required. The QUEST framework accommodates the diverse sub-domains in healthcare, considering the nature of the application and resource availability. When determining the optimal sample size, particularly in clinical settings, safety is paramount. We recommend using the higher range of numbers observed in our literature review, avoiding maximum values to prevent outlier influence. For CDS and patient-provider quality assurance, a larger sample size of at least 130 is recommended, depending on the scope and uncertainty of the questions. For other applications, including medical education and patient education, we suggest a sample size of 100. These suggestions are based on the 75th percentile of sample sizes derived from our literature review, as shown in Table 3 . Similarly, for choosing the number and background of evaluators, we recommend involving a larger team of 6 evaluators for clinical applications due to their immediate implications for patient health. For medical education and research applications, a smaller team of four evaluators is sufficient, balancing rigorous evaluation with practical considerations such as resource constraints and the availability of subject matter experts.

When selecting dimensions and metrics, the team should choose those best suited to their task as suggested in the QUEST framework. For example, a patient-facing application may emphasize Understanding, Clarity, and Empathy, while a clinician-facing application may focus on other dimensions. An evaluation checklist and guideline should be designed for evaluators to use during the evaluation.

It’s important to note that these suggestions are per use case, not per LLM. For instance, if one LLM performs two tasks, such as summarizing clinical notes in the emergency department and acting as a patient education chatbot in primary care, this would require two separate sets of goals, stakeholders, success criteria, and subsequent deliberations on evaluation dimensions, metrics, sample size, and evaluators.

Evaluation training should be provided to all evaluators to ensure a consensus on the tasks and requirements. Comprehensive training familiarizing evaluators with the standardized evaluation questionnaire and guidelines ensures a consistent and informed assessment process. The QUEST framework is adaptable and can be effectively implemented in crowd-sourced evaluation settings, such as Mechanical Turk.

Implementation and adjudication phase

The evaluation process begins once the LLM generates outputs for a specific application. Survey tools, forms, or spreadsheets can be used for efficient data collection. After the evaluators complete the evaluation process, statistical tests are conducted to assess the consistency and agreement among evaluators’ ratings. Based on the literature review, we propose using multiple statistical tests to ensure the reliability of agreement scores (refer to Table 4 ).

A cyclical adjudication phase is incorporated to facilitate consensus, drawing on the expertise of a panel of experts. During this phase, the evaluation guidelines can be updated based on insights gained. Reviewers are re-trained according to the new guidelines, and the evaluation process is repeated until consensus is reached, such as achieving a Cohen’s kappa value of 0.7 or above.

Scoring and review phase

Once consensus is achieved, the final score for each dimension is calculated using either mean or median scores, aggregating the evaluators’ ratings. This scoring approach provides a comprehensive overview of the LLM’s performance. To ensure a holistic evaluation, these human assessment results are compared with automatic evaluation metrics, such as the F1 measure and AUROC. This comparison benchmarks the human evaluation against machine-generated outputs, offering a rounded perspective on the strengths and limitations of the LLMs.

Case studies

In this section, we provide two case studies of application of the QUEST framework in the Emergency Medicine specialty in a healthcare system to showcase the considerations needed for effective evaluation of LLMs in clinical use. Emergency departments (EDs) are the first point of contact for patients requiring urgent medical attention, requiring summarization of key patient information, generation of possible diagnoses and providing initial stabilization of patients with a wide variety of medical problems. The healthcare team needs to have a high level of expertise and make rapid management decisions. LLMs, by virtue of their ability to understand natural language, can be very useful to the ED team by improving the efficiency of triage workflow, making it quicker, accurate and free of fatigue, human error and biases. Figure 8 provides a side-by-side visual summary of how the QUEST framework is applied in two ED use cases: clinical note summarization and triage decision support. Details are provided below.

figure 8

Two use cases, clinical note summarization and patient triage applications, are provided as an example to showcase the applicability of QUEST Human Evaluation Framework in different applications in the healthcare system. Detailed summary is provided for each step in the three phases: Planning, Implementation and Adjudication, and Scoring and Review.

Use Case 1: human evaluation of LLMs for clinical note summarization in emergency department

Suppose a healthcare system wishes to evaluate a few LLMs for clinical note summarization to streamline clinical documentation and reduce physician burden. We will apply the QUEST framework to the human evaluation of LLMs in this application. First, during the Planning Phase, the team needs to consider four key elements:

Goals of the model: The primary objective of implementing LLMs for clinical note summarization is to streamline the clinical documentation process, thereby significantly reducing the documentation burden on physicians.

Tasks performed by the model: The model will perform several critical tasks to achieve its objectives. It will extract key clinical information from detailed patient notes, including diagnoses, treatments, and outcomes. Following extraction, the model will generate concise and accurate summaries that retain all essential information. These summaries will be consistently formatted according to the healthcare system’s documentation standards.

Stakeholders involved: The primary users of the model are physicians who will benefit from reduced documentation time and improved workflow efficiency. Secondary users include nurses and allied health professionals who will refer to the summarized notes for patient care coordination. Patients are indirectly affected as improved documentation can lead to better quality of care and more efficient clinical workflows.

Criteria for success: Success will be measured according to several key criteria. Accuracy is paramount; the summaries must accurately reflect the key points of the original clinical notes without omitting critical information.

After defining these four considerations, the team will determine the sample size. Following the QUEST framework, they decided on a minimum sample size of 130 notes. Since the dataset lacks demographic information, they extracted such data from the notes where feasible to ensure sufficient variability that matches the population.

An important aspect of the QUEST framework is the dimensions and evaluation checklist. The committee identified dimensions most relevant to the emergency department (ED) settings where the model will be tested. Clinicians led the discussion, prioritizing Accuracy and Comprehensiveness under the principle “Quality of Information” while considering Currency less important. For metrics, they decided on a binary (presence or absence) metric for the “Bias” dimension and the AHRQ’s harm scale for the “Harm” dimension to capture ample details for categorizing the LLM outputs.

Recruiting evaluators for a clinical use case, as suggested by the QUEST framework, requires a minimum of seven evaluators. The committee determined that evaluators should have a suitable background reflective of the users, namely physicians, nurses, and other specialists. Since this use case is clinician-facing only, patient evaluators are not needed but would be included in other settings. The committee reached out to the ED department and recruited seven evaluators, including four physicians (one attending, two residents, and one fellow), two nurses, and one pharmacist.

Based on the QUEST framework, the evaluation involves five steps: developing the evaluation guideline, training evaluators, initial evaluation with a subset, discussion and adjudication, and fine-tuning the evaluation guideline. The committee developed an initial evaluation guideline and provided two hours of training to the evaluators. A sample of ten outputs with the original medical records was provided for the first round of evaluation. After discussing inter-annotator agreement to finalize and standardize the evaluation process, the guidelines were updated. The final evaluation was conducted by the seven evaluators over a two-week period. Evaluators’ comments and scoring results were collected using an application built by the informatics team, and statistical analysis was performed by the committee. The final results report was then submitted to hospital management for an in-depth review before hospital-wide implementation.

Use Case 2: human evaluation of LLMs for patient triage in emergency department

Suppose the Chief Medical Informatics Officer (CMIO) of a large academic health system wishes to evaluate LLMs as a decision support tool for ED teams in the triage of patients across all busy EDs in the health system. During the Planning Phase, the team needs to consider four key elements:

Goals of the model: The primary objective of implementing LLMs as a decision support tool is to enhance the efficiency and accuracy of patient triage in busy EDs.

Tasks performed by the model: The LLMs will use patients’ chief complaints, relevant medical history, vital signs, and physical examination findings to make triage decisions (emergent, non-emergent, self-care at home) and provide recommendations for the next steps in management.

Stakeholders involved: The primary users of the model are ED physicians, nurses, and triage staff, who will benefit from enhanced decision support and streamlined triage processes. Secondary stakeholders include patients, who will receive more timely and accurate triage, and hospital administration, which will benefit from improved ED efficiency and patient throughput.

Criteria for success: Success will be measured through several key criteria, with accuracy in triage decisions being paramount. The model must provide reliable and timely triage recommendations that align with clinical assessments.

A total of 130 triage cases will be sampled and verified to be representative of ED visits for expert evaluation. Triage experts will perform their assessments and make triage decisions independently of the LLM output, serving as the gold standard. 3 experienced ED physicians and 3 nurses, different from the triage experts, will independently evaluate and compare the LLM outputs in terms of triage decisions with the gold standard and their own assessments. The core dimensions to be evaluated will include accuracy, agreement, comprehensiveness, currency, logical reasoning, fabrication, empathy, bias, harm, and trust, as outlined by the QUEST framework. Additionally, the evaluators will assess the LLMs’ recommendations for the next management steps on a 5-point Likert scale (strongly agree, agree, neither agree nor disagree, disagree, strongly disagree).

The evaluations by the ED physicians will be completed using a standardized questionnaire (listed in Supplementary Table 1 ). If there is any disagreement among the evaluators’ results, adjudication will be performed until a consensus is reached. The final results of the evaluation will be collected and analyzed across various dimensions using appropriate statistical tests, and final evaluation scores for each dimension will be generated. The final results of the evaluation will be presented to the CMIO and ED leadership. This comprehensive evaluation will guide the decision on whether to implement the LLMs system-wide, ensuring it meets the healthcare system’s standards for accuracy, efficiency, and usability.

LLMs have become integral to various clinical applications due to their ability to generate text in response to user queries. Despite recent enthusiasm on the potential of LLMs and GenAI in many healthcare systems, the inner workings of these models remain opaque, in other words, they are still “black boxes”. The articles we reviewed reveal that evaluations of these “black box” models typically involve manual testing through human evaluation, which underscores a significant issue: the lack of traceability, reliability, and trust. Critical details such as the origins of the text sources, the reasoning processes within the generated text, and the reliability of the evidence for medical use are often not transparent. Furthermore, the traditional NLP evaluations, commonly used in well-defined tasks like Information Extraction (IE) and Question Answering (QA), prove to be suboptimal for assessing LLMs. This inadequacy stems from the novelty of the text generated by LLMs, which traditional NLP evaluation methods struggle to handle effectively. As the use of LLMs in medicine increases, the need for appropriate evaluation frameworks which align with human values becomes more pronounced.

To address these challenges, we have proposed guidelines for human evaluation of LLMs. However, these guidelines also have limitations, constrained by the scale of human evaluation, the size of the samples reviewed, and the measures used, all of which can affect the depth and breadth of the assessments. Adding to these challenges is the predominance of proprietary models developed by major technology firms. The healthcare sector often faces constraints in computational resources, which limits the ability of informatics researchers to thoroughly study LLMs. This situation calls for a collaborative effort among the medical community, computer scientists, and major tech companies to develop comprehensive evaluation methods that can improve the quality and reliability of LLMs for clinical use. Our hope is to bridge these gaps and foster a synergy that could lead to more robust, transparent, and accountable LLMs in healthcare, ensuring they meet the high standards necessary for clinical application.

This literature review has a few limitations. First, this review may have missed relevant articles published after February 22, 2024. Since this field is rapidly evolving, there are likely significant advancements and new findings that have emerged since then. Second, the review is limited to articles written in English. Articles written in other languages may also provide valuable information and insights. Third, the search strings and databases we used for this review might not have been comprehensive enough, potentially introducing bias into the findings. Fourth, this review does not consider articles that utilize LLMs in other domains than healthcare, which may contain valuable insights regarding human evaluation.

The QUEST framework is developed based on the literature review and has several potential limitations as well. First, the implementation of QUEST framework may still vary across different clinical specialties and institutions. While we propose the framework to be generally applicable in various domains in healthcare and to best accommodate the varying limiting factors in different application settings, i.e., clinical, education, and research, constraints among subspecialties and/or institutions may pose challenges to their adherence to the framework and limit the actual outcome of standardization efforts advocated in this framework. Institutional policies, technology infrastructures, resources, and the availability of trained informatics personnel can vary widely, further complicating the implementation. Second, the dimensions of QUEST framework, although comprehensive, might not fully capture the nuances of every particular use case or scenario. The evaluation of LLMs is inherently task-specific in healthcare. Users should view this framework as a foundational starting point rather than a definitive solution. It is crucial to think critically about how this framework can be adapted and applied to meet the unique demands of their specific clinical applications. We encourage readers to consider applying a combination of our framework and specific frameworks depending on the use case. Third, the QUEST framework focuses only on human evaluation and does not consider automatic evaluation. While human evaluation provides critical insights into qualitative aspects of performance, automatic quantitative evaluation methods can offer valuable metrics and scalability. We fully acknowledge the utility that automatic evaluation can bring to any experiment or deployment of LLMs. Therefore, striking a balance between human evaluation and automatic quantitative evaluation is essential to uncover insights potentially missed by either approach.

There are opportunities in unifying automatic and human evaluation across various applications in the healthcare domain. While there are efforts in aligning and improving the correlation between automatic evaluation metric and human evaluation such as Krishna et al. and Moramarco et al. 56 , 57 , they are limited to specific applications and tasks such as clinical note summarization and note generation.

There are also technical advancements of multimodal LLMs and text-to-image foundational models such as CLIP 58 . These models could be instrumental in generating synthetic medical images (text-to-image) and performing clinical diagnoses (image-to-text or recordings-to-text). However, their potential applications warrant separate research efforts.

Further, the ultimate goal of any implementation of LLMs is to improve actual patient outcomes or scientific understanding, and how to best connect the evaluation to reflect the impact of LLMs in achieving these goals is left as future work. While our evaluation framework aims to be futureproof, emerging techniques in LLMs and AI might require add-ons to our framework in the future. For example, emerging automatic and/or human evaluation methods with LLMs in the general domain may be helpful in the development of automatic and/or human evaluation healthcare domain, and vice versa.

Finally, applying LLMs and other emerging technology to automate or scale human evaluation processes in a low resource setting remains a research question. Although there have been attempts to use LLMs to complement human evaluation, success has been limited, highlighting the need for further exploration in this area.

Data sources and search strategies

This is a scoping review that adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) to ensure a rigorous and replicable methodology (Fig. 9 and the corresponding checklist in Supplementary Table 2 ). Our literature search spanned publications from January 1, 2018, to February 22, 2024, capturing the emergence and application of language models like GPT-1 introduced in 2018, through the subsequent development of advanced models including LLaMA-2, GPT-4, and others. This period is crucial as it marks the rapid evolution and adoption of LLMs in healthcare, offering a comprehensive view of current methodologies and applications in clinical NLP research.

figure 9

The initial search yielded 795 articles after applying language and publication year filters. Exclusion criteria were set to omit articles types irrelevant to our research aims, resulting in 688 potentially relevant articles. To ascertain focus on LLMs in healthcare, articles underwent a two-stage screening process. The first stage involved title and abstract screening to identify articles explicitly discussing human evaluation of LLM within healthcare contexts. The second stage involved a full-text review, emphasizing methodological detail, particularly regarding human evaluation of LLMs, and their applicability to healthcare. Due to accessibility issues, 42 articles were excluded, resulting in a final selection of 142 articles for the comprehensive literature review.

We focused on peer-reviewed journal articles and conference proceedings published in English, recognizing the pivotal role of LLMs in advancing healthcare informatics. The search focused on PubMed to ensure broad coverage of the healthcare literature. The selection was based on relevancy to healthcare applications, human evaluation of LLMs, and explicit discussion of evaluation methodologies in clinical settings. Our search strategy included terms related to “Generative Large Language Models,” “Human Evaluation,” and “Healthcare,” combined in various iterations to capture the breadth and depth of the studies in question.

Below details the inclusion and exclusion criteria for the article search and Table 6 lists the search query and corresponding results. Table 7 shows the search terms we decided to exclude in the final search queries as they return a large proportion of false positive results. This review includes publication available in database PubMed in English language from the year 2018 to 2024 as major designs of LLM such as GPT-1 were released since 2018; We excluded in the article search the following article types: Comment, Preprint, Editorial, Letter, Review, Scientific Integrity Review, Systematic Review, News, Newspaper Article, Published Erratum.

Article selection

We excluded the article types as stated above as we intentionally seek articles with in-depth discussion of actual experiments of LLMs and human evaluation of the LLM outputs. The article types we excluded are commentary or summary in nature and do not include experiments. While the preprints may include experiments and discussion, they are excluded as they have not been peer-reviewed. Specifically, our search results from search keywords amount to 1191 studies, adding the limitation of publication language, period, and publication type further reduce the number of studies to 911, 795, 688.

To ascertain focus on LLMs in healthcare, articles underwent a two-stage screening process. The first stage involved title and abstract screening to identify articles explicitly discussing human evaluation of LLMs applications within healthcare contexts. We also excluded studies which examine only non-generative pretrained language models like BERT 59 , RoBERTa 60 , etc. and multimodal studies such as image-to-text or text-to-image application of generative LLMs. The second stage involved a full-text review, emphasizing methodological detail, particularly regarding human evaluation of LLMs, and their applicability to healthcare. Due to accessibility issues, 42 articles were excluded, resulting in a final selection of 142 articles for the comprehensive literature review.

Data availability

This study is a scoping review, and it does not generate any new data. Questions regarding data access should be addressed to the corresponding author.

Code availability

This study is a Article type and it does not use any code. A software tool is currently being developed to support the use of QUEST for assessing large language models (LLMs) in healthcare. It will be available upon request by emailing the corresponding author.

Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 35 , 27730–27744 (2022).

Google Scholar  

Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. Preprint at https://arxiv.org/abs/2307.09288 (2023).

Ayers, J. W. et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern. Med. 183 , 589–596 (2023).

Article   PubMed   PubMed Central   Google Scholar  

Chari, S. et al. Informing clinical assessment by contextualizing post-hoc explanations of risk prediction models in type-2 diabetes. Artif. Intell. Med. 137 , 102498 (2023).

Article   PubMed   Google Scholar  

Liu, S. et al. Using AI-generated suggestions from ChatGPT to optimize clinical decision support. J. Am. Med. Inform. Assoc. JAMIA 30 , 1237–1245 (2023).

Alapati, R. et al. Evaluating insomnia queries from an artificial intelligence chatbot for patient education. J. Clin. Sleep. Med. JCSM. Publ. Am. Acad. Sleep. Med. 20 , 583–594 (2024).

Article   Google Scholar  

Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proc. 40th Annual Meeting of the Association for Computational Linguistics (eds. Isabelle, P., Charniak, E. & Lin, D.) 311–318 (Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 2002). https://doi.org/10.3115/1073083.1073135 .

Lin, C.-Y. ROUGE: a package for automatic evaluation of summaries. Text Summarization Branches Out (Association for Computational Linguistics, Barcelona, Spain, 2004).

Liang, P. et al. Holistic evaluation of language models. CoRR https://doi.org/10.48550/arXiv.2211.09110 (2022).

Singhal, K. et al. Large language models encode clinical knowledge. Nature 620 , 172–180 (2023).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Sivarajkumar, S., Kelley, M., Samolyk-Mazzanti, A., Visweswaran, S. & Wang, Y. An empirical evaluation of prompting strategies for large language models in zero-shot clinical natural language processing: algorithm development and validation study. JMIR Med. Inf. 12 , e55318 (2024).

Chiang, C.-H. & Lee, H. Can large language models be an alternative to human evaluations? In Proc. the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds. Rogers, A., Boyd-Graber, J. & Okazaki, N.) 15607–15631 (Association for Computational Linguistics, Toronto, Canada, 2023). https://doi.org/10.18653/v1/2023.acl-long.870 .

Wei, Q. et al. Evaluation of ChatGPT-generated medical responses: a systematic review and meta-analysis. J. Biomed. Inform. 151 , 104620 (2024).

Park, Y.-J. et al. Assessing the research landscape and clinical utility of large language models: a scoping review. BMC Med. Inform. Decis. Mak. 24 , 72 (2024).

Yuan, M. et al. Large language models illuminate a progressive pathway to artificial intelligent healthcare assistant. Med. Plus 1 , 100030 (2024).

Awasthi, R. et al. HumanELY: Human evaluation of LLM yield, using a novel web-based evaluation tool. 2023.12.22.23300458 Preprint at https://doi.org/10.1101/2023.12.22.23300458 (2023).

Mongan, J., Moy, L. & Kahn, C. E. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): a guide for authors and reviewers. Radiol. Artif. Intell. 2 , e200029 (2020).

Sounderajah V. et al. Developing a reporting guideline for artificial intelligence-centred diagnostic test accuracy studies: the STARD-AI protocol. BMJ Open 11 , e047709 (2021).

Martindale, A. P. L. et al. Concordance of randomised controlled trials for artificial intelligence interventions with the CONSORT-AI reporting guidelines. Nat. Commun. 15 , 1619 (2024).

Norgeot, B. et al. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist. Nat. Med. 26 , 1320–1324 (2020).

Lechien, J. R., Georgescu, B. M., Hans, S. & Chiesa-Estomba, C. M. ChatGPT performance in laryngology and head and neck surgery: a clinical case-series. Eur. Arch. Oto-Rhino-Laryngol. J. Eur. Fed. Oto-Rhino-Laryngol. Soc. EUFOS Affil. Ger. Soc. Oto-Rhino-Laryngol. - Head. Neck Surg. 281 , 319–333 (2024).

Seth, I. et al. Exploring the role of a large language model on carpal tunnel syndrome management: an observation study of ChatGPT. J. Hand Surg. 48 , 1025–1033 (2023).

Maillard, A. et al. Can Chatbot artificial intelligence replace infectious diseases physicians in the management of bloodstream infections? A prospective cohort study. Clin. Infect. Dis. Publ. Infect. Dis. Soc. Am. 78 , 825–832 (2024).

Yaneva, V., Baldwin, P., Jurich, D. P., Swygert, K. & Clauser, B. E. Examining ChatGPT performance on USMLE sample items and implications for assessment. Acad. Med. J. Assoc. Am. Med. Coll. 99 , 192–197 (2024).

Wu, Y. et al. Evaluating the performance of the language model ChatGPT in responding to common questions of people with epilepsy. Epilepsy Behav. 151 , 109645 (2024).

Ghosh, A. & Bir, A. Evaluating ChatGPT’s ability to solve higher-order questions on the competency-based medical education curriculum in medical biochemistry. Cureus 15 , e37023 (2023).

PubMed   PubMed Central   Google Scholar  

Choi, W. Assessment of the capacity of ChatGPT as a self-learning tool in medical pharmacology: a study using MCQs. BMC Med. Educ. 23 , 864 (2023).

Kavadella, A., Dias da Silva, M. A., Kaklamanos, E. G., Stamatopoulos, V. & Giannakopoulos, K. Evaluation of ChatGPT’s real-life implementation in undergraduate dental education: mixed methods study. JMIR Med. Educ. 10 , e51344 (2024).

Baglivo, F. et al. Exploring the possible use of AI chatbots in public health education: feasibility study. JMIR Med. Educ. 9 , e51421 (2023).

Hatia, A. et al. Accuracy and completeness of ChatGPT-generated information on interceptive orthodontics: a multicenter collaborative study. J. Clin. Med. 13 , 735 (2024).

Kienzle, A., Niemann, M., Meller, S. & Gwinner, C. ChatGPT may offer an adequate substitute for informed consent to patients prior to total knee arthroplasty-yet caution is needed. J. Pers. Med. 14 , 69 (2024).

Peng, W. et al. Evaluating AI in medicine: a comparative analysis of expert and ChatGPT responses to colorectal cancer questions. Sci. Rep. 14 , 2840 (2024).

Xie, Y., Seth, I., Rozen, W. M. & Hunter-Smith, D. J. Evaluation of the artificial intelligence chatbot on breast reconstruction and its efficacy in surgical research: a case study. Aesthet. Plast. Surg. 47 , 2360–2369 (2023).

Tang, L. et al. Evaluating large language models on medical evidence summarization. NPJ Digit. Med. 6 , 158 (2023).

Moramarco, F. et al. Towards more patient friendly clinical notes through language models and ontologies. AMIA Annu. Symp. Proc. AMIA Symp. 2021 , 881–890 (2021).

PubMed   Google Scholar  

Bernstein, I. A. et al. Comparison of ophthalmologist and large language model Chatbot responses to online patient eye care questions. JAMA Netw. Open 6 , e2330320 (2023).

Hirosawa, T. et al. Diagnostic accuracy of differential-diagnosis lists generated by generative pretrained transformer 3 Chatbot for clinical vignettes with common chief complaints: a pilot study. Int. J. Environ. Res. Public. Health 20 , 3378 (2023).

Medical Specialties & Subspecialties | ABMS. American Board of Medical Specialties https://www.abms.org/member-boards/specialty-subspecialty-certificates/ . Accessed 19 Sep, 2024.

Schmidt, S., Zimmerer, A., Cucos, T., Feucht, M. & Navas, L. Simplifying radiologic reports with natural language processing: a novel approach using ChatGPT in enhancing patient understanding of MRI results. Arch. Orthop. Trauma Surg. 144 , 611–618 (2024).

Truhn, D. et al. A pilot study on the efficacy of GPT-4 in providing orthopedic treatment recommendations from MRI reports. Sci. Rep. 13 , 20159 (2023).

Sorin, V. et al. Large language model (ChatGPT) as a support tool for breast tumor board. NPJ Breast Cancer 9 , 44 (2023).

Allahqoli, L., Ghiasvand, M. M., Mazidimoradi, A., Salehiniya, H. & Alkatout, I. Diagnostic and management performance of ChatGPT in obstetrics and gynecology. Gynecol. Obstet. Invest. 88 , 310–313 (2023).

Yapar, D., Demir Avcı, Y., Tokur Sonuvar, E., Eğerci, Ö. F. & Yapar, A. ChatGPT’s potential to support home care for patients in the early period after orthopedic interventions and enhance public health. Jt. Dis. Relat. Surg. 35 , 169–176 (2024).

Huespe, I. A. et al. Clinical research with large language models generated writing—clinical research with AI-assisted writing (CRAW) study. Crit. Care Explor. 5 , e0975 (2023).

Shao, C.-Y. et al. Appropriateness and comprehensiveness of using ChatGPT for perioperative patient education in thoracic surgery in different language contexts: survey study. Interact. J. Med. Res. 12 , e46900 (2023).

Bazzari, F. H. & Bazzari, A. H. Utilizing ChatGPT in telepharmacy. Cureus 16 , e52365 (2024).

Qu, R. W., Qureshi, U., Petersen, G. & Lee, S. C. Diagnostic and management applications of ChatGPT in structured otolaryngology clinical scenarios. OTO Open 7 , e67 (2023).

Agarwal, M., Goswami, A. & Sharma, P. Evaluating ChatGPT-3.5 and Claude-2 in answering and explaining conceptual medical physiology multiple-choice questions. Cureus 15 , e46222 (2023).

Wilhelm, T. I., Roos, J. & Kaczmarczyk, R. Large language models for therapy recommendations across 3 clinical specialties: comparative study. J. Med. Internet Res. 25 , e49324 (2023).

Gilson, A. et al. How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med. Educ. 9 , e45312 (2023).

Dennstädt, F. et al. Exploring capabilities of large language models such as ChatGPT in radiation oncology. Adv. Radiat. Oncol. 9 , 101400 (2024).

Elyoseph, Z., Hadar-Shoval, D., Asraf, K. & Lvovsky, M. ChatGPT outperforms humans in emotional awareness evaluations. Front. Psychol. 14 , 1199058 (2023).

Sallam, M., Al-Salahat, K. & Al-Ajlouni, E. ChatGPT performance in diagnostic clinical microbiology laboratory-oriented case scenarios. Cureus 15 , e50629 (2023).

Varshney, D., Zafar, A., Behera, N. K. & Ekbal, A. Knowledge grounded medical dialogue generation using augmented graphs. Sci. Rep. 13 , 3310 (2023).

Choi, J. et al. Availability of ChatGPT to provide medical information for patients with kidney cancer. Sci. Rep. 14 , 1542 (2024).

Krishna, K. et al. LongEval: guidelines for human evaluation of faithfulness in long-form summarization. In Proc. 17th Conference of the European Chapter of the Association for Computational Linguistics (eds. Vlachos, A. & Augenstein, I.) 1650–1669 (Association for Computational Linguistics, Dubrovnik, Croatia, 2023). https://doi.org/10.18653/v1/2023.eacl-main.121 .

Moramarco, F. et al. Human evaluation and correlation with automatic metrics in consultation note generation. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers) (eds Muresan, S Nakov, P & Villavicencio, A.) 5739–5754 (Association for Computational Linguistics, 2022).

Radford, A. et al. Learning transferable visual models from natural language supervision. Preprint at https://doi.org/10.48550/arXiv.2103.00020 (2021).

Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies , Vol. 1 (Long and Short Papers) (eds Burstein, J Doran, C & Solorio, T.) 4171–4186 (Association for Computational Linguistics, 2019).

Liu, Y. et al. Roberta: a robustly optimized BERT pretraining approach. Preprint at https://arxiv.org/abs/1907.11692 (2019).

Draschl, A. et al. Are ChatGPT’s free-text responses on periprosthetic joint infections of the hip and knee reliable and useful? J. Clin. Med. 12 , 6655 (2023).

Khlaif, Z. N. et al. The potential and concerns of using AI in scientific research: ChatGPT performance evaluation. JMIR Med. Educ. 9 , e47049 (2023).

Rogasch, J. M. M. et al. ChatGPT: can you prepare my patients for [ 18 F]FDG PET/CT and explain my reports? J. Nucl. Med. 64 , 1876–1879 (2023).

Sallam, M., Barakat, M. & Sallam, M. A Preliminary Checklist (METRICS) to standardize the design and reporting of studies on generative artificial intelligence–based models in health care education and practice: development study involving a literature review. Interact. J. Med. Res. 13 , e54704 (2024).

Jenko, N. et al. An evaluation of AI generated literature reviews in musculoskeletal radiology. Surg. J. R. Coll. Surg. Edinb. Irel . 00008–8 (2024) https://doi.org/10.1016/j.surge.2023.12.005 .

Deiana, G. et al. Artificial intelligence and public health: evaluating ChatGPT responses to vaccination myths and misconceptions. Vaccines 11 , 1217 (2023).

Roosan, D. et al. Effectiveness of ChatGPT in clinical pharmacy and the role of artificial intelligence in medication therapy management. J. Am. Pharm. Assoc. 64 , 422–428.e8 (2024).

Article   CAS   Google Scholar  

Ayub, I., Hamann, D., Hamann, C. R. & Davis, M. J. Exploring the potential and limitations of chat generative pre-trained transformer (ChatGPT) in generating board-style dermatology questions: a qualitative analysis. Cureus 15 , e43717 (2023).

An, Y., Fang, Q. & Wang, L. Enhancing patient education in cancer care: Intelligent cancer patient education model for effective communication. Comput. Biol. Med. 169 , 107874 (2024).

Babayiğit, O., Tastan Eroglu, Z., Ozkan Sen, D. & Ucan Yarkac, F. Potential use of ChatGPT for patient information in periodontology: a descriptive pilot study. Cureus 15 , e48518 (2023).

Gordon, E. B. et al. Enhancing patient communication With Chat-GPT in radiology: evaluating the efficacy and readability of answers to common imaging-related questions. J. Am. Coll. Radiol. 21 , 353–359 (2024).

Kuşcu, O., Pamuk, A. E., Sütay Süslü, N. & Hosal, S. Is ChatGPT accurate and reliable in answering questions regarding head and neck cancer? Front. Oncol . 13 , 1256459 (2023).

Iannantuono, G. M. et al. Comparison of large language models in answering immuno-oncology questions: a cross-sectional study. Oncologist oyae009 https://doi.org/10.1093/oncolo/oyae009 (2024).

Zhou, Y., Moon, C., Szatkowski, J., Moore, D. & Stevens, J. Evaluating ChatGPT responses in the context of a 53-year-old male with a femoral neck fracture: a qualitative analysis. Eur. J. Orthop. Surg. Traumatol. Orthop. Traumatol. 34 , 927–955 (2024).

Lahat, A., Shachar, E., Avidan, B., Glicksberg, B. & Klang, E. Evaluating the utility of a large language model in answering common patients’ gastrointestinal health-related questions: are we there yet? Diagnostics 13 , 1950 (2023).

Cadamuro, J. et al. Potentials and pitfalls of ChatGPT and natural-language artificial intelligence models for the understanding of laboratory medicine test results. An assessment by the European Federation of Clinical Chemistry and Laboratory Medicine (EFLM) Working Group on Artificial Intelligence (WG-AI). Clin. Chem. Lab. Med. 61 , 1158–1166 (2023).

Article   CAS   PubMed   Google Scholar  

Nachalon, Y., Broer, M. & Nativ-Zeltzer, N. Using ChatGPT to generate research ideas in dysphagia: a pilot study. Dysphagia https://doi.org/10.1007/s00455-023-10623-9 (2023).

Yun, J. Y., Kim, D. J., Lee, N. & Kim, E. K. A comprehensive evaluation of ChatGPT consultation quality for augmentation mammoplasty: a comparative analysis between plastic surgeons and laypersons. Int. J. Med. Inf. 179 , 105219 (2023).

Sallam, M. et al. ChatGPT output regarding compulsory vaccination and COVID-19 Vaccine conspiracy: a descriptive study at the outset of a paradigm shift in online search for information. Cureus 15 , e35029 (2023).

Hristidis, V., Ruggiano, N., Brown, E. L., Ganta, S. R. R. & Stewart, S. ChatGPT vs Google for queries related to dementia and other cognitive decline: comparison of results. J. Med. Internet Res. 25 , e48966 (2023).

Al-Sharif, E. M. et al. Evaluating the accuracy of ChatGPT and Google BARD in fielding oculoplastic patient queries: a comparative study on artificial versus human intelligence. Ophthal. Plast. Reconstr. Surg . https://doi.org/10.1097/IOP.0000000000002567 . (2024)

Kaneda, Y., Namba, M., Kaneda, U. & Tanimoto, T. Artificial intelligence in childcare: assessing the performance and acceptance of ChatGPT responses. Cureus 15 , e44484 (2023).

Song, H. et al. Evaluating the performance of different large language models on health consultation and patient education in urolithiasis. J. Med. Syst. 47 , 125 (2023).

Chee, J., Kwa, E. D. & Goh, X. ‘Vertigo, likely peripheral’: the dizzying rise of ChatGPT. Eur. Arch. Oto-Rhino-Laryngol. J. Eur. Fed. Oto-Rhino-Laryngol. Soc. EUFOS Affil. Ger. Soc. Oto-Rhino-Laryngol. Head. Neck Surg. 280 , 4687–4689 (2023).

Hillmann, H. A. K. et al. Accuracy and comprehensibility of chat-based artificial intelligence for patient information on atrial fibrillation and cardiac implantable electronic devices. Eur. Eur. Pacing Arrhythm. Card. Electrophysiol. J. Work. Groups Card. Pacing Arrhythm. Card. Cell. Electrophysiol. Eur. Soc. Cardiol . 26 , euad369 (2023).

Currie, G., Robbie, S. & Tually, P. ChatGPT and patient information in nuclear medicine: GPT-3.5 Versus GPT-4. J. Nucl. Med. Technol. 51 , 307–313 (2023).

Tie, X. et al. Personalized impression generation for PET reports using large language models. J. Imaging Inform. Med . https://doi.org/10.1007/s10278-024-00985-3 (2024).

Madrid-García, A. et al. Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training. Sci. Rep. 13 , 22129 (2023).

Cankurtaran, R. E., Polat, Y. H., Aydemir, N. G., Umay, E. & Yurekli, O. T. Reliability and usefulness of ChatGPT for inflammatory bowel diseases: an analysis for patients and healthcare professionals. Cureus 15 , e46736 (2023).

Sievert, M. et al. Risk stratification of thyroid nodules: assessing the suitability of ChatGPT for text-based analysis. Am. J. Otolaryngol. 45 , 104144 (2024).

Gobira, M. et al. Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation. Rev. Assoc. Med. Bras. 1992 . 69 , e20230848 (2023).

Saibene, A. M. et al. Reliability of large language models in managing odontogenic sinusitis clinical scenarios: a preliminary multidisciplinary evaluation. Eur. Arch. Oto-Rhino-Laryngol. J. Eur. Fed. Oto-Rhino-Laryngol. Soc. EUFOS Affil. Ger. Soc. Oto-Rhino-Laryngol. Head. Neck Surg. 281 , 1835–1841 (2024).

Giannakopoulos, K., Kavadella, A., Aaqel Salim, A., Stamatopoulos, V. & Kaklamanos, E. G. Evaluation of the performance of generative AI large language models ChatGPT, Google Bard, and Microsoft Bing Chat in supporting evidence-based dentistry: comparative mixed methods study. J. Med. Internet Res. 25 , e51580 (2023).

Parasuraman, A., Berry, L. L. & Zeithaml, V. A. Refinement and reassessment of the SERVQUAL scale. J. Retail. 67 , 420 (1991).

Shoemaker, S. J., Wolf, M. S. & Brach, C. Development of the Patient Education Materials Assessment Tool (PEMAT): a new measure of understandability and actionability for print and audiovisual patient information. Patient Educ. Couns . 96 , 395–403 (2014).

Cheong, R. C. T. et al. Artificial intelligence chatbots as sources of patient education material for obstructive sleep apnoea: ChatGPT versus Google Bard. Eur. Arch. Otorhinolaryngol. 281 , 985–993 (2024).

Biggs, J. B. & Collis, K. F. Evaluating the Quality of Learning: The SOLO Taxonomy (Structure of the Observed Learning Outcome) (Academic Press, 2014).

Sinha, R. K., Roy, A. D., Kumar, N., Mondal, H. & Sinha, R. Applicability of ChatGPT in assisting to solve higher order problems in pathology. Cureus 15 , e35237 (2023).

Wang, R. Y. & Strong, D. M. Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12 , 5–33 (1996).

Riedel, M. et al. ChatGPT’s performance in German OB/GYN exams–paving the way for AI-enhanced medical education and clinical practice. Front. Med . 10 , 1296615 (2023).

Sallam, M., Barakat, M. & Sallam, M. METRICS: establishing a preliminary checklist to standardize design and reporting of artificial intelligence-based studies in healthcare. JMIR Prepr . 10 , (2023).

Sallam, M., Barakat, M. & Sallam, M. Pilot testing of a tool to standardize the assessment of the quality of health information generated by artificial intelligence-based models. Cureus 15 , e49373 (2023).

Charnock, D., Shepperd, S., Needham, G. & Gann, R. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J. Epidemiol. Community Health 53 , 105–111 (1999).

Seth, I. et al. Comparing the efficacy of large language models ChatGPT, BARD, and Bing AI in providing information on rhinoplasty: an observational study. Aesthet. Surg. J. Open Forum. 5 , ojad084 (2023).

Mu, X. et al. Comparison of large language models in management advice for melanoma: Google’s AI BARD, BingAI and ChatGPT. Ski. Health Dis. 4 , e313 (2024).

Xie, Y., Seth, I., Hunter‐Smith, D. J., Rozen, W. M. & Seifman, M. A. Investigating the impact of innovative AI chatbot on post‐pandemic medical education and clinical assistance: a comprehensive analysis. ANZ J. Surg. 94 , 68–77 (2024).

Anastasio, A. T., Mills, F. B. IV, Karavan, M. P. Jr & Adams, S. B. Jr Evaluating the quality and usability of artificial intelligence–generated responses to common patient questions in foot and ankle surgery. Foot Ankle Orthop. 8 , 24730114231209919 (2023).

Chou, R. et al. AHRQ Series Paper 4: assessing harms when comparing medical interventions: AHRQ and the Effective Health-Care Program. J. Clin. Epidemiol. 63 , 502–512 (2010).

Download references

Acknowledgements

Research reported in this article was supported by the University of Pittsburgh Momentum Funds, and the National Institutes of Health awards UL1 TR001857, U24 TR004111, and R01 LM014306. The sponsors had no role in study design; in the collection, analysis, and interpretation of data; in the writing of the report; and in the decision to submit the paper for publication.

Author information

These authors contributed equally: Thomas Yu Chow Tam, Sonish Sivarajkumar.

Authors and Affiliations

Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA

Thomas Yu Chow Tam, Alisa V. Stolyar, Katelyn Polanska, Karleigh R. McCarthy, Hunter Osterhoudt, Xizhi Wu & Yanshan Wang

Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA

Sonish Sivarajkumar & Yanshan Wang

Department of Critical Care Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA

Sumit Kapoor

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA

Shyam Visweswaran & Yanshan Wang

Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA

Department of Clinical and Health Informatics, Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX, USA

Department of Anesthesiology, Cleveland Clinic, Cleveland, OH, USA

Piyush Mathur

BrainX AI ReSearch, BrainX LLC, Cleveland, OH, USA

Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA

Giovanni E. Cacciamani

Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA

Cong Sun & Yifan Peng

Hillman Cancer Center, University of Pittsburgh Medical Center, Pittsburgh, PA, USA

Yanshan Wang

You can also search for this author in PubMed   Google Scholar

Contributions

T.Y.C.T. and S.S. conceptualized, designed, and organized this study, analyzed the results, and wrote, reviewed, and revised the paper. S.K., A.V.S., K.P., K.R.M., H.O., and X.W. analyzed the results, and wrote, reviewed, and revised the paper. S.V., S.F., P.M., G.C., C.S., and Y.P. wrote, reviewed, and revised the paper. Y.W. conceptualized, designed, and directed this study, wrote, reviewed, and revised the paper.

Corresponding author

Correspondence to Yanshan Wang .

Ethics declarations

Competing interests.

P.M. has ownership and equity in BrainX, LLC, Y.W. has ownership and equity in BonafideNLP, LLC, and S.V. has ownership and equity in Kvatchii, Ltd., READE.ai, Inc., and ThetaRho, Inc. The other authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary material, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Tam, T.Y.C., Sivarajkumar, S., Kapoor, S. et al. A framework for human evaluation of large language models in healthcare derived from literature review. npj Digit. Med. 7 , 258 (2024). https://doi.org/10.1038/s41746-024-01258-7

Download citation

Received : 04 May 2024

Accepted : 11 September 2024

Published : 28 September 2024

DOI : https://doi.org/10.1038/s41746-024-01258-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

types of the literature review

Non-invasive approaches to hydration assessment: a literature review

  • Published: 26 September 2024
  • Volume 52 , article number  132 , ( 2024 )

Cite this article

types of the literature review

  • Achraf Tahar 1 ,
  • Hadil Zrour 1 ,
  • Stéphane Dupont 2 &
  • Agnieszka Pozdzik 3 , 4  

Explore all metrics

Traditional hydration assessment methods, while accurate, are often invasive and impractical for routine monitoring. In response, innovative non-invasive techniques such as bioelectrical impedance analysis (BIA), electrodermal activity (EDA), electrocardiogram (ECG) monitoring, and urine color charts have emerged, offering greater comfort and accessibility for patients. These methods use various types of sensors to capture a range of bio-signals, followed by machine learning-based classification or regression methods, providing real-time feedback on hydration status, which is crucial for effective management and prevention of urinary stones. This review explores the principles, applications, and efficacy of these non-invasive techniques, highlighting their potential to transform hydration monitoring in clinical and everyday settings. By facilitating improved patient compliance and enabling proactive hydration management, these approaches align with contemporary trends in personalized healthcare. This article presents a literature review on non-invasive approaches to hydration assessment, focusing on their significance in preventing kidney stone disease and enhancing kidney health.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

types of the literature review

Similar content being viewed by others

types of the literature review

Protocol of a pilot-scale, single-arm, observational study to assess the utility and acceptability of a wearable hydration monitor in haemodialysis patients

types of the literature review

Personalized prediction of optimal water intake in adult population by blended use of machine learning and clinical data

types of the literature review

A Survey on Various Total Body Water Detection Techniques to Develop a Wearable Device

Explore related subjects.

  • Artificial Intelligence

Data availability

No datasets were generated or analysed during the current study.

Abbreviations

Bioelectrical impedance analysis

Electrodermal activity

Electrocardiogram

Kidney stone disease

Extracellular water

Intracellular water

Galvanic skin response

Bidirectional long short-term memory

K-nearest neighbors

Monitoring my dehydration

Heart rate variability

Before exercise

Post-exercise

After hydration

Standard deviation of RR intervals

Root mean square of successive RR interval differences

Support vector machine

Electric potential sensing

Body mass index

Hydration monitor

Photoplethysmography

Dehydration body monitor

Lotan Y, Daudon M, Bruyère F, Talaska G, Strippoli G, Johnson RJ, Tack I (2013) Impact of fluid intake in the prevention of urinary system diseases: a brief review. Curr Opin Nephrol Hypertens 22:S1

Article   PubMed   Google Scholar  

Williams JC, Gambaro G, Rodgers A et al (2021) Urine and stone analysis for the investigation of the renal stone former: a consensus conference. Urolithiasis 49:1–16

Pozdzik A, Grillo V, Sakhaee K (2024) Gaps in kidney stone disease management: from clinical theory to patient reality. Urolithiasis 52:61

Article   PubMed   PubMed Central   Google Scholar  

Dawson CH, Tomson CR (2012) Kidney stone disease: pathophysiology, investigation and medical treatment. Clin Med 12:467–471

Article   Google Scholar  

Wang J-S, Chiang H-Y, Chen H-L, Flores M, Navas-Acien A, Kuo C-C (2022) Association of water intake and hydration status with risk of kidney stone formation based on NHANES 2009–2012 cycles. Public Health Nutr 25:2403–2414

Dello Russo M, Formisano A, Lauria F et al (2023) Dietary Diversity and its association with diet quality and health status of European children, adolescents, and adults: results from the I.Family study. Foods 12:4458

Courbebaisse M, Travers S, Bouderlique E, Michon-Colin A, Daudon M, De Mul A, Poli L, Baron S, Prot-Bertoye C (2023) Hydration for adult patients with nephrolithiasis: specificities and current recommendations. Nutrients 15:4885

Article   CAS   PubMed   PubMed Central   Google Scholar  

Gamage KN, Jamnadass E, Sulaiman SK, Pietropaolo A, Aboumarzouk O, Somani BK (2020) The role of fluid intake in the prevention of kidney stone disease: a systematic review over the last two decades. Turk J Urol 46:S92–S103

Mohammedin AS, AlSaid AH, Almalki AM, Alsaiari AR, Alghamdi FN, Jalalah AA, Alghamdi AF, Jatoi N-A (2022) Assessment of hydration status and blood pressure in a tertiary care hospital at Al-Khobar. Cureus 14:e27706

PubMed   PubMed Central   Google Scholar  

Sawka MN, Latzka WA, Matott RP, Montain SJ (1998) Hydration effects on temperature regulation. Int J Sports Med 19(Suppl 2):S108-110

Thornton SN (2016) Increased hydration can be associated with weight loss. Front Nutr. https://doi.org/10.3389/fnut.2016.00018

Tamborino F, Cicchetti R, Mascitti M et al (2024) Pathophysiology and main molecular mechanisms of urinary stone formation and recurrence. Int J Mol Sci 25:3075

Barley OR, Chapman DW, Abbiss CR (2020) Reviewing the current methods of assessing hydration in athletes. J Int Soc Sports Nutr 17:52

Alaslani R, Perzhilla L, Rahman MMU, Laleg-Kirati T-M, Al-Naffouri TY (2024) You can monitor your hydration level using your smartphone camera. https://doi.org/10.48550/arXiv.2402.07467

Noor Azhar M, Bustam A, Naseem FS, Shuin SS, Md Yusuf MH, Hishamudin NU, Poh K (2023) Improving the reliability of smartphone-based urine colorimetry using a colour card calibration method. Digit Health 9:20552076231154684

AlDisi R, Bader Q, Bermak A (2022) Hydration assessment using the bio-impedance analysis method. Sensors 22:6350

O’Brien C, Young AJ, Sawka MN (2002) Bioelectrical impedance to estimate changes in hydration status. Int J Sports Med 23:361–366

Liaqat S, Dashtipour K, Rizwan A, Usman M, Shah SA, Arshad K, Assaleh K, Ramzan N (2022) Personalized wearable electrodermal sensing-based human skin hydration level detection for sports, health and wellbeing. Sci Rep 12:3715

Kulkarni N, Compton C, Luna J, Alam MAU (2020) Monitoring my dehydration: a non-invasive dehydration alert system using electrodermal activity. https://doi.org/10.48550/arXiv.2009.13626

Rizwan A, Abu Ali N, Zoha A, Ozturk M, Alomaniy A, Imran M, Abbasi Q (2020) Non-invasive hydration level estimation in human body using galvanic skin response. IEEE Sens J. https://doi.org/10.1109/JSEN.2020.2965892

Alvarez A, Severeyn E, Velásquez J, Wong S, Perpiñan G, Huerta M (2019) Machine learning methods in the classification of the athletes dehydration. In: 2019 IEEE Fourth Ecuad. Tech. Chapters Meet. ETCM. pp 1–5

Rendon-Morales E, Roggen D, Prance H, Prance RJ (2015) Towards the correlation between human hydration and the electrical activity of the heart using Electric Potential Sensors. In: 2015 IEEE Sens. Appl. Symp. SAS. pp 1–5

Kaveh A, Chung W (2013) Classification of hydration status using electrocardiogram and machine learning. In: AIP Conference Proceedings (Vol. 1559, No. 1, pp. 240-249). American Institute of Physics

Google Scholar  

Mengistu Y, Pham M, Manh Do H, Sheng W (2016) AutoHydrate: a wearable hydration monitoring system. In: 2016 IEEERSJ Int. Conf. Intell. Robots Syst. IROS. pp 1857–1862

Rodin D, Shapiro Y, Pinhasov A, Kreinin A, Kirby M (2022) An accurate wearable hydration sensor: real-world evaluation of practical use. PLoS ONE 17:e0272646

Reljin N, Malyuta Y, Zimmer G, Mendelson Y, Blehar DJ, Darling CE, Chon KH (2018) Automatic Detection of dehydration using support vector machines. In: 2018 14th Symp. Neural Netw. Appl. NEUREL. pp 1–6

Chew N, Noor Azhar AM, Bustam A, Azanan MS, Wang C, Lum LCS (2020) Assessing dehydration status in dengue patients using urine colourimetry and mobile phone technology. PLoS Negl Trop Dis 14:e0008562

Mentes JC, Wakefield B, Culp K (2006) Use of a urine color chart to monitor hydration status in nursing home residents. Biol Res Nurs 7:197–203

Conroy DE, West AB, Brunke-Reese D, Thomaz E, Streeper NM (2020) Just-in-time adaptive intervention to promote fluid consumption in patients with kidney stones. Health Psychol Off J Div Health Psychol Am Psychol Assoc 39:1062–1069

Streeper NM, Fairbourn JD, Marks J, Thomaz E, Ram N, Conroy DE (2023) Feasibility of mini sipIT behavioral intervention to increase urine volume in patients with kidney stones. Urology 179:39–43

Conroy DE, Marks J, Cutshaw A, Ram N, Thomaz E, Streeper NM (2024) Promoting fluid intake to increase urine volume for kidney stone prevention: protocol for a randomized controlled efficacy trial of the sipIT intervention. Contemp Clin Trials 138:107454

Gray M, Birkenfeld JS, Butterworth I (2023) Noninvasive monitoring to detect dehydration: are we there yet? Annu Rev Biomed Eng 25:23–49

Article   CAS   PubMed   Google Scholar  

Johnson KB, Wei W, Weeraratne D, Frisse ME, Misulis K, Rhee K, Zhao J, Snowdon JL (2021) Precision medicine, AI, and the future of personalized health care. Clin Transl Sci 14:86–93

Scales CD, Desai AC, Harper JD et al (2021) Prevention of urinary stones with hydration (PUSH): design and rationale of a clinical trial. Am J Kidney Dis Off J Natl Kidney Found 77:898-906.e1

Aksenov LI, Streeper NM, Scales CD (2024) Leveraging behavioral modification technology for the prevention of kidney stones. Curr Opin Urol 34:14–19

Greenhalgh T, Thorne S, Malterud K (2018) Time to challenge the spurious hierarchy of systematic over narrative reviews? Eur J Clin Invest 48:e12931

Kamran F, Le VC, Frischknecht A, Wiens J, Sienko KH (2021) Noninvasive estimation of hydration status in athletes using wearable sensors and a data-driven approach based on orthostatic changes. Sensors 21:4469

Samoni S, Bonilla-Reséndiz LI (2019) Noninvasive methods of fluid status assessment in critically ill patients. Clinical Publishing, pp. 821–825.e2

Jaffrin MY, Morel H (2008) Body fluid volumes measurements by impedance: a review of bioimpedance spectroscopy (BIS) and bioimpedance analysis (BIA) methods. Med Eng Phys 30:1257–1269

Bari DS, Rammoo MNS, Aldosky HYY, Jaqsi MK, Martinsen ØG (2023) The five basic human senses evoke electrodermal activity. Sensors 23:8181

“Tobii Customer Portal.” [Online]. Available: https://connect.tobii.com . Accessed 06 Aug 2024

“GSR Devices, GSR Signals, Metrics and Applications | MM.” [Online]. Available: https://www.ashokcharan.com/Marketing-Analytics/bm-galvanic-skin-response.php#gsc.tab=0 . Accessed 06 Aug 2024

“Electrocardiogram (EKG/ECG),” Cleveland Clinic. [Online]. Available: https://my.clevelandclinic.org/health/diagnostics/16953-electrocardiogram-ekg . Accessed 07 Aug 2024

Sarvazyan AP, Tsyuryupa SN, Calhoun M, Utter A (2016) Acoustical method of whole-body hydration status monitoring. Acoust Phys 62:514–522

Mengistu Y, Pham M, Do HM, Sheng W (2016) AutoHydrate: a wearable hydration monitoring system. In: 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1857–1862. https://doi.org/10.1109/IROS.2016.7759295

Quarti-Trevano F, Seravalle G, Dell’Oro R, Mancia G, Grassi G (2021) Autonomic cardiovascular alterations in chronic kidney disease: effects of dialysis, kidney transplantation, and renal denervation. Curr Hypertens Rep 23:10

de Moraes J et al (2018) Advances in photopletysmography signal analysis for biomedical applications. Sensors. https://doi.org/10.3390/s18061894

Feng Y, Fang G, Qu C, Cui S, Geng X, Gao D, Qin F, Zhao J (2022) Validation of urine colour L*a*b* for assessing hydration amongst athletes. Front Nutr. https://doi.org/10.3389/fnut.2022.997189

Belasco R, Edwards T, Munoz AJ, Rayo V, Buono MJ (2020) The effect of hydration on urine color objectively evaluated in CIE L*a*b* color space. Front Nutr 7:576974

“The science of nutrition and healthy eating: Week 3: 2 | OpenLearn - Open University.” Accessed: Aug. 27, 2024. [Online]. Available: https://www.open.edu/openlearn/mod/oucontent/view.php?id=72178&section=2 . Accessed 25 Aug 2024

Calhoun MC, Utter A, McAnulty SR, McBride JM, Zwetsloot J, Austin M, Mehlhorn JD, Sommerfield L, Tsyuryupa S, Sarvazyan A (2015) Validity of an acoustic method to assess whole-body hydration status. Proc Meet Acoust 23:020001

Kreinin A (2017) Study details | analysis of sweat secretion and body dehydration monitoring | ClinicalTrials.gov. https://clinicaltrials.gov/study/NCT03229109?cond=NCT03229109&rank=1 . Accessed 7 Sep 2024

Marks J, E. Conroy D, M. Streeper N (2023) CLINICAL TRIALS SipIT behavioral intervention clinical trial to increase fluid intake for kidney stone prevention - American Urological Association. https://auanews.net/issues/articles/2023/october-extra-2023/clinical-trials-sipit-behavioral-intervention-clinical-trial-to-increase-fluid-intake-for-kidney-stone-prevention . Accessed 7 Sep 2024

Download references

Author information

Authors and affiliations.

Department of Research, Development and Innovation, Renal Care and Research Srl, Rue Saint Martin 35, 1457, Walhain, Nil Saint Vicent, Belgium

Achraf Tahar & Hadil Zrour

Artificial Intelligence Research Unit (MAIA), Department of Computer Science, University of Mons, Avenue Maistriau15, 7000, Mons, Belgium

Stéphane Dupont

Kidney Stone Clinic, University Hospital Brugmann, Place A. Van Gehuchtenplein 4, 1020, Brussels, Belgium

Agnieszka Pozdzik

Faculty of Medicine, Université Libre de Bruxelles (ULB), Route de Lennik 808, 1070, Brussels, Belgium

You can also search for this author in PubMed   Google Scholar

Contributions

A.T. and A.P. wrote the main manuscript text. A.T. prepared figures. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Achraf Tahar or Agnieszka Pozdzik .

Ethics declarations

Conflict of interest.

H.Z. is currently employed at Renal Care & Research. A.P. serves on the board of Renal Care & Research.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Tahar, A., Zrour, H., Dupont, S. et al. Non-invasive approaches to hydration assessment: a literature review. Urolithiasis 52 , 132 (2024). https://doi.org/10.1007/s00240-024-01630-y

Download citation

Received : 03 September 2024

Accepted : 09 September 2024

Published : 26 September 2024

DOI : https://doi.org/10.1007/s00240-024-01630-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Kidney stones
  • Water intake
  • Mobile application
  • Smart bottle
  • Find a journal
  • Publish with us
  • Track your research
  • Systematic Review
  • Open access
  • Published: 27 September 2024

The dental needs of children with Epidermolysis Bullosa and service delivery: a scoping review

  • Z. Smith   ORCID: orcid.org/0000-0002-5575-1165 1 ,
  • S. Nath   ORCID: orcid.org/0000-0001-8714-7264 2 ,
  • M. Javanmard   ORCID: orcid.org/0000-0003-3675-9871 1 &
  • Y. Salamon   ORCID: orcid.org/0000-0002-1710-8056 1  

BMC Oral Health volume  24 , Article number:  1131 ( 2024 ) Cite this article

Metrics details

Epidermolysis Bullosa (EB) is a genetic condition with fragility of the skin and oral mucosal lining requiring appropriate care and management by dental health professionals. The objective of this scoping review was to comprehensively examine the specialised dental needs of children with Epidermolysis Bullosa and map evidence towards the type, availability, and accessibility of specialised dental care services within various health care systems.

This scoping review was conducted using the JBI Methodology framework for scoping reviews. Five databases were used to source relevant literature: MEDLINE, Embase, Dentistry & Oral Sciences Source, Scopus, and Web of Science during the period 1963–2022.

Thirty three published case reports were identified reporting on 45 participants encompassing the dental care and management of children diagnosed with EB aged between 0–12 years of age from an Australian and international health care context. The findings reveal the need for greater awareness amongst health professionals in the management and specialised dental care needs of children and the need for further research, and care pathways for children with EB.

There is a dearth of evidence which examines the dental needs of children, in particular referral pathways and timely access to dental health services and professionals. Dentists play an important role in monitoring and providing individualised and specialised oral care and treatment to the child with EB. It is vital that dentists as well as the wider multidisciplinary team have knowledge and understanding of the EB condition in meeting the specialised needs and management of these children.

Peer Review reports

Introduction

Epidermolysis Bullosa (EB) is a rare inherited disease affecting the skin and mucosal membranes in response to minor trauma. The condition has thirty reported sub types across four main classifications of the disease based on the blister formations noted as: EB Simplex (EBS), Junctional EB (JEB), Dystrophic EB (DEB) and Kindler EB (KEB) [ 1 ]. The type of EB can range from mild to severe in nature impacting an estimated 500,000 people globally [ 1 ]. The condition is incurable and affects people from birth with chronic fragility of the skin, blistering, ulcerations, and trauma to the skin and mucosal membranes from minor injury, trauma, rubbing, friction, and heat [ 2 , 3 ]. Babies born with this condition are commonly referred to as ‘butterfly children’ due to the thin, fragile, and translucent nature of their skin similar to that of a delicate butterfly’s wings [ 1 ]. Children with EB have been reported to experience traumatic stress reactions from not only their medical treatments but interactions with health professionals providing painful treatments [ 4 , 5 ]. Similarly, continued daily EB treatments and management of their condition has also reported to impact and cause strain on the individuals, their family and those providing care [ 4 , 5 , 6 ].

Dependent on the type of EB, the eyes, nails, and hair can also be affected in addition to the mouth, gums, throat and esophagus, stomach, and bladder [ 1 ]. For children, blistering and trauma to the oral mucosa can impact their ability to eat and maintain healthy weight, nutrition, growth, and wound healing [ 7 ]. Of the four types of EB all patients experienced some degree of mouth ulceration. EBS is identified as having milder oral cavity ulceration [ 1 ]. However, JEB, DEB and KEB have additional health issues of tooth enamel decay, tooth decay, overcrowding or misalignment of teeth, and oesophageal blistering [ 1 ]. The sub type KEB also has additional oral cavity complications of gingivitis, tooth decay, loss of teeth and gingival enlargement ( growth of the gum around the teeth ) [ 8 ]. For the EB child general oral health care is complex with a focus on preventative care, the management of oral hygiene, dental caries, and necessary tooth extractions [ 5 , 9 ]. Similarly, dental sensitivity, pain and oral care in general are areas often overlooked for children with this condition. Several authors report children as reluctant to conduct daily cleaning, thereby being noncompliant with ongoing recommended treatment/care when visiting dentists thereby increasing the incidence of ongoing dental treatment issues such as infection, teeth cavities and inflammation of the gums and overall poor oral health [ 10 , 11 , 12 ]. For children with EB regular in the chair dental treatment can be painful and traumatic with further trauma and complications experienced to the oral mucosa, with many children refusing treatment based on fear, pain, and previous negative dental experiences [ 13 ].

Children with EB may undergo numerous invasive procedures with their condition further compromised in the regular dental environment due to non-compliance, pain, trauma, and further complications to the oral mucosa. Dental care for EB children is often required to be undertaken in the operating room setting under general anaesthetic where specialised care and management can be fulfilled in a safe and controlled environment. Little is currently understood of the current arrangement of dental services, accessibility, and the availability of healthcare services for EB children.

Current guidelines on dental care for EB patients focus on prevention and management with a shared care approach with the multidisciplinary team providing care [ 8 ]. Referrals to specialist dental services are often required for the management of painful extractions or dental treatments which are unable to be performed in a regular dental clinic. For many children, dental treatment is best undertaken within the perioperative setting with experienced anaesthetic staff familiar with the EB condition as anaesthetic management can be hazardous with issues such as difficultly establishing an airway during intubation and trauma to the airway [ 14 , 15 , 16 ]. Many children with EB have had successful surgical procedures conducted under a general anaesthetic, with new techniques to manage the airway successfully intraoperatively in a controlled environment to improve their long-term oral EB rehabilitation to delimit exacerbating further oral, skin trauma and integrity during the perioperative period [ 17 , 18 , 19 , 20 ].

Globally, there is extensive literature of challenging and complex dental EB cases and the necessity for individualised dental care and management across the lifespan [ 20 , 21 , 22 , 23 , 24 ]. It is imperative that health services and schemes are available to assist patients with ongoing care requirements across the lifespan. In the Australian context, there is support by the National Disability Insurance Scheme (NDIS) for children and adults with significant physical impairment for the severe types of EB whilst those with milder forms of EB are unsupported in meeting their specific care requirements [ 25 ]. Therefore, there is a need for greater attention on the dental needs and care requirement for the EB child in line with their developmental oral health needs. Dental guidelines in managing EB patients have emphasised the need for early access to dental services with regular prevention and monitoring by a local dentist [ 8 , 26 ]. In effect, the local dentist is a primary conduit for a shared care approach and referral to specialised dentistry services should sedation or general anesthesia be required to aid in the child’s ongoing management and improve the quality of oral health outcomes across the lifespan [ 21 ]. Access to regular and specialised dental services may not always be readily available. This may impact how children and families who may require individualised preventative care, access treatment and care to manage their condition and their long-term oral health. Improving dental care and services for children with EB is an area often overlooked and in need of highlighting for the whole multidisciplinary health team. As treatment is often required early it is important for all health professionals to have an awareness of EB and the potential impact on the child’s developmental phases, nutrition, healthy weight, growth, wound healing, speech, and oral health. The purpose of this scoping review is to provide insight into the best evidence base of specialised dental care and management for children with EB during their pivotal developmental ages between 0–12 years and map evidence towards the type, availability, and accessibility of specialised dental care services within various health care systems from the extant literature.

Scoping review questions

The following questions guided the scoping review:

What are the specialised dental needs of children with EB?

What is the availability and accessibility of specialised dental care services currently available for children impacted with EB?

Given the rare nature of EB and the dearth of literature specifically exploring the dental care of children it was decided that a scoping review was the best approach and suitable in nature to address the research topic and questions from an international perspective. To guide the review process, the scoping review was conducted in accordance with the JBI methodology framework for scoping reviews and reported using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) [ 27 , 28 , 29 , 30 ] to broadly explore and map current evidence from the extant literature.

Inclusion criteria

Participants.

In line with the review questions, we included children 0–12 years of age, both male and female, of any ethnicity diagnosed with any type and form of EB ( e.g., EB Simplex (EBS), Junctional EB (JEB), Dystrophic EB (DEB) and Kindler EB (KEB ).

Included were studies which examined EB dental treatments for paediatric patients requiring specialised dental care, treatment, or specialised services from various health care settings ( e.g., dental clinic, hospital setting, hospital clinic, operating room ) inclusive of care provided by the multidisciplinary health care professionals ( e.g., dentists, dental nurse, dental surgeons & anesthetists ).

Studies from any geographical location, setting which reported on children 0–12 years of age with EB requiring specialised dental care or treatment or services were considered for inclusion within this review. Studies which addressed aspects of ‘ dental services ’, ‘ referral processes ’ and ‘ management of the paediatric EB patient ’ were included within this review.

Types of sources

This review considered all forms of primary studies; experimental and quasi-experimental study designs including randomized controlled trials, non-randomized controlled trials, before and after studies, interrupted time-series studies, qualitative studies, and text and opinion papers published in English language.

Exclusion criteria

The following exclusion criteria were applied during the abstract, title and full-text review stages:

▪ Ineligible phenomena of interest or health condition

▪ Conference posters

▪ Ineligible age population e.g. studies focused on children more than 12 years.

▪ Studies published in another language without an English translation were excluded due to lack of time and cost of translation.

Search strategy

The search strategy aimed to locate both published and unpublished primary studies. An initial search of MEDLINE, Embase, Dentistry & Oral Sciences Source (DOSS), Scopus, Web of Science was undertaken with a librarian to identify the relevant text words, and index terms to identify and source relevant articles on the topic during October 2022. The keyword search terms used for MEDLINE were: Exp epidermolysis bullosa OR epidermolysis bullosa.ti,ab OR EB.ti,ab OR bullous epidermolysis.ti,ab OR epidermoid bullosa.ti,ab Dental care.sh OR dental care for children.sh OR oral health.sh OR exp Surgery, Oral OR exp Oral Surgical Procedures OR exp Dentistry, Operative OR dentistry.sh OR exp "Oral and Maxillofacial Surgeons" OR exp tooth extraction OR exp dental clinics OR oral hygiene.sh OR dental*.ti,ab OR ((teeth OR tooth OR dental) adj2 (extraction* OR excision OR removal)).ti,ab OR oral health.ti,ab OR dental surg*.ti,ab OR dentist*.ti,ab OR teeth.ti,ab OR tooth.ti,ab OR oral maxillofacial.ti,ab OR ((hospital outpatient OR program*) adj3 (dental* OR oral OR dentist* OR teeth OR tooth OR extraction*)).ti,ab Exp child OR exp infant OR child*.ti,ab OR preschool*.ti,ab OR pediatric.ti,ab OR paediatric.ti,ab OR minor*.ti,ab OR infant*.ti,ab OR toddler*.ti,ab Exp Australia OR Australia*.ti,ab. These search terms and strings were further used to develop a full search strategy, including all identified keywords and index terms, to ensure they were applied, and adapted accordingly for each included database and information source. The databases searched included JBI Evidence Synthesis, Cochrane Database of Systematic Reviews, MEDLINE, Embase, Dentistry & Oral Sciences Source (DOSS), Scopus, Web of Science. Sources of unpublished studies/gray literature were also searched including Google Scholar and Open Grey. Studies published in any language were included if also available in English. Studies were not limited by a specific date range apart from the inclusion of all published papers up until September 2022.

Study selection

Following the search, all identified citations were collated and uploaded into Endnote 20 ( Clarivate Analytics, PA, USA) [ 31 ], with duplicates removed prior to import into the JBI System for the Unified Management, Assessment and Review of Information (JBI SUMARI) (JBI, Adelaide, Australia) [ 32 ]. Following a pilot test, titles and abstracts were screened by three independent reviewers (ZS, MJ, YS) for assessment against the inclusion criteria filtering ineligible studies and those irrelevant to the review question. Studies put forward for full text review were assessed in detail by two independent reviewers (ZS, YS) against the inclusion criteria and where consensus could not be reached a third reviewer (MJ) was consulted. Studies excluded were recorded and reported noting the reasons for exclusion.

Data extraction, analysis & presentation

Data was extracted from the papers by reviewers (SN, ZS) using a data extraction tool developed by the reviewers and checked for accuracy and completeness of information extracted by (ZS). Extracted data included specific details about the participants, concept, context, setting, study methods, and key findings relevant to the review question/s is presented (Table  1 ). The data collected from each of the included studies was analysed by (ZS, SN) and has been presented graphically and in tabular format with a narrative summary of the tabulated results related to the reviews objective and questions exploring the types of EB, care, and management of specialised dental services for children impacted with EB.

The literature search resulted in 789 articles sourced, and after removing duplicates and removing articles that did not meet the inclusion criteria, 33 publications were considered for full-text review. After reviewing the full-text articles, data extraction was carried out for these articles. All the reported literature were either case reports or case reviews published from 1963 to 2022. The PRISMA-ScR flow diagram describes the study selection process (Fig.  1 ).

figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) [ 30 ]

Population characteristics

The total number of patients reported was 45, which included 23 males and 22 females. The geographic distribution and proportion of EB cases were predominantly reported from the United States ( n  =  10 ) [ 33 , 36 , 38 , 42 , 43 , 45 , 46 , 51 , 63 , 64 ], followed by Brazil in second place with six case reports [ 44 , 50 , 52 , 55 , 58 , 60 ]. Other countries reporting on EB were India [ 39 , 56 ], Iran [ 22 , 57 ], Turkey [ 54 , 59 ], and Taiwan [ 37 , 47 ], having 2 case reports each. All the other reports were from European countries: France [ 35 ], Germany [ 34 ], Italy [ 40 , 48 ], Russia [ 62 ], and the United Kingdom [ 49 ]. Australia had two publications on EB [ 41 , 53 ]. The ages ranged from newborns to age 12 (Fig.  2 ). All the studies reported on children 0–12 years of age, except for two studies who reported on not only a child but an adult patient within their reported case reviews ( highlighted in Table  1 ) [ 40 , 61 ].

figure 2

Patients per age

Oral manifestation of EB

According to the literature, there are four major manifestations of Epidermolysis bullosa: EB simplex, junctional EB, Dystrophic EB, and Kindler syndrome [ 65 ]. For this review, we found EB simplex was described in seven case reports [ 37 , 41 , 42 , 47 , 58 , 59 , 62 ]. The Koebner subtype [ 37 ], and herpetiformis (Dowling Meara type) [ 47 ] type were the reported subtypes for EB simplex. Dystrophic EB was the most commonly reported type of EB, having 23 case reports [ 33 , 34 , 36 , 38 , 22 , 39 , 40 , 44 , 46 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 60 , 61 , 63 , 64 ]. The Dystrophic form has two main subtypes, dominant and recessive subtypes. The recessive type was more commonly reported ( n  =  10 ), with two cases of the Haliopeau-Siemens subtype [ 34 , 40 ], and one of the Touraine subtype [ 45 ]. Only one case was found on the mixed EB: the Kindler subtypes [ 35 ]. There were no reported oral manifestations of Junctional EB from our literature search.

The oral manifestation can significantly decrease the quality of life. The most common intraoral features in all the reports were multiple bullae, erosions and/or vesicles on the oral mucosa, including sites such as the tongue, hard palate, gingiva, and buccal mucosa. Most patients experienced limited mouth opening or microstomia due to repeated blistering and healing, leading to scarring and contractures around the lips and mouth [ 34 , 36 , 39 , 44 , 46 , 50 , 53 , 54 , 61 ]. The absence of upper and lower frenum [ 39 ], and lingual papillae (ankyloglossia) [ 34 , 40 , 44 , 52 ] was also observed. Few patients had obliteration of the buccal and lingual vestibule [ 39 , 40 , 46 , 54 , 61 ]. The tongue showed a denuded appearance without papillae [ 44 , 57 , 60 ], and rugae were absent from the palate [ 54 , 61 ]. White lesions were observed on the tongue, gingiva, and buccal mucosa [ 37 , 39 , 47 , 52 , 54 ]. Pigmentation of lips and angular cheilitis were also reported [ 35 ].

The EB had affected both the primary and permanent dentition. Maintaining proper oral hygiene is essential for overall health, and children with EB may struggle due to pain and limited mouth opening, leading to a higher risk for dental caries and periodontal disease. The dental findings included enamel hypoplasia [ 35 , 40 ] and enamel pitting [ 59 ], which progressed to carious teeth [ 33 , 34 , 36 , 37 , 38 , 40 , 43 , 44 , 45 , 46 , 48 , 50 , 53 , 54 ] and, in severe cases, led to dentoalveolar abscess formation [ 22 , 46 , 51 , 56 , 63 , 64 ]. The rapid progression of caries resulted in the deterioration of teeth, leaving only remnants of root fragments [ 37 ]. There was delayed eruption of permanent teeth. Features of Class II malocclusion were also observed, showing signs of severe crowding, protrusion of incisors, and anterior open or deep bite [ 22 , 41 , 45 , 61 , 62 ]. The periodontal tissues were also affected, showing gingival inflammation or hyperplasia [ 59 ] and ulceration [ 60 ], causing gingivitis [ 34 , 45 , 48 ], and eventually progressing to periodontitis [ 35 ].

Healthcare treatment context & specialised treatment

The dental treatment for EB were predominantly managed in a hospital setting ( n  =  18, 55% ) [ 33 , 22 , 35 , 36 , 37 , 38 , 40 , 42 , 43 , 45 , 46 , 47 , 49 , 50 , 54 , 56 , 63 , 64 ], and a dental hospital ( n  =  14, 42% ), [ 9 , 34 , 39 , 44 , 48 , 51 , 52 , 53 , 55 , 57 , 58 , 59 , 60 , 62 ] with ( n  = 1, 3%) reporting orthodontic treatment provided in the dental clinic setting.63 Children requiring specialised treatment or interventions were reported within the general hospital and dental hospitals where ( n  =  12 ) required a general anaesthetic [ 34 , 36 , 37 , 38 , 43 , 45 , 46 , 49 , 50 , 56 , 63 , 64 ], a local anaesthetic ( n  =  5 ) [ 9 , 35 , 39 , 57 , 60 ] with one case requiring a local and a general anaesthetic [ 22 ] and ( n  =  2 ) patients had dental care provided with IV Ketamine [ 33 , 51 ], another required topical local anaesthetic [ 52 ] and one report noted a combination of treatments such as nitrous oxide, regional anaesthesia, IV sedation with ketamine [ 53 ]. Whilst ( n  =  11 ) were not reported to require anaesthesia for specialised treatment [ 40 , 41 , 42 , 44 , 47 , 48 , 54 , 55 , 58 , 59 , 62 ]. Therefore, children within this review received a variety of anaesthetic approaches to ensure optimal care and outcomes.

Dental treatment outcomes

A number of dental treatments were reported from preventative, diagnostic and restorative (Fig.  3 ). Dental extraction under local or general anaesthesia was the most common treatment for severely decayed teeth [ 35 , 36 , 45 , 46 , 49 , 51 ], or even full mouth extraction [ 33 , 50 ]. For mild to moderate dental caries, the teeth were restored with GIC [ 22 , 55 ] composite [ 38 , 64 ], silver amalgam [ 34 , 38 , 53 , 59 , 63 ], and pulp therapy [ 37 , 45 , 64 ], or root canal treatment, [ 44 , 53 ]was performed in severe cases. Five authors suggested using stainless steel crowns for restoring primary molars [ 38 , 45 , 46 , 63 , 64 ]. Galeotti et al. [ 40 ] reported using lasers to remove caries. In adjunctive to therapeutic and surgical treatment, oral hygiene therapy was performed [ 35 , 38 , 44 , 53 , 55 , 57 , 60 ], Fluoride gels [ 53 ], fluoride varnish [ 39 , 52 , 55 , 59 ] and fissure sealants [ 34 , 57 ] were used for the preventative strategies. Two authors suggested removing or fixing partial dentures to rehabilitate missing dentition for permanent dentition [ 37 , 52 ], Rochette bridge, [ 53 ] and space maintainers of primary teeth [ 36 ]. Marini et al. [ 48 ] suggested home care methods such as topical application of Sucralfate on the blisters, and Scheidt et al. 47 recommended using aloe vera gel. Three authors performed removal and fixed orthodontic treatment for treating malocclusion [ 35 , 41 , 62 ].

figure 3

Types of treatment

This scoping review focussed on studies primarily on children with EB exploring their dental needs and specialised treatments received. Due to the rarity of this disease and limited focus on the dental care needs of children, this scoping review comprehensively maps the evidence highlighting the complex dental and specialised care needs of forty five case reviews of children with EB informing this scoping review.

Individualised care & follow up

The need for individualised care was emphasised throughout all the case reviews presented. A number of authors reported the need for dentists to provide continuous dental examinations from birth throughout the lifespan to monitor, recognise and address dental issues as early as possible. Therefore, predominant care in the early stages is focussed on preventative measures commencing from birth [ 45 ]. The study by Camm et al. [ 36 ] recommends triannual dental examinations, whilst other authors reported follow up monthly [ 35 , 44 ] and every six months [ 22 ].

Dental compliance & health professional trust

This review has highlighted the complexity of dental care for children and the need for routine care to manage dental symptoms prior to the eruption of their first tooth and ongoing follow up care to manage their overall oral health development milestones. As recognised within this review, children requiring dental care may undergo numerous invasive procedures as a result of various dental ailments where they were unable to be managed in the regular dental environment due to complexity of care, non-compliance, pain, trauma, and further complications to their oral mucosa.

Dental compliance for any child can be difficult even more so for the child with EB where daily dental preventative treatment as simple as brushing their teeth can cause painful intraoral blistering with limited mouth opening [ 36 , 56 ]. Preventative care is also related to parental knowledge, understanding of diet impacting oral health and compliance in monitoring dental hygience at home. The study by Eswara [ 39 ] highlighted this aspect reporting the experience of parents avoiding brushing their child’s teeth up until aged seven years of age to avoid pain and not to cause further intra oral blistering. Therefore, parents play a pivotal role in the oral health of their child, encouraging regular oral hygiene, the use of soft toothbrushes, puree diets and supplements as required [ 52 ]. This is further supported by Torres et al. [ 60 ] who recommends diet counselling as a preventative measure to reduce potential oral health issues.

The need for timely access to dental services was also identified amongst the case reviews. The earlier study by Hochberg et al. [ 42 ] also confirmed that many patients were not brought to the dentist until they required actual care to resolve a dental issue. This was the case for a child who although from birth was diagnosed with EB was not seen by a dentist until 11 years of age until he was flagged by the dermatologist as requiring urgent dental treatment [ 53 ]. Whilst other children were seen from birth and followed through for alternate specialised care such as orthodontic treatment [ 41 ] or new innovative treatments such as sucralfate for pain and blisters [ 48 ].

For children with EB developing trust in health professionals is important particularly when undergoing painful procedures. As such for the EB child requiring specialised care, meeting new dentists, oral surgeons, and other health professionals, as well as visiting new places such as an operating room or an outpatient clinic can be a difficult and traumatic experience. Interestingly, to build trust and continuity of care with patients a few of the dentists within this review reported providing ongoing dental care across the lifespan for the child [ 35 , 54 ]. This may not always be possible with specialised care and referral required elsewhere dependant on the needs of the child and compliance with treatment. This was the case when earlier authors reported issues with limited cooperation by some children with dental therapy in the chair, [ 33 , 58 ] and the preference for dental therapy to involve procedures under anaesthesia [ 35 , 36 , 45 , 46 , 49 , 51 ].

Specialised care & treatment in the operating room

Several authors (dentists) have discussed the need to monitor and minimise trauma to their patients when providing any form of treatment to delimit fragility of the oral tissue causing blistering [ 33 , 48 , 58 ] with less invasive procedures producing the best effects [ 40 ], as well as a focus on overall safety and patient benefits of procedures in the operating setting [ 46 , 50 ]. Several of the case reviews reported the need for uncooperative children with EB requiring dental treatment in the operating room setting where specialised care and management could be fulfilled in a safe and controlled environment [ 45 , 54 , 63 ]. The risks of care under general anaesthesia was reported as primarily inflicting only minor trauma to the airway, and minor post operative complications as well as general trauma to the skin when inserting intravenous lines as well as the use of various tapes [ 36 ]. The type of treatments for EB primarily in the operating setting were reported as dental extractions and this continues to be the main surgery type. This was confirmed by Hubbert and Adams [ 43 ] earlier report noting dental and reconstructive surgery of the hands and fingers as prominent surgeries for EB within the operating room setting. Overall, access to specialised care and treatment via surgical intervention was effective in managing the child’s condition with minimal trauma and pain experienced.

Other minor forms of treatment were provided in the hospital or dental clinics with clinicians preferring treatment in outpatient settings to decrease the risk of patients developing secondary infections [ 42 ]. An important aspect of EB is the need to monitor oral infections and blistering on a regular basis. Authors Yoon and Ohkawa [ 64 ], recommend the use of topical antibiotics and oral antiseptics to assist in resolving secondary infections. Whilst Hochberg et al. [ 42 ] reported the use of antibiotics pre and post dental treatment.

Dental services, individualised care, continuity of care & referral processes

Little is currently understood of the current arrangements or dental care pathways for patients with EB. As evidenced with the reports each patient has had a unique journey through the health system in receiving care at varied ages. There is a dearth of information on the nature of dental services, accessibility, and the availability of healthcare services for EB children nor how the team provides initial referral in amongst the multidisciplinary team. From the case reviews examined both internationally and within the Australian context the process of referral is an unexplored phenomenon. The case review by Lindemeyer et al. [ 46 ] reported the need for an international referral from Saudi Arabia of two siblings aged four and eight years of age for treatment in the United States encompassing anaesthetic management during surgery.

Many authors emphasise the need for patients diagnosed with EB necessitating a comprehensive dental care plan in conjunction with a conservative dental treatment plan, with a multidisciplinary care team approach to improve the quality of life for these children [ 44 , 57 , 58 ]. Surprisingly none of the case reviews explored and outlined aspects of referral processes or care pathways for these children. Several authors however do emphasise the need for managing the complex conditions of each patient, accessibility to care in some reports, the decision making processes in providing the best available care via intraoperative management, and safe and effective treatments across the lifespan [ 9 , 60 , 62 , 63 , 64 ]. Continuity of care is an area which requires further exploration as some children had not seen a dentist for some time, and this certainly can restrict timely care towards correcting oral health issues.

Across the studies, accessibility to dental care was not reported as a dominant issue although there are differences and perhaps disparities in health services and accessibility to dental care services globally. Interestingly from an Australian context only two papers were identified exploring the dental management of EB patients [ 41 , 53 ]. A report outlining data from the EB national registry on the distribution of EB patients described a large number of people residing predominantly outside the major metropolitan areas with many living in rural and remote regions with limited access to health professionals and treatment [ 66 ].

Implications for practice & research

Many health professionals are unfamiliar with EB as a condition as well the complexity in managing and treating patients. Several of the studies reviewed commented on the need for the multidisciplinary team to work together in providing comprehensive care to EB patients. This multidisciplinary team involves, paediatricians, geneticists, dermatologists, gastroenterologist, paediatric dentists, oral surgeons, anaesthetists, mental health teams, dieticians, physiotherapists, speech, and language therapists across various settings [ 9 ]. Interestingly, nurses play a pivotal role in providing care within the hospital, the perioperative setting and community setting for EB patients however there is little literature on the specific aspects of oral and dental care provided for these patients by nurses. From a dentistry perspective, care can be complex and challenging. It is important to raise awareness of the dental needs of children with EB in amongst the multidisciplinary team to ensure early referral, management, and specialised treatment. Dentists also need to have an awareness and understanding of the EB condition, treatment and provide appropriate and timely referral to enhance the patients’ oral outcomes and quality of life across the lifespan [ 6 , 44 , 57 , 58 ]. Whilst there are international practice guidelines [ 10 ], further research on the efficacy of services and accessibility to specialised dental services is an area worth exploring further to establish care pathways and accessible services for children from birth across their lifespan.

Limitations of the review

It is acknowledged that this review was limited based on the focus of children and the absence of research articles which met the review inclusion criteria. Unfortunately, there were limited research papers with a focus on EB and dentistry limited to children zero to 12 years of age from an international context. Similarly, this review excluded papers not published in English which can be regarded as a limitation of this scoping review. Another significant limitations to this review is the number of case reviews and reports included within the review which limit the generalisability of these papers as they are predominantly case specific. On the other hand, a strength of this review was the ability to focus solely on children and capture the data as reported via the individual case reviews from an International perspective. Overall, this review, provides an insight towards the type of care provided to children, the context of this care, the treatment received and treatment outcomes as well as the types of specialised dental care services accessed across the different health care systems as well as highlighting the importance of continuity of care and best practice towards optimal oral health.

EB is a condition which can affect the quality of life for children with this condition. The overall findings confirm that children with EB require ongoing dental monitoring and specialised care. Therefore, as identified through the case reports most of the children with this condition from newborn with ongoing needs and care requirements across their lifespan. The scoping review provides an insight into the need for further research. Greater attention is required on the dental needs of children, in particular referral and timely access to dental health professionals and services. The review raises awareness of EB, and the importance of health professionals and dentists working together to meet the specialised dental care needs of these children to ensure they thrive and have a quality of life.

Data availability

Data is provided within the manuscript.

Availability of data and materials

All data generated or analysed during this study are included in this published manuscript and its supplementary files.

DEBRA International. Austria. [cited 2022 October 2, 2022]. Available from: https://www.debra-international.org

Boeira VLSY, Souza ES, Rocha BO, et al. Inherited epidermolysis bullosa: Clinical and therapeutic aspects: Epidermólise bolhosa hereditária: Aspectos clínicos e terapêuticos. An Bras Dermatol. 2013;88:185–98.

Article   PubMed   PubMed Central   Google Scholar  

Has C, Bauer J, Bodemer C, et al. Consensus reclassification of inherited epidermolysis bullosa and other disorders with skin fragility. Br J Dermatol. 2020;183:614–27.

Article   CAS   PubMed   Google Scholar  

Bodán RC. Reframing the Care of Children With Epidermolysis Bullosa Through the Lens of Medical Trauma. Journal of the Dermatology Nurses’ Association. 2020;12:16–23.

Article   Google Scholar  

Korolenkova M, Poberezhnaya A, Starikova N, Udalova N, Dmitrieva N. Complex dental rehabilitation in children with dystrophic epidermolysis bullosa. Acta Derm Venereol. 2020;100:65–6.

Google Scholar  

Kearney S, Donohoe A, McAuliffe E. Living with epidermolysis bullosa: Daily challenges and health-care needs. Health Expect. 2020;23:368–76.

Article   PubMed   Google Scholar  

Kramer S. Chapter 1: General information on epidermolysis bullosa for the oral health care professional. Spec Care Dentist. 2020;40:9–12.

Krämer S, Lucas J, Gamboa F, et al. Clinical practice guidelines: Oral health care for children and adults living with epidermolysis bullosa. Spec Care Dentist. 2020;40:3–81.

Veliz S, Huber H, Yubero MJ, Fuentes I, Alsayer F, Kramer SM. Early teeth extraction in patients with generalized recessive dystrophic epidermolysis bullosa: A case series. Special care in dentistry : official publication of the American Association of Hospital Dentists, the Academy of Dentistry for the Handicapped, and the American Society for Geriatric Dentistry. 2020;40:561–5.

Kramer SM, Serrano MC, Zillmann G, et al. Oral Health Care for Patients with Epidermolysis Bullosa - Best Clinical Practice Guidelines. Int J Pediatr Dent. 2012;22:1–35.

Dag C, Bezgin T, Ozalp N. Dental management of patients with epidermolysis bullosa. Oral health and dental management. 2014;13:623–7.

PubMed   Google Scholar  

Kosmidou A, Liversidge HM, Hector MP. Tooth formation in children with Epidermolysis Bullosa. J Dent Res. 2001;80:1165–1165.

Kramer S, Lucas J, Gamboa F, et al. CHAPTER 3: Oral health care and dental treatment for children and adults living with epidermolysis bullosa-Clinical practice guidelines. Spec Care Dentist. 2020;40:32–53.

Blazquez Gomez E, Garces Aleta A, Monclus Diaz E, Manen Berga F, Garcia-Aparicio L, Ontanilla LA. Anaesthetic management in a paediatric patient with a difficult airway due to epidermolysis bullosa dystrophica. Rev Esp Anestesiol Reanim. 2015;62:280–4.

CAS   PubMed   Google Scholar  

Iohom G, Lyons B. Anaesthesia for children with epidermolysis bullosa: A review of 20 years’ experience. Eur J Anaesthesiol. 2001;18:745–54.

Griffin RP, Mayou BJ. The anaesthetic management of patients with dystrophic epidermolysis bullosa: A review of 44 patients over a 10 year period. Anaesthesia. 1993;48:810–5.

Korolenkova MV. Dental treatment in children with dystrophic form of epidermolysis bullosa. Stomatologiia. 2015;94:34–6.

Stevens P, Hustig A. 5-year review of airway management in children with Epidermolysis Bullosa at a tertiary paediatric centre. Trends in Anaesthesia and Critical Care. 2020;30: e156.

Lin Y-C, Golianu B. Anesthesia and pain management for pediatric patients with dystrophic epidermolysis bullosa. J Clin Anesth. 2006;18:268–71.

Wai C, Raghavendra T. Anaesthetic non-touch technique for butterfly children. Anaesthesia. 2014;69:120.

Colovic A, Jovicic O, Stevanovic R, Ivanovic M. Oral health status in children with inherited dystrophic epidermolysis bullosa. Vojnosanit Pregl. 2017;74:644–51.

Esfahanizade K, Mahdavi AR, Ansari G, Fallahinejad Ghajari M, Esfahanizadeh A. Epidermolysis bullosa, dental and anesthetic management: a case report. Journal of dentistry (Shiraz, Iran). 2014;15:147–52.

Agustin-Panadero R, Gomar-Vercher S, Penarrocha-Oltra D, Guzman-Letelier M, Penarrocha-Diago M. Fixed full-arch implant-supported prostheses in a patient with epidermolysis bullosa: a clinical case history report. Int J Prosthodont. 2015;28:33–6.

Baican A, Chiriac G, Torio-Padron N, Sitaru C. Childhood epidermolysis bullosa acquisita associated with severe dental alterations: A case presentation. J Dermatol. 2013;40:410–1.

DEBRA International. National Disability Insurance Scheme (NDIS) and Epidermolysis Bullosa (EB). https://www.debra.org.au/ndis/ .

Ogonowska A, Zadroga E. Dental needs and health behaviors in patients with epidermolysis bullosa - A survey. Dental and Medical Problems. 2016;53:103–10.

Peters MD, Marnie C, Tricco AC, et al. Updated methodological guidance for the conduct of scoping reviews. JBI evidence synthesis. 2020;18:2119–26.

Pollock D, Tricco AC, Peters MDJ, et al. Methodological quality, guidance, and tools in scoping reviews: a scoping review protocol. JBI Evidence Synthesis. 2022;20:1098–105.

Peters MD, Godfrey C, McInerney P, Baldini Soares C, Khalil H, Parker D. Scoping reviews. Joanna Briggs Institute reviewer’s manual. 2017;2015:1–24.

Tricco AC, Lillie E, Zarin W, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169:467–73.

The EndNote Team. EndNote. EndNote 20 edn. Philadelphia: Clarivate; 2013.

Munn Z, Aromataris E, Tufanaru C, et al. The development of software to support multiple systematic review types: the Joanna Briggs Institute System for the Unified Management, Assessment and Review of Information (JBI SUMARI). JBI Evidence Implementation. 2019;17:36–43.

Album MM, Gaisin A, Lee KW, Buck BE, Sharrar WG, Gill FM. Epidermolysis bullosa dystrophica polydysplastica. A case of anesthetic management in oral surgery. Oral Surg Oral Med Oral Pathol. 1977;43:859–72.

Azrak B, Kaevel K, Hofmann L, Gleissner C, Willershausen B. Dystrophic epidermolysis bullosa: oral findings and problems. Spec Care Dentist. 2006;26:111–5.

Blanchet I, Tardieu C, Casazza E. Oral Care in Kindler Syndrome: 7-Year Follow-up of 2 Brothers. J Clin Pediatr Dent. 2021;45:41–7.

Camm JH, Gray SE, Mayes TC. Combined medical-dental treatment of an epidermolysis bullosa patient. Spec Care Dentist. 1991;11:148–50.

Chuang LC, Hsu CL, Lin SY. A fixed denture for a child with epidermolysis bullosa simplex. Eur J Paediatr Dent. 2015;16:315–8.

Endruschat AJ, Keenen DA. Anesthetic and dental management of a child with epidermolysis bullosa dystrophica. Oral Surg Oral Med Oral Pathol. 1973;36:667–71.

Eswara U. Dystrophic epidermolysis bullosa in a child. Contemp Clin Dent. 2012;3:90–2.

Galeotti A, D’Antò V, Gentile T, et al. Er:YAG Laser Dental Treatment of Patients Affected by Epidermolysis Bullosa. Case Rep Dent. 2014;2014: 421783.

PubMed   PubMed Central   Google Scholar  

Goldschmied F. Orthodontic management of a patient with epidermolysis bullosa. Aust Orthod J. 1999;15:302–7.

Hochberg MS, Vazquez-Santiago IA, Sher M. Epidermolysis bullosa. A case report. Oral Surg Oral Med Oral Pathol. 1993;75:54–7.

Hubbert CH, Adams JG. Anesthetic management of patients with epidermolysis bullosa. South Med J. 1977;70:1375–7.

Kummer TR, Nagano HC, Tavares SS, Santos BZ, Miranda C. Oral manifestations and challenges in dental treatment of epidermolysis bullosa dystrophica. J Dent Child (Chic). 2013;80:97–100.

Lanier PA, Posnick WR, Donly KJ. Epidermolysis bullosa–dental management and anesthetic considerations: case report. Pediatr Dent. 1990;12:246–9.

Lindemeyer R, Wadenya R, Maxwell L. Dental and anaesthetic management of children with dystrophic epidermolysis bullosa. Int J Paediatr Dent. 2009;19:127–34.

Liu HH, Chen CJ, Miles DA. Epidermolysis bullosa simplex: review and report of case. ASDC J Dent Child. 1998;65:349–53.

Marini I, Vecchiet F. Sucralfate: a help during oral management in patients with epidermolysis bullosa. J Periodontol. 2001;72:691–5.

Marshall BE. A comment on epidermolysis bullosa and its anaesthetic management for dental operations. Br J Anaesth. 1963;35:724–7.

Mello BZ, Neto NL, Kobayashi TY, et al. General anesthesia for dental care management of a patient with epidermolysis bullosa: 24-month follow-up. Spec Care Dentist. 2016;36:237–40.

Morgan WC. Dental anesthetic management of epidermolysis bullosa: a new approach. Oral Surg Oral Med Oral Pathol. 1975;40:732–5.

Oliveira TM, Sakai VT, Candido LA, Silva SM, Machado MA. Clinical management for epidermolysis bullosa dystrophica. J Appl Oral Sci. 2008;16:81–5.

Olsen CB, Bourke LF. Recessive dystrophic epidermolysis bullosa. Two case reports with 20-year follow-up. Aust Dent J. 1997;42:1–7.

Pekiner FN, Yücelten D, Ozbayrak S, Sezen EC. Oral-clinical findings and management of epidermolysis bullosa. J Clin Pediatr Dent. 2005;30:59–65.

Silva LC, Cruz RA, Abou-Id LR, Brini LN, Moreira LS. Clinical evaluation of patients with epidermolysis bullosa: review of the literature and case reports. Spec Care Dentist. 2004;24:22–7.

Prabhu VR, Rekka P, Swathi S. Dental and anesthetic management of a child with epidermolysis bullosa. J Indian Soc Pedod Prev Dent. 2011;29:155–60.

Sanjari K, Bayani M, Zadeh HE. Conservative dental management of a patient with Epidermolysis bullosa. A case report Pediatric Dental Journal. 2020;30:245–50.

Scheidt L, Sanabe ME, Diniz MB. Oral Manifestations and Dental Management of Epidermolysis Bullosa Simplex. Int J Clin Pediatr Dent. 2015;8:239–41.

Sipahier M. Epidermolysis bullosa: a case report. Quintessence Int. 1994;25:839–43.

Torres CP, Gomes-Silva JM, Mellara TS, Carvalho LP, Borsatto MC. Dental care management in a child with recessive dystrophic epidermolysis bullosa. Braz Dent J. 2011;22:511–6.

Véliz S, Huber H, Yubero MJ, Fuentes I, Alsayer F, Krämer SM. Early teeth extraction in patients with generalized recessive dystrophic epidermolysis bullosa: A case series. Spec Care Dentist. 2020;40:561–5.

Volovikov O, Velichko E, Razumova S, Said OB. The First Case Report about Noninvasive Impression Taking in Orthodontic Patient with Epidermolysis Bullosa. Journal of International Dental and Medical Research. 2021;14:1587–91.

Wright JT. Epidermolysis bullosa: dental and anesthetic management of two cases. Oral Surg Oral Med Oral Pathol. 1984;57:155–7.

Yoon RK, Ohkawa S. Management of a pediatric patient with epidermolysis bullosa receiving comprehensive dental treatment under general anesthesia. Pediatr Dent. 2012;34:251–3.

Fine J-D, Eady RA, Bauer EA, et al. The classification of inherited epidermolysis bullosa (EB): Report of the Third International Consensus Meeting on Diagnosis and Classification of EB. J Am Acad Dermatol. 2008;58:931–50.

Harris AG, Todes-Taylor NR, Petrović N, Murrell DF. The distribution of epidermolysis bullosa in Australia with a focus on rural and remote areas. Australas J Dermatol. 2017;58:122–5.

Download references

Acknowledgements

The authors would like to acknowledge Vikki Langdon, Faculty Librarian, Faculty of Health and Medical Sciences, The University of Adelaide for contribution to the initial database search strategy.

Conflicts of interest

The authors declare that there are no potential sources of conflict of interest towards this project.

Publication Funded by Adelaide Nursing School, Faculty of Health and Medical Sciences, The University of Adelaide.

Author information

Authors and affiliations.

Adelaide Nursing School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA, Australia

Z. Smith, M. Javanmard & Y. Salamon

Australian Research Centre for Population Oral Health, School of Dentistry, The University of Adelaide, Adelaide, SA, Australia

You can also search for this author in PubMed   Google Scholar

Contributions

ZS conceived and designed the scoping review, search strategy and project administration. ZS, MJ & YS contributed to article screening. SN and ZS contributed to data extraction and analysis and write up of results. ZS contributed to the write up of the manuscript and development of figures with SN, MJ & YS providing editorial review and approval of the final manuscript. All authors read and approved the final manuscript for submission.

Corresponding author

Correspondence to Z. Smith .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Smith, Z., Nath, S., Javanmard, M. et al. The dental needs of children with Epidermolysis Bullosa and service delivery: a scoping review. BMC Oral Health 24 , 1131 (2024). https://doi.org/10.1186/s12903-024-04861-y

Download citation

Received : 25 January 2024

Accepted : 04 September 2024

Published : 27 September 2024

DOI : https://doi.org/10.1186/s12903-024-04861-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Epidermolysis bullosa
  • Specialist dental service
  • Health services

BMC Oral Health

ISSN: 1472-6831

types of the literature review

U.S. flag

An official website of the United States government.

Here’s how you know

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • American Job Centers
  • Apprenticeship
  • Demonstration Grants
  • Farmworkers
  • Federal Bonding Program
  • Foreign Labor Certification
  • Indians and Native Americans
  • Job Seekers
  • Layoffs and Rapid Response
  • National Dislocated Worker Grants
  • Older Workers
  • Skills Training Grants
  • Trade Adjustment Assistance
  • Unemployment Insurance
  • Workforce Innovation and Opportunity Act (WIOA)
  • WIOA Adult Program
  • Advisories and Directives
  • Regulations
  • Labor Surplus Area
  • Performance
  • Recovery-Ready Workplace Resource Hub
  • Research and Evaluation
  • ETA News Releases
  • Regional Offices
  • Freedom of Information Act
  • Office of Apprenticeship
  • Office of Foreign Labor Certification
  • Office of Grants Management
  • Office of Job Corps
  • Office of Unemployment Insurance (1-877-S-2JOBS)

Equity in Grant-Making: A Review of Barriers and Strategies for Funders Considering Improvement Opportunities

Publication info, research methodology, description.

In 2023 the Chief Evaluation Office partnered with the Employment and Training Administration (ETA) to fund a study focused on exploring approaches to measure and increase equity in ETA’s discretionary grant-making programs. This study sought to explore how grant-makers – such as Federal agencies, State and local government agencies, and philanthropic organizations – define, assess, and increase equity in their grant-making process.

This study explores research and strategies related to equity in the discretionary grant-making process based on a review of publicly available literature and Federal agency Equity Action Plans as well as interviews with Federal and philanthropic grant-makers. The report describes how funders define equity in the context of awarding grants, common barriers and promising action steps to increase equity in each stage of the grant-making process (pre-award, collection of applications, funding of awards, and post-award), and measurement strategies to help funders track their progress.

This report can support a variety of grant-makers examining equity, whether at government agencies (including at Federal, State, and local levels) or foundations. Recognizing that grant-making organizations vary in size, policy area, and scope, the study team provides findings and suggestions that funders can tailor to meet their context and goals. The findings focus on domestic (U.S.-based) grant-making, though international or transnational grant-makers may also find useful insights.

Key takeaways include:

  • When selecting strategies to increase equity, grant-makers may invest time and resources to communicate the new approach to potential applicants and build trust, particularly with organizations and groups that provide services to underrepresented communities. For example, reviewed resources encourage funders to expand the networks they use to announce new funding opportunities and participate in community events. These trust-building activities may encourage new organizations to apply for grant programs and create space to provide feedback on challenging or inequitable aspects of the grant-making process. 
  • Study interviewees also emphasized the value of continued internal communications with funding staff to build organizational motivation to implement and refine equity initiatives. Communication efforts include describing goals and progress, holding training sessions to increase awareness of action steps, and sharing tools to streamline implementation and affect change. 
  • By implementing strategies to increase equity in grant-making, funders take a critical step toward addressing systemic inequities in the type of organizations, individuals, and communities that receive grant funding.   

IMAGES

  1. Literature Review: Outline, Strategies, and Examples

    types of the literature review

  2. How To Write A Literature Review

    types of the literature review

  3. Types of literature reviews

    types of the literature review

  4. Literature Review

    types of the literature review

  5. Types of literature review.

    types of the literature review

  6. Literature Review: Structure, Format, & Writing Tips

    types of the literature review

VIDEO

  1. Novel and it's types # Literature

  2. Types of Review of Literature Dr Muhammad Akram Tatri Nothen Sargodha Eastern Medicine GCUF PK

  3. Types of Literature. #english #vocabulary #ielts #shorts

  4. Lecture-23,Litrature review and it’s type/What are the different types of literature review

  5. Introduction to Literature Review, Systematic Review, and Meta-analysis

  6. Types of literature review

COMMENTS

  1. Types of Literature Review

    1. Narrative Literature Review. A narrative literature review, also known as a traditional literature review, involves analyzing and summarizing existing literature without adhering to a structured methodology. It typically provides a descriptive overview of key concepts, theories, and relevant findings of the research topic.

  2. Research Guides: Systematic Reviews: Types of Literature Reviews

    Rapid review. Assessment of what is already known about a policy or practice issue, by using systematic review methods to search and critically appraise existing research. Completeness of searching determined by time constraints. Time-limited formal quality assessment. Typically narrative and tabular.

  3. Literature Review: Types of literature reviews

    Narrative or traditional literature reviews. Critically Appraised Topic (CAT) Scoping reviews. Systematic literature reviews. Annotated bibliographies. These are not the only types of reviews of literature that can be conducted. Often the term "review" and "literature" can be confusing and used in the wrong context.

  4. Types of Literature Review

    The choice of a specific type depends on your research approach and design. The following types of literature review are the most popular in business studies: Narrative literature review, also referred to as traditional literature review, critiques literature and summarizes the body of a literature. Narrative review also draws conclusions about ...

  5. How to Write a Literature Review

    Examples of literature reviews. Step 1 - Search for relevant literature. Step 2 - Evaluate and select sources. Step 3 - Identify themes, debates, and gaps. Step 4 - Outline your literature review's structure. Step 5 - Write your literature review.

  6. Chapter 9 Methods for Literature Reviews

    9.3. Types of Review Articles and Brief Illustrations. EHealth researchers have at their disposal a number of approaches and methods for making sense out of existing literature, all with the purpose of casting current research findings into historical contexts or explaining contradictions that might exist among a set of primary research studies conducted on a particular topic.

  7. Literature Review: The What, Why and How-to Guide

    What kinds of literature reviews are written? Narrative review: The purpose of this type of review is to describe the current state of the research on a specific topic/research and to offer a critical analysis of the literature reviewed. Studies are grouped by research/theoretical categories, and themes and trends, strengths and weakness, and gaps are identified.

  8. What is a Literature Review?

    A literature review may itself be a scholarly publication and provide an analysis of what has been written on a particular topic without contributing original research. These types of literature reviews can serve to help keep people updated on a field as well as helping scholars choose a research topic to fill gaps in the knowledge on that topic.

  9. Types of reviews

    Types of reviews and examples. Definition: "A term used to describe a conventional overview of the literature, particularly when contrasted with a systematic review (Booth et al., 2012, p. 265). Characteristics: Example: Mitchell, L. E., & Zajchowski, C. A. (2022). The history of air quality in Utah: A narrative review.

  10. Literature Review Types, Taxonomies

    Mapping Review (Systematic Map) - Map out and categorize existing literature from which to commission further reviews and/or primary research by identifying gaps in research literature. Meta-Analysis - Technique that statistically combines the results of quantitative studies to provide a more precise effect of the results.

  11. Literature Review

    Types of Literature Review are as follows: Narrative literature review: This type of review involves a comprehensive summary and critical analysis of the available literature on a particular topic or research question. It is often used as an introductory section of a research paper. Systematic literature review: This is a rigorous and ...

  12. Methodological Approaches to Literature Review

    This chapter discusses the methodological approaches to conducting a literature review and offers an overview of different types of reviews. There are various types of reviews, including narrative reviews, scoping reviews, and systematic reviews with reporting strategies such as meta-analysis and meta-synthesis.

  13. Research Guides: Literature Reviews: Choosing a Type of Review

    LITERATURE REVIEW. Often used as a generic term to describe any type of review. More precise definition: Published materials that provide an examination of published literature. Can cover wide range of subjects at various levels of comprehensiveness. Identifies gaps in research, explains importance of topic, hypothesizes future work, etc.

  14. Steps in Conducting a Literature Review

    A literature review is an integrated analysis-- not just a summary-- of scholarly writings and other relevant evidence related directly to your research question. That is, it represents a synthesis of the evidence that provides background information on your topic and shows a association between the evidence and your research question.

  15. How to Conduct a Literature Review: Types of Literature Reviews

    Integrative Review Considered a form of research that reviews, critiques, and synthesizes representative literature on a topic in an integrated way such that new frameworks and perspectives on the topic are generated. The body of literature includes all studies that address related or identical hypotheses.

  16. Types of Literature Reviews

    A literature review summarizes and synthesizes material on a research topic. It provides a summary of previous research and provides context for the material presented in your thesis. The literature review is your opportunity to show what you understand about your topic area, and distinguish previous research from the work you are doing.

  17. Ten Simple Rules for Writing a Literature Review

    When searching the literature for pertinent papers and reviews, the usual rules apply: be thorough, use different keywords and database sources (e.g., DBLP, Google Scholar, ISI Proceedings, JSTOR Search, Medline, Scopus, Web of Science), and. look at who has cited past relevant papers and book chapters.

  18. Types of Literature Reviews

    Listed below are definitions of types of literature reviews: Argumentative Review. This form examines literature selectively in order to support or refute an argument, deeply embedded assumption, or philosophical problem already established in the literature. The purpose is to develop a body of literature that establishes a contrarian viewpoint.

  19. Types of Reviews and Their Differences

    There are many types of literature reviews. The purposes of a literature review will vary, and the sources used in one will depend on the discipline and the review's topic. Literature reviews may have differences that include: Purpose: The reason or objective of the review. One review may be to see how much has been published on a topic (a ...

  20. Systematic Reviews: Methods & Resources

    Gold-standard guideline on how to perform and write-up a systematic review and/or meta-analysis of the outcomes reported in multiple clinical trials of therapeutic interventions. AHRQ's Methods Guide for Effectiveness and Comparative Effectiveness Reviews. Synthesis without meta-analysis (SWiM) in systematic reviews.

  21. Types of Reviews

    There are many types of reviews --- narrative reviews, scoping reviews, systematic reviews, integrative reviews, umbrella reviews, rapid reviews and others --- and it's not always straightforward to choose which type of review to conduct.These Review Navigator tools (see below) ask a series of questions to guide you through the various kinds of reviews and to help you determine the best choice ...

  22. 14 Types Of Literature Review

    4 Major Types Of Literature Review. The four major types include, Narrative Review, Systematic Review, Meta-Analysis, and Scoping Review. These are known as the major ones because they're like the "go-to" methods for researchers in academic and research circles. Think of them as the classic tools in the researcher's toolbox.

  23. Literature review as a research methodology: An overview and guidelines

    This is why the literature review as a research method is more relevant than ever. Traditional literature reviews often lack thoroughness and rigor and are conducted ad hoc, rather than following a specific methodology. Therefore, questions can be raised about the quality and trustworthiness of these types of reviews.

  24. LibGuides: SOC 200

    A review of the literature must be differentiated from a HISTORICAL ARTICLE on the same subject, but a review of historical literature is also within the scope of this publication type. * Lit reviews aren't always obviously labeled "literature review"; they may be embedded within sections such as the introduction or background.

  25. Decision threshold models in medical decision making: a scoping

    Decision thresholds play important role in medical decision-making. Individual decision-making differences may be attributable to differences in subjective judgments or cognitive processes that are captured through the decision thresholds. This systematic scoping review sought to characterize the literature on non-expected utility decision thresholds in medical decision-making by identifying ...

  26. A framework for human evaluation of large language models in ...

    The literature review revealed a diverse range of medical specialties leveraging LLMs, with Radiology the leading specialty. Urology and General Surgery also emerged as prominent specialties ...

  27. Non-invasive approaches to hydration assessment: a literature review

    These methods use various types of sensors to capture a range of bio-signals, followed by machine learning-based classification or regression methods, providing real-time feedback on hydration status, which is crucial for effective management and prevention of urinary stones. ... In this literature review, we will explore these novel approaches ...

  28. The dental needs of children with Epidermolysis Bullosa and service

    Epidermolysis Bullosa (EB) is a genetic condition with fragility of the skin and oral mucosal lining requiring appropriate care and management by dental health professionals. The objective of this scoping review was to comprehensively examine the specialised dental needs of children with Epidermolysis Bullosa and map evidence towards the type, availability, and accessibility of specialised ...

  29. Equity in Grant-Making: A Review of Barriers and Strategies for Funders

    In 2023 the Chief Evaluation Office partnered with the Employment and Training Administration (ETA) to fund a study focused on exploring approaches to measure and increase equity in ETA's discretionary grant-making programs. This study sought to explore how grant-makers - such as Federal agencies, State and local government agencies, and philanthropic organizations - define, assess, and ...