What Is the Evidence Pyramid for Systematic Reviews? (Beginner Guide)

May 11, 20266 min readByGeorge BurchellView publications on PubMedORCID
What Is the Evidence Pyramid for Systematic Reviews? (Beginner Guide)

TL;DR

The evidence pyramid ranks study designs by their typical risk of bias. It is useful for planning and interpretation, but it is not a quality score for individual studies.

Key rule:

  • level of evidence = starting design strength
  • quality/risk of bias = how well a specific study was actually done

What the evidence pyramid is

The evidence pyramid is a hierarchy of study designs. For intervention questions, it usually places:

  • systematic reviews/meta-analyses and RCTs near the top
  • observational studies in middle layers
  • case reports and expert opinion near the bottom

It helps you anticipate how strong causal claims can be.

One sentence for your protocol

Use this wording in plain language: “Design hierarchy tells us where causal inference usually starts; risk-of-bias and certainty methods tell us where we end up after reading the actual studies.”


Why beginners find it useful

It helps with four early decisions:

  1. which designs to prioritize in inclusion criteria
  2. what to expect from the literature landscape
  3. how to frame confidence in conclusions
  4. how to structure evidence tables for synthesis

If you are new to full workflow design, pair this with What Are the 7 Steps of a Systematic Review?.


Not all pyramids are the same diagram

You may see slightly different “layers” depending on the source (teaching diagrams vs formal levels-of-evidence frameworks). Common ideas still apply:

  • Teaching pyramids simplify messaging: RCTs and systematic reviews float upward; mechanistic and narrative evidence sink downward.
  • Oxford CEBM 2009 levels (and later updates) formalize levels of evidence for questions beyond treatment effects—diagnosis, prognosis, harm, and economic evidence each shift which designs sit where.

For a single intervention review, the practical question is narrower: which eligible designs can support the causal claim your review question implies? Everything else is commentary unless your scope explicitly widens.


When the pyramid shape changes (same idea, different ordering)

Review question type Why the default “RCT on top” picture can mislead
Treatment efficacy / harm RCTs and high-quality overviews often lead, but harms and long-term outcomes still need careful appraisal.
Diagnostic accuracy Cross-sectional or cohort designs comparing test to reference standard may dominate; “RCT at top” is the wrong mental model.
Prognosis Longitudinal cohorts may be the best available evidence even when no trial exists.
Qualitative evidence Synthesis of qualitative studies answers different questions; design “levels” from trials do not replace methodological rigor frameworks for qualitative work.

Takeaway: the pyramid is a map of typical starting strength, not a universal ladder stamped on every clinical question.


Where people misuse it

Mistake 1: assuming top-level design means high quality

A weakly run RCT can be less trustworthy than a strong cohort study.

Mistake 2: dismissing lower-level evidence completely

For rare conditions or early technologies, lower-level evidence may be the only available evidence.

Mistake 3: using hierarchy as the final verdict

Design hierarchy should be combined with risk-of-bias assessment and certainty reasoning.

Mistake 4: counting studies instead of information

Ten small, biased trials do not automatically outrank one well-conducted study that directly matches your PICO and outcomes.


Pyramid vs quality assessment

Think of it like this:

  • pyramid = map of where evidence starts
  • risk-of-bias tool = check of how evidence was produced
  • certainty approach (for example GRADE) = integrated confidence judgment

For tool selection, use How to Choose the Right Quality Assessment Tool.


Glossary (quick reference)

Term Plain-language meaning
Study design Planned structure of the study (trial, cohort, case-control, etc.).
Risk of bias Systematic ways the study could deviate from the truth (selection, performance, detection, attrition, reporting, confounding).
Precision How tight estimates are; often driven by sample size and event counts.
Directness How closely outcomes, populations, interventions, and comparators match what you care about in practice.
Consistency Whether separate studies point in the same direction.

How to use the pyramid in real reviews

During protocol design

Specify which designs are eligible and why.

During screening

Tag study design early to understand evidence mix.

During extraction

Capture design and key quality fields in the same table.

During synthesis

Interpret effects in light of design strength and bias risk, not just p-values.

Linking hierarchy to evidence tables

In practice, teams that separate design label from appraisal result avoid the most common pyramid mistakes:

  • One column (or structured field) for design type (RCT, non-randomized comparative study, cohort, etc.).
  • Separate fields for risk-of-bias domain judgments and an overall appraisal.
  • Outcome-level notes where indirectness or imprecision matters for GRADE-style reasoning.

That structure mirrors how Evidence Table Builder is intended to be used: keep extraction structured enough that downstream quality and certainty judgments are traceable in the same row as the effect estimate or narrative finding.


Quick practical checklist

  • Have we defined eligible design types in protocol?
  • Do we track design type as a structured field?
  • Do we assess risk of bias separately from design level?
  • Are conclusions aligned with certainty, not only study count?
  • Have we stated which “pyramid” logic applies to our question type (treatment vs diagnosis vs prognosis)?

If any answer is no, your hierarchy logic is probably incomplete.


Frequently asked questions

Does “higher on the pyramid” always mean “include it, lower means exclude it”?
No. Eligibility is a protocol decision. Some reviews intentionally include lower-level evidence when trials are sparse, with explicit rules for interpretation.

Should we use the pyramid instead of RoB 2 or ROBINS-I?
No. The pyramid is not a substitute for domain-based risk-of-bias tools. At best it informs which tool family you are likely to need.

Do systematic reviews always sit “above” RCTs?
Not automatically. A systematic review is only as trustworthy as its included studies, methods, and risk-of-bias assessments across studies. A bad overview can mislead faster than a single good trial.

Where does GRADE fit?
GRADE (or similar) integrates risk of bias, inconsistency, indirectness, imprecision, and publication bias for quantitative estimates—exactly the dimensions the pyramid does not encode by itself.


Final thought

The evidence pyramid is a useful starting framework, not a decision machine. Use it to orient your review, then combine it with structured extraction and quality appraisal to make defensible conclusions.

Related reading

Tags:

evidence pyramidevidence hierarchysystematic reviewsbeginner methodsevidence tables
George Burchell

About the Author

Connect on LinkedIn

George Burchell

George Burchell is a specialist in systematic literature reviews and scientific evidence synthesis with significant expertise in integrating advanced AI technologies and automation tools into the research process. With over four years of consulting and practical experience, he has developed and led multiple projects focused on accelerating and refining the workflow for systematic reviews within medical and scientific research.

Systematic ReviewsEvidence SynthesisAI Research ToolsResearch Automation