PICO Prediction: How good is off-the-shelf AI?

The first EU Joint Clinical Assessment for an oncology drug is due in weeks. Time to ask: What would an off-the-shelf AI have predicted? We ran the experiment.

The setup

We gave both Claude and Gemini four inputs:

Product name: tovorafenib (Ojemda)
Trials: The FIREFLY-1 trial (NCT04775485, Kilburn et al., Nature Medicine 2023)
Label: FDA label information
The February 2026 CHMP positive opinion

No curated databases. No expert annotation. No proprietary data. Simple prompts (see section below).

The task: predict the consolidated PICO scope and the (confidential) results of the survey (requested PICOs) across all 27 EU member states.

Results

Both models converged on the same core structure: a fusion population vs chemotherapy PICO that every member state would request, and a V600E population vs dabrafenib + trametinib (a first-line therapy?).

Claude's predicted PICOs for tovorafenib. Note the V600 population split across two comparators and the off-label MEKi PICO driven by reference-centre practice in Germany, France and the Netherlands. — Claude’s predicted PICOs for tovorafenib. Note the V600 population split across two comparators and the off-label MEKi PICO driven by reference-centre practice in Germany, France and the Netherlands.

Where they diverged: Gemini identified an NF1 subpopulation as a separate PICO, requesting selumetinib as comparator in markets with active NF1 reimbursement infrastructure (is that realistic?). Claude split the V600 population across two comparators and added an off-label MEKi PICO driven by academic reference centre practice in Germany, France and the Netherlands.

Gemini's predicted PICOs for tovorafenib. Note the NF1 subpopulation as a separate PICO with selumetinib as comparator in markets with active NF1 reimbursement infrastructure. — Gemini’s predicted PICOs for tovorafenib. Note the NF1 subpopulation as a separate PICO with selumetinib as comparator in markets with active NF1 reimbursement infrastructure.

On evidence sources: both models identified that no direct head-to-head trial evidence exists. Every predicted PICO relies on indirect or historical comparison. The models identified plausible sources (TADPOLE, SIOPE-LGG 2004, Bouffet 2012 vinblastine series). Did they weight them correctly?

Is this how one would do it in practice? Clearly NOT!

A properly designed AI-supported PICO prediction system would look quite different:

A clearly defined, high-quality evidence base; structured, curated, and expert-annotated, not an opaque general-purpose training corpus plus uncontrolled agentic data pulls
A system that steps through the reasoning transparently and allows experts to intervene at each stage
A system that explicitly flags uncertainty and stops when the evidence base is insufficient to support a prediction
Expert-judgement in the driver seat, AI as the engine

What we ran here is none of those things. So it’s not surprising that the results are questionable. But let’s see how far an off-the-shelf approach gets us, and where and how it fails. We’ll know soon.

Prompts

The prompt consists of two steps: first, predict the results of the per-country PICO survey.

Step 1: Predict the pre-country PICO survey

You are a health technology assessment expert specialising in EU market access
and the Joint Clinical Assessment (JCA) process under Regulation (EU) 2021/2282.

TASK: Analyse the treatment landscape for the following product and indication
across EU member states, focusing on what comparator therapies are likely to be requested in a JCA PICO scoping process.

PRODUCT: Tovorafenib (Ojemda)
INDICATION:
Ojemda is indicated as monotherapy for the treatment of patients 6 months of age and older with paediatric low‑grade glioma (LGG) harbouring a BRAF fusion or rearrangement, or BRAF V600 mutation, who have progressed after one or more prior systemic therapies.

Analyse the following for each of these markets: Germany, France, Italy, Spain,
Netherlands, Sweden, Denmark, Poland, and a "smaller EU markets" cluster
(Baltics, CEE, Cyprus, Malta).
For each market, identify:
1. The current standard of care for this indication (first line, second line,
and subsequent lines as relevant)
2. Key clinical guidelines that define treatment sequencing
3. Any recently approved targeted therapies or immuno-oncology agents that
could serve as comparators
4. Whether the national HTA body has assessed any related product in this
indication, and if so, what comparator was used
5. Any distinctive national treatment traditions that differ from the
European consensus

After the table, summarise: which comparators will be universally requested,
which are country-specific, and where you have low confidence in your prediction.
Flag any areas where your training data may be outdated and where
verification against current guidelines would change the prediction.

Step 2: Consolidate PICOs

Based on the treatment landscape analysis above, now predict the consolidated
PICO scope that the HTACG would produce for a Joint Clinical Assessment.

CONTEXT ON THE JCA PICO PROCESS:
- All 27 EU member states can submit PICO requests via a survey
- The assessor and co-assessor consolidate these into the minimum number of
PICOs that covers all member states' needs
- Each unique combination of Population + Comparator typically generates a
separate PICO
- The intervention is fixed (the product under assessment)
- Outcomes are usually consolidated across PICOs unless a specific member
state requires a distinctive endpoint
- PICO scoping is driven by national policy questions and standards of care,
NOT by the available trial evidence

ASSESSOR: National Centre for Pharmacoeconomics, Ireland
CO-ASSESSOR: Institute for Quality and Efficiency in Health Care, Germany

Produce a table with the following columns:
| PICO # | Population | Intervention | Comparator(s) | Key outcomes | Requesting countries (predicted) | Likelihood (high/medium/low) | Rationale | Source |
Rules:
- Population: Define by disease stage, biomarker status, line of therapy,
and age group as relevant. Split into separate PICOs where member states
are likely to request distinct subpopulations.
- Intervention: Fixed — state the product, dose, and route per SmPC.
- Comparator: One comparator (or specified combination) per PICO. If you
predict a "blended/individualised treatment" comparator, flag this
explicitly as methodologically contentious.
- Key outcomes: List the primary and secondary outcomes likely to be
requested. Note where specific countries may request additional endpoints
(e.g., QoL for countries using cost-effectiveness analysis).
- Requesting countries: List the member states most likely to request this
specific PICO. Use country names, not codes.
- Likelihood: Your confidence that this PICO will appear in the consolidated
scope. High = near certain, Medium = plausible, Low = possible but
uncertain.
- Rationale: One sentence explaining why this PICO is predicted.
- Source: What evidence supports this prediction (guidelines, prior HTA
decisions, clinical practice patterns). Be specific.
After the table, provide:
1. TOTAL predicted PICOs: [number]
2. CONFIDENCE ASSESSMENT: Where is this prediction most likely to be wrong
and why?
3. KEY UNCERTAINTIES: What information would most improve this prediction?
Do not hedge excessively. Make concrete predictions and state your confidence
level. A wrong but specific prediction is more useful than a vague one.