Biomedical Evidence Synthesis Agent
Develop a state-of-the-art tool-calling agent designed for reproducible biomedical evidence synthesis. Using the OpenAI Agents SDK, you will orchestrate a multi-turn conversation agent that can autonomously navigate the NCBI E-utilities API. The agent must successfully map user queries (e.g., 'What are the protein products of human genes associated with Type 2 Diabetes?') into a sequence of API calls involving ESearch for record IDs, ELink for cross-database mapping (Gene to Protein/PubMed), and EFetch for data retrieval. To enhance the synthesis, you will integrate Hugging Face Transformers to perform Named Entity Recognition (NER) on retrieved abstracts, ensuring that the evidence synthesized is grounded in specific biological entities. The final system should produce a structured JSON report including provenance (PMIDs, Gene IDs) and a confidence score based on the consistency of the data found across different NCBI databases.
What you are building
The core problem, expected build, and operating context for this challenge.
Develop a state-of-the-art tool-calling agent designed for reproducible biomedical evidence synthesis. Using the OpenAI Agents SDK, you will orchestrate a multi-turn conversation agent that can autonomously navigate the NCBI E-utilities API. The agent must successfully map user queries (e.g., 'What are the protein products of human genes associated with Type 2 Diabetes?') into a sequence of API calls involving ESearch for record IDs, ELink for cross-database mapping (Gene to Protein/PubMed), and EFetch for data retrieval. To enhance the synthesis, you will integrate Hugging Face Transformers to perform Named Entity Recognition (NER) on retrieved abstracts, ensuring that the evidence synthesized is grounded in specific biological entities. The final system should produce a structured JSON report including provenance (PMIDs, Gene IDs) and a confidence score based on the consistency of the data found across different NCBI databases.
Shared data for this challenge
Review public datasets and any private uploads tied to your build.
How submissions are scored
These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.
ID Integrity
Checks if the retrieved Gene ID correctly maps to the input name via NCBI.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
Citation Recall
Percentage of relevant PMIDs retrieved compared to the gold standard. • target: 0.8 • range: 0-1
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
What you should walk away with
Master the OpenAI Agents SDK orchestration patterns, specifically handling 'tool_outputs' for asynchronous API responses
Implement the E-utilities ESearch-ELink-EFetch pipeline to programmatically bridge genomics and literature data
Utilize Hugging Face 'transformers' pipelines (e.g., dslim/bert-base-NER) to extract and normalize Gene and Disease entities from PubMed abstracts
Design a validation layer that compares E-utility metadata with NER outputs to detect inconsistencies in the evidence
Build a structured JSON schema for 'Evidence Objects' that includes timestamped API logs for maximum reproducibility
Optimize agent prompts to handle NCBI's rate limits and API key requirements using robust retry logic
Orchestrate a 'Summary Agent' that uses the retrieved evidence to generate a final biological conclusion with citations
[ok] Wrote CHALLENGE.md
[ok] Wrote .versalist.json
[ok] Wrote eval/examples.json
Requires VERSALIST_API_KEY. Works with any MCP-aware editor.
DocsAI Research & Mentorship
Participation status
You haven't started this challenge yet
Operating window
Key dates and the organization behind this challenge.
Find another challenge
Jump to a random challenge when you want a fresh benchmark or a different problem space.