Robotics & Biotech Research Navigator Agent
Inspired by the advancements in robotics foundation models and the push for AI in traditional Chinese medicine, this challenge focuses on building a sophisticated multi-agent research system. Your task is to design and implement an autonomous research navigator that can ingest vast amounts of scientific literature (e.g., papers on robotics, biotechnology, or drug discovery), identify key trends, synthesize novel insights, and generate structured summaries or reports. Utilizing LangGraph, you will orchestrate a team of specialized agents, each with a distinct role—such as a 'Researcher' for information gathering, an 'Analyst' for data interpretation, and a 'Synthesizer' for report generation. The system should manage complex, stateful workflows, allowing agents to collaborate, iterate on findings, and dynamically adapt their research path based on intermediate results. This challenge emphasizes robust information retrieval, advanced reasoning, and structured output generation for scientific applications.
What you are building
The core problem, expected build, and operating context for this challenge.
Inspired by the advancements in robotics foundation models and the push for AI in traditional Chinese medicine, this challenge focuses on building a sophisticated multi-agent research system. Your task is to design and implement an autonomous research navigator that can ingest vast amounts of scientific literature (e.g., papers on robotics, biotechnology, or drug discovery), identify key trends, synthesize novel insights, and generate structured summaries or reports. Utilizing LangGraph, you will orchestrate a team of specialized agents, each with a distinct role—such as a 'Researcher' for information gathering, an 'Analyst' for data interpretation, and a 'Synthesizer' for report generation. The system should manage complex, stateful workflows, allowing agents to collaborate, iterate on findings, and dynamically adapt their research path based on intermediate results. This challenge emphasizes robust information retrieval, advanced reasoning, and structured output generation for scientific applications.
Shared data for this challenge
Review public datasets and any private uploads tied to your build.
How submissions are scored
These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.
ReportCompleteness
All required sections in the output JSON are present and non-empty.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
KeyInformationPresence
Specific critical information (e.g., 'key_challenges' or 'ai_techniques_used') is identified and listed.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
FactualAccuracy
Percentage of extracted facts or statements that are verifiably true (0-1). • target: 0.9 • range: 0-1
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
InsightfulnessScore
Subjective score for the depth and novelty of the insights generated (1-5). • target: 4 • range: 1-5
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
ToolUsageEfficiency
Number of relevant tool calls per query compared to a baseline (ratio). • target: 1 • range: 0.5-2
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
What you should walk away with
Master LangGraph for defining stateful, cyclic agentic workflows, including nodes, edges, and state management for complex research tasks
Implement distinct agent roles (e.g., 'Researcher', 'Analyst', 'Synthesizer') within LangGraph, assigning specialized tools and prompts to each for optimal performance with Claude Opus 4.1
Integrate `SerpAPI` for real-time, comprehensive web search and `ChromaDB` for indexing and retrieving domain-specific documents (e.g., scientific papers) within agent workflows
Design sophisticated prompt engineering strategies for each agent role, focusing on information extraction, critical analysis, and structured synthesis outputs from Claude Opus 4.1
Develop mechanisms for inter-agent communication and dynamic routing within the LangGraph framework, allowing agents to pass structured information and trigger subsequent actions based on intermediate findings
Deploy agent components and their dependencies using `Docker` for consistent environments and scalable execution of research workflows
Utilize `Arize AI` for observability and evaluation of agent performance, tracking metrics like information recall, summarization quality, and reasoning accuracy across research iterations
Implement a mechanism for the 'Synthesizer' agent to generate structured research reports, potentially using Pydantic models for output validation, based on the collective findings of the agent team
[ok] Wrote CHALLENGE.md
[ok] Wrote .versalist.json
[ok] Wrote eval/examples.json
Requires VERSALIST_API_KEY. Works with any MCP-aware editor.
DocsAI Research & Mentorship
Participation status
You haven't started this challenge yet
Operating window
Key dates and the organization behind this challenge.
Find another challenge
Jump to a random challenge when you want a fresh benchmark or a different problem space.