Agent Building
Advanced
Always open

Robotics & Biotech Research Navigator Agent

Inspired by the advancements in robotics foundation models and the push for AI in traditional Chinese medicine, this challenge focuses on building a sophisticated multi-agent research system. Your task is to design and implement an autonomous research navigator that can ingest vast amounts of scientific literature (e.g., papers on robotics, biotechnology, or drug discovery), identify key trends, synthesize novel insights, and generate structured summaries or reports. Utilizing LangGraph, you will orchestrate a team of specialized agents, each with a distinct role—such as a 'Researcher' for information gathering, an 'Analyst' for data interpretation, and a 'Synthesizer' for report generation. The system should manage complex, stateful workflows, allowing agents to collaborate, iterate on findings, and dynamically adapt their research path based on intermediate results. This challenge emphasizes robust information retrieval, advanced reasoning, and structured output generation for scientific applications.

Challenge brief

What you are building

The core problem, expected build, and operating context for this challenge.

Inspired by the advancements in robotics foundation models and the push for AI in traditional Chinese medicine, this challenge focuses on building a sophisticated multi-agent research system. Your task is to design and implement an autonomous research navigator that can ingest vast amounts of scientific literature (e.g., papers on robotics, biotechnology, or drug discovery), identify key trends, synthesize novel insights, and generate structured summaries or reports. Utilizing LangGraph, you will orchestrate a team of specialized agents, each with a distinct role—such as a 'Researcher' for information gathering, an 'Analyst' for data interpretation, and a 'Synthesizer' for report generation. The system should manage complex, stateful workflows, allowing agents to collaborate, iterate on findings, and dynamically adapt their research path based on intermediate results. This challenge emphasizes robust information retrieval, advanced reasoning, and structured output generation for scientific applications.

Datasets

Shared data for this challenge

Review public datasets and any private uploads tied to your build.

Loading datasets...
Evaluation rubric

How submissions are scored

These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.

Max Score: 5
Dimensions
5 scoring checks
Binary
5 pass or fail dimensions
Ordinal
0 scaled dimensions
Dimension 1reportcompleteness

ReportCompleteness

All required sections in the output JSON are present and non-empty.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 2keyinformationpresence

KeyInformationPresence

Specific critical information (e.g., 'key_challenges' or 'ai_techniques_used') is identified and listed.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 3factualaccuracy

FactualAccuracy

Percentage of extracted facts or statements that are verifiably true (0-1). • target: 0.9 • range: 0-1

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 4insightfulnessscore

InsightfulnessScore

Subjective score for the depth and novelty of the insights generated (1-5). • target: 4 • range: 1-5

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 5toolusageefficiency

ToolUsageEfficiency

Number of relevant tool calls per query compared to a baseline (ratio). • target: 1 • range: 0.5-2

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Learning goals

What you should walk away with

Master LangGraph for defining stateful, cyclic agentic workflows, including nodes, edges, and state management for complex research tasks

Implement distinct agent roles (e.g., 'Researcher', 'Analyst', 'Synthesizer') within LangGraph, assigning specialized tools and prompts to each for optimal performance with Claude Opus 4.1

Integrate `SerpAPI` for real-time, comprehensive web search and `ChromaDB` for indexing and retrieving domain-specific documents (e.g., scientific papers) within agent workflows

Design sophisticated prompt engineering strategies for each agent role, focusing on information extraction, critical analysis, and structured synthesis outputs from Claude Opus 4.1

Develop mechanisms for inter-agent communication and dynamic routing within the LangGraph framework, allowing agents to pass structured information and trigger subsequent actions based on intermediate findings

Deploy agent components and their dependencies using `Docker` for consistent environments and scalable execution of research workflows

Utilize `Arize AI` for observability and evaluation of agent performance, tracking metrics like information recall, summarization quality, and reasoning accuracy across research iterations

Implement a mechanism for the 'Synthesizer' agent to generate structured research reports, potentially using Pydantic models for output validation, based on the collective findings of the agent team

Start from your terminal
$npx -y @versalist/cli start robotics-biotech-research-navigator-agent

[ok] Wrote CHALLENGE.md

[ok] Wrote .versalist.json

[ok] Wrote eval/examples.json

Requires VERSALIST_API_KEY. Works with any MCP-aware editor.

Docs
Manage API keys
Challenge at a glance
Host and timing
Vera

AI Research & Mentorship

Starts Available now
Evergreen challenge
Your progress

Participation status

You haven't started this challenge yet

Timeline and host

Operating window

Key dates and the organization behind this challenge.

Start date
Available now
Run mode
Evergreen challenge
Explore

Find another challenge

Jump to a random challenge when you want a fresh benchmark or a different problem space.

Useful when you want to pressure-test your workflow on a new dataset, new constraints, or a new evaluation rubric.

Tool Space Recipe

Draft
Evaluation
Rubric: 5 dimensions
·ReportCompleteness(1%)
·KeyInformationPresence(1%)
·FactualAccuracy(1%)
·InsightfulnessScore(1%)
·ToolUsageEfficiency(1%)
Gold items: 2 (2 public)

Frequently Asked Questions about Robotics & Biotech Research Navigator Agent