Challenge

Real-Time FDA Clinical Data Monitor using AutoGen and Gemini

Inspired by the FDA pilot program for real-time clinical data feeds, this challenge tasks you with building a multi-agent system to monitor and analyze pharmaceutical trial data. You will use AutoGen to orchestrate a team of agents that process high-frequency clinical updates, identify safety signals, and report anomalies. The system must utilize Gemini 3.1 Flash Lite for rapid data extraction and Sweep AI to automatically manage code patches for the data ingestion pipelines. To ensure model reliability, you will integrate deepchecks for continuous evaluation of the agent outputs. Finally, you will design a low-code dashboard in Bubble to visualize these real-time feeds and use Cartesia for voice-enabled AI alerts when critical safety thresholds are met.

Business OperationsHosted by Vera
Status
Always open
Difficulty
Advanced
Points
500
Challenge brief

What you are building

The core problem, expected build, and operating context for this challenge.

Inspired by the FDA pilot program for real-time clinical data feeds, this challenge tasks you with building a multi-agent system to monitor and analyze pharmaceutical trial data. You will use AutoGen to orchestrate a team of agents that process high-frequency clinical updates, identify safety signals, and report anomalies. The system must utilize Gemini 3.1 Flash Lite for rapid data extraction and Sweep AI to automatically manage code patches for the data ingestion pipelines. To ensure model reliability, you will integrate deepchecks for continuous evaluation of the agent outputs. Finally, you will design a low-code dashboard in Bubble to visualize these real-time feeds and use Cartesia for voice-enabled AI alerts when critical safety thresholds are met.

Datasets

Shared data for this challenge

Review public datasets and any private uploads tied to your build.

Loading datasets...
Evaluation rubric

How submissions are scored

These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.

Max Score: 2
Dimensions
2 scoring checks
Binary
2 pass or fail dimensions
Ordinal
0 scaled dimensions
Dimension 1latency_test

Latency Test

System must process a data batch in under 2 seconds

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 2signal_accuracy

Signal Accuracy

Percentage of correctly identified clinical anomalies • target: 0.95 • range: 0-1

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Learning goals

What you should walk away with

  • Master AutoGen for building stateful multi-agent conversations with specific roles for data scientists and safety monitors

  • Orchestrate Gemini 3.1 Flash Lite for low-latency extraction of clinical entities from unstructured real-time feeds

  • Integrate Sweep AI to autonomously handle GitHub issues and pull requests for clinical data parsers

  • Implement Cartesia voice synthesis for low-latency critical alerts in the monitoring interface

  • Build a real-time data visualization bridge between Python-based AutoGen and Bubble via API Connector

  • Deploy deepchecks to perform validation of LLM outputs against ground-truth clinical safety protocols

Start from your terminal
$npx -y @versalist/cli start real-time-fda-clinical-data-monitor-using-autogen-and-gemini

[ok] Wrote CHALLENGE.md

[ok] Wrote .versalist.json

[ok] Wrote eval/examples.json

Requires VERSALIST_API_KEY. Works with any MCP-aware editor.

Docs
Manage API keys
Host and timing
Vera

AI Research & Mentorship

Starts Available now
Evergreen challenge
Your progress

Participation status

You haven't started this challenge yet

Timeline and host

Operating window

Key dates and the organization behind this challenge.

Start date
Available now
Run mode
Evergreen challenge
Explore

Find another challenge

Jump to a random challenge when you want a fresh benchmark or a different problem space.

Useful when you want to pressure-test your workflow on a new dataset, new constraints, or a new evaluation rubric.

Tool Space Recipe

Draft
Action Space
Google GeminiGoogle's multimodal AI model
required
AutoGenAgent Systems · Multi-Agent Systems
deepchecksML testing and monitoring
Evaluation
Rubric: 2 dimensions
·Latency Test(1%)
·Signal Accuracy(1%)
Gold items: 1 (1 public)

Frequently Asked Questions about Real-Time FDA Clinical Data Monitor using AutoGen and Gemini