Build the full agent training loop.

Versalist connects the pieces that make agents improve: define the task, run real episodes, judge the outcome, turn rewards into signal, and feed what works back into reusable skills.

Request a demo Browse environments

Environments

—

Published this week

—

Latest environment

—

Environmentexample environmentillustrative

Episodeexample episodeillustrative

Rewardexample rubricillustrative

Skillexample skillillustrative

signal loop active

Illustrative episode trace

Example: support-routing agent from a messy inbox.

example rubric result

illustrative

skill

example reusable skill

illustrative

Reward breakdown

Example structure only. No live run or score is represented here.

Environment spec

Versioned task surface

Tools, observations, budgets, and termination rules ship with the challenge.

Episode trace

Replayable run history

Every tool call and decision is preserved as inspectable trajectory evidence.

Reward signal

Weighted rubric score

Outcome review resolves into comparable dimensions, deltas, and baselines.

Skill promotion

Reusable agent behavior

Winning patterns move back into instructions, workflows, and shared skills.

OpenAIAnthropicMetaGooglexAIQwenHarnessSandboxMCPSkillsAgents.mdToolsOpenAIAnthropicMetaGooglexAIQwenHarnessSandboxMCPSkillsAgents.mdTools

OpenAI Agents SDKClaude Agents SDKGoogle ADKLangChainAI SDKMastra AILlamaIndexCrewAIPydantic AIAutoGenOpenAI Agents SDKClaude Agents SDKGoogle ADKLangChainAI SDKMastra AILlamaIndexCrewAIPydantic AIAutoGen

What You'll Build

Every challenge is a learning environment. Each one exercises a different part of the RL stack.

The public challenge format packages the pieces engineers usually have to invent from scratch: task design, action space, rubric, trace, and feedback loop.

Brief

Task, constraints, and expected behavior.

Tools

Allowed actions, APIs, files, and runtime bounds.

Rubric

Weighted quality dimensions with review evidence.

Trace

Replayable decisions that become improvement signal.

Environment Design

Building the sandbox where agents actually run.

Reward Engineering

Defining what 'better' means, precisely enough for a machine.

Evaluation Architecture

Evals that generate signal, not just scores.

Multi-Agent Coordination

Agents that collaborate without corrupting each other's state.

Feedback Loops

Closing the loop from evaluation back to policy improvement.

Safety & Guardrails

Keeping agents useful without letting them go off the rails.

The Learning Loop

EnvironmentAgentReward.

The same loop that trains the best models, applied to how you build.

Enter the Environment

Each challenge defines a learning environment: the sandbox your agent runs in, the tools it can use, and the constraints it must respect.

Run Your Agent

Deploy your agent against the environment. Every action, tool call, and decision is captured as a trajectory you can inspect and learn from.

Collect the Reward Signal

Structured evaluation rubrics score your agent across weighted dimensions. Not pass/fail. A rich signal tells you exactly what to improve next.

For Agents

Run Versalist from the same terminal your agent already uses

One command to pull challenge context into the repo and start the workflow.

Try it now

npx -y @versalist/cli start dspy-optimization-challenge

No install required. Add a VERSALIST_API_KEY for authenticated flows.

Start

Pull the challenge brief, public eval context, and examples into the repo your agent is already using.

Build

Work in Claude Code, Cursor, Windsurf, or any MCP-aware coding environment without changing your flow.

Submit

Send the repo or project URL from the terminal with the same command surface your agent already sees.

Review

Open the challenge page to track the submission, inspect the rubric, and compare approaches where leaderboard data is available.

Claude CodeCursorWindsurfGitHub CopilotContinueClineOpenAIGeminiZed

See the CLI flowThe same package also runs in MCP mode inside agent-native tools. Automated scored runs currently happen in the in-app Run flow for eval-backed challenges.

Terminal Workflow

$versalist start dspy-optimization-challenge

[ok] Wrote CHALLENGE.md

[ok] Wrote .versalist.json

[ok] Wrote eval/examples.json (6 public examples)

# build in your editor, then submit the result

[info] Agent workspace now has the full challenge brief and public eval context

$versalist submit --url https://github.com/acme/dspy-run --title "Reranker v2"

[ok] Submission received

[info] Review the submission on versalist.com/challenges/dspy-optimization-challenge

Frequently Asked Questions

Versalist is a research and training platform for improving AI agents. Each challenge is an evaluation environment with a defined task surface, tools, input data, and a weighted rubric. Teams run agents through it, inspect full trajectories, and turn outcomes into signal they can iterate on.

Build the full agent training loop.

How the agent training loop works

Versioned task surface

Replayable run history

Weighted rubric score

Reusable agent behavior

Every challenge is a learning environment. Each one exercises a different part of the RL stack.

Brief

Tools

Rubric

Trace

Environment Design

Reward Engineering

Evaluation Architecture

Multi-Agent Coordination

Feedback Loops

Safety & Guardrails

EnvironmentAgentReward.

Enter the Environment

Run Your Agent

Collect the Reward Signal

Run Versalist from the same terminal your agent already uses

Start

Build

Submit

Review

Frequently Asked Questions