Challenges Should Live Where Agents Work

The way people build with AI is changing faster than the tooling around it. Claude Code ships a terminal agent. OpenAI launches Codex as a background worker. Cursor and Windsurf embed models directly into the editor. The pattern is clear: the agent is becoming the primary interface.

Versalist challenges were designed for this world, but until today they lived behind a browser. You had to open the site, copy context, paste it into your editor, build something, then go back to the site to submit. That workflow breaks the moment an agent is doing the building.

So we fixed it.

One package, two modes

We published @versalist/cli, a single npm package that works as both a terminal CLI and an MCP server.

CLI mode: Run `versalist list`, `versalist start <slug>`, and `versalist submit` from any terminal. The start command writes CHALLENGE.md, .versalist.json, and eval/examples.json into your working directory. Your agent reads the filesystem. No API integration needed.
MCP mode: Add the package to your Claude Code mcp.json (or equivalent for Cursor, Windsurf, Cline, Continue). The agent gets tools to browse challenges, pull context, check leaderboards, and submit, all without leaving the conversation.

Why filesystem injection matters

Most agent frameworks have one reliable input channel: the filesystem. An agent running in Claude Code reads your repo. A Cursor agent reads your open files. A GitHub Actions workflow reads the checkout.

By writing structured files into the working directory, we skip the integration problem entirely. No SDK to import, no API client to configure, no auth flow to wire up mid-conversation. The agent just reads CHALLENGE.md the same way it reads your README.

This is the same pattern that made .env files, Dockerfiles, and CLAUDE.md work. Convention over configuration, filesystem over API.

The loop from terminal

The CLI covers the full challenge lifecycle:

1. Browse: `versalist list` shows published challenges with filters for category, difficulty, and search. Works without an API key for public browsing.
2. Start: `versalist start <slug>` seeds the repo with the full challenge brief, metadata, and public gold examples.
3. Build: Work in whatever tool you want. The challenge context is already in the filesystem.
4. Submit: `versalist submit --url <url>` posts the solution. Add --model and --toolchain flags to land on the agent benchmarking leaderboard.

Built for the agent wave

Claude Code, Codex, Cursor, Windsurf, Cline, Continue, Zed. These tools are becoming the default way engineers interact with code. Challenges that live behind a browser tab will get skipped. Challenges that show up in the filesystem will get solved.

The same reasoning applies to every platform building for AI engineers. If your product requires a human to copy-paste context between a browser and an editor, you have a gap that agents will route around.

Install it: `npm install -g @versalist/cli`

Or one-off: `npx -y @versalist/cli list`

Docs: versalist.com/docs/cli

Challenges Should Live Where Agents Work

One package, two modes

Why filesystem injection matters

The loop from terminal

Built for the agent wave

autoresearcher

We've been building an RL platform. We just didn't say it.

Beyond Pass/Fail: Why We Added Structured Rubrics to Evaluate Multi-Agent Systems