Challenge

Build AI Dev Agent Teams

Amidst the booming AI coding market, this challenge focuses on developing sophisticated AI agent teams to automate web application development and deployment tasks. Participants will leverage the LangChain framework to orchestrate a multi-agent system, simulating a collaborative development team capable of generating code, performing quality checks, and interacting with stakeholders via voice. The system will demonstrate how advanced generative AI can streamline the software development lifecycle, from initial concept to deployment readiness.

Agent BuildingHosted by Vera
Status
Always open
Difficulty
Advanced
Points
500
Challenge brief

What you are building

The core problem, expected build, and operating context for this challenge.

Amidst the booming AI coding market, this challenge focuses on developing sophisticated AI agent teams to automate web application development and deployment tasks. Participants will leverage the LangChain framework to orchestrate a multi-agent system, simulating a collaborative development team capable of generating code, performing quality checks, and interacting with stakeholders via voice. The system will demonstrate how advanced generative AI can streamline the software development lifecycle, from initial concept to deployment readiness.

Datasets

Shared data for this challenge

Review public datasets and any private uploads tied to your build.

Loading datasets...
Evaluation rubric

How submissions are scored

These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.

Max Score: 4
Dimensions
4 scoring checks
Binary
4 pass or fail dimensions
Ordinal
0 scaled dimensions
Dimension 1feature_files_present

Feature Files Present

All required files (HTML, CSS, JS, report) for the generated feature must exist and be accessible.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 2coval_trace_completeness

Coval Trace Completeness

Coval logs must show a complete execution trace, including agent transitions and tool calls.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 3code_correctness_score

Code Correctness Score

Automated linting and basic functionality tests on generated code (0-100). • target: 85 • range: 0-100

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 4security_vulnerability_count

Security Vulnerability Count

Number of critical security vulnerabilities identified by StarCoder 2 (lower is better). • target: 0 • range: 0-5

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Learning goals

What you should walk away with

  • Master LangChain's AgentExecutor and LangGraph for designing stateful, graph-based agent workflows that manage task decomposition and inter-agent communication.

  • Integrate Claude 4 Opus into LangChain agents to perform sophisticated code generation, architectural design, and problem-solving tasks for web applications.

  • Implement real-time, bidirectional voice communication for agent interfaces using LiveKit, enabling natural language interaction for task assignment and status updates.

  • Leverage StarCoder 2 within a LangChain tool-use agent for focused code quality analysis, vulnerability scanning, and automated refactoring suggestions.

  • Design an evaluation and observability pipeline using Coval to trace agent decisions, monitor performance metrics, and ensure adherence to development standards.

  • Build a 'Project Manager' agent that uses LangChain's planning capabilities to orchestrate 'Developer' and 'QA' agents for end-to-end web app feature delivery.

  • Develop custom tools for LangChain agents to interact with external systems, simulating version control and deployment platforms.

Start from your terminal
$npx -y @versalist/cli start build-ai-dev-agent-teams

[ok] Wrote CHALLENGE.md

[ok] Wrote .versalist.json

[ok] Wrote eval/examples.json

Requires VERSALIST_API_KEY. Works with any MCP-aware editor.

Docs
Manage API keys
Host and timing
Vera

AI Research & Mentorship

Starts Available now
Evergreen challenge
Your progress

Participation status

You haven't started this challenge yet

Timeline and host

Operating window

Key dates and the organization behind this challenge.

Start date
Available now
Run mode
Evergreen challenge
Explore

Find another challenge

Jump to a random challenge when you want a fresh benchmark or a different problem space.

Useful when you want to pressure-test your workflow on a new dataset, new constraints, or a new evaluation rubric.

Tool Space Recipe

Draft
Action Space
LangChainFramework for building LLM applications
required
LangchainBuilding applications with LLMs
Policy Serving
Claude 4 Opus
Orchestration
LangChainFramework for building LLM applications
required
LangchainBuilding applications with LLMs
Evaluation
Rubric: 4 dimensions
·Feature Files Present(1%)
·Coval Trace Completeness(1%)
·Code Correctness Score(1%)
·Security Vulnerability Count(1%)
Gold items: 2 (2 public)

Frequently Asked Questions about Build AI Dev Agent Teams