Agent Building
Advanced
Always open

AI Policy Argument Generation Agent

Develop an advanced AI agent system using the Claude Agents SDK to assist in complex policy negotiations, drawing inspiration from the SAG-AFTRA talks regarding an 'AI tax' on synthetic actors. This challenge requires building a system capable of analyzing diverse policy documents, legal texts, and economic data to generate well-reasoned arguments and counter-arguments for specific stakeholders, such as labor unions and production studios. The core of the solution will be a multi-agent workflow orchestrated by the Claude Agents SDK, leveraging Claude 3.5 Sonnet for its robust reasoning and document understanding capabilities. The agents will be tasked with identifying key points of contention, forecasting potential impacts of proposed policies, and synthesizing persuasive rhetoric. Evaluation of the generated arguments will be conducted using Gentrace, focusing on logical coherence, factual accuracy, and persuasive strength. The system will rely on Azure Blob Storage and Azure Cognitive Search for efficient storage and retrieval of relevant policy documents and background information.

Challenge brief

What you are building

The core problem, expected build, and operating context for this challenge.

Develop an advanced AI agent system using the Claude Agents SDK to assist in complex policy negotiations, drawing inspiration from the SAG-AFTRA talks regarding an 'AI tax' on synthetic actors. This challenge requires building a system capable of analyzing diverse policy documents, legal texts, and economic data to generate well-reasoned arguments and counter-arguments for specific stakeholders, such as labor unions and production studios. The core of the solution will be a multi-agent workflow orchestrated by the Claude Agents SDK, leveraging Claude 3.5 Sonnet for its robust reasoning and document understanding capabilities. The agents will be tasked with identifying key points of contention, forecasting potential impacts of proposed policies, and synthesizing persuasive rhetoric. Evaluation of the generated arguments will be conducted using Gentrace, focusing on logical coherence, factual accuracy, and persuasive strength. The system will rely on Azure Blob Storage and Azure Cognitive Search for efficient storage and retrieval of relevant policy documents and background information.

Datasets

Shared data for this challenge

Review public datasets and any private uploads tied to your build.

Loading datasets...
Evaluation rubric

How submissions are scored

These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.

Max Score: 5
Dimensions
5 scoring checks
Binary
5 pass or fail dimensions
Ordinal
0 scaled dimensions
Dimension 1factualaccuracycheck

FactualAccuracyCheck

Verifies that generated arguments do not contain factual inaccuracies based on provided documents.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 2stakeholderalignment

StakeholderAlignment

Ensures arguments are consistently aligned with the specified stakeholder's interests.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 3logicalcoherence_score

LogicalCoherence_Score

A score (1-5) representing the logical flow and consistency of the argument. • target: 4 • range: 1-5

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 4persuasiveness_rating

Persuasiveness_Rating

A human-rated score (1-5) for how persuasive the argument is. • target: 4 • range: 1-5

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 5completeness_of_coverage

Completeness_of_Coverage

Percentage of relevant points from documents covered by the argument. • target: 0.8 • range: 0.5-1

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Learning goals

What you should walk away with

Master the Claude Agents SDK for constructing sophisticated multi-step agentic workflows, incorporating tool use and computer interaction capabilities.

Implement advanced document understanding techniques with Claude 3.5 Sonnet to parse complex legal, contractual, and policy texts, extracting key clauses and implications.

Design a robust data pipeline utilizing Azure Blob Storage for raw document storage and Azure Cognitive Search for semantic retrieval of relevant information for agents.

Develop strategies for generating balanced and persuasive arguments, including identifying pros and cons, potential impacts, and counter-arguments for different stakeholder perspectives.

Integrate Gentrace into the agent workflow to evaluate the logical coherence, factual grounding, and rhetorical effectiveness of generated policy arguments.

Build tool-use capabilities within Claude Agents SDK to interact with external databases, search engines, and custom analytics functions for comprehensive data gathering.

Start from your terminal
$npx -y @versalist/cli start ai-policy-argument-generation-agent

[ok] Wrote CHALLENGE.md

[ok] Wrote .versalist.json

[ok] Wrote eval/examples.json

Requires VERSALIST_API_KEY. Works with any MCP-aware editor.

Docs
Manage API keys
Challenge at a glance
Host and timing
Vera

AI Research & Mentorship

Starts Available now
Evergreen challenge
Your progress

Participation status

You haven't started this challenge yet

Timeline and host

Operating window

Key dates and the organization behind this challenge.

Start date
Available now
Run mode
Evergreen challenge
Explore

Find another challenge

Jump to a random challenge when you want a fresh benchmark or a different problem space.

Useful when you want to pressure-test your workflow on a new dataset, new constraints, or a new evaluation rubric.

Tool Space Recipe

Draft
Evaluation
Rubric: 5 dimensions
·FactualAccuracyCheck(1%)
·StakeholderAlignment(1%)
·LogicalCoherence_Score(1%)
·Persuasiveness_Rating(1%)
·Completeness_of_Coverage(1%)
Gold items: 2 (2 public)

Frequently Asked Questions about AI Policy Argument Generation Agent