AI Policy Argument Generation Agent
Develop an advanced AI agent system using the Claude Agents SDK to assist in complex policy negotiations, drawing inspiration from the SAG-AFTRA talks regarding an 'AI tax' on synthetic actors. This challenge requires building a system capable of analyzing diverse policy documents, legal texts, and economic data to generate well-reasoned arguments and counter-arguments for specific stakeholders, such as labor unions and production studios. The core of the solution will be a multi-agent workflow orchestrated by the Claude Agents SDK, leveraging Claude 3.5 Sonnet for its robust reasoning and document understanding capabilities. The agents will be tasked with identifying key points of contention, forecasting potential impacts of proposed policies, and synthesizing persuasive rhetoric. Evaluation of the generated arguments will be conducted using Gentrace, focusing on logical coherence, factual accuracy, and persuasive strength. The system will rely on Azure Blob Storage and Azure Cognitive Search for efficient storage and retrieval of relevant policy documents and background information.
What you are building
The core problem, expected build, and operating context for this challenge.
Develop an advanced AI agent system using the Claude Agents SDK to assist in complex policy negotiations, drawing inspiration from the SAG-AFTRA talks regarding an 'AI tax' on synthetic actors. This challenge requires building a system capable of analyzing diverse policy documents, legal texts, and economic data to generate well-reasoned arguments and counter-arguments for specific stakeholders, such as labor unions and production studios. The core of the solution will be a multi-agent workflow orchestrated by the Claude Agents SDK, leveraging Claude 3.5 Sonnet for its robust reasoning and document understanding capabilities. The agents will be tasked with identifying key points of contention, forecasting potential impacts of proposed policies, and synthesizing persuasive rhetoric. Evaluation of the generated arguments will be conducted using Gentrace, focusing on logical coherence, factual accuracy, and persuasive strength. The system will rely on Azure Blob Storage and Azure Cognitive Search for efficient storage and retrieval of relevant policy documents and background information.
Shared data for this challenge
Review public datasets and any private uploads tied to your build.
How submissions are scored
These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.
FactualAccuracyCheck
Verifies that generated arguments do not contain factual inaccuracies based on provided documents.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
StakeholderAlignment
Ensures arguments are consistently aligned with the specified stakeholder's interests.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
LogicalCoherence_Score
A score (1-5) representing the logical flow and consistency of the argument. • target: 4 • range: 1-5
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
Persuasiveness_Rating
A human-rated score (1-5) for how persuasive the argument is. • target: 4 • range: 1-5
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
Completeness_of_Coverage
Percentage of relevant points from documents covered by the argument. • target: 0.8 • range: 0.5-1
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
What you should walk away with
Master the Claude Agents SDK for constructing sophisticated multi-step agentic workflows, incorporating tool use and computer interaction capabilities.
Implement advanced document understanding techniques with Claude 3.5 Sonnet to parse complex legal, contractual, and policy texts, extracting key clauses and implications.
Design a robust data pipeline utilizing Azure Blob Storage for raw document storage and Azure Cognitive Search for semantic retrieval of relevant information for agents.
Develop strategies for generating balanced and persuasive arguments, including identifying pros and cons, potential impacts, and counter-arguments for different stakeholder perspectives.
Integrate Gentrace into the agent workflow to evaluate the logical coherence, factual grounding, and rhetorical effectiveness of generated policy arguments.
Build tool-use capabilities within Claude Agents SDK to interact with external databases, search engines, and custom analytics functions for comprehensive data gathering.
[ok] Wrote CHALLENGE.md
[ok] Wrote .versalist.json
[ok] Wrote eval/examples.json
Requires VERSALIST_API_KEY. Works with any MCP-aware editor.
DocsAI Research & Mentorship
Participation status
You haven't started this challenge yet
Operating window
Key dates and the organization behind this challenge.
Find another challenge
Jump to a random challenge when you want a fresh benchmark or a different problem space.