Creative Integrity Orchestrator
In response to the WGA and AMPTP agreement regarding AI training protection and creative workflows, you will build an autonomous agent team that automates script review and IP protection. Using the OpenAI Agents SDK, you will orchestrate a multi-agent system where one agent specializes in deep reasoning about copyright law and union agreements, while another manages creative content generation. The system must verify that any generated content adheres to the 2026 labor agreements and does not inadvertently ingest restricted training data. You will implement a governance layer using Fairlearn to ensure the creative output is free from specific demographic biases that often plague large models. The workflow will be optimized by Yupp AI, which routes reasoning tasks to DeepSeek R1 for logic-heavy analysis and creative tasks to GPT-5.4-mini for rapid iteration. Finally, you will integrate Cognition (Devin) as an autonomous engineering agent to perform code reviews on the system's own tool-calling logic to prevent data leaks.
What you are building
The core problem, expected build, and operating context for this challenge.
In response to the WGA and AMPTP agreement regarding AI training protection and creative workflows, you will build an autonomous agent team that automates script review and IP protection. Using the OpenAI Agents SDK, you will orchestrate a multi-agent system where one agent specializes in deep reasoning about copyright law and union agreements, while another manages creative content generation. The system must verify that any generated content adheres to the 2026 labor agreements and does not inadvertently ingest restricted training data. You will implement a governance layer using Fairlearn to ensure the creative output is free from specific demographic biases that often plague large models. The workflow will be optimized by Yupp AI, which routes reasoning tasks to DeepSeek R1 for logic-heavy analysis and creative tasks to GPT-5.4-mini for rapid iteration. Finally, you will integrate Cognition (Devin) as an autonomous engineering agent to perform code reviews on the system's own tool-calling logic to prevent data leaks.
Shared data for this challenge
Review public datasets and any private uploads tied to your build.
How submissions are scored
These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.
Fairness Threshold
Does the Fairlearn demographic parity difference stay below 0.1?
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
Reasoning Accuracy
DeepSeek R1 explanation quality score • target: 90 • range: 0-100
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
What you should walk away with
Master the OpenAI Agents SDK for building role-based autonomous agents with shared memory and tool access
Orchestrate complex logic chains by delegating legal reasoning tasks to DeepSeek R1 via the Yupp AI routing layer
Implement Fairlearn assessment pipelines to evaluate and mitigate biases in AI-generated narrative content
Design autonomous code-correction loops with Cognition (Devin) to audit agentic tool usage and API calls
Build a multi-turn conversation manager that maintains script context across different agent specializations
Integrate Coplay AI as the primary interface for human-in-the-loop creative approval workflows
[ok] Wrote CHALLENGE.md
[ok] Wrote .versalist.json
[ok] Wrote eval/examples.json
Requires VERSALIST_API_KEY. Works with any MCP-aware editor.
DocsAI Research & Mentorship
Participation status
You haven't started this challenge yet
Operating window
Key dates and the organization behind this challenge.
Find another challenge
Jump to a random challenge when you want a fresh benchmark or a different problem space.