Challenge

Creative Integrity Orchestrator

In response to the WGA and AMPTP agreement regarding AI training protection and creative workflows, you will build an autonomous agent team that automates script review and IP protection. Using the OpenAI Agents SDK, you will orchestrate a multi-agent system where one agent specializes in deep reasoning about copyright law and union agreements, while another manages creative content generation. The system must verify that any generated content adheres to the 2026 labor agreements and does not inadvertently ingest restricted training data. You will implement a governance layer using Fairlearn to ensure the creative output is free from specific demographic biases that often plague large models. The workflow will be optimized by Yupp AI, which routes reasoning tasks to DeepSeek R1 for logic-heavy analysis and creative tasks to GPT-5.4-mini for rapid iteration. Finally, you will integrate Cognition (Devin) as an autonomous engineering agent to perform code reviews on the system's own tool-calling logic to prevent data leaks.

Agent BuildingHosted by Vera
Status
Always open
Difficulty
Advanced
Points
500
Challenge brief

What you are building

The core problem, expected build, and operating context for this challenge.

In response to the WGA and AMPTP agreement regarding AI training protection and creative workflows, you will build an autonomous agent team that automates script review and IP protection. Using the OpenAI Agents SDK, you will orchestrate a multi-agent system where one agent specializes in deep reasoning about copyright law and union agreements, while another manages creative content generation. The system must verify that any generated content adheres to the 2026 labor agreements and does not inadvertently ingest restricted training data. You will implement a governance layer using Fairlearn to ensure the creative output is free from specific demographic biases that often plague large models. The workflow will be optimized by Yupp AI, which routes reasoning tasks to DeepSeek R1 for logic-heavy analysis and creative tasks to GPT-5.4-mini for rapid iteration. Finally, you will integrate Cognition (Devin) as an autonomous engineering agent to perform code reviews on the system's own tool-calling logic to prevent data leaks.

Datasets

Shared data for this challenge

Review public datasets and any private uploads tied to your build.

Loading datasets...
Evaluation rubric

How submissions are scored

These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.

Max Score: 2
Dimensions
2 scoring checks
Binary
2 pass or fail dimensions
Ordinal
0 scaled dimensions
Dimension 1fairness_threshold

Fairness Threshold

Does the Fairlearn demographic parity difference stay below 0.1?

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 2reasoning_accuracy

Reasoning Accuracy

DeepSeek R1 explanation quality score • target: 90 • range: 0-100

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Learning goals

What you should walk away with

  • Master the OpenAI Agents SDK for building role-based autonomous agents with shared memory and tool access

  • Orchestrate complex logic chains by delegating legal reasoning tasks to DeepSeek R1 via the Yupp AI routing layer

  • Implement Fairlearn assessment pipelines to evaluate and mitigate biases in AI-generated narrative content

  • Design autonomous code-correction loops with Cognition (Devin) to audit agentic tool usage and API calls

  • Build a multi-turn conversation manager that maintains script context across different agent specializations

  • Integrate Coplay AI as the primary interface for human-in-the-loop creative approval workflows

Start from your terminal
$npx -y @versalist/cli start creative-integrity-orchestrator

[ok] Wrote CHALLENGE.md

[ok] Wrote .versalist.json

[ok] Wrote eval/examples.json

Requires VERSALIST_API_KEY. Works with any MCP-aware editor.

Docs
Manage API keys
Host and timing
Vera

AI Research & Mentorship

Starts Available now
Evergreen challenge
Your progress

Participation status

You haven't started this challenge yet

Timeline and host

Operating window

Key dates and the organization behind this challenge.

Start date
Available now
Run mode
Evergreen challenge
Explore

Find another challenge

Jump to a random challenge when you want a fresh benchmark or a different problem space.

Useful when you want to pressure-test your workflow on a new dataset, new constraints, or a new evaluation rubric.

Tool Space Recipe

Draft
Action Space
OpenAIOpenAI AI model provider
required
RAIAgentic framework for robotics using ROS 2
Policy Serving
DeepSeek R1
Evaluation
Rubric: 2 dimensions
·Fairness Threshold(1%)
·Reasoning Accuracy(1%)
Gold items: 1 (1 public)

Frequently Asked Questions about Creative Integrity Orchestrator