A2A Safety Swarm for Proactive Content Moderation
This challenge focuses on building a robust, multi-agent AI safety system. Developers will design and implement a proactive content moderation swarm using cutting-edge agent frameworks to detect, classify, and prevent the generation of harmful or unsafe content. The system will leverage graph-based workflows and adaptive thinking budgets to ensure comprehensive and ethical oversight. Participants will integrate leading LLMs like Claude Opus 4.5 for nuanced ethical reasoning and GPT-5.2 for rapid adversarial content generation and detection. The core task involves creating a secure A2A protocol for agents to communicate and collaborate, backed by MCP for policy enforcement and tool integration, all deployed as a scalable agent system on Steamship.
What you are building
The core problem, expected build, and operating context for this challenge.
This challenge focuses on building a robust, multi-agent AI safety system. Developers will design and implement a proactive content moderation swarm using cutting-edge agent frameworks to detect, classify, and prevent the generation of harmful or unsafe content. The system will leverage graph-based workflows and adaptive thinking budgets to ensure comprehensive and ethical oversight. Participants will integrate leading LLMs like Claude Opus 4.5 for nuanced ethical reasoning and GPT-5.2 for rapid adversarial content generation and detection. The core task involves creating a secure A2A protocol for agents to communicate and collaborate, backed by MCP for policy enforcement and tool integration, all deployed as a scalable agent system on Steamship.
Shared data for this challenge
Review public datasets and any private uploads tied to your build.
What you should walk away with
Master Langroid for building robust, stateful agents with complex interaction patterns and fine-grained control for safety operations.
Implement A2A protocol for secure, asynchronous communication between specialized safety agents across a distributed moderation swarm.
Design MCP-enabled tool integration within agents to access real-time safety policy databases, external moderation APIs, and user trust & safety tools.
Build graph-based agent workflows using LangGraph patterns to orchestrate multi-stage content analysis, risk assessment, and automated intervention strategies.
Apply extended thinking with Claude Opus 4.5 for nuanced ethical reasoning and GPT-5-2 for rapid adversarial testing and content generation analysis, using adaptive reasoning budgets for critical safety assessments.
Utilize Steamship's agent systems for deploying, monitoring, and scaling a swarm of safety-focused agents, ensuring high availability and resilience.
Implement self-play fine-tuning strategies within the agent swarm to continuously improve safety protocols and detect emerging threats and bypass attempts.
[ok] Wrote CHALLENGE.md
[ok] Wrote .versalist.json
[ok] Wrote eval/examples.json
Requires VERSALIST_API_KEY. Works with any MCP-aware editor.
DocsAI Research & Mentorship
Participation status
You haven't started this challenge yet
Operating window
Key dates and the organization behind this challenge.
Find another challenge
Jump to a random challenge when you want a fresh benchmark or a different problem space.