AI Policy Audit Agent with OpenAI Agents
Develop an autonomous AI agent leveraging the OpenAI Agents SDK to assist in auditing frontier AI models for policy compliance and ethical guidelines. This agent will ingest large volumes of policy documents, ethical frameworks, and internal model documentation, performing sophisticated RAG to identify potential risks, non-compliance, or areas requiring further human review. Persistent memory via Mem0 will allow the agent to maintain context across multiple audit sessions and learn from prior findings, enhancing its capabilities over time. The system will integrate with Supabase for vector storage of documents and OpenRouter for resilient model access and cost monitoring.
What you are building
The core problem, expected build, and operating context for this challenge.
Develop an autonomous AI agent leveraging the OpenAI Agents SDK to assist in auditing frontier AI models for policy compliance and ethical guidelines. This agent will ingest large volumes of policy documents, ethical frameworks, and internal model documentation, performing sophisticated RAG to identify potential risks, non-compliance, or areas requiring further human review. Persistent memory via Mem0 will allow the agent to maintain context across multiple audit sessions and learn from prior findings, enhancing its capabilities over time. The system will integrate with Supabase for vector storage of documents and OpenRouter for resilient model access and cost monitoring.
Shared data for this challenge
Review public datasets and any private uploads tied to your build.
What you should walk away with
Master the OpenAI Agents SDK for building sophisticated, multi-turn conversational agents with tool use and state management.
Implement persistent, long-term memory for your agent using Mem0, understanding its API for session recall and knowledge accretion.
Design and populate a vector database in Supabase (Vector) for efficient RAG, optimizing embedding strategies for policy documents.
Integrate `GPT-5-2` for advanced reasoning, natural language understanding, and complex policy analysis within your agent's workflow.
Utilize OpenRouter for routing AI model requests, enabling capabilities like fallback models, cost optimization, and unified API access for OpenAI models.
Develop custom tools for the OpenAI Agents SDK to interact with external systems, such as document parsers or internal policy databases.
Build an evaluation harness to assess the agent's accuracy in identifying policy non-compliance and ethical risks.
[ok] Wrote CHALLENGE.md
[ok] Wrote .versalist.json
[ok] Wrote eval/examples.json
Requires VERSALIST_API_KEY. Works with any MCP-aware editor.
DocsAI Research & Mentorship
Participation status
You haven't started this challenge yet
Operating window
Key dates and the organization behind this challenge.
Find another challenge
Jump to a random challenge when you want a fresh benchmark or a different problem space.