Multi-Agent System for Internal Security Anomaly Detection
This challenge focuses on building a sophisticated multi-agent system using AutoGen to detect potential data leaks or anomalous behavior. Participants will design and implement a collaborative team of AI agents capable of monitoring internal communication logs, system access records, and cross-referencing this data with external news feeds or public information. The system will identify patterns and anomalies that might indicate security incidents or insider threats. The core of the challenge involves orchestrating diverse agents, each with specific roles like 'Log Monitor', 'News Analyst', 'Incident Investigator', and 'Reporting Agent'. These agents will communicate and collaborate autonomously, using o4-mini for reasoning and specific tools to interact with simulated data sources. The goal is to build an intelligent, proactive security monitoring system that can identify subtle indicators of risk and present a consolidated, actionable report.
What you are building
The core problem, expected build, and operating context for this challenge.
This challenge focuses on building a sophisticated multi-agent system using AutoGen to detect potential data leaks or anomalous behavior. Participants will design and implement a collaborative team of AI agents capable of monitoring internal communication logs, system access records, and cross-referencing this data with external news feeds or public information. The system will identify patterns and anomalies that might indicate security incidents or insider threats. The core of the challenge involves orchestrating diverse agents, each with specific roles like 'Log Monitor', 'News Analyst', 'Incident Investigator', and 'Reporting Agent'. These agents will communicate and collaborate autonomously, using o4-mini for reasoning and specific tools to interact with simulated data sources. The goal is to build an intelligent, proactive security monitoring system that can identify subtle indicators of risk and present a consolidated, actionable report.
Shared data for this challenge
Review public datasets and any private uploads tied to your build.
How submissions are scored
These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.
Correct Anomaly Identification
The 'anomaly_detected' flag must be true for positive cases and false for negative cases.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
Report Clarity
The 'report_summary' must clearly describe the anomaly and contributing factors.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
Confidence Score
The reported confidence in the anomaly detection. • target: 0.8 • range: 0-1
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
What you should walk away with
Master AutoGen for building complex, conversational multi-agent systems with shared context and human-in-the-loop capabilities.
Implement role-based agent collaboration patterns in AutoGen, defining specialized agents like Log Monitor, News Analyst, and Security Investigator.
Integrate o4-mini models into AutoGen agents for advanced reasoning, natural language processing, and pattern recognition tasks.
Design and implement custom tools for AutoGen agents to interact with simulated internal access logs, email archives, and external news APIs.
Utilize FLAML within AutoGen workflows for automated hyperparameter tuning and efficient resource management for agent-based tasks.
Develop reporting mechanisms using All Hands AI for summarizing security incidents and communicating findings to human operators.
Apply CodeRabbit principles for ensuring code quality and best practices in the AutoGen agent codebase, emphasizing maintainability and security.
Explore Neurolink patterns for designing resilient and adaptive agent systems capable of handling dynamic security threat landscapes.
[ok] Wrote CHALLENGE.md
[ok] Wrote .versalist.json
[ok] Wrote eval/examples.json
Requires VERSALIST_API_KEY. Works with any MCP-aware editor.
DocsAI Research & Mentorship
Participation status
You haven't started this challenge yet
Operating window
Key dates and the organization behind this challenge.
Find another challenge
Jump to a random challenge when you want a fresh benchmark or a different problem space.