Cyberthreat Orchestrator Agent
This challenge requires building an autonomous cyber threat detection and remediation system using the LangChain framework, specifically leveraging LangGraph for complex, stateful multi-agent workflows. Developers will design a team of specialized agents that work together to identify threats from simulated log data, analyze their severity, formulate a remediation plan, and orchestrate protective actions. The system must be capable of dynamic decision-making and adapting its response based on the evolving threat landscape. The focus is on robust agent collaboration patterns, sophisticated tool integration, and continuous evaluation of the agent system's effectiveness.
What you are building
The core problem, expected build, and operating context for this challenge.
This challenge requires building an autonomous cyber threat detection and remediation system using the LangChain framework, specifically leveraging LangGraph for complex, stateful multi-agent workflows. Developers will design a team of specialized agents that work together to identify threats from simulated log data, analyze their severity, formulate a remediation plan, and orchestrate protective actions. The system must be capable of dynamic decision-making and adapting its response based on the evolving threat landscape. The focus is on robust agent collaboration patterns, sophisticated tool integration, and continuous evaluation of the agent system's effectiveness.
Shared data for this challenge
Review public datasets and any private uploads tied to your build.
How submissions are scored
These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.
ThreatAccuracy
Agent correctly identifies and classifies threats based on provided logs.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
PlanActionability
Remediation plan contains at least 3 actionable steps for high-severity threats.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
PlanCoherenceScore
Expert-rated score for the coherence and completeness of the remediation plan (1-5). • target: 4 • range: 1-5
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
ResponseLatency_ms
Time taken for the agent to process a threat and propose a plan in milliseconds. • target: 1000 • range: 0-5000
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
What you should walk away with
Master LangGraph for building stateful, cyclic agent workflows, including defining nodes, edges, and conditional routing based on threat context.
Implement robust tool invocation patterns within LangChain agents, enabling interaction with mock security APIs for threat scanning and system lockdown.
Design prompts for Gemini 2.5 Pro to perform sophisticated threat intelligence analysis, identifying zero-day potential and recommending remediation strategies.
Build custom data parsers and aggregators to normalize threat data inputs from various simulated security tools.
Integrate Evidently AI to establish continuous evaluation metrics for agent response time, accuracy of threat identification, and efficacy of proposed remediation steps.
Orchestrate the deployment of specialized Qwen 2-powered sub-agents via Together AI for high-throughput anomaly detection in network traffic logs.
Develop fault-tolerant agent execution strategies within LangGraph to handle partial failures during critical incident response scenarios.
[ok] Wrote CHALLENGE.md
[ok] Wrote .versalist.json
[ok] Wrote eval/examples.json
Requires VERSALIST_API_KEY. Works with any MCP-aware editor.
DocsAI Research & Mentorship
Participation status
You haven't started this challenge yet
Operating window
Key dates and the organization behind this challenge.
Find another challenge
Jump to a random challenge when you want a fresh benchmark or a different problem space.