High-Performance Operational Planning Agent
Design and implement a high-performance, real-time operational planning agent using the Mastra AI TypeScript framework. This challenge focuses on creating a robust system capable of handling complex logistics scenarios, such as autonomous fleet dispatch or dynamic resource allocation, inspired by systems like Waymo's next-gen robotaxis. The agent will leverage the advanced reasoning capabilities of GPT-5 for strategic planning and decision-making, complemented by Claude Sonnet 4 for critical validation and safety checks, ensuring reliable operations under uncertainty. Both models will be served efficiently via the Cohere Platform, providing a scalable and secure inference environment. To achieve ultra-low-latency responses crucial for real-time operations, NVIDIA's TensorRT-LLM will be employed to optimize and accelerate the model inference. Mastra AI's built-in memory management and tool-use capabilities will be central to the agent's ability to maintain situational awareness and interact with various operational systems. This challenge highlights the integration of high-performance inference technologies with advanced agent frameworks for critical, real-world applications.
What you are building
The core problem, expected build, and operating context for this challenge.
Design and implement a high-performance, real-time operational planning agent using the Mastra AI TypeScript framework. This challenge focuses on creating a robust system capable of handling complex logistics scenarios, such as autonomous fleet dispatch or dynamic resource allocation, inspired by systems like Waymo's next-gen robotaxis. The agent will leverage the advanced reasoning capabilities of GPT-5 for strategic planning and decision-making, complemented by Claude Sonnet 4 for critical validation and safety checks, ensuring reliable operations under uncertainty. Both models will be served efficiently via the Cohere Platform, providing a scalable and secure inference environment. To achieve ultra-low-latency responses crucial for real-time operations, NVIDIA's TensorRT-LLM will be employed to optimize and accelerate the model inference. Mastra AI's built-in memory management and tool-use capabilities will be central to the agent's ability to maintain situational awareness and interact with various operational systems. This challenge highlights the integration of high-performance inference technologies with advanced agent frameworks for critical, real-world applications.
Shared data for this challenge
Review public datasets and any private uploads tied to your build.
How submissions are scored
These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.
Decision Latency (Average)
Ensure average time from receiving new requests to outputting a dispatch plan is under 500ms.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
Plan Feasibility
Verify that generated dispatch plans are logically sound and executable given fleet constraints.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
Safety Validation Accuracy
Confirm Claude Sonnet 4 correctly identifies safety violations in 95% of test cases.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
Plan Optimality Score
Numerical score (0-1) reflecting how close the generated plan is to the theoretically optimal solution. • target: 0.9 • range: 0.7-1
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
TensorRT-LLM Speedup Factor
Ratio of inference speed with TensorRT-LLM vs. baseline non-optimized inference. • target: 5 • range: 2-10
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
Multi-Model Consistency
Agreement rate between GPT-5's proposed plan and Claude Sonnet 4's validation (after filtering for safety concerns). • target: 0.9 • range: 0.8-1
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
What you should walk away with
Master the Mastra AI TypeScript framework for defining agents, managing memory, and orchestrating complex workflows for real-time decision-making.
Implement multi-model integration, using GPT-5 for primary planning and Claude Sonnet 4 for secondary validation or safety checks within the Mastra AI agent.
Deploy and manage both GPT-5 and Claude Sonnet 4 through the Cohere Platform, leveraging its capabilities for secure API access, rate limiting, and model versioning.
Integrate NVIDIA TensorRT-LLM into the inference pipeline via Cohere Platform to achieve significant speedups and reduced latency for LLM responses, critical for real-time operational scenarios.
Design and implement custom tools for the Mastra AI agent to interact with simulated operational data, such as real-time sensor feeds, logistics databases, or vehicle control systems.
Build a robust real-time decision-making loop within the Mastra AI agent, capable of adaptive planning and rapid response to dynamic environmental changes.
Evaluate the performance of the operational planning agent under stress, measuring decision accuracy, response latency, and system resilience.
[ok] Wrote CHALLENGE.md
[ok] Wrote .versalist.json
[ok] Wrote eval/examples.json
Requires VERSALIST_API_KEY. Works with any MCP-aware editor.
DocsAI Research & Mentorship
Participation status
You haven't started this challenge yet
Operating window
Key dates and the organization behind this challenge.
Find another challenge
Jump to a random challenge when you want a fresh benchmark or a different problem space.