Workflow Automation
Advanced
Always open

High-Performance Operational Planning Agent

Design and implement a high-performance, real-time operational planning agent using the Mastra AI TypeScript framework. This challenge focuses on creating a robust system capable of handling complex logistics scenarios, such as autonomous fleet dispatch or dynamic resource allocation, inspired by systems like Waymo's next-gen robotaxis. The agent will leverage the advanced reasoning capabilities of GPT-5 for strategic planning and decision-making, complemented by Claude Sonnet 4 for critical validation and safety checks, ensuring reliable operations under uncertainty. Both models will be served efficiently via the Cohere Platform, providing a scalable and secure inference environment. To achieve ultra-low-latency responses crucial for real-time operations, NVIDIA's TensorRT-LLM will be employed to optimize and accelerate the model inference. Mastra AI's built-in memory management and tool-use capabilities will be central to the agent's ability to maintain situational awareness and interact with various operational systems. This challenge highlights the integration of high-performance inference technologies with advanced agent frameworks for critical, real-world applications.

Challenge brief

What you are building

The core problem, expected build, and operating context for this challenge.

Design and implement a high-performance, real-time operational planning agent using the Mastra AI TypeScript framework. This challenge focuses on creating a robust system capable of handling complex logistics scenarios, such as autonomous fleet dispatch or dynamic resource allocation, inspired by systems like Waymo's next-gen robotaxis. The agent will leverage the advanced reasoning capabilities of GPT-5 for strategic planning and decision-making, complemented by Claude Sonnet 4 for critical validation and safety checks, ensuring reliable operations under uncertainty. Both models will be served efficiently via the Cohere Platform, providing a scalable and secure inference environment. To achieve ultra-low-latency responses crucial for real-time operations, NVIDIA's TensorRT-LLM will be employed to optimize and accelerate the model inference. Mastra AI's built-in memory management and tool-use capabilities will be central to the agent's ability to maintain situational awareness and interact with various operational systems. This challenge highlights the integration of high-performance inference technologies with advanced agent frameworks for critical, real-world applications.

Datasets

Shared data for this challenge

Review public datasets and any private uploads tied to your build.

Loading datasets...
Evaluation rubric

How submissions are scored

These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.

Max Score: 6
Dimensions
6 scoring checks
Binary
6 pass or fail dimensions
Ordinal
0 scaled dimensions
Dimension 1decision_latency_average

Decision Latency (Average)

Ensure average time from receiving new requests to outputting a dispatch plan is under 500ms.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 2plan_feasibility

Plan Feasibility

Verify that generated dispatch plans are logically sound and executable given fleet constraints.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 3safety_validation_accuracy

Safety Validation Accuracy

Confirm Claude Sonnet 4 correctly identifies safety violations in 95% of test cases.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 4plan_optimality_score

Plan Optimality Score

Numerical score (0-1) reflecting how close the generated plan is to the theoretically optimal solution. • target: 0.9 • range: 0.7-1

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 5tensorrt_llm_speedup_factor

TensorRT-LLM Speedup Factor

Ratio of inference speed with TensorRT-LLM vs. baseline non-optimized inference. • target: 5 • range: 2-10

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 6multi_model_consistency

Multi-Model Consistency

Agreement rate between GPT-5's proposed plan and Claude Sonnet 4's validation (after filtering for safety concerns). • target: 0.9 • range: 0.8-1

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Learning goals

What you should walk away with

Master the Mastra AI TypeScript framework for defining agents, managing memory, and orchestrating complex workflows for real-time decision-making.

Implement multi-model integration, using GPT-5 for primary planning and Claude Sonnet 4 for secondary validation or safety checks within the Mastra AI agent.

Deploy and manage both GPT-5 and Claude Sonnet 4 through the Cohere Platform, leveraging its capabilities for secure API access, rate limiting, and model versioning.

Integrate NVIDIA TensorRT-LLM into the inference pipeline via Cohere Platform to achieve significant speedups and reduced latency for LLM responses, critical for real-time operational scenarios.

Design and implement custom tools for the Mastra AI agent to interact with simulated operational data, such as real-time sensor feeds, logistics databases, or vehicle control systems.

Build a robust real-time decision-making loop within the Mastra AI agent, capable of adaptive planning and rapid response to dynamic environmental changes.

Evaluate the performance of the operational planning agent under stress, measuring decision accuracy, response latency, and system resilience.

Start from your terminal
$npx -y @versalist/cli start high-performance-operational-planning-agent

[ok] Wrote CHALLENGE.md

[ok] Wrote .versalist.json

[ok] Wrote eval/examples.json

Requires VERSALIST_API_KEY. Works with any MCP-aware editor.

Docs
Manage API keys
Challenge at a glance
Host and timing
Vera

AI Research & Mentorship

Starts Available now
Evergreen challenge
Your progress

Participation status

You haven't started this challenge yet

Timeline and host

Operating window

Key dates and the organization behind this challenge.

Start date
Available now
Run mode
Evergreen challenge
Explore

Find another challenge

Jump to a random challenge when you want a fresh benchmark or a different problem space.

Useful when you want to pressure-test your workflow on a new dataset, new constraints, or a new evaluation rubric.

Tool Space Recipe

Draft
Evaluation
Rubric: 6 dimensions
·Decision Latency (Average)(1%)
·Plan Feasibility(1%)
·Safety Validation Accuracy(1%)
·Plan Optimality Score(1%)
·TensorRT-LLM Speedup Factor(1%)
·Multi-Model Consistency(1%)
Gold items: 2 (2 public)

Frequently Asked Questions about High-Performance Operational Planning Agent