AI-Assisted Flight Operations Agent
This challenge focuses on building an AI-powered flight assistant using the Vercel AI SDK. This assistant will provide real-time guidance, perform safety checks, and assist with complex operational procedures through a conversational, voice-enabled interface. You will integrate Claude Sonnet 4 for robust reasoning and Llama 3 for quick, localized responses (via Hugging Face Inference Endpoints). The system will leverage a low-code automation platform like Hyperbolic for connecting to simulated flight control systems and OpenTelemetry for robust observability of AI interactions and system state, ensuring safety and compliance in critical aerospace operations.
What you are building
The core problem, expected build, and operating context for this challenge.
This challenge focuses on building an AI-powered flight assistant using the Vercel AI SDK. This assistant will provide real-time guidance, perform safety checks, and assist with complex operational procedures through a conversational, voice-enabled interface. You will integrate Claude Sonnet 4 for robust reasoning and Llama 3 for quick, localized responses (via Hugging Face Inference Endpoints). The system will leverage a low-code automation platform like Hyperbolic for connecting to simulated flight control systems and OpenTelemetry for robust observability of AI interactions and system state, ensuring safety and compliance in critical aerospace operations.
Shared data for this challenge
Review public datasets and any private uploads tied to your build.
How submissions are scored
These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.
CorrectToolExecution
The assistant must execute the correct sequence of tools based on the voice command and flight state.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
ContextualResponseAccuracy
The 'agent_response' must be contextually appropriate and informative based on the simulated actions.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
ResponseLatency
Time taken from voice command processing to agent response, lower is better. • target: 300 • range: 0-1000
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
ObservabilityCompleteness
Score based on the completeness and correctness of OpenTelemetry traces for the interaction, higher is better. • target: 90 • range: 0-100
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
What you should walk away with
Master the Vercel AI SDK for building streaming, tool-using conversational interfaces in TypeScript/JavaScript.
Implement voice input and output using Fixie (or a similar Web Speech API integration) to create a natural, hands-free interaction experience.
Design intelligent tool calls within the AI SDK for interacting with a simulated flight control system (e.g., 'check_fuel_level', 'initiate_autopilot_sequence').
Integrate Claude Sonnet 4 for critical reasoning tasks and complex procedure interpretation, leveraging its strong safety and reliability features.
Utilize Llama 3 via Hugging Face Inference Endpoints for quicker, context-specific responses or simpler control commands.
Connect the AI SDK application to a simulated flight system via Hyperbolic (or a mock API gateway) for triggering external actions and fetching real-time data.
Implement OpenTelemetry tracing and logging within the AI SDK application to monitor user interactions, agent decisions, and tool executions for auditing and debugging flight-critical operations.
[ok] Wrote CHALLENGE.md
[ok] Wrote .versalist.json
[ok] Wrote eval/examples.json
Requires VERSALIST_API_KEY. Works with any MCP-aware editor.
DocsAI Research & Mentorship
Participation status
You haven't started this challenge yet
Operating window
Key dates and the organization behind this challenge.
Find another challenge
Jump to a random challenge when you want a fresh benchmark or a different problem space.