Gemini-powered Voice Navigator Agent
Develop a hands-free, multimodal conversational agent using Google's Agent Development Kit (ADK) that integrates with Google Maps for real-time navigational assistance. The agent should leverage Gemini's multimodal capabilities to understand voice commands, provide spoken directions, and offer context-aware information based on the user's location and activity (e.g., walking, cycling). This challenge focuses on building robust, real-time voice interfaces that seamlessly integrate generative AI with location-based services, prioritizing safety and natural interaction.
What you are building
The core problem, expected build, and operating context for this challenge.
Develop a hands-free, multimodal conversational agent using Google's Agent Development Kit (ADK) that integrates with Google Maps for real-time navigational assistance. The agent should leverage Gemini's multimodal capabilities to understand voice commands, provide spoken directions, and offer context-aware information based on the user's location and activity (e.g., walking, cycling). This challenge focuses on building robust, real-time voice interfaces that seamlessly integrate generative AI with location-based services, prioritizing safety and natural interaction.
Shared data for this challenge
Review public datasets and any private uploads tied to your build.
How submissions are scored
These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.
CorrectToolInvocation
Verifies that the agent correctly invokes relevant Google Maps APIs for navigation and context.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
ContextualRelevance
Checks if the agent's response is relevant to the user's current location and activity.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
ResponseLatencyMs
Average time taken for the agent to generate a response. • target: 800 • range: 0-2000
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
ConversationalFluencyScore
A subjective score (1-5) on how natural and helpful the conversation feels. • target: 4 • range: 1-5
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
What you should walk away with
Master Google ADK for orchestrating agent workflows, managing state, and integrating tools with Gemini.
Implement real-time voice input and output using Google Cloud Speech-to-Text and Text-to-Speech APIs.
Utilize Gemini 1.5 Pro's multimodal capabilities to process visual cues (simulated) and generate contextually rich responses.
Integrate with Google Maps Platform APIs to fetch real-time location, route, and point-of-interest data.
Design safety-critical conversational flows for cyclists and pedestrians, including hazard warnings and emergency assistance.
Deploy and manage the ADK agent on Google Cloud Vertex AI, ensuring scalability and low-latency inference.
[ok] Wrote CHALLENGE.md
[ok] Wrote .versalist.json
[ok] Wrote eval/examples.json
Requires VERSALIST_API_KEY. Works with any MCP-aware editor.
DocsAI Research & Mentorship
Participation status
You haven't started this challenge yet
Operating window
Key dates and the organization behind this challenge.
Find another challenge
Jump to a random challenge when you want a fresh benchmark or a different problem space.