Agent Building
Advanced
Always open

Real-time Voice Assistant with Personalized Context

Develop a sophisticated, real-time voice assistant capable of transcribing spoken queries, understanding context, and providing personalized responses. This challenge involves integrating advanced speech-to-text capabilities, managing conversational state, and leveraging a dynamic knowledge base. The solution will demonstrate the power of OpenAI's agentic capabilities for complex, multi-turn interactions, ensuring smooth user experience akin to next-generation AI assistants. Focus on designing an agent that can not only answer questions but also infer user intent from conversational flow and adapt its responses based on historical interactions and profile data, all while maintaining low latency for a fluid conversational experience.

Challenge brief

What you are building

The core problem, expected build, and operating context for this challenge.

Develop a sophisticated, real-time voice assistant capable of transcribing spoken queries, understanding context, and providing personalized responses. This challenge involves integrating advanced speech-to-text capabilities, managing conversational state, and leveraging a dynamic knowledge base. The solution will demonstrate the power of OpenAI's agentic capabilities for complex, multi-turn interactions, ensuring smooth user experience akin to next-generation AI assistants. Focus on designing an agent that can not only answer questions but also infer user intent from conversational flow and adapt its responses based on historical interactions and profile data, all while maintaining low latency for a fluid conversational experience.

Datasets

Shared data for this challenge

Review public datasets and any private uploads tied to your build.

Loading datasets...
Evaluation rubric

How submissions are scored

These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.

Max Score: 6
Dimensions
6 scoring checks
Binary
6 pass or fail dimensions
Ordinal
0 scaled dimensions
Dimension 1diarization_accuracy

Diarization Accuracy

Checks if speaker diarization is correctly identified for at least 2 distinct speakers in multi-speaker audio.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 2personalized_context_use

Personalized Context Use

Verifies that the agent response explicitly leverages personalized context (e.g., mentioning specific meeting details, user preferences).

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 3safety_compliance_giskard

Safety Compliance (Giskard)

Ensures the response does not contain any detected safety violations flagged by Giskard.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 4transcription_word_error_rate

Transcription Word Error Rate

Measures the accuracy of the speech-to-text transcription. • target: 0.05 • range: 0-0.2

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 5response_coherence_score

Response Coherence Score

Evaluates the logical flow and relevance of the agent's response to the query and context (0-1). • target: 0.9 • range: 0.7-1

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 6response_latency

Response Latency

Measures the time taken from audio input end to response start (in seconds). • target: 1.5 • range: 0-3

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Learning goals

What you should walk away with

Master the OpenAI Agents SDK for defining agent capabilities, tools, and conversational memory.

Implement robust audio processing pipelines using OpenAI's Whisper API for accurate speech-to-text and speaker diarization.

Design and build custom tools/functions for the OpenAI Agent to access external services and a personalized knowledge base.

Utilize Featuretools to generate dynamic user features from interaction history for personalized context management.

Configure and deploy a custom lightweight model (e.g., for intent classification or sentiment analysis) using TorchServe, accessible via agent tools.

Integrate Giskard for continuous evaluation of the agent's responses, ensuring accuracy, coherence, and adherence to safety policies.

Orchestrate a real-time interaction loop, handling audio input, agent processing, and synthesized speech output for a fluid user experience.

Start from your terminal
$npx -y @versalist/cli start real-time-voice-assistant-with-personalized-context

[ok] Wrote CHALLENGE.md

[ok] Wrote .versalist.json

[ok] Wrote eval/examples.json

Requires VERSALIST_API_KEY. Works with any MCP-aware editor.

Docs
Manage API keys
Challenge at a glance
Host and timing
Vera

AI Research & Mentorship

Starts Available now
Evergreen challenge
Your progress

Participation status

You haven't started this challenge yet

Timeline and host

Operating window

Key dates and the organization behind this challenge.

Start date
Available now
Run mode
Evergreen challenge
Explore

Find another challenge

Jump to a random challenge when you want a fresh benchmark or a different problem space.

Useful when you want to pressure-test your workflow on a new dataset, new constraints, or a new evaluation rubric.

Tool Space Recipe

Draft
Evaluation
Rubric: 6 dimensions
·Diarization Accuracy(1%)
·Personalized Context Use(1%)
·Safety Compliance (Giskard)(1%)
·Transcription Word Error Rate(1%)
·Response Coherence Score(1%)
·Response Latency(1%)
Gold items: 1 (1 public)

Frequently Asked Questions about Real-time Voice Assistant with Personalized Context