SkillProof Document Agent
This challenge involves creating an AI agent using the Mastra AI TypeScript framework that generates and verifies 'SkillProof' documents – an AI-native standard designed to replace traditional credentials for displaying professional skills and 'vibe coding'. The agent will leverage Qwen 2 for nuanced skill description and personality assessment generation, while integrating custom tools for external verification of AI proficiency levels. A key aspect is the implementation of a conversational voice interface via Sarvam AI, allowing users to interact naturally to create and modify their SkillProof documents, demonstrating advanced natural language processing and structured output generation.
What you are building
The core problem, expected build, and operating context for this challenge.
This challenge involves creating an AI agent using the Mastra AI TypeScript framework that generates and verifies 'SkillProof' documents – an AI-native standard designed to replace traditional credentials for displaying professional skills and 'vibe coding'. The agent will leverage Qwen 2 for nuanced skill description and personality assessment generation, while integrating custom tools for external verification of AI proficiency levels. A key aspect is the implementation of a conversational voice interface via Sarvam AI, allowing users to interact naturally to create and modify their SkillProof documents, demonstrating advanced natural language processing and structured output generation.
Shared data for this challenge
Review public datasets and any private uploads tied to your build.
How submissions are scored
These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.
DocumentSchemaAdherence
Generated document strictly adheres to the defined JSON schema.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
CorrectVerificationStatus
Verification status is correct based on proficiency and threshold.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
DescriptionQualityScore
LLM-generated description quality and relevance (1-5). • target: 4 • range: 1-5
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
VibeCodingRelevance
Relevance of the generated vibe coding assessment to the skill (1-5). • target: 4 • range: 1-5
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
What you should walk away with
Master Mastra AI's core concepts including agent definition, memory system, and tool integration for building complex workflows.
Implement structured output generation using Qwen 2 to adhere to a predefined 'SkillProof' document schema (e.g., JSON-LD or similar AI-native format).
Utilize Mastra AI's built-in memory capabilities (e.g., with Redis) to persistently store draft documents and user preferences, enabling multi-turn document refinement.
Develop a custom tool, 'verify_ai_proficiency', that takes skill descriptions and provides a mock verification status and score.
Integrate Sarvam AI to create a conversational voice interface, allowing users to verbally dictate document content and request verification checks.
Design prompts for OpenAI o3 to parse natural language input into structured data for the 'SkillProof' document and to generate a 'vibe coding' personality assessment.
Implement a document versioning and storage mechanism (e.g., saving to a local file system or mock cloud storage) as part of the agent's capabilities.
[ok] Wrote CHALLENGE.md
[ok] Wrote .versalist.json
[ok] Wrote eval/examples.json
Requires VERSALIST_API_KEY. Works with any MCP-aware editor.
DocsAI Research & Mentorship
Participation status
You haven't started this challenge yet
Operating window
Key dates and the organization behind this challenge.
Find another challenge
Jump to a random challenge when you want a fresh benchmark or a different problem space.