Multimodal Content Generation Agent for AI Video Platform
Design and build an advanced multimodal agent using Google ADK and Gemini 3 Pro that specializes in generating creative content ideas, scripts, and visual concepts for short-form AI video platforms, similar to Meta's 'Vibes'. The agent should analyze current trends (e.g., popular memes, news topics, user preferences stored in a vector database) and generate novel, engaging video concepts. It should be capable of orchestrating calls to external tools like Stable Diffusion XL for generating visual mood boards or Triton Inference Server for specialized video analysis models. The challenge emphasizes multimodal reasoning, creative generation, and robust workflow orchestration.
What you are building
The core problem, expected build, and operating context for this challenge.
Design and build an advanced multimodal agent using Google ADK and Gemini 3 Pro that specializes in generating creative content ideas, scripts, and visual concepts for short-form AI video platforms, similar to Meta's 'Vibes'. The agent should analyze current trends (e.g., popular memes, news topics, user preferences stored in a vector database) and generate novel, engaging video concepts. It should be capable of orchestrating calls to external tools like Stable Diffusion XL for generating visual mood boards or Triton Inference Server for specialized video analysis models. The challenge emphasizes multimodal reasoning, creative generation, and robust workflow orchestration.
Shared data for this challenge
Review public datasets and any private uploads tied to your build.
How submissions are scored
These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.
JSON Format Validity
Ensure the generated output adheres to the specified JSON schema.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
Topic Relevance
Verify that the generated concept directly relates to the trending topic.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
Creative Score
Subjective score based on originality, engagement, and novelty of the concept. • target: 4 • range: 1-5
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
Multimodal Coherence
Consistency between text concept, script, and visual moodboard prompt. • target: 0.9 • range: 0-1
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
Script Detail Level
Measure of how detailed and actionable the script outline is. • target: 0.8 • range: 0-1
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
What you should walk away with
Master the Google ADK for defining agent components, tool use, and multimodal input/output processing with Gemini 3 Pro.
Implement techniques for integrating Gemini 3 Pro's multimodal reasoning to analyze visual trends and generate coherent video narratives.
Design a content trend analysis pipeline using Weaviate vector database to store and retrieve contextual information like popular memes, news, and user feedback.
Integrate Stable Diffusion XL as a tool callable by the agent to generate visual mood boards or keyframe concepts based on textual descriptions.
Orchestrate complex agent workflows using Prefect, ensuring reliable execution, retry mechanisms, and dependency management for multimodal tasks.
Deploy custom video analysis or processing models on Triton Inference Server, enabling the Google ADK agent to invoke them for specific tasks like scene detection or style transfer.
Develop strategies for continuous content trend ingestion and analysis, keeping the agent's knowledge base fresh and relevant.
[ok] Wrote CHALLENGE.md
[ok] Wrote .versalist.json
[ok] Wrote eval/examples.json
Requires VERSALIST_API_KEY. Works with any MCP-aware editor.
DocsAI Research & Mentorship
Participation status
You haven't started this challenge yet
Operating window
Key dates and the organization behind this challenge.
Find another challenge
Jump to a random challenge when you want a fresh benchmark or a different problem space.