Agent Building
Advanced
Always open

Multimodal Content Generation Agent for AI Video Platform

Design and build an advanced multimodal agent using Google ADK and Gemini 3 Pro that specializes in generating creative content ideas, scripts, and visual concepts for short-form AI video platforms, similar to Meta's 'Vibes'. The agent should analyze current trends (e.g., popular memes, news topics, user preferences stored in a vector database) and generate novel, engaging video concepts. It should be capable of orchestrating calls to external tools like Stable Diffusion XL for generating visual mood boards or Triton Inference Server for specialized video analysis models. The challenge emphasizes multimodal reasoning, creative generation, and robust workflow orchestration.

Challenge brief

What you are building

The core problem, expected build, and operating context for this challenge.

Design and build an advanced multimodal agent using Google ADK and Gemini 3 Pro that specializes in generating creative content ideas, scripts, and visual concepts for short-form AI video platforms, similar to Meta's 'Vibes'. The agent should analyze current trends (e.g., popular memes, news topics, user preferences stored in a vector database) and generate novel, engaging video concepts. It should be capable of orchestrating calls to external tools like Stable Diffusion XL for generating visual mood boards or Triton Inference Server for specialized video analysis models. The challenge emphasizes multimodal reasoning, creative generation, and robust workflow orchestration.

Datasets

Shared data for this challenge

Review public datasets and any private uploads tied to your build.

Loading datasets...
Evaluation rubric

How submissions are scored

These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.

Max Score: 5
Dimensions
5 scoring checks
Binary
5 pass or fail dimensions
Ordinal
0 scaled dimensions
Dimension 1json_format_validity

JSON Format Validity

Ensure the generated output adheres to the specified JSON schema.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 2topic_relevance

Topic Relevance

Verify that the generated concept directly relates to the trending topic.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 3creative_score

Creative Score

Subjective score based on originality, engagement, and novelty of the concept. • target: 4 • range: 1-5

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 4multimodal_coherence

Multimodal Coherence

Consistency between text concept, script, and visual moodboard prompt. • target: 0.9 • range: 0-1

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 5script_detail_level

Script Detail Level

Measure of how detailed and actionable the script outline is. • target: 0.8 • range: 0-1

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Learning goals

What you should walk away with

Master the Google ADK for defining agent components, tool use, and multimodal input/output processing with Gemini 3 Pro.

Implement techniques for integrating Gemini 3 Pro's multimodal reasoning to analyze visual trends and generate coherent video narratives.

Design a content trend analysis pipeline using Weaviate vector database to store and retrieve contextual information like popular memes, news, and user feedback.

Integrate Stable Diffusion XL as a tool callable by the agent to generate visual mood boards or keyframe concepts based on textual descriptions.

Orchestrate complex agent workflows using Prefect, ensuring reliable execution, retry mechanisms, and dependency management for multimodal tasks.

Deploy custom video analysis or processing models on Triton Inference Server, enabling the Google ADK agent to invoke them for specific tasks like scene detection or style transfer.

Develop strategies for continuous content trend ingestion and analysis, keeping the agent's knowledge base fresh and relevant.

Start from your terminal
$npx -y @versalist/cli start multimodal-content-generation-agent-for-ai-video-platform

[ok] Wrote CHALLENGE.md

[ok] Wrote .versalist.json

[ok] Wrote eval/examples.json

Requires VERSALIST_API_KEY. Works with any MCP-aware editor.

Docs
Manage API keys
Challenge at a glance
Host and timing
Vera

AI Research & Mentorship

Starts Available now
Evergreen challenge
Your progress

Participation status

You haven't started this challenge yet

Timeline and host

Operating window

Key dates and the organization behind this challenge.

Start date
Available now
Run mode
Evergreen challenge
Explore

Find another challenge

Jump to a random challenge when you want a fresh benchmark or a different problem space.

Useful when you want to pressure-test your workflow on a new dataset, new constraints, or a new evaluation rubric.

Tool Space Recipe

Draft
Evaluation
Rubric: 5 dimensions
·JSON Format Validity(1%)
·Topic Relevance(1%)
·Creative Score(1%)
·Multimodal Coherence(1%)
·Script Detail Level(1%)
Gold items: 1 (1 public)

Frequently Asked Questions about Multimodal Content Generation Agent for AI Video Platform