AI Development
Advanced
Always open

Multi-Modal Creative Agents for Text-to-Video Storyboarding with CrewAI

This challenge tasks you with building a multi-modal creative agent team. You will use CrewAI to orchestrate specialized agents powered by Gemini 3 and OpenAI GPT 5.1 Pro to generate, storyboard, and critique concepts for short videos from a given text prompt. The system must leverage hybrid instant/deep reasoning, RAG for creative inspiration, and dynamically adapt its thinking budget based on the complexity of the creative brief. The final output should be a detailed storyboard plan, including visual descriptions, suggested camera angles, and a critical evaluation of the concept's potential.

Status
Always open
Difficulty
Advanced
Points
500
Start the challenge to track prompts, tools, evaluation progress, and leaderboard position in one workspace.
Challenge at a glance
Host and timing
Vera

AI Research & Mentorship

Starts Available now
Evergreen challenge
Challenge brief

What you are building

The core problem, expected build, and operating context for this challenge.

This challenge tasks you with building a multi-modal creative agent team. You will use CrewAI to orchestrate specialized agents powered by Gemini 3 and OpenAI GPT 5.1 Pro to generate, storyboard, and critique concepts for short videos from a given text prompt. The system must leverage hybrid instant/deep reasoning, RAG for creative inspiration, and dynamically adapt its thinking budget based on the complexity of the creative brief. The final output should be a detailed storyboard plan, including visual descriptions, suggested camera angles, and a critical evaluation of the concept's potential.

Datasets

Shared data for this challenge

Review public datasets and any private uploads tied to your build.

Loading datasets...
Learning goals

What you should walk away with

Master CrewAI for defining and orchestrating role-based agent teams with specific tools, goals, and backstories for creative tasks.

Implement hybrid reasoning patterns, utilizing Gemini 3's multi-modal capabilities for initial concept generation (instant reasoning) and deeper visual analysis (deep reasoning).

Leverage OpenAI GPT 5.1 for sophisticated textual critique and refinement of generated storyboards, acting as a 'Creative Director' agent.

Integrate advanced RAG techniques to provide agents with a rich context of film theory, visual styles, and creative brief examples, enhancing creative output quality.

Develop a 'Visualizer Agent' and a 'Critique Agent' capable of processing and generating multi-modal descriptions, ensuring alignment with the text-to-video paradigm.

Design mechanisms for adaptive thinking budgets, allowing agents to allocate more computational resources for complex creative challenges or critical evaluation phases.

Implement a feedback loop within the CrewAI agents, where critique agents can refine storyboards generated by conceptual agents, iterating towards a higher quality output.

Your progress

Participation status

You haven't started this challenge yet

Timeline and host

Operating window

Key dates and the organization behind this challenge.

Start date
Available now
Run mode
Evergreen challenge
Explore

Find another challenge

Jump to a random challenge when you want a fresh benchmark or a different problem space.

Useful when you want to pressure-test your workflow on a new dataset, new constraints, or a new evaluation rubric.

Tool Space Recipe

Draft
Evaluation

Frequently Asked Questions about Multi-Modal Creative Agents for Text-to-Video Storyboarding with CrewAI