AI Development
Advanced
Always open

Multimodal Asset Generation

This challenge involves building an advanced generative AI system capable of producing creative marketing assets, including images with embedded text, based on complex briefs and brand guidelines. Leveraging the multimodal capabilities of Gemini 3 and Nano Banana Pro, participants will orchestrate a workflow that not only generates visually compelling images but also ensures accurate and contextually relevant text rendering directly within the image. The core of this challenge lies in integrating prompt optimization techniques using DSPy with sophisticated knowledge retrieval via LlamaIndex. This hybrid approach enables the system to dynamically adapt prompts for Gemini 3 and Nano Banana Pro, ensuring adherence to brand style guides and creative objectives fetched through RAG, while also self-correcting for improved text fidelity and image quality. This system will simulate a creative agency assistant, transforming abstract marketing concepts into concrete visual outputs.

Challenge brief

What you are building

The core problem, expected build, and operating context for this challenge.

This challenge involves building an advanced generative AI system capable of producing creative marketing assets, including images with embedded text, based on complex briefs and brand guidelines. Leveraging the multimodal capabilities of Gemini 3 and Nano Banana Pro, participants will orchestrate a workflow that not only generates visually compelling images but also ensures accurate and contextually relevant text rendering directly within the image. The core of this challenge lies in integrating prompt optimization techniques using DSPy with sophisticated knowledge retrieval via LlamaIndex. This hybrid approach enables the system to dynamically adapt prompts for Gemini 3 and Nano Banana Pro, ensuring adherence to brand style guides and creative objectives fetched through RAG, while also self-correcting for improved text fidelity and image quality. This system will simulate a creative agency assistant, transforming abstract marketing concepts into concrete visual outputs.

Datasets

Shared data for this challenge

Review public datasets and any private uploads tied to your build.

Loading datasets...
Learning goals

What you should walk away with

Master multimodal prompt engineering for Gemini 3 and Nano Banana Pro to control image composition, style, and embedded text attributes.

Implement DSPy's `Signature` and `Predict` modules to design a declarative pipeline for generating images and optimizing text rendering quality.

Integrate LlamaIndex with vector databases to perform RAG on a corpus of brand guidelines, marketing assets, and style examples, feeding context into DSPy prompts.

Build a feedback loop using DSPy's `BootstrapFewShot` or custom metrics to iteratively refine prompts and improve generated image text accuracy and aesthetic quality.

Develop a mechanism for parsing and validating text content within generated images, ensuring consistency with input requirements and brand messaging.

Design a scalable architecture for deploying multimodal generative agents, considering API rate limits and computational resources.

Explore advanced techniques for zero-shot and few-shot multimodal generation using Gemini 3 and Nano Banana Pro within the DSPy framework.

Start from your terminal
$npx -y @versalist/cli start multimodal-asset-generation

[ok] Wrote CHALLENGE.md

[ok] Wrote .versalist.json

[ok] Wrote eval/examples.json

Requires VERSALIST_API_KEY. Works with any MCP-aware editor.

Docs
Manage API keys
Challenge at a glance
Host and timing
Vera

AI Research & Mentorship

Starts Available now
Evergreen challenge
Your progress

Participation status

You haven't started this challenge yet

Timeline and host

Operating window

Key dates and the organization behind this challenge.

Start date
Available now
Run mode
Evergreen challenge
Explore

Find another challenge

Jump to a random challenge when you want a fresh benchmark or a different problem space.

Useful when you want to pressure-test your workflow on a new dataset, new constraints, or a new evaluation rubric.

Tool Space Recipe

Draft
Evaluation

Frequently Asked Questions about Multimodal Asset Generation