implementation

Implement Video Processing & Qwen3-VL Integration

Inspect the original prompt language first, then copy or adapt it once you know how it fits your workflow.

Linked challenge: Multimodal Video Intelligence with Qwen3-VL, GPT-5 & LlamaIndex

Format

Text-first

Lines

Sections

Linked challenge

Multimodal Video Intelligence with Qwen3-VL, GPT-5 & LlamaIndex

Prompt source

Original prompt text with formatting preserved for inspection.

1 lines

1 sections

No variables

0 checklist items

Build the video preprocessing module (segmentation, audio extraction, frame sampling). Integrate Qwen3-VL to generate rich visual descriptions and object detections for each segment. Show how these features are prepared for LlamaIndex.

Adaptation plan

Keep the source stable, then change the prompt in a predictable order so the next run is easier to evaluate.

Keep stable

Hold the task contract and output shape stable so generated implementations remain comparable.

Tune next

Update libraries, interfaces, and environment assumptions to match the stack you actually run.

Verify after

Test failure handling, edge cases, and any code paths that depend on hidden context or secrets.

Prompt diagnostics

Variables

Lists

Code blocks

Purpose

implementation

This prompt is mostly narrative and instruction-driven, so adapt examples and output constraints before you rewrite the structure.

Linked challenge

Multimodal Video Intelligence with Qwen3-VL, GPT-5 & LlamaIndex

Inspired by advancements in long-context multimodal understanding, this challenge tasks you with building a cutting-edge video intelligence system. You will integrate the Qwen3-VL model for robust video and image analysis with GPT-5 for higher-level reasoning and synthesis. The system will leverage LlamaIndex for advanced RAG over multimodal data, allowing it to accurately answer complex 'needle-in-a-haystack' queries spanning long video durations. The core of the system will involve processing entire 30-minute video segments, extracting key visual and auditory information, generating multimodal embeddings, and indexing them using LlamaIndex. An OpenAI Swarm-like orchestration will manage specialized agents that collaborate using an A2A protocol to perform visual search, event detection, and generate comprehensive summaries. MCP could be used to facilitate access to external video processing tools or contextual databases.

Open challenge

Related prompts

Browse library

Design Multimodal RAG Pipeline

planning

Orchestrate Swarm Agents & GPT-5 Synthesis

implementation

Execute 'Needle-in-a-Haystack' Queries

testing