AI Development
Advanced
Always open

Agentic Video Scene Skipper

This challenge involves building an advanced agentic system that can interpret complex natural language requests to navigate video content. You will leverage Gemini 3 Pro's multimodal understanding and Langroid's robust agent capabilities to process user queries, perform semantic search over video metadata, and execute simulated playback commands. The system must accurately identify specific scenes based on descriptions, character names, or quotes, demonstrating sophisticated hybrid reasoning and MCP tool integration for real-time control of a simulated media player. This project focuses on combining cutting-edge LLMs with specialized agent frameworks and advanced RAG techniques. You will design a graph-based workflow for parsing queries, retrieving relevant video segments, and interacting with external tools, simulating a highly responsive and intelligent content navigation system. Success will require meticulous prompt engineering, efficient data indexing, and robust error handling to deliver a seamless user experience.

Status
Always open
Difficulty
Advanced
Points
500
Start the challenge to track prompts, tools, evaluation progress, and leaderboard position in one workspace.
Challenge at a glance
Host and timing
Vera

AI Research & Mentorship

Starts Available now
Evergreen challenge
Challenge brief

What you are building

The core problem, expected build, and operating context for this challenge.

This challenge involves building an advanced agentic system that can interpret complex natural language requests to navigate video content. You will leverage Gemini 3 Pro's multimodal understanding and Langroid's robust agent capabilities to process user queries, perform semantic search over video metadata, and execute simulated playback commands. The system must accurately identify specific scenes based on descriptions, character names, or quotes, demonstrating sophisticated hybrid reasoning and MCP tool integration for real-time control of a simulated media player. This project focuses on combining cutting-edge LLMs with specialized agent frameworks and advanced RAG techniques. You will design a graph-based workflow for parsing queries, retrieving relevant video segments, and interacting with external tools, simulating a highly responsive and intelligent content navigation system. Success will require meticulous prompt engineering, efficient data indexing, and robust error handling to deliver a seamless user experience.

Datasets

Shared data for this challenge

Review public datasets and any private uploads tied to your build.

Loading datasets...
Learning goals

What you should walk away with

Master Langroid for developing stateful, multi-step conversational agents for complex interactions with external systems.

Implement advanced RAG pipelines with LlamaIndex using hybrid indexing (vector + keyword) for comprehensive video content metadata and transcript segment retrieval.

Design MCP-enabled tool integration for a simulated video player API, allowing Langroid agents to control playback, skip to identified scenes, and retrieve current video status.

Utilize Gemini 3 Pro's multimodal capabilities for understanding nuanced natural language queries about video content and accurately inferring user intent from context.

Build extended thinking patterns within the Langroid agent, enabling it to decompose complex scene descriptions into executable search queries and precise playback commands.

Deploy a lightweight vector database (e.g., ChromaDB, Milvus) for efficient similarity search and retrieval of video scene embeddings and associated metadata.

Develop a robust prompt engineering strategy for Claude Sonnet 4 to refine initial Gemini 2.5 Pro outputs, ensuring precise scene identification and minimizing false positive scene skips.

Your progress

Participation status

You haven't started this challenge yet

Timeline and host

Operating window

Key dates and the organization behind this challenge.

Start date
Available now
Run mode
Evergreen challenge
Explore

Find another challenge

Jump to a random challenge when you want a fresh benchmark or a different problem space.

Useful when you want to pressure-test your workflow on a new dataset, new constraints, or a new evaluation rubric.

Tool Space Recipe

Draft
Evaluation

Frequently Asked Questions about Agentic Video Scene Skipper