AI Development
Advanced
Always open

Multimodal 3D Object Verification

Leveraging the advancements in multimodal AI and 3D vision models like Meta's SAM 3D, this challenge tasks you with building a multimodal agent system using Semantic Kernel. Your agent will act as a '3D Model Quality Assurance' specialist. It will receive a natural language request along with a simulated '3D scene description' (derived from SAM 3D output) and verify if objects within the scene meet specified criteria. The Gemini 2.5 Pro model will be at the core, orchestrating visual analysis tools (simulated APIs for SAM 3D) and performing extended reasoning to identify discrepancies or compliance issues.

Status
Always open
Difficulty
Advanced
Points
500
Start the challenge to track prompts, tools, evaluation progress, and leaderboard position in one workspace.
Challenge at a glance
Host and timing
Vera

AI Research & Mentorship

Starts Available now
Evergreen challenge
Challenge brief

What you are building

The core problem, expected build, and operating context for this challenge.

Leveraging the advancements in multimodal AI and 3D vision models like Meta's SAM 3D, this challenge tasks you with building a multimodal agent system using Semantic Kernel. Your agent will act as a '3D Model Quality Assurance' specialist. It will receive a natural language request along with a simulated '3D scene description' (derived from SAM 3D output) and verify if objects within the scene meet specified criteria. The Gemini 2.5 Pro model will be at the core, orchestrating visual analysis tools (simulated APIs for SAM 3D) and performing extended reasoning to identify discrepancies or compliance issues.

Datasets

Shared data for this challenge

Review public datasets and any private uploads tied to your build.

Loading datasets...
Learning goals

What you should walk away with

Master Semantic Kernel's planning and plugin architecture for orchestrating multimodal capabilities and external tools.

Implement plugins that simulate calls to Meta's SAM 3D API, receiving structured '3D scene descriptions' (e.g., JSON representation of detected objects, their positions, and attributes).

Design an extended thinking pipeline where Gemini 2.5 Pro iteratively refines its understanding and verification process, performing multiple steps of analysis against the provided 3D data.

Utilize Gemini 2.5 Pro's multimodal input capabilities to process both the natural language request and the simulated 3D scene description for comprehensive analysis.

Develop specific reasoning patterns to identify common issues in 3D models, such as incorrect scaling, misalignment, missing components, or color discrepancies, based on criteria.

Build a feedback loop within Semantic Kernel's planner, allowing the agent to self-correct and re-evaluate its findings if initial assessments are inconclusive.

Create a user interface (simple command-line or web-based) to submit verification requests and display the agent's detailed findings and recommendations.

Your progress

Participation status

You haven't started this challenge yet

Timeline and host

Operating window

Key dates and the organization behind this challenge.

Start date
Available now
Run mode
Evergreen challenge
Explore

Find another challenge

Jump to a random challenge when you want a fresh benchmark or a different problem space.

Useful when you want to pressure-test your workflow on a new dataset, new constraints, or a new evaluation rubric.

Tool Space Recipe

Draft
Evaluation

Frequently Asked Questions about Multimodal 3D Object Verification