Question 1

What is the Multimodal 3D Object Verification  challenge on Versalist?

Accepted Answer

Leveraging the advancements in multimodal AI and 3D vision models like Meta's SAM 3D, this challenge tasks you with building a multimodal agent system using Semantic Kernel. Your agent will act as a '3D Model Quality Assurance' specialist. It will receive a natural language request along with a simulated '3D scene description' (derived from SAM 3D output) and verify if objects within the scene meet specified criteria. The Gemini 2.5 Pro model will be at the core, orchestrating visual analysis tools (simulated APIs for SAM 3D) and performing extended reasoning to identify discrepancies or compliance issues.

Question 2

What difficulty level is Multimodal 3D Object Verification ?

Accepted Answer

Rated Advanced. estimated time: 3-4 days. 500 points on completion.

Question 3

What will I learn from Multimodal 3D Object Verification ?

Accepted Answer

Master Semantic Kernel's planning and plugin architecture for orchestrating multimodal capabilities and external tools.. Implement plugins that simulate calls to Meta's SAM 3D API, receiving structured '3D scene descriptions' (e.g., JSON representation of detected objects, their positions, and attributes).. Design an extended thinking pipeline where Gemini 2.5 Pro iteratively refines its understanding and verification process, performing multiple steps of analysis against the provided 3D data.. Utilize Gemini 2.5 Pro's multimodal input capabilities to process both the natural language request and the simulated 3D scene description for comprehensive analysis.. Develop specific reasoning patterns to identify common issues in 3D models, such as incorrect scaling, misalignment, missing components, or color discrepancies, based on criteria.. Build a feedback loop within Semantic Kernel's planner, allowing the agent to self-correct and re-evaluate its findings if initial assessments are inconclusive.. Create a user interface (simple command-line or web-based) to submit verification requests and display the agent's detailed findings and recommendations..

Multimodal 3D Object Verification

What you are building

Shared data for this challenge

What you should walk away with

Participation status

Operating window

Find another challenge

Tool Space Recipe

Frequently Asked Questions about Multimodal 3D Object Verification