Multimodal 3D Object Verification
Leveraging the advancements in multimodal AI and 3D vision models like Meta's SAM 3D, this challenge tasks you with building a multimodal agent system using Semantic Kernel. Your agent will act as a '3D Model Quality Assurance' specialist. It will receive a natural language request along with a simulated '3D scene description' (derived from SAM 3D output) and verify if objects within the scene meet specified criteria. The Gemini 2.5 Pro model will be at the core, orchestrating visual analysis tools (simulated APIs for SAM 3D) and performing extended reasoning to identify discrepancies or compliance issues.
AI Research & Mentorship
What you are building
The core problem, expected build, and operating context for this challenge.
Leveraging the advancements in multimodal AI and 3D vision models like Meta's SAM 3D, this challenge tasks you with building a multimodal agent system using Semantic Kernel. Your agent will act as a '3D Model Quality Assurance' specialist. It will receive a natural language request along with a simulated '3D scene description' (derived from SAM 3D output) and verify if objects within the scene meet specified criteria. The Gemini 2.5 Pro model will be at the core, orchestrating visual analysis tools (simulated APIs for SAM 3D) and performing extended reasoning to identify discrepancies or compliance issues.
Shared data for this challenge
Review public datasets and any private uploads tied to your build.
What you should walk away with
Master Semantic Kernel's planning and plugin architecture for orchestrating multimodal capabilities and external tools.
Implement plugins that simulate calls to Meta's SAM 3D API, receiving structured '3D scene descriptions' (e.g., JSON representation of detected objects, their positions, and attributes).
Design an extended thinking pipeline where Gemini 2.5 Pro iteratively refines its understanding and verification process, performing multiple steps of analysis against the provided 3D data.
Utilize Gemini 2.5 Pro's multimodal input capabilities to process both the natural language request and the simulated 3D scene description for comprehensive analysis.
Develop specific reasoning patterns to identify common issues in 3D models, such as incorrect scaling, misalignment, missing components, or color discrepancies, based on criteria.
Build a feedback loop within Semantic Kernel's planner, allowing the agent to self-correct and re-evaluate its findings if initial assessments are inconclusive.
Create a user interface (simple command-line or web-based) to submit verification requests and display the agent's detailed findings and recommendations.
Participation status
You haven't started this challenge yet
Operating window
Key dates and the organization behind this challenge.
Find another challenge
Jump to a random challenge when you want a fresh benchmark or a different problem space.