GPT-5 Multimodal Tool Integration for Figures

implementationChallengeOctober 31, 2025

Prompt Content

Implement an 'Image Analysis Agent' that uses GPT-5 (or a specialized VLM integrated with GPT-5) to analyze a base64 encoded patent figure. This agent should extract key features, objects, and their relationships described in the image. Use MCP to mock calling a 'Getty Images API' or similar service if GPT-5's multimodal capabilities are not sufficient for complex diagram interpretation. The output should be structured textual descriptions suitable for feeding into the graph RAG.

Run with your own API keysBYOK

Use your Anthropic, OpenAI, or Vertex keys to execute this prompt directly in Vera. keys are stored locally in your browser.

Usage Tips

Copy the prompt and paste it into your preferred AI tool (Claude, ChatGPT, Gemini)

Customize placeholder values with your specific requirements and context

For best results, provide clear examples and test different variations