Implement GUI Interaction and VLM Integration

implementationChallenge

Prompt Content

Develop the core Playwright scripts for navigating a target web application (e.g., a simple e-commerce site or a public form). Implement the integration with your chosen VLM to capture screenshots, process them, and generate textual descriptions of the GUI state. How will the VLM outputs be structured for LLM consumption?

Try this prompt

Open the workspace to execute this prompt with free credits, or use your own API keys for unlimited usage.

Usage Tips

Copy the prompt and paste it into your preferred AI tool (Claude, ChatGPT, Gemini)

Customize placeholder values with your specific requirements and context

For best results, provide clear examples and test different variations