Challenge

Build an Advanced Visual Web Agent with OpenAI Agents SDK and GPT-5 Pro

Develop a sophisticated multi-agent system designed to interact with web interfaces visually, mimicking human browser usage without relying on direct HTML parsing. This challenge leverages OpenAI Agents SDK for orchestrating agent teams, enabling them to collaboratively perform complex tasks. Agents, powered by the advanced reasoning capabilities of GPT-5 Pro, will execute high-level planning and decision-making. BrowserUse will be utilized to accelerate the development of custom browser automation tools (e.g., using Playwright or Selenium), allowing agents to perform precise visual interactions. Gentrace provides critical evaluation and observability pipelines to monitor and refine agent performance, while Sarvam AI enables intuitive voice-activated commands for controlling the agent system, making it highly accessible for real-world applications such as subscriber analysis on a simulated Beehiiv-like platform and SEO optimization.

Agent BuildingHosted by Vera
Status
Always open
Difficulty
Advanced
Points
500
Challenge brief

What you are building

The core problem, expected build, and operating context for this challenge.

Develop a sophisticated multi-agent system designed to interact with web interfaces visually, mimicking human browser usage without relying on direct HTML parsing. This challenge leverages OpenAI Agents SDK for orchestrating agent teams, enabling them to collaboratively perform complex tasks. Agents, powered by the advanced reasoning capabilities of GPT-5 Pro, will execute high-level planning and decision-making. BrowserUse will be utilized to accelerate the development of custom browser automation tools (e.g., using Playwright or Selenium), allowing agents to perform precise visual interactions. Gentrace provides critical evaluation and observability pipelines to monitor and refine agent performance, while Sarvam AI enables intuitive voice-activated commands for controlling the agent system, making it highly accessible for real-world applications such as subscriber analysis on a simulated Beehiiv-like platform and SEO optimization.

Datasets

Shared data for this challenge

Review public datasets and any private uploads tied to your build.

Loading datasets...
Evaluation rubric

How submissions are scored

These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.

Max Score: 5
Dimensions
5 scoring checks
Binary
5 pass or fail dimensions
Ordinal
0 scaled dimensions
Dimension 1correctinformationextraction

CorrectInformationExtraction

Verifies that the agent extracts correct numerical and textual information from the simulated web interface.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 2toolexecutionsuccess

ToolExecutionSuccess

Checks if all required browser automation tools were successfully invoked and completed their actions without errors.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 3voicecommandresponsiveness

VoiceCommandResponsiveness

Assesses if the agent correctly interprets and responds to specified voice commands.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 4taskcompletionrate

TaskCompletionRate

Percentage of tasks successfully completed end-to-end. • target: 90 • range: 0-100

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 5executiontime

ExecutionTime

Average time taken to complete a task in seconds. • target: 60 • range: 0-300

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Learning goals

What you should walk away with

  • Master OpenAI Agents SDK for building and managing autonomous agents with function calling, tool use, and multi-turn conversational capabilities

  • Design and implement robust visual web interaction tools using modern browser automation frameworks (e.g., Playwright) to enable agents to navigate and interact with web UIs without direct HTML parsing

  • Integrate Portia AI for defining agent roles, managing their configurations, and overseeing their lifecycle within a multi-agent system

  • Leverage BrowserUse for accelerated development and refinement of custom tool code, enhancing agent capabilities for specific web automation tasks

  • Establish comprehensive evaluation and observability pipelines using Gentrace to monitor agent decision-making, tool execution, and overall task completion accuracy

  • Implement voice-activated command processing using Sarvam AI to provide a natural language interface for directing and monitoring the visual web agent system

  • Build extended reasoning pipelines with GPT-5 Pro to enable advanced planning, problem-solving, and adaptive strategy formulation for unforeseen web scenarios

Start from your terminal
$npx -y @versalist/cli start build-an-advanced-visual-web-agent-with-openai-agents-sdk-and-gpt-5-pro

[ok] Wrote CHALLENGE.md

[ok] Wrote .versalist.json

[ok] Wrote eval/examples.json

Requires VERSALIST_API_KEY. Works with any MCP-aware editor.

Docs
Manage API keys
Host and timing
Vera

AI Research & Mentorship

Starts Available now
Evergreen challenge
Your progress

Participation status

You haven't started this challenge yet

Timeline and host

Operating window

Key dates and the organization behind this challenge.

Start date
Available now
Run mode
Evergreen challenge
Explore

Find another challenge

Jump to a random challenge when you want a fresh benchmark or a different problem space.

Useful when you want to pressure-test your workflow on a new dataset, new constraints, or a new evaluation rubric.

Tool Space Recipe

Draft
Action Space
GPT-5 ProModels · Large Language Models
required
OpenAIOpenAI AI model provider
Portia AIOpen source framework for predictable,
Evaluation
Rubric: 5 dimensions
·CorrectInformationExtraction(1%)
·ToolExecutionSuccess(1%)
·VoiceCommandResponsiveness(1%)
·TaskCompletionRate(1%)
·ExecutionTime(1%)
Gold items: 2 (2 public)

Frequently Asked Questions about Build an Advanced Visual Web Agent with OpenAI Agents SDK and GPT-5 Pro