Workflow Automation
Advanced
Always open

Secure Enterprise Financial Automation

Develop an autonomous agent system using OpenAI Agents SDK to automate complex financial operations within an enterprise setting. This challenge requires building a multi-agent orchestration layer capable of interacting with various financial data sources and enterprise APIs securely. The system must demonstrate reliable execution of tasks such as client vetting, transaction processing, or trade automation, while ensuring strict compliance and audibility. Performance and reliability will be evaluated using adaptive experimentation principles.

Challenge brief

What you are building

The core problem, expected build, and operating context for this challenge.

Develop an autonomous agent system using OpenAI Agents SDK to automate complex financial operations within an enterprise setting. This challenge requires building a multi-agent orchestration layer capable of interacting with various financial data sources and enterprise APIs securely. The system must demonstrate reliable execution of tasks such as client vetting, transaction processing, or trade automation, while ensuring strict compliance and audibility. Performance and reliability will be evaluated using adaptive experimentation principles.

Datasets

Shared data for this challenge

Review public datasets and any private uploads tied to your build.

Loading datasets...
Evaluation rubric

How submissions are scored

These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.

Max Score: 5
Dimensions
5 scoring checks
Binary
5 pass or fail dimensions
Ordinal
0 scaled dimensions
Dimension 1correctstatus

CorrectStatus

Output 'status' matches expected outcome.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 2nocomplianceviolations

NoComplianceViolations

No compliance flags raised for compliant inputs.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 3successfultooluse

SuccessfulToolUse

Evidence of appropriate tool/API calls in agent trace.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 4accuracy

Accuracy

Percentage of tasks executed correctly. • target: 90 • range: 0-100

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 5latency_ms

Latency (ms)

Average time taken for a task completion (for Groq-optimized parts). • target: 500 • range: 0-5000

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Learning goals

What you should walk away with

Master the OpenAI Agents SDK for building complex, stateful agentic workflows including function calling and tool use.

Implement secure authentication and authorization mechanisms for agent access to sensitive financial enterprise APIs.

Design a robust evaluation harness using Ax (Adaptive Experimentation) to continuously test and optimize agent reliability and accuracy in various financial scenarios.

Integrate Groq Cloud to accelerate inference for specific, high-throughput agent tools or sub-agents requiring low-latency responses, potentially for real-time market data analysis.

Build a persistent knowledge base for compliance documents and client histories using Weaviate, enabling advanced RAG capabilities for agents to make informed decisions.

Orchestrate collaborative agent teams (e.g., a 'Vetting Agent' working with a 'Transaction Agent') using OpenAI Agents SDK's capabilities for structured communication and task hand-off.

Develop error handling and recovery strategies for agents operating in a high-stakes financial environment.

Start from your terminal
$npx -y @versalist/cli start secure-enterprise-financial-automation

[ok] Wrote CHALLENGE.md

[ok] Wrote .versalist.json

[ok] Wrote eval/examples.json

Requires VERSALIST_API_KEY. Works with any MCP-aware editor.

Docs
Manage API keys
Challenge at a glance
Host and timing
Vera

AI Research & Mentorship

Starts Available now
Evergreen challenge
Your progress

Participation status

You haven't started this challenge yet

Timeline and host

Operating window

Key dates and the organization behind this challenge.

Start date
Available now
Run mode
Evergreen challenge
Explore

Find another challenge

Jump to a random challenge when you want a fresh benchmark or a different problem space.

Useful when you want to pressure-test your workflow on a new dataset, new constraints, or a new evaluation rubric.

Tool Space Recipe

Draft
Evaluation
Rubric: 5 dimensions
·CorrectStatus(1%)
·NoComplianceViolations(1%)
·SuccessfulToolUse(1%)
·Accuracy(1%)
·Latency (ms)(1%)
Gold items: 2 (2 public)

Frequently Asked Questions about Secure Enterprise Financial Automation