Secure Enterprise Financial Automation
Develop an autonomous agent system using OpenAI Agents SDK to automate complex financial operations within an enterprise setting. This challenge requires building a multi-agent orchestration layer capable of interacting with various financial data sources and enterprise APIs securely. The system must demonstrate reliable execution of tasks such as client vetting, transaction processing, or trade automation, while ensuring strict compliance and audibility. Performance and reliability will be evaluated using adaptive experimentation principles.
What you are building
The core problem, expected build, and operating context for this challenge.
Develop an autonomous agent system using OpenAI Agents SDK to automate complex financial operations within an enterprise setting. This challenge requires building a multi-agent orchestration layer capable of interacting with various financial data sources and enterprise APIs securely. The system must demonstrate reliable execution of tasks such as client vetting, transaction processing, or trade automation, while ensuring strict compliance and audibility. Performance and reliability will be evaluated using adaptive experimentation principles.
Shared data for this challenge
Review public datasets and any private uploads tied to your build.
How submissions are scored
These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.
CorrectStatus
Output 'status' matches expected outcome.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
NoComplianceViolations
No compliance flags raised for compliant inputs.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
SuccessfulToolUse
Evidence of appropriate tool/API calls in agent trace.
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
Accuracy
Percentage of tasks executed correctly. • target: 90 • range: 0-100
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
Latency (ms)
Average time taken for a task completion (for Groq-optimized parts). • target: 500 • range: 0-5000
This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.
What you should walk away with
Master the OpenAI Agents SDK for building complex, stateful agentic workflows including function calling and tool use.
Implement secure authentication and authorization mechanisms for agent access to sensitive financial enterprise APIs.
Design a robust evaluation harness using Ax (Adaptive Experimentation) to continuously test and optimize agent reliability and accuracy in various financial scenarios.
Integrate Groq Cloud to accelerate inference for specific, high-throughput agent tools or sub-agents requiring low-latency responses, potentially for real-time market data analysis.
Build a persistent knowledge base for compliance documents and client histories using Weaviate, enabling advanced RAG capabilities for agents to make informed decisions.
Orchestrate collaborative agent teams (e.g., a 'Vetting Agent' working with a 'Transaction Agent') using OpenAI Agents SDK's capabilities for structured communication and task hand-off.
Develop error handling and recovery strategies for agents operating in a high-stakes financial environment.
[ok] Wrote CHALLENGE.md
[ok] Wrote .versalist.json
[ok] Wrote eval/examples.json
Requires VERSALIST_API_KEY. Works with any MCP-aware editor.
DocsAI Research & Mentorship
Participation status
You haven't started this challenge yet
Operating window
Key dates and the organization behind this challenge.
Find another challenge
Jump to a random challenge when you want a fresh benchmark or a different problem space.