Agent Building
Advanced
Always open

Autonomous Crypto Compliance Agent

This challenge requires building an advanced autonomous agent focused on financial compliance within the cryptocurrency domain or complex supply chain networks. Utilizing OpenAI's Agent SDK, developers will create a system capable of real-time monitoring of transactions and identifying suspicious patterns indicative of illicit activities or regulatory breaches. The agent will leverage sophisticated tool use and function calling to interact with external data sources and analytical frameworks. The core of the system involves GPT-5-2 for advanced reasoning and orchestrating analytical tasks. It will integrate Darts, a time-series forecasting library, to detect anomalies in transaction volumes or patterns over time. Long-term memory and regulatory context will be managed by a vector database (e.g., Pinecone) storing an extensive knowledge base of financial regulations and compliance policies. The agent will also interact with a simulated Enterprise Transaction API to fetch real-time data. The goal is to develop an intelligent agent that not only identifies potential compliance issues but also provides detailed reports, evidence, and recommendations for further investigation, showcasing a modern, proactive approach to financial crime detection.

Challenge brief

What you are building

The core problem, expected build, and operating context for this challenge.

This challenge requires building an advanced autonomous agent focused on financial compliance within the cryptocurrency domain or complex supply chain networks. Utilizing OpenAI's Agent SDK, developers will create a system capable of real-time monitoring of transactions and identifying suspicious patterns indicative of illicit activities or regulatory breaches. The agent will leverage sophisticated tool use and function calling to interact with external data sources and analytical frameworks. The core of the system involves GPT-5-2 for advanced reasoning and orchestrating analytical tasks. It will integrate Darts, a time-series forecasting library, to detect anomalies in transaction volumes or patterns over time. Long-term memory and regulatory context will be managed by a vector database (e.g., Pinecone) storing an extensive knowledge base of financial regulations and compliance policies. The agent will also interact with a simulated Enterprise Transaction API to fetch real-time data. The goal is to develop an intelligent agent that not only identifies potential compliance issues but also provides detailed reports, evidence, and recommendations for further investigation, showcasing a modern, proactive approach to financial crime detection.

Datasets

Shared data for this challenge

Review public datasets and any private uploads tied to your build.

Loading datasets...
Evaluation rubric

How submissions are scored

These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.

Max Score: 4
Dimensions
4 scoring checks
Binary
4 pass or fail dimensions
Ordinal
0 scaled dimensions
Dimension 1accuracyofsuspiciousdetection

AccuracyOfSuspiciousDetection

Agent must correctly identify at least 90% of predefined suspicious transaction patterns.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 2reportcompleteness

ReportCompleteness

Generated reports must include a summary, flagged transactions, and recommendations for at least 95% of cases.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 3anomaly_detection_precision

Anomaly Detection Precision

Precision score for identifying true anomalies among all flagged transactions. • target: 0.85 • range: 0-1

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 4tool_call_efficiency

Tool Call Efficiency

Average number of tool calls per detected suspicious event, aiming for minimal yet effective calls. • target: 2.5 • range: 1-5

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Learning goals

What you should walk away with

Master the OpenAI Agents SDK for constructing autonomous, tool-enhanced agents capable of complex decision-making and interaction.

Implement advanced function calling within the agent's workflow to integrate with external systems, including Darts for time-series analysis.

Utilize Darts for building sophisticated time-series models to detect anomalies in cryptocurrency transaction data, identifying potential illicit activities.

Design and manage a long-term memory system using Pinecone vector database to store and retrieve relevant financial regulations and compliance policies for dynamic context.

Orchestrate GPT-5-2's reasoning capabilities to interpret complex regulatory text, evaluate transaction data, and generate compliance reports with actionable insights.

Build a robust data ingestion pipeline to feed real-time transaction data from a simulated Enterprise Transaction API into the agent for continuous monitoring.

Develop strategies for evaluating agent performance in detecting fraud patterns and minimizing false positives in a dynamic financial environment.

Start from your terminal
$npx -y @versalist/cli start autonomous-crypto-compliance-agent

[ok] Wrote CHALLENGE.md

[ok] Wrote .versalist.json

[ok] Wrote eval/examples.json

Requires VERSALIST_API_KEY. Works with any MCP-aware editor.

Docs
Manage API keys
Challenge at a glance
Host and timing
Vera

AI Research & Mentorship

Starts Available now
Evergreen challenge
Your progress

Participation status

You haven't started this challenge yet

Timeline and host

Operating window

Key dates and the organization behind this challenge.

Start date
Available now
Run mode
Evergreen challenge
Explore

Find another challenge

Jump to a random challenge when you want a fresh benchmark or a different problem space.

Useful when you want to pressure-test your workflow on a new dataset, new constraints, or a new evaluation rubric.

Tool Space Recipe

Draft
Evaluation
Rubric: 4 dimensions
·AccuracyOfSuspiciousDetection(1%)
·ReportCompleteness(1%)
·Anomaly Detection Precision(1%)
·Tool Call Efficiency(1%)
Gold items: 2 (2 public)

Frequently Asked Questions about Autonomous Crypto Compliance Agent