Agent Building
Advanced
Always open

Multi-Agent System for Automated Audit Evidence Collection

Develop a sophisticated multi-agent system using Microsoft's AutoGen framework to automate the collection and initial analysis of financial audit evidence. This challenge focuses on creating a team of specialized AI agents that can autonomously navigate public financial documents, extract relevant data, reconcile inconsistencies, and present findings in a structured format. The system should mimic the workflow of junior auditors, but with AI-driven efficiency and consistency, leveraging advanced LLM capabilities for reasoning and information synthesis. The final output should be a summary report highlighting key extracted data points and any identified discrepancies, preparing the ground for human oversight. This project will involve designing conversational agent roles, defining their communication protocols within AutoGen, and integrating external tools for data access and long-term memory. It emphasizes practical application in a business context, showcasing how generative AI can streamline complex, data-intensive tasks in financial services.

Challenge brief

What you are building

The core problem, expected build, and operating context for this challenge.

Develop a sophisticated multi-agent system using Microsoft's AutoGen framework to automate the collection and initial analysis of financial audit evidence. This challenge focuses on creating a team of specialized AI agents that can autonomously navigate public financial documents, extract relevant data, reconcile inconsistencies, and present findings in a structured format. The system should mimic the workflow of junior auditors, but with AI-driven efficiency and consistency, leveraging advanced LLM capabilities for reasoning and information synthesis. The final output should be a summary report highlighting key extracted data points and any identified discrepancies, preparing the ground for human oversight. This project will involve designing conversational agent roles, defining their communication protocols within AutoGen, and integrating external tools for data access and long-term memory. It emphasizes practical application in a business context, showcasing how generative AI can streamline complex, data-intensive tasks in financial services.

Datasets

Shared data for this challenge

Review public datasets and any private uploads tied to your build.

Loading datasets...
Evaluation rubric

How submissions are scored

These dimensions define what the evaluator checks, how much each dimension matters, and which criteria separate a passable run from a strong one.

Max Score: 4
Dimensions
4 scoring checks
Binary
4 pass or fail dimensions
Ordinal
0 scaled dimensions
Dimension 1schemavalidation

SchemaValidation

Output JSON adheres to the specified schema.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 2revenueextractionaccuracy

RevenueExtractionAccuracy

Extracted revenue is within 5% of actual value.

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 3discrepancydetectionrate

DiscrepancyDetectionRate

Percentage of actual discrepancies correctly identified. • target: 0.8 • range: 0-1

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Dimension 4responsetime

ResponseTime

Time taken to generate the full report in seconds. • target: 90 • range: 30-300

binary
Weight: 1
Binary check

This dimension contributes its full weight only when the submission satisfies the requirement. Partial credit is not awarded.

Learning goals

What you should walk away with

Master AutoGen for defining conversational agent roles, skill sets, and inter-agent communication protocols to simulate an audit team.

Implement data acquisition agents using Bright Data's web scraping APIs to gather financial reports and public disclosures from designated sources.

Design an analysis agent that leverages Mistral Large's advanced reasoning capabilities to extract, categorize, and reconcile financial data points.

Integrate Pinecone as a long-term memory store for agents, allowing them to recall previously processed information and maintain context across tasks.

Build a 'Reviewer Agent' within AutoGen that validates extracted information and flags potential inconsistencies for human review, using structured output generation.

Orchestrate complex, multi-step workflows within AutoGen, including dynamic task assignment and conditional execution based on intermediate results.

Utilize Libretto for monitoring agent interactions and tracing decision-making paths, enabling robust debugging and performance evaluation of the multi-agent system.

Start from your terminal
$npx -y @versalist/cli start multi-agent-system-for-automated-audit-evidence-collection

[ok] Wrote CHALLENGE.md

[ok] Wrote .versalist.json

[ok] Wrote eval/examples.json

Requires VERSALIST_API_KEY. Works with any MCP-aware editor.

Docs
Manage API keys
Challenge at a glance
Host and timing
Vera

AI Research & Mentorship

Starts Available now
Evergreen challenge
Your progress

Participation status

You haven't started this challenge yet

Timeline and host

Operating window

Key dates and the organization behind this challenge.

Start date
Available now
Run mode
Evergreen challenge
Explore

Find another challenge

Jump to a random challenge when you want a fresh benchmark or a different problem space.

Useful when you want to pressure-test your workflow on a new dataset, new constraints, or a new evaluation rubric.

Tool Space Recipe

Draft
Evaluation
Rubric: 4 dimensions
·SchemaValidation(1%)
·RevenueExtractionAccuracy(1%)
·DiscrepancyDetectionRate(1%)
·ResponseTime(1%)
Gold items: 1 (1 public)

Frequently Asked Questions about Multi-Agent System for Automated Audit Evidence Collection