Structured Evaluation with Testaify

testingChallenge

Prompt Content

Design a Testaify test suite for your 'GlobalTaxAdvisor' agent focusing on the 'LegalComplianceQuery' task. Define a few test cases, each with an input query, country, and the expected 'advice', 'is_compliant' status, and list of 'citations'. Describe how Testaify would run these tests and generate reports on the agent's performance, including specific assertion checks for correctness and completeness. Provide a conceptual Python structure for setting up these tests.

```python
# Conceptual Testaify usage (Testaify is an example tool for structured eval)
# from testaify import TestSuite, TestCase, assert_equals, assert_contains

# class GlobalTaxAdvisorTestSuite(TestSuite):
#     def setup(self):
#         self.advisor_agent = initialize_openai_assistant()

#     @TestCase(name="German Corporate Tax Query")
#     def test_german_corporate_tax(self):
#         input_data = {"query": "corporate tax in Germany", "country": "Germany", "context": "small business"}
#         agent_output = self.advisor_agent.run(input_data)
#         assert_equals(agent_output['is_compliant'], True)
#         assert_contains(agent_output['advice'], "15%")
#         assert_contains(agent_output['citations'], "German Corporate Tax Act")

#     @TestCase(name="UK Income Tax Brackets")
#     def test_uk_income_tax(self):
#         input_data = {"query": "income tax brackets UK", "country": "UK", "context": "individual"}
#         agent_output = self.advisor_agent.run(input_data)
#         assert_equals(agent_output['is_compliant'], True)
#         assert_contains(agent_output['advice'], "basic rate")

# if __name__ == '__main__'
#     TestSuite.main(GlobalTaxAdvisorTestSuite)
```

Try this prompt

Open the workspace to execute this prompt with free credits, or use your own API keys for unlimited usage.

Usage Tips

Copy the prompt and paste it into your preferred AI tool (Claude, ChatGPT, Gemini)

Customize placeholder values with your specific requirements and context

For best results, provide clear examples and test different variations