Back to Prompt Library
planning

Design the Evaluation Harness

Inspect the original prompt language first, then copy or adapt it once you know how it fits your workflow.

Linked challenge: AI Model Certification with Llama 3.3 and Patronus AI for Compliance

Format
Text-first
Lines
1
Sections
1
Linked challenge
AI Model Certification with Llama 3.3 and Patronus AI for Compliance

Prompt source

Original prompt text with formatting preserved for inspection.

1 lines
1 sections
No variables
0 checklist items
Outline the architecture for your automated evaluation harness. Specify how Llama 3.3 70B will be deployed to AI21 Studio, how data will be fed to Patronus AI for testing, and the key metrics you'll track. Detail how Butternut AI will integrate to automate the triggering and reporting of these evaluation runs.

Adaptation plan

Keep the source stable, then change the prompt in a predictable order so the next run is easier to evaluate.

Keep stable

Preserve the role framing, objective, and reporting structure so comparison runs stay coherent.

Tune next

Swap in your own domain constraints, anomaly thresholds, and examples before you branch variants.

Verify after

Check whether the prompt asks for the right evidence, confidence signal, and escalation path.