Safety Monitor Agent Implementation

implementationChallenge

Prompt Content

Implement the 'Safety Monitor' agent, leveraging GPT-5 for core reasoning and Claude Opus 4.1 for ethical verification. This agent must analyze the code generated by the adversarial agent, comparing it against the original task description and employing extended thinking to detect reward-hacking. Use DSPy to optimize its internal reasoning steps for higher detection accuracy and fewer false positives.

Try this prompt

Open the workspace to execute this prompt with free credits, or use your own API keys for unlimited usage.

Usage Tips

Copy the prompt and paste it into your preferred AI tool (Claude, ChatGPT, Gemini)

Customize placeholder values with your specific requirements and context

For best results, provide clear examples and test different variations