Adversarial Testing and Policy Refinement

testingChallenge

Prompt Content

Conduct iterative testing using a diverse dataset of real images, generated deepfakes, and carefully crafted adversarial prompts. Use the 'Red Team Agent' to systematically attempt to bypass your moderation system. Based on the 'Moderation Agent's' failure cases, refine your agent's prompts, update moderation policies (via RAG), and enhance the multimodal analysis logic. Document the evolution of your system's robustness.

Try this prompt

Open the workspace to execute this prompt with free credits, or use your own API keys for unlimited usage.

Usage Tips

Copy the prompt and paste it into your preferred AI tool (Claude, ChatGPT, Gemini)

Customize placeholder values with your specific requirements and context

For best results, provide clear examples and test different variations