testing

Adversarial Testing and Policy Refinement

Inspect the original prompt language first, then copy or adapt it once you know how it fits your workflow.

Linked challenge: Building Adversarial Deepfake Detection

Format

Text-first

Lines

Sections

Linked challenge

Building Adversarial Deepfake Detection

Prompt source

Original prompt text with formatting preserved for inspection.

1 lines

1 sections

No variables

0 checklist items

Conduct iterative testing using a diverse dataset of real images, generated deepfakes, and carefully crafted adversarial prompts. Use the 'Red Team Agent' to systematically attempt to bypass your moderation system. Based on the 'Moderation Agent's' failure cases, refine your agent's prompts, update moderation policies (via RAG), and enhance the multimodal analysis logic. Document the evolution of your system's robustness.

Adaptation plan

Keep the source stable, then change the prompt in a predictable order so the next run is easier to evaluate.

Keep stable

Preserve the rubric, target behavior, and pass-fail criteria as the baseline for evaluation.

Tune next

Adjust fixtures, mocks, and thresholds to the system under test instead of weakening the assertions.

Verify after

Make sure the prompt catches regressions instead of just mirroring the happy-path examples.

Prompt diagnostics

Variables

Lists

Code blocks

Purpose

testing

This prompt is mostly narrative and instruction-driven, so adapt examples and output constraints before you rewrite the structure.

Linked challenge

Building Adversarial Deepfake Detection

Following reports of advanced AI models generating non-consensual deepfakes, this challenge focuses on developing an ethical AI solution for robust deepfake detection and content moderation. You will build a multi-agent system using AutoGen, where agents collaborate and debate to identify and flag AI-generated visual content that violates ethical guidelines. The system will integrate OpenAI 'Next-Gen Multimodal' capabilities for nuanced image and text analysis, enabling the agents to understand context and detect subtle manipulations. A2A Protocol will facilitate secure and structured communication between 'Moderation' agents and 'Red Team' agents, with the latter attempting to bypass the moderation system with adversarial prompts, pushing the boundaries of ethical AI robustness. Extended thinking and dynamic policy RAG will ensure comprehensive and adaptable moderation.

Open challenge

Related prompts

Browse library

Agent Roles and AutoGen Configuration

planning

OpenAI Multimodal Integration for Analysis

implementation

A2A Protocol for Red Teaming and Policy Sharing

implementation