testing

Testing and Evaluation Module

Inspect the original prompt language first, then copy or adapt it once you know how it fits your workflow.

Linked challenge: Agentic Code Generation & Refinement

Format

Code-aware

Lines

Sections

Linked challenge

Agentic Code Generation & Refinement

Prompt source

Original prompt text with formatting preserved for inspection.

1 lines

1 sections

No variables

0 checklist items

Develop the unit tests and the evaluation harness for the 'GenerateAndRefineFunction' task template. Ensure your harness can execute the agent, capture its output, and use Trulens-Eval to record metrics and traces. Define specific unit tests for the Fibonacci function (as per the `sample_input`) to verify the correctness of the agent's refined code.

Adaptation plan

Keep the source stable, then change the prompt in a predictable order so the next run is easier to evaluate.

Keep stable

Preserve the rubric, target behavior, and pass-fail criteria as the baseline for evaluation.

Tune next

Adjust fixtures, mocks, and thresholds to the system under test instead of weakening the assertions.

Verify after

Make sure the prompt catches regressions instead of just mirroring the happy-path examples.

Prompt diagnostics

Variables

Lists

Code blocks

Purpose

testing

This prompt already mixes executable detail with instructions, so tune examples and interfaces before rewriting the scaffold.

Linked challenge

Agentic Code Generation & Refinement

This challenge tasks you with building a robust AI agent using the OpenAI Agents SDK. Your agent will specialize in generating, debugging, and refining code snippets based on natural language prompts. It will simulate interaction with an IDE environment, leveraging external tools for code linting, static analysis, and version control operations. A key aspect is implementing MCP principles for structured tool integration, allowing the agent to dynamically select and utilize code-related services with clear input/output schemas. This project emphasizes advanced agentic design, tool orchestration, and the practical application of AI in developer workflows to enhance productivity and code quality.

Open challenge

Related prompts

Browse library

Agent Design and Tool Definition

planning

Initial Code Generation and Linting Integration

implementation

Iterative Refinement and Trulens-Eval Integration

implementation