implementation

Implement Tool Use for Code Execution and Testing

Inspect the original prompt language first, then copy or adapt it once you know how it fits your workflow.

Linked challenge: Self-Improving GPT-5.3-Codex Agent for Code Generation & Refinement

Format

Text-first

Lines

Sections

Linked challenge

Self-Improving GPT-5.3-Codex Agent for Code Generation & Refinement

Prompt source

Original prompt text with formatting preserved for inspection.

1 lines

1 sections

No variables

0 checklist items

Create a custom tool for your OpenAI agent that can execute Python code in a sandboxed environment and return the results, including standard output and any errors. This tool should also be capable of running a set of predefined unit tests against the generated code. Show how to integrate this tool into your OpenAI Agents SDK agent using the `tool_resources` and `function_calling` features. Provide a Python snippet demonstrating the tool definition and how the agent would invoke it.

Adaptation plan

Keep the source stable, then change the prompt in a predictable order so the next run is easier to evaluate.

Keep stable

Hold the task contract and output shape stable so generated implementations remain comparable.

Tune next

Update libraries, interfaces, and environment assumptions to match the stack you actually run.

Verify after

Test failure handling, edge cases, and any code paths that depend on hidden context or secrets.

Prompt diagnostics

Variables

Lists

Code blocks

Purpose

implementation

This prompt is mostly narrative and instruction-driven, so adapt examples and output constraints before you rewrite the structure.

Linked challenge

Self-Improving GPT-5.3-Codex Agent for Code Generation & Refinement

Build a self-improving agent using the OpenAI Agents SDK, leveraging GPT-5.3-Codex's advanced code generation and reasoning capabilities. Inspired by OpenAI's claim of a model instrumental in creating itself, this challenge focuses on an agent that can autonomously generate code solutions for a given problem, then critically evaluate, test, and iteratively refine its own code to improve correctness, efficiency, and adherence to specified coding standards. The system should manage longer-running tasks, potentially involving multiple stages of generation, testing, and debugging, with robust observability and evaluation.

Open challenge

Related prompts

Browse library

Initial Agent Design for Code Generation

planning

Orchestrate Self-Improvement with Dagster and DeepEval

implementation

Build a Gradio Interface for Agent Interaction and Visualization

implementation