Skill Bundles

Inspect the public source, then install a private draft when it earns it.

Public skill bundles are the reusable execution layer behind My Skills. Review the published source first, then install a private copy for edits, experiments, and self-improvement.

Published bundles
108
Total installs
0
Average quality
70/100

Browse bundles

108 published bundles ready to inspect and install

Skill bundlev1.0.0

Rl Failure Postmortem

Diagnose why an RL training run failed and what to change

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Rl Vs Prompting Decision

Determine when prompt engineering, fine-tuning, or RL is the right approach

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Rl Cost Estimation

Estimate total cost (compute, data, engineering time) for an RL project

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Rl Paper Reading

Read and critically evaluate RL research papers, extract practical implications

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Rl Experiment Design

Plan RL experiments: baselines, ablations, compute budgets, success criteria

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Document Processing Env

Environments for extraction, classification, and transformation of business documents

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Ticket Triage Env

Environments for support ticket routing, prioritization, and resolution

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Crm Workflow Env

Environments mimicking CRM operations (Salesforce, HubSpot)

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Data Pipeline Env

Environments for building and debugging ETL/ELT pipelines

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Sql Generation Rl Env

Build environments where agents write SQL, execute it, and get scored on result correctness

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Ui Task Specification

Formally specify UI tasks with clear start states, goal states, and evaluation criteria

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Pixel Vs Dom Action Space

Trade-offs between pixel-level interaction and DOM-level interaction for UI agents

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Browser Env Construction

Build instrumented browser environments with action logging and state capture

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Repo Level Coding Env

Build environments where agents navigate and modify entire repositories, not just single files

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Test Generation As Reward

Use test pass rates as automatic reward signals for code generation

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Code Review Reward Design

Score code changes on correctness, style, security, and performance

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Code Completion Rl Env

Build environments for training code completion models (à la Cursor's online RL)

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Experience Replay Management

Maintain and curate experience replay buffers for continual RL training

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Distribution Shift Detection

Detect when the production task distribution has drifted from the training distribution

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Catastrophic Forgetting Mitigation

Prevent RL training from destroying previously learned capabilities

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Online Rl From Production

Set up learning loops where production experience feeds back into training

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Capability Regression Testing

Run broad capability evals before and after RL training to catch degradation

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Overfitting Detection For Rl

Detect when RL training narrows capability (great on trained tasks, worse on everything else)

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Domain Transfer Measurement

Quantify how much RL training on coding transfers to (say) data analysis or writing

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Transfer Eval Design

Build evals that test whether RL training on task A improved performance on related task B

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Risk Tier Classification

Classify agent skills by risk level (read-only vs. write vs. financial vs. external-facing) and apply appropriate controls

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Audit Trail For Rl Decisions

Log every decision an RL agent makes in production with sufficient context for post-hoc review

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Deployment Gating Pipeline

Build eval-gated deployment pipelines where RL-trained models must pass benchmarks before production

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Data Exfiltration Prevention

Monitor and prevent agents from leaking sensitive data through tool calls

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Skill Security Audit

Static and dynamic analysis of agent skill code for security vulnerabilities

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Deceptive Alignment Detection

Test whether agents behave differently when they believe they're being evaluated vs. not

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Rl Alignment Auditing

Verify that the policy optimizes for the intended objective, not a proxy

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Action Space Sandboxing

Restrict agent actions to prevent irreversible or harmful operations

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Safe Exploration Constraints

Define and enforce hard constraints on what agents can do during training rollouts

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Reward Hacking Red Teaming

Systematically find ways an agent could game the reward function

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Rollback And Versioning

Maintain and switch between agent versions when new RL training degrades performance

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Production Monitoring For Rl Agents

Monitor deployed RL-trained agents for performance drift, reward hacking in the wild, and distribution shift

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Continual Learning Pipeline

Set up recurring RL training loops that retrain as the workflow or data distribution shifts

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Rl Roi Measurement

Quantify the business impact (time saved, error reduction, cost) of RL-trained agents

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Ab Test Rl Policy

Design and run A/B tests comparing RL-trained agent vs. baseline in production

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Environment Reset Engineering

Build reliable, fast environment reset mechanisms for episode boundaries

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Environment Fidelity Validation

Verify that the sandbox environment faithfully reproduces production behavior

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Client Data Onboarding

Ingest, clean, and transform client data into RL-ready formats

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Mock Production System

Build a faithful replica of a client's production system (APIs, DB, auth) for safe RL training

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Data Readiness Assessment

Evaluate whether the client has sufficient trajectory data, or whether collection needs to happen first

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Success Metric Extraction

Work with stakeholders to convert vague "it should work better" into measurable, scorable outcomes

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Baseline Agent Benchmarking

Measure current agent performance on the target workflow before RL intervention

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Rl Feasibility Assessment

Determine whether a workflow is actually amenable to RL improvement (clear rewards, sufficient volume, safe to explore)

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Workflow Audit

Map an enterprise workflow end-to-end: inputs, decisions, tools, outputs, success criteria

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Model Versioning For Rl

Track and switch between reference model, current policy, and reward model versions during training

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Vllm For Rl

Configure vLLM or similar engines for RL workloads (batched generation, multiple completions)

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

High Throughput Rollout Serving

Serve models at high throughput for RL rollout collection (not just user-facing latency)

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Compute Budgeting For Rl

Estimate and optimize GPU hours needed for RL training runs

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Checkpoint Selection

Choose the best model checkpoint based on eval performance, not just training metrics

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Training Stability Debugging

Diagnose and fix common RL training failures: reward collapse, mode collapse, KL explosion

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Kl Divergence Management

Control how far the policy drifts from the reference model during training

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Reward Model Training

Train reward models from human preference data, handle label noise and distribution shift

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Rl Hyperparameter Tuning

Tune learning rates, KL penalties, reward scaling, batch sizes for RL stability

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Distributed Rl Training

Shard training across multiple GPUs/nodes with proper gradient synchronization

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Manage Rl Rollouts

Orchestrate parallel agent rollouts across environments at scale

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Implement Constitutional Ai

Self-critique and revision loops using model-generated feedback

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Implement Rlhf Pipeline

End-to-end: collect preferences → train reward model → optimize policy

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Online Vs Offline Rl Tradeoffs

When to use online rollouts vs. offline datasets, and how to blend

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Implement Rejection Sampling

Best-of-N sampling with a reward model; simplest "RL" that actually works

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Implement Reinforce With Baseline

Classic REINFORCE with variance reduction, the foundation of policy gradient methods

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Implement Grpo

Build Group Relative Policy Optimization as used in DeepSeek-R1

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Implement Dpo

Build Direct Preference Optimization, understand when it outperforms PPO

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Implement Ppo

Build Proximal Policy Optimization from scratch, understand clipping and advantage estimation

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Eval From Production Failures

Convert real production failures into new eval cases automatically

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Multi Model Eval Harness

Run the same eval suite across Haiku/Sonnet/Opus (or GPT-4/Claude/Gemini) and compare

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Eval Versioning And Regression

Track eval suite changes over time, detect regressions when evals are updated

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Domain Specific Eval Design

Build evals for specialized verticals (legal, medical, finance, engineering)

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Eval Contamination Prevention

Ensure training data and eval data don't overlap

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Adversarial Eval Generation

Create evals specifically designed to find failure modes and edge cases

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Eval Saturation Detection

Identify when a model has maxed out an eval and needs harder/different benchmarks

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Eval Coverage Analysis

Measure whether your eval suite covers the actual distribution of production tasks

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Build Fuzzy Eval

Design evals for tasks with multiple valid solutions (writing, design, open-ended code)

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Build Deterministic Eval

Create evals with unambiguous, programmatically verifiable correct answers

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Outcome Vs Process Reward Tradeoff

When to reward final results vs. intermediate steps, and how to blend both

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Reward Calibration

Ensure reward functions produce consistent, well-scaled signals across different task types and difficulties

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Human Feedback Collection

Design interfaces and protocols for collecting human preference judgments at scale

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Reward Hacking Detection

Identify when agents exploit reward function loopholes to get high scores without doing the task correctly

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Reward Shaping

Add intermediate reward signals that guide learning without changing the optimal policy

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Process Reward Modeling

Score intermediate reasoning steps, not just final outcomes

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Composite Reward Design

Combine multiple reward signals (correctness, efficiency, style, safety) into a single scalar

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Llm As Judge Reward

Use a language model to score agent outputs against specifications or rubrics

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Graded Rubric Reward

Translate qualitative rubrics into multi-dimensional scoring functions with partial credit

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Binary Outcome Reward

Design pass/fail reward signals (code compiles, test passes, form submitted correctly)

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Offline Dataset Curation

Build high-quality static datasets from historical trajectories for offline RL or behavior cloning

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Trajectory Format Standardization

Convert heterogeneous log formats into a unified trajectory schema (state, action, reward, metadata)

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Trajectory Anonymization

Strip PII, credentials, and sensitive business data from trajectories while preserving RL-relevant structure

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Trajectory Filtering

Score and filter trajectories by quality, remove corrupted/incomplete episodes

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Capture Agent Trajectories

Log agent rollouts with full state-action-reward-next_state tuples, tool calls, and timing

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Capture Human Trajectories

Instrument production tools to log human expert actions, states, and outcomes as RL-ready trajectories

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Multi Step Task Decomposition

Break complex enterprise workflows into subtask chains with intermediate checkpoints

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Synthetic Data Augmentation

Generate realistic variations of workflow data (user inputs, edge cases, adversarial inputs) without real PII

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Sop To Task Parser

Convert natural language SOPs and runbooks into structured, machine-executable task specifications

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Task Difficulty Calibration

Score and bucket tasks by difficulty using baseline agent performance

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Edge Case Mining

Extract rare but high-impact failure modes from production logs to create targeted task sets

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Curriculum Design

Order tasks by difficulty, introduce new complexity dimensions progressively

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Generate Task Variations

Programmatically produce 10K–100K+ task instances from templates, SOPs, and historical logs

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Instrument Action Space

Define, constrain, and document the valid action space an agent can take within an environment

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Build Stateful Env

Handle environments with persistent state across episodes (databases, file systems, user sessions)

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Build Multi Tool Env

Compose environments spanning multiple tools (IDE + terminal + browser + DB) into a single coherent action space

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Build Cli Env

Create terminal/shell environments with filesystem state, command history, and outcome verification

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Build Codebase Env

Set up repo-level coding environments with test harnesses, linting, compilation feedback loops

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Build Api Harness

Wrap real or mock APIs into instrumented RL-ready surfaces with deterministic reset, state capture, and action logging

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Build Ui Sandbox

Construct browser-based sandboxed environments where agents interact with realistic UI surfaces (forms, dashboards, multi-step wizards)

0 installs
70/100 quality
Compatibility not listed
Inspect bundle