Read the source. Install what you trust.
Each skill bundle packages a reusable agent behavior — a prompt, supporting files, and evaluation criteria. Browse the public catalog, review the full source, then install a private copy you can edit and experiment with.
Browse bundles
108 published bundles ready to inspect and install
Implement Constitutional AI
Self-critique and revision loops using model-generated feedback
Implement RLHF Pipeline
End-to-end: collect preferences → train reward model → optimize policy
Online VS Offline RL Tradeoffs
When to use online rollouts vs. offline datasets, and how to blend
Implement Rejection Sampling
Best-of-N sampling with a reward model; simplest "RL" that actually works
Implement Reinforce With Baseline
Classic REINFORCE with variance reduction, the foundation of policy gradient methods
Implement GRPO
Build Group Relative Policy Optimization as used in DeepSeek-R1
Implement DPO
Build Direct Preference Optimization, understand when it outperforms PPO
Implement PPO
Build Proximal Policy Optimization from scratch, understand clipping and advantage estimation
Eval From Production Failures
Convert real production failures into new eval cases automatically
Multi Model Eval Harness
Run the same eval suite across Haiku/Sonnet/Opus (or GPT-4/Claude/Gemini) and compare
Eval Versioning And Regression
Track eval suite changes over time, detect regressions when evals are updated
Domain Specific Eval Design
Build evals for specialized verticals (legal, medical, finance, engineering)