Skill Bundles

Read the source. Install what you trust.

Each skill bundle packages a reusable agent behavior — a prompt, supporting files, and evaluation criteria. Browse the public catalog, review the full source, then install a private copy you can edit and experiment with.

Published bundles
108
Total installs
0
Average quality
70/100

Browse bundles

108 published bundles ready to inspect and install

Skill bundlev1.0.0

Eval Contamination Prevention

Ensure training data and eval data don't overlap

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Adversarial Eval Generation

Create evals specifically designed to find failure modes and edge cases

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Eval Saturation Detection

Identify when a model has maxed out an eval and needs harder/different benchmarks

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Eval Coverage Analysis

Measure whether your eval suite covers the actual distribution of production tasks

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Build Fuzzy Eval

Design evals for tasks with multiple valid solutions (writing, design, open-ended code)

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Build Deterministic Eval

Create evals with unambiguous, programmatically verifiable correct answers

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Outcome VS Process Reward Tradeoff

When to reward final results vs. intermediate steps, and how to blend both

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Reward Calibration

Ensure reward functions produce consistent, well-scaled signals across different task types and difficulties

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Human Feedback Collection

Design interfaces and protocols for collecting human preference judgments at scale

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Reward Hacking Detection

Identify when agents exploit reward function loopholes to get high scores without doing the task correctly

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Reward Shaping

Add intermediate reward signals that guide learning without changing the optimal policy

0 installs
70/100 quality
Compatibility not listed
Inspect bundle
Skill bundlev1.0.0

Process Reward Modeling

Score intermediate reasoning steps, not just final outcomes

0 installs
70/100 quality
Compatibility not listed
Inspect bundle