Read the source. Install what you trust.
Each skill bundle packages a reusable agent behavior — a prompt, supporting files, and evaluation criteria. Browse the public catalog, review the full source, then install a private copy you can edit and experiment with.
Browse bundles
108 published bundles ready to inspect and install
Production Monitoring For RL Agents
Monitor deployed RL-trained agents for performance drift, reward hacking in the wild, and distribution shift
Continual Learning Pipeline
Set up recurring RL training loops that retrain as the workflow or data distribution shifts
RL Roi Measurement
Quantify the business impact (time saved, error reduction, cost) of RL-trained agents
Ab Test RL Policy
Design and run A/B tests comparing RL-trained agent vs. baseline in production
Environment Reset Engineering
Build reliable, fast environment reset mechanisms for episode boundaries
Environment Fidelity Validation
Verify that the sandbox environment faithfully reproduces production behavior
Client Data Onboarding
Ingest, clean, and transform client data into RL-ready formats
Mock Production System
Build a faithful replica of a client's production system (APIs, DB, auth) for safe RL training
Data Readiness Assessment
Evaluate whether the client has sufficient trajectory data, or whether collection needs to happen first
Success Metric Extraction
Work with stakeholders to convert vague "it should work better" into measurable, scorable outcomes
Baseline Agent Benchmarking
Measure current agent performance on the target workflow before RL intervention
RL Feasibility Assessment
Determine whether a workflow is actually amenable to RL improvement (clear rewards, sufficient volume, safe to explore)