Each skill now has 5-8 evals covering:
- Core framework usage with realistic prompts
- Casual trigger phrase variants
- Sub-type and section-specific coverage
- Boundary tests (skill deferral to related skills)
- Structured assertions for grading
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>