Each skill now has 5-8 evals covering: - Core framework usage with realistic prompts - Casual trigger phrase variants - Sub-type and section-specific coverage - Boundary tests (skill deferral to related skills) - Structured assertions for grading Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| evals.json | ||