- marketing-psychology eval 4: BJ Fogg assertion did not match expected_output
which lists Goal-Gradient Effect. Fixed.
- sales-enablement eval 2: all 6 categories assertion contradicted expected_output
which only categorizes the 3 given objections. Fixed.
- ad-creative eval 5: TikTok hard limit corrected to recommended (80 chars
recommended, 100 max) per SKILL.md.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Each skill now has 5-8 evals covering:
- Core framework usage with realistic prompts
- Casual trigger phrase variants
- Sub-type and section-specific coverage
- Boundary tests (skill deferral to related skills)
- Structured assertions for grading
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>