Commit graph

2 commits

Author SHA1 Message Date
Corey Haines
926c624d07 fix: address eval review - assertion mismatches and factual error
- marketing-psychology eval 4: BJ Fogg assertion did not match expected_output
  which lists Goal-Gradient Effect. Fixed.
- sales-enablement eval 2: all 6 categories assertion contradicted expected_output
  which only categorizes the 3 given objections. Fixed.
- ad-creative eval 5: TikTok hard limit corrected to recommended (80 chars
  recommended, 100 max) per SKILL.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 15:51:28 -08:00
Corey Haines
11e9ea811f feat: add evals for all 29 remaining skills (197 total evals across 32 skills)
Each skill now has 5-8 evals covering:
- Core framework usage with realistic prompts
- Casual trigger phrase variants
- Sub-type and section-specific coverage
- Boundary tests (skill deferral to related skills)
- Structured assertions for grading

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 13:37:01 -08:00