Commit graph

3 commits

Author SHA1 Message Date
Corey Haines
926c624d07 fix: address eval review - assertion mismatches and factual error
- marketing-psychology eval 4: BJ Fogg assertion did not match expected_output
  which lists Goal-Gradient Effect. Fixed.
- sales-enablement eval 2: all 6 categories assertion contradicted expected_output
  which only categorizes the 3 given objections. Fixed.
- ad-creative eval 5: TikTok hard limit corrected to recommended (80 chars
  recommended, 100 max) per SKILL.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 15:51:28 -08:00
Corey Haines
7e7e7a09d8 fix: align eval assertions with SKILL.md content per Codex review
Fixes 5 issues identified by independent Codex review:
- product-marketing-context: match auto-draft workflow, section flexibility
- marketing-psychology: replace phantom models with actual SKILL.md models
- ad-creative: correct RSA pinning guidance to match skill
- free-tool-strategy: boundary test now defers to related skill (page-cro)
- paywall-upgrade-cro: boundary test references only related skills

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 14:07:38 -08:00
Corey Haines
11e9ea811f feat: add evals for all 29 remaining skills (197 total evals across 32 skills)
Each skill now has 5-8 evals covering:
- Core framework usage with realistic prompts
- Casual trigger phrase variants
- Sub-type and section-specific coverage
- Boundary tests (skill deferral to related skills)
- Structured assertions for grading

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 13:37:01 -08:00