fix: address eval review - assertion mismatches and factual error

- marketing-psychology eval 4: BJ Fogg assertion did not match expected_output
  which lists Goal-Gradient Effect. Fixed.
- sales-enablement eval 2: all 6 categories assertion contradicted expected_output
  which only categorizes the 3 given objections. Fixed.
- ad-creative eval 5: TikTok hard limit corrected to recommended (80 chars
  recommended, 100 max) per SKILL.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Corey Haines 2026-03-04 15:51:28 -08:00
parent 7e7e7a09d8
commit 926c624d07
3 changed files with 3 additions and 3 deletions

View file

@ -63,7 +63,7 @@
{ {
"id": 5, "id": 5,
"prompt": "I need to generate a big batch of ad variations for a multi-platform campaign launching next week. We're a meal delivery service targeting busy professionals. Need ads for Google, Meta, and TikTok.", "prompt": "I need to generate a big batch of ad variations for a multi-platform campaign launching next week. We're a meal delivery service targeting busy professionals. Need ads for Google, Meta, and TikTok.",
"expected_output": "Should activate the batch generation workflow. Should generate creative for all three platforms respecting each platform's character limits: Google RSA (30/90), Meta (125/40/30), TikTok (80 chars). Should identify 3-5 angles that work across platforms (convenience, health, time savings, variety, cost vs eating out). Should generate variations per angle per platform. Should note platform-specific creative considerations (TikTok needs video concepts, not just text). Should organize output clearly by platform.", "expected_output": "Should activate the batch generation workflow. Should generate creative for all three platforms respecting each platform's character limits: Google RSA (30/90), Meta (125/40/30), TikTok (80 chars recommended, 100 max). Should identify 3-5 angles that work across platforms (convenience, health, time savings, variety, cost vs eating out). Should generate variations per angle per platform. Should note platform-specific creative considerations (TikTok needs video concepts, not just text). Should organize output clearly by platform.",
"assertions": [ "assertions": [
"Activates batch generation workflow", "Activates batch generation workflow",
"Generates for all three platforms", "Generates for all three platforms",

View file

@ -49,7 +49,7 @@
"prompt": "I'm designing an onboarding flow and want to use behavioral psychology to increase activation. What models should I apply?", "prompt": "I'm designing an onboarding flow and want to use behavioral psychology to increase activation. What models should I apply?",
"expected_output": "Should apply design and behavioral models from the skill's taxonomy: Goal-Gradient Effect (motivation increases near goal), Hick's Law (reduce choices), IKEA Effect (let users build something), Endowment Effect (let them experience ownership), Zeigarnik Effect (incomplete tasks drive completion), Commitment & Consistency (small asks first). Should explain how each applies to onboarding specifically. Should provide actionable recommendations for each model.", "expected_output": "Should apply design and behavioral models from the skill's taxonomy: Goal-Gradient Effect (motivation increases near goal), Hick's Law (reduce choices), IKEA Effect (let users build something), Endowment Effect (let them experience ownership), Zeigarnik Effect (incomplete tasks drive completion), Commitment & Consistency (small asks first). Should explain how each applies to onboarding specifically. Should provide actionable recommendations for each model.",
"assertions": [ "assertions": [
"Applies BJ Fogg Behavior Model", "Applies Goal-Gradient Effect",
"Applies Hick's Law", "Applies Hick's Law",
"Applies IKEA Effect or Endowment Effect", "Applies IKEA Effect or Endowment Effect",
"Applies Zeigarnik Effect or commitment principles", "Applies Zeigarnik Effect or commitment principles",

View file

@ -26,7 +26,7 @@
"Provides structured response for each (acknowledge, reframe, evidence, bridge)", "Provides structured response for each (acknowledge, reframe, evidence, bridge)",
"Provides 2-3 response variations per objection", "Provides 2-3 response variations per objection",
"Organizes for quick reference during calls", "Organizes for quick reference during calls",
"Addresses all 6 objection categories from the skill" "Categorizes objections using the skill's framework (competitor, budget, need/timing)"
], ],
"files": [] "files": []
}, },