fix: address eval review - assertion mismatches and factual error

- marketing-psychology eval 4: BJ Fogg assertion did not match expected_output
  which lists Goal-Gradient Effect. Fixed.
- sales-enablement eval 2: all 6 categories assertion contradicted expected_output
  which only categorizes the 3 given objections. Fixed.
- ad-creative eval 5: TikTok hard limit corrected to recommended (80 chars
  recommended, 100 max) per SKILL.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Corey Haines 2026-03-04 15:51:28 -08:00
parent 7e7e7a09d8
commit 926c624d07
3 changed files with 3 additions and 3 deletions

View file

@ -63,7 +63,7 @@
{
"id": 5,
"prompt": "I need to generate a big batch of ad variations for a multi-platform campaign launching next week. We're a meal delivery service targeting busy professionals. Need ads for Google, Meta, and TikTok.",
"expected_output": "Should activate the batch generation workflow. Should generate creative for all three platforms respecting each platform's character limits: Google RSA (30/90), Meta (125/40/30), TikTok (80 chars). Should identify 3-5 angles that work across platforms (convenience, health, time savings, variety, cost vs eating out). Should generate variations per angle per platform. Should note platform-specific creative considerations (TikTok needs video concepts, not just text). Should organize output clearly by platform.",
"expected_output": "Should activate the batch generation workflow. Should generate creative for all three platforms respecting each platform's character limits: Google RSA (30/90), Meta (125/40/30), TikTok (80 chars recommended, 100 max). Should identify 3-5 angles that work across platforms (convenience, health, time savings, variety, cost vs eating out). Should generate variations per angle per platform. Should note platform-specific creative considerations (TikTok needs video concepts, not just text). Should organize output clearly by platform.",
"assertions": [
"Activates batch generation workflow",
"Generates for all three platforms",

View file

@ -49,7 +49,7 @@
"prompt": "I'm designing an onboarding flow and want to use behavioral psychology to increase activation. What models should I apply?",
"expected_output": "Should apply design and behavioral models from the skill's taxonomy: Goal-Gradient Effect (motivation increases near goal), Hick's Law (reduce choices), IKEA Effect (let users build something), Endowment Effect (let them experience ownership), Zeigarnik Effect (incomplete tasks drive completion), Commitment & Consistency (small asks first). Should explain how each applies to onboarding specifically. Should provide actionable recommendations for each model.",
"assertions": [
"Applies BJ Fogg Behavior Model",
"Applies Goal-Gradient Effect",
"Applies Hick's Law",
"Applies IKEA Effect or Endowment Effect",
"Applies Zeigarnik Effect or commitment principles",

View file

@ -26,7 +26,7 @@
"Provides structured response for each (acknowledge, reframe, evidence, bridge)",
"Provides 2-3 response variations per objection",
"Organizes for quick reference during calls",
"Addresses all 6 objection categories from the skill"
"Categorizes objections using the skill's framework (competitor, budget, need/timing)"
],
"files": []
},