fix: address eval review - assertion mismatches and factual error
- marketing-psychology eval 4: BJ Fogg assertion did not match expected_output which lists Goal-Gradient Effect. Fixed. - sales-enablement eval 2: all 6 categories assertion contradicted expected_output which only categorizes the 3 given objections. Fixed. - ad-creative eval 5: TikTok hard limit corrected to recommended (80 chars recommended, 100 max) per SKILL.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
7e7e7a09d8
commit
926c624d07
3 changed files with 3 additions and 3 deletions
|
|
@ -63,7 +63,7 @@
|
|||
{
|
||||
"id": 5,
|
||||
"prompt": "I need to generate a big batch of ad variations for a multi-platform campaign launching next week. We're a meal delivery service targeting busy professionals. Need ads for Google, Meta, and TikTok.",
|
||||
"expected_output": "Should activate the batch generation workflow. Should generate creative for all three platforms respecting each platform's character limits: Google RSA (30/90), Meta (125/40/30), TikTok (≤80 chars). Should identify 3-5 angles that work across platforms (convenience, health, time savings, variety, cost vs eating out). Should generate variations per angle per platform. Should note platform-specific creative considerations (TikTok needs video concepts, not just text). Should organize output clearly by platform.",
|
||||
"expected_output": "Should activate the batch generation workflow. Should generate creative for all three platforms respecting each platform's character limits: Google RSA (30/90), Meta (125/40/30), TikTok (80 chars recommended, 100 max). Should identify 3-5 angles that work across platforms (convenience, health, time savings, variety, cost vs eating out). Should generate variations per angle per platform. Should note platform-specific creative considerations (TikTok needs video concepts, not just text). Should organize output clearly by platform.",
|
||||
"assertions": [
|
||||
"Activates batch generation workflow",
|
||||
"Generates for all three platforms",
|
||||
|
|
|
|||
|
|
@ -49,7 +49,7 @@
|
|||
"prompt": "I'm designing an onboarding flow and want to use behavioral psychology to increase activation. What models should I apply?",
|
||||
"expected_output": "Should apply design and behavioral models from the skill's taxonomy: Goal-Gradient Effect (motivation increases near goal), Hick's Law (reduce choices), IKEA Effect (let users build something), Endowment Effect (let them experience ownership), Zeigarnik Effect (incomplete tasks drive completion), Commitment & Consistency (small asks first). Should explain how each applies to onboarding specifically. Should provide actionable recommendations for each model.",
|
||||
"assertions": [
|
||||
"Applies BJ Fogg Behavior Model",
|
||||
"Applies Goal-Gradient Effect",
|
||||
"Applies Hick's Law",
|
||||
"Applies IKEA Effect or Endowment Effect",
|
||||
"Applies Zeigarnik Effect or commitment principles",
|
||||
|
|
|
|||
|
|
@ -26,7 +26,7 @@
|
|||
"Provides structured response for each (acknowledge, reframe, evidence, bridge)",
|
||||
"Provides 2-3 response variations per objection",
|
||||
"Organizes for quick reference during calls",
|
||||
"Addresses all 6 objection categories from the skill"
|
||||
"Categorizes objections using the skill's framework (competitor, budget, need/timing)"
|
||||
],
|
||||
"files": []
|
||||
},
|
||||
|
|
|
|||
Loading…
Reference in a new issue