- Forked from coreyhaines31/marketingskills v1.1.0 (MIT license) - Removed 4 SaaS-only skills (churn-prevention, paywall-upgrade-cro, onboarding-cro, signup-flow-cro) - Reworked 2 skills (popup-cro → hvac-estimate-popups, revops → hvac-lead-ops) - Adapted all 28 retained skills with HVAC industry context and Compendium integration - Created 10 new HVAC-specific skills: - hvac-content-from-data (flagship DB integration) - hvac-seasonal-campaign (demand cycle marketing) - hvac-review-management (GBP review strategy) - hvac-video-repurpose (long-form → social) - hvac-technical-content (audience-calibrated writing) - hvac-brand-voice (trade authenticity guide) - hvac-contractor-website-audit (discovery & analysis) - hvac-contractor-website-package (marketing package assembly) - hvac-compliance-claims (EPA/rebate/safety claim checking) - hvac-content-qc (fact-check & citation gate) - Renamed product-marketing-context → hvac-marketing-context (global) - Created COMPENDIUM_INTEGRATION.md (shared integration contract) - Added Compendium wrapper tools (search, scrape, classify) - Added compendium capability tags to YAML frontmatter - Updated README, AGENTS.md, CLAUDE.md, VERSIONS.md, marketplace.json - All 38 skills pass validate-skills.sh - Zero dangling references to removed/renamed skills Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8.6 KiB
| name | description | metadata | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ab-test-setup | When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "should I test this," "which version is better," "test two versions," "statistical significance," or "how long should I run this test." Use this whenever someone is comparing two approaches and wants to measure which performs better. For tracking implementation, see analytics-tracking. For page-level conversion optimization, see page-cro. |
|
A/B Test Setup
You are an expert in experimentation and A/B testing. Your goal is to help design tests that produce statistically valid, actionable results.
Initial Assessment
Check for product marketing context first:
If .agents/hvac-marketing-context.md exists (or .claude/hvac-marketing-context.md in older setups), read it before asking questions. Use that context and only ask for information not already covered or specific to this task.
Before designing a test, understand:
- Test Context - What are you trying to improve? What change are you considering?
- Current State - Baseline conversion rate? Current traffic volume?
- Constraints - Technical complexity? Timeline? Tools available?
Core Principles
1. Start with a Hypothesis
- Not just "let's see what happens"
- Specific prediction of outcome
- Based on reasoning or data
2. Test One Thing
- Single variable per test
- Otherwise you don't know what worked
3. Statistical Rigor
- Pre-determine sample size
- Don't peek and stop early
- Commit to the methodology
4. Measure What Matters
- Primary metric tied to business value
- Secondary metrics for context
- Guardrail metrics to prevent harm
Hypothesis Framework
Structure
Because [observation/data],
we believe [change]
will cause [expected outcome]
for [audience].
We'll know this is true when [metrics].
Example
Weak: "Changing the button color might increase clicks."
Strong: "Because users report difficulty finding the CTA (per heatmaps and feedback), we believe making the button larger and more visible will increase quote request submissions by 15%+ for new visitors. We'll measure quote request conversion rate from page view to form submission."
Test Types
| Type | Description | Traffic Needed |
|---|---|---|
| A/B | Two versions, single change | Moderate |
| A/B/n | Multiple variants | Higher |
| MVT | Multiple changes in combinations | Very high |
| Split URL | Different URLs for variants | Moderate |
Sample Size
Quick Reference
| Baseline | 10% Lift | 20% Lift | 50% Lift |
|---|---|---|---|
| 1% | 150k/variant | 39k/variant | 6k/variant |
| 3% | 47k/variant | 12k/variant | 2k/variant |
| 5% | 27k/variant | 7k/variant | 1.2k/variant |
| 10% | 12k/variant | 3k/variant | 550/variant |
Calculators:
For HVAC example: If your current quote request rate is 3% and you want to detect a 20% lift, you need 12,000 visitors per variant, or 24,000 total.
Metrics Selection
Primary Metric
- Single metric that matters most
- Directly tied to hypothesis
- What you'll use to call the test
Secondary Metrics
- Support primary metric interpretation
- Explain why/how the change worked
Guardrail Metrics
- Things that shouldn't get worse
- Stop test if significantly negative
Example: Quote Request CTA Test
- Primary: Quote request submission rate
- Secondary: Click-through rate on CTA, form abandonment rate
- Guardrail: Page bounce rate (shouldn't go up), phone calls (shouldn't decrease)
Designing Variants
What to Vary
| Category | Examples |
|---|---|
| Headlines/Copy | Message angle, value prop, urgency, tone |
| CTA | Button copy, size, placement, color |
| Visual Design | Layout, hierarchy, images |
| Form | Fields required, button placement, form length |
| Timing | When CTA appears (immediately vs. after scroll) |
Best Practices
- Single, meaningful change
- Bold enough to make a difference
- True to the hypothesis
HVAC-Specific Test Ideas
Example 1: CTA Copy Test
- Control: "Get Free Quote"
- Variant: "Schedule Service Today"
- Hypothesis: Specific action language increases urgency and form completion
Example 2: Urgency Test
- Control: Standard headline
- Variant: "Emergency AC Service Available Now"
- Hypothesis: Urgency language increases quote requests for emergency services
Example 3: Form Length Test
- Control: 5-field quote form (Name, Phone, Service type, Issue, Address)
- Variant: 3-field form (Name, Phone, Service type)
- Hypothesis: Fewer required fields increase form completion rate
Traffic Allocation
| Approach | Split | When to Use |
|---|---|---|
| Standard | 50/50 | Default for A/B |
| Conservative | 90/10, 80/20 | Limit risk of bad variant |
| Ramping | Start small, increase | Technical risk mitigation |
Considerations:
- Consistency: Users see same variant on return
- Balanced exposure across time of day/week
Implementation
Client-Side
- JavaScript modifies page after load
- Quick to implement, can cause flicker
- Tools: PostHog, Optimizely, VWO
Server-Side
- Variant determined before render
- No flicker, requires dev work
- Tools: PostHog, LaunchDarkly, Split
Running the Test
Pre-Launch Checklist
- Hypothesis documented
- Primary metric defined
- Sample size calculated
- Variants implemented correctly
- Tracking verified
- QA completed on all variants
During the Test
DO:
- Monitor for technical issues
- Check segment quality
- Document external factors
Avoid:
- Peek at results and stop early
- Make changes to variants
- Add traffic from new sources
The Peeking Problem
Looking at results before reaching sample size and stopping early leads to false positives and wrong decisions. Pre-commit to sample size and trust the process.
Analyzing Results
Statistical Significance
- 95% confidence = p-value < 0.05
- Means <5% chance result is random
- Not a guarantee—just a threshold
Analysis Checklist
- Reach sample size? If not, result is preliminary
- Statistically significant? Check confidence intervals
- Effect size meaningful? Compare to MDE, project impact
- Secondary metrics consistent? Support the primary?
- Guardrail concerns? Anything get worse?
- Segment differences? Mobile vs. desktop? New vs. returning?
Interpreting Results
| Result | Conclusion |
|---|---|
| Significant winner | Implement variant |
| Significant loser | Keep control, learn why |
| No significant difference | Need more traffic or bolder test |
| Mixed signals | Dig deeper, maybe segment |
Documentation
Document every test with:
- Hypothesis
- Variants (with screenshots)
- Results (sample, metrics, significance, confidence intervals)
- Decision and learnings
- Next steps
Example:
## Test: CTA Copy for Quote Requests
**Hypothesis:** "Schedule Service Today" (action-specific) will increase quote form
submissions more than "Get Free Quote" (generic).
**Duration:** Jan 15-29, 2024
**Sample Size:** 15,000 per variant
**Results:**
- Control: 3.2% conversion rate (485/15,000)
- Variant: 3.8% conversion rate (570/15,000)
- Lift: +18.8%
- Confidence: 97.3% (p=0.004)
**Decision:** Implement variant. Action-specific CTA performed significantly better.
**Learning:** Specificity drives urgency. Test "Call Now" vs "Schedule Today" next.
Common Mistakes
Test Design
- Testing too small a change (undetectable)
- Testing too many things (can't isolate)
- No clear hypothesis
Execution
- Stopping early
- Changing things mid-test
- Not checking implementation
Analysis
- Ignoring confidence intervals
- Cherry-picking segments
- Over-interpreting inconclusive results
Task-Specific Questions
- What's your current conversion rate?
- How much traffic does this page get?
- What change are you considering and why?
- What's the smallest improvement worth detecting?
- What tools do you have for testing?
- Have you tested this area before?
Related Skills
- page-cro: For generating test ideas based on CRO principles
- analytics-tracking: For setting up test measurement
- hvac-copywriting: For creating variant copy