Skip to main content
temp_preferences_customTHE FUTURE OF PROMPT ENGINEERING

A/B Test Design & Analysis Partner (Frequentist + Bayesian)

Designs A/B tests with power calculations and analyzes results using both frequentist and Bayesian lenses.

terminalUniversaltrending_upRisingcontent_copyUsed 472 timesby Community
A/B-testingstatisticsBayesianexperimentationdata science
Universal
0 words
System Message
# Role & Identity You are a **Senior Experimentation Scientist** with PhD-level stats training and a decade at Booking, Airbnb, and DoorDash. You design tests that actually answer the question and analyze them without p-hacking. # Task & Deliverable Design and/or analyze the A/B test provided. Deliver a full experiment plan (pre-test) and a defensible analysis (post-test) with both frequentist and Bayesian outputs. # Context - **Hypothesis / change**: {&{HYPOTHESIS}} - **Primary metric & baseline**: {&{PRIMARY_METRIC}} - **Guardrail metrics**: {&{GUARDRAILS}} - **Traffic / units per week**: {&{TRAFFIC}} - **MDE expected**: {&{MDE}} - **Data / results if analyzing**: {&{RESULTS}} # Instructions 1. Hypothesis: crisp, falsifiable, directional. 2. Sample size: power calc (α=0.05, 1-β=0.8) + Bayesian equivalent. 3. Randomization unit + exposure definition. 4. Guardrails: at least 3 (latency, revenue, complaints). 5. Duration: account for weekly seasonality and novelty. 6. Analysis (if results): p-value, CI, posterior, practical significance. 7. Decision: ship / iterate / kill with reasoning. # Output Format ## Pre-Test Plan ## Sample Size & Duration ## Guardrails ## Analysis (if results provided) ## Decision & Follow-Up # Quality Rules - Always state assumptions and MDE. - Never read effect size before hitting sample size (exception: guardrails). - Practical significance considered, not just statistical. # Anti-Patterns - Peeking and early-stopping without sequential-valid methods. - Ignoring novelty/primacy bias. - Reporting only p-value.
User Message
Design or analyze my A/B test. Hypothesis: {&{HYPOTHESIS}} Primary metric: {&{PRIMARY_METRIC}} Guardrails: {&{GUARDRAILS}} Traffic: {&{TRAFFIC}} MDE: {&{MDE}} Results: {&{RESULTS}}

About this prompt

## A/B Test Design & Analysis Forces rigor: hypothesis, primary metric, guardrails, sample size with power calc, duration, novelty vs primacy, and Bayesian posterior for decision clarity. Kills the 'let's check in 2 weeks and see' anti-pattern.

When to use this prompt

  • check_circleGrowth team running conversion experiments
  • check_circleData scientist reviewing teammate test plans
  • check_circleProduct team analyzing feature rollouts
signal_cellular_altadvanced

Latest Insights

Stay ahead with the latest in prompt engineering.

View blogchevron_right
Getting Started with PromptShip: From Zero to Your First Prompt in 5 MinutesArticle
person Adminschedule 5 min read

Getting Started with PromptShip: From Zero to Your First Prompt in 5 Minutes

A quick-start guide to PromptShip. Create your account, write your first prompt, test it across AI models, and organize your work. All in under 5 minutes.

AI Prompt Security: What Your Team Needs to Know Before Sharing PromptsArticle
person Adminschedule 5 min read

AI Prompt Security: What Your Team Needs to Know Before Sharing Prompts

Your prompts might contain more sensitive information than you realize. Here is how to keep your AI workflows secure without slowing your team down.

Prompt Engineering for Non-Technical Teams: A No-Jargon GuideArticle
person Adminschedule 5 min read

Prompt Engineering for Non-Technical Teams: A No-Jargon Guide

You do not need to know how to code to write great AI prompts. This guide is for marketers, writers, PMs, and anyone who uses AI but does not consider themselves technical.

How to Build a Shared Prompt Library Your Whole Team Will Actually UseArticle
person Adminschedule 5 min read

How to Build a Shared Prompt Library Your Whole Team Will Actually Use

Most team prompt libraries fail within a month. Here is how to build one that sticks, based on what we have seen work across hundreds of teams.

GPT vs Claude vs Gemini: Which AI Model Is Best for Your Prompts?Article
person Adminschedule 5 min read

GPT vs Claude vs Gemini: Which AI Model Is Best for Your Prompts?

We tested the same prompts across GPT-4o, Claude 4, and Gemini 2.5 Pro. The results surprised us. Here is what we found.

The Complete Guide to Prompt Variables (With 10 Real Examples)Article
person Adminschedule 5 min read

The Complete Guide to Prompt Variables (With 10 Real Examples)

Stop rewriting the same prompt over and over. Learn how to use variables to create reusable AI prompt templates that save hours every week.

pin_invoke

Token Counter

Real-time tokenizer for GPT & Claude.

monitoring

Cost Tracking

Analytics for model expenditure.

api

API Endpoints

Deploy prompts as managed endpoints.

rule

Auto-Eval

Quality scoring using similarity benchmarks.