Meta-prompting: use the model to write better prompts

Meta-prompting is the practice of using the model itself to draft, critique, or refine prompts. Learn three patterns — generation, critique, and self-improvement — and when each is worth the round-trip.

schedule7 min readLast updated May 1, 2026

You've been staring at a prompt for an hour. Output keeps drifting in the same way. You add a rule. Output gets weirder. You remove the rule. Different problem. You consider quitting.

Try this instead: open a fresh chat, paste your current prompt and three failing outputs, and ask the model: "Diagnose what's wrong with this prompt and propose a fixed version that handles these failure cases without breaking the working ones."

What comes back is often better than what you'd produce on your fifth iteration. Not always — domain knowledge is irreplaceable — but for prompt mechanics (structure, constraints, examples, formatting), an LLM given a clear goal will routinely produce something tighter than a tired human first draft.

Meta-prompting is the practice of using the model itself to draft, critique, or iteratively improve prompts. Three patterns earn their keep in production workflows. This guide is when each works, when each falls short, and how to use them without producing model-flavored prompt slop.

The whole idea in one line

Three patterns: (1) generate — ask the model to write a prompt for a goal; (2) critique — ask it to find weaknesses in an existing prompt; (3) refine — give it a prompt + bad outputs and ask for an improved version. Each saves human cycles.

The mental model: the model as your prompt-engineering assistant#

Modern LLMs have seen millions of instruction/response pairs during training. They have implicit knowledge of what makes a prompt work — structure, clear constraints, worked examples, output format specifications. They're not better than a senior prompt engineer at meta-thinking about prompts; they're competitive, often faster, and useful as a sounding board.

The trick is treating the meta-prompt's output as a draft, not the answer. The model produces something that looks polished; you review with taste, add domain context, and test on real inputs before shipping. The speed-up comes from skipping the blank-page-paralysis phase, not from surrendering judgment.

Pattern 1: prompt generation from a goal#

The simplest meta-prompt: describe what you need, ask for a finished prompt. The model handles structure, constraints, and formatting; you handle review.

terminalPromptMeta-prompt: generate a prompt

Any

I need a prompt that does the following:
{{describe_the_task}}

Constraints I care about:
- Output format: {{format}}
- Tone: {{tone}}
- Length target: {{length}}
- Edge cases the prompt must handle: {{edge_cases}}

Please produce:
1. A complete prompt that meets these requirements.
2. Three example inputs of different shapes.
3. The expected output for each example.
4. A 3-bullet checklist I should verify against.

Format the final prompt as a code block I can copy directly.

play_arrowTry in PromptShip

The output is rarely your final prompt — but it's an excellent first draft to react to. Faster than starting from blank, and the example inputs/outputs make it easier to spot weaknesses.

Pattern 2: prompt critique#

Have a prompt that mostly works but you want to harden? Hand it to a separate model session and ask for a ruthless review.

terminalPromptMeta-prompt: critique an existing prompt

Any

Review the prompt below as if you were a senior prompt engineer.
Find weaknesses, ambiguities, edge cases it doesn't handle, and
unstated assumptions.

Be specific. Don't say "could be clearer" — quote the line and
explain the issue.

For each problem:
- The exact phrase or section
- Why it's a problem
- A concrete fix

Prompt to review:
"""
{{prompt_under_review}}
"""

Output as a numbered list of issues, severity-ranked.

play_arrowTry in PromptShip

Critique pattern is most useful before promoting a prompt to production. Pair with A/B testing to verify the "improvements" actually improve outputs.

Pattern 3: refine from real failures#

Strongest pattern of the three: feed the model a prompt plus 3-5 examples where it produced bad output, and ask for a refined prompt that handles those cases.

terminalPromptMeta-prompt: refine from failure cases

Any

The prompt below is producing bad outputs on certain inputs.
Diagnose the failure, then propose a revised prompt that handles
the failing cases without breaking the working cases.

Original prompt:
"""
{{original_prompt}}
"""

Failing cases (input → bad output → what good would have looked like):
1. Input: {{input_1}}
   Got: {{bad_output_1}}
   Wanted: {{good_output_1}}

2. Input: {{input_2}}
   Got: {{bad_output_2}}
   Wanted: {{good_output_2}}

[…more cases…]

Output:
- Diagnosis: what's wrong with the original prompt
- Revised prompt (full version, copy-paste ready)
- Why the revision should fix the failures

play_arrowTry in PromptShip

This is the most effective pattern because it forces the model to learn from real, specific failures instead of imagined ones. Pair with a real evaluation set so you can verify the revision generalizes.

When meta-prompting earns its keep#

Meta-prompting use cases

If your situation is…	Reach for…	Why
Starting a prompt from scratch in an unfamiliar domain	Pattern 1 (generation)	Beats blank-page paralysis; the draft is reviewable
Hardening a working prompt before production	Pattern 2 (critique)	Fresh-session review catches issues you stopped seeing
Working prompt with known failure cases	Pattern 3 (refine)	Most powerful — concrete failures produce concrete fixes
Onboarding a teammate to prompt engineering	Patterns 1+2 together	Generate → critique demonstrates the bar
Creating 50 prompts for 50 product categories	Pattern 1 with templated inputs	Bulk creation with consistent structure
Domain-specific brand voice or compliance constraints	Skip — write yourself	Model can't generate constraints only you know

When meta-prompting falls short#

Domain expertise the model lacks. The model can't generate the right constraints for your specific brand voice, regulatory requirements, or institutional norms. Those have to come from you.
Sycophantic critiques. Models can lean toward saying "this looks good, just a few small tweaks." Mitigate by framing — "find every weakness, no matter how small" — and by ignoring critiques that don't reference specific lines. See biases.
Replacing real evaluation. A meta-prompt that says "the revised prompt is better" doesn't make it true. Always test against a real eval set.

Use a different model session for critique

Asking the same chat session to critique its own output produces softer feedback than asking a fresh session. For honest critique, start fresh — and ideally use a different model altogether (Claude critiquing GPT-4o output, or vice versa).

Going further: production meta-prompting#

Closed-loop with eval#

The full production pipeline:

Run current prompt against eval set, collect failures.
Feed failures into Pattern 3 to get a revised prompt.
Run revised prompt against eval set. Promote if better; repeat if not.

Effectively gradient descent over prompts, where the model is the gradient estimator. Works best when failures are concrete and eval is fast.

Cross-model critique#

Use one model to critique prompts intended for another. Claude often gives sharper critique on GPT-4o-bound prompts than GPT-4o does on its own. Cross-pollination catches blind spots specific to each model's training.

Bulk prompt generation#

Need 50 prompts for 50 product categories? Don't write them by hand. Define the template (the variable parts, the constraints) and use a meta-prompt to generate all 50 with consistent structure. Hand-review for quality; the time savings are massive when this fits.

The research connection: APE#

Automatic Prompt Engineer (APE — Zhou et al., 2022) formalized this approach in research: use an LLM to generate candidate prompts, evaluate them against a benchmark, and keep the best. Modern production meta-prompting workflows are an applied version of the same idea. See papers for the source.

Common mistakes#

Generic meta-prompts. "Make this prompt better" produces generic improvements. Specify what "better" means: more concise? More robust? Better at edge cases?
Skipping the failure examples in pattern 3. Without real failure cases, the model can only guess what's wrong. Concrete failures are where the value is.
Treating the meta-prompt output as final. The output is a draft. Review it, A/B test it, and version it before promoting to production.
Iterating without an eval set. Each meta-prompt round "feels" like progress. Without measurement, you might be polishing in circles.
Letting the model write prompts that only the model would write. Meta-prompted prompts can drift toward LLM-flavored phrasing — overly structured, hedged, generic. Add taste in the review.

Quick reference#

The 60-second summary

Three patterns: generation (from a goal), critique (of an existing prompt), refinement (from real failures). Pattern 3 is the strongest.

The mindset: meta-prompt output is a draft, not the answer. Review with taste. Test on real inputs.

What it solves: blank-page paralysis, blind spots in your own prompts, bulk prompt creation.

What it doesn't solve: domain expertise, brand voice, real evaluation. Pair with eval.

What to read next#

Meta-prompting is most powerful when paired with rigorous testing — A/B testing prompts tells you whether the "improved" prompt is actually improved. To turn meta-prompted variants into managed assets, version control for prompts. For the academic background (the APE paper), see papers.

Put this guide to work

Save your prompts, version every change, and share them with your team — free for up to 200 prompts.

Start free Browse the prompt library

Meta-prompting: use the model to write better prompts

schedule7 min readLast updated May 1, 2026

You've been staring at a prompt for an hour. Output keeps drifting in the same way. You add a rule. Output gets weirder. You remove the rule. Different problem. You consider quitting.

The whole idea in one line

The mental model: the model as your prompt-engineering assistant#

Pattern 1: prompt generation from a goal#

The simplest meta-prompt: describe what you need, ask for a finished prompt. The model handles structure, constraints, and formatting; you handle review.

terminalPromptMeta-prompt: generate a prompt

Any

I need a prompt that does the following:
{{describe_the_task}}

Constraints I care about:
- Output format: {{format}}
- Tone: {{tone}}
- Length target: {{length}}
- Edge cases the prompt must handle: {{edge_cases}}

Please produce:
1. A complete prompt that meets these requirements.
2. Three example inputs of different shapes.
3. The expected output for each example.
4. A 3-bullet checklist I should verify against.

Format the final prompt as a code block I can copy directly.

play_arrowTry in PromptShip

The output is rarely your final prompt — but it's an excellent first draft to react to. Faster than starting from blank, and the example inputs/outputs make it easier to spot weaknesses.

Pattern 2: prompt critique#

Have a prompt that mostly works but you want to harden? Hand it to a separate model session and ask for a ruthless review.

terminalPromptMeta-prompt: critique an existing prompt

Any

Review the prompt below as if you were a senior prompt engineer.
Find weaknesses, ambiguities, edge cases it doesn't handle, and
unstated assumptions.

Be specific. Don't say "could be clearer" — quote the line and
explain the issue.

For each problem:
- The exact phrase or section
- Why it's a problem
- A concrete fix

Prompt to review:
"""
{{prompt_under_review}}
"""

Output as a numbered list of issues, severity-ranked.

play_arrowTry in PromptShip

Critique pattern is most useful before promoting a prompt to production. Pair with A/B testing to verify the "improvements" actually improve outputs.

Pattern 3: refine from real failures#

Strongest pattern of the three: feed the model a prompt plus 3-5 examples where it produced bad output, and ask for a refined prompt that handles those cases.

terminalPromptMeta-prompt: refine from failure cases

Any

The prompt below is producing bad outputs on certain inputs.
Diagnose the failure, then propose a revised prompt that handles
the failing cases without breaking the working cases.

Original prompt:
"""
{{original_prompt}}
"""

Failing cases (input → bad output → what good would have looked like):
1. Input: {{input_1}}
   Got: {{bad_output_1}}
   Wanted: {{good_output_1}}

2. Input: {{input_2}}
   Got: {{bad_output_2}}
   Wanted: {{good_output_2}}

[…more cases…]

Output:
- Diagnosis: what's wrong with the original prompt
- Revised prompt (full version, copy-paste ready)
- Why the revision should fix the failures

play_arrowTry in PromptShip

When meta-prompting earns its keep#

Meta-prompting use cases

If your situation is…	Reach for…	Why
Starting a prompt from scratch in an unfamiliar domain	Pattern 1 (generation)	Beats blank-page paralysis; the draft is reviewable
Hardening a working prompt before production	Pattern 2 (critique)	Fresh-session review catches issues you stopped seeing
Working prompt with known failure cases	Pattern 3 (refine)	Most powerful — concrete failures produce concrete fixes
Onboarding a teammate to prompt engineering	Patterns 1+2 together	Generate → critique demonstrates the bar
Creating 50 prompts for 50 product categories	Pattern 1 with templated inputs	Bulk creation with consistent structure
Domain-specific brand voice or compliance constraints	Skip — write yourself	Model can't generate constraints only you know

When meta-prompting falls short#

Domain expertise the model lacks. The model can't generate the right constraints for your specific brand voice, regulatory requirements, or institutional norms. Those have to come from you.
Sycophantic critiques. Models can lean toward saying "this looks good, just a few small tweaks." Mitigate by framing — "find every weakness, no matter how small" — and by ignoring critiques that don't reference specific lines. See biases.
Replacing real evaluation. A meta-prompt that says "the revised prompt is better" doesn't make it true. Always test against a real eval set.

Use a different model session for critique

Going further: production meta-prompting#

Closed-loop with eval#

The full production pipeline:

Run current prompt against eval set, collect failures.
Feed failures into Pattern 3 to get a revised prompt.
Run revised prompt against eval set. Promote if better; repeat if not.

Effectively gradient descent over prompts, where the model is the gradient estimator. Works best when failures are concrete and eval is fast.

Cross-model critique#

Bulk prompt generation#

The research connection: APE#

Common mistakes#

Generic meta-prompts. "Make this prompt better" produces generic improvements. Specify what "better" means: more concise? More robust? Better at edge cases?
Skipping the failure examples in pattern 3. Without real failure cases, the model can only guess what's wrong. Concrete failures are where the value is.
Treating the meta-prompt output as final. The output is a draft. Review it, A/B test it, and version it before promoting to production.
Iterating without an eval set. Each meta-prompt round "feels" like progress. Without measurement, you might be polishing in circles.
Letting the model write prompts that only the model would write. Meta-prompted prompts can drift toward LLM-flavored phrasing — overly structured, hedged, generic. Add taste in the review.

Quick reference#

The 60-second summary

Three patterns: generation (from a goal), critique (of an existing prompt), refinement (from real failures). Pattern 3 is the strongest.

The mindset: meta-prompt output is a draft, not the answer. Review with taste. Test on real inputs.

What it solves: blank-page paralysis, blind spots in your own prompts, bulk prompt creation.

What it doesn't solve: domain expertise, brand voice, real evaluation. Pair with eval.

Put this guide to work

Save your prompts, version every change, and share them with your team — free for up to 200 prompts.

Start free Browse the prompt library