Reflexion: agents that learn from their own failures
Reflexion is the pattern where an agent tries, fails, reflects in plain English on what went wrong, and retries with that reflection in context. The result: agents that improve across attempts without weight updates.
A coding agent tries to write a function. The test fails. Most agents at this point either give up or repeat the exact same broken approach in a loop. A Reflexion-style agent does something different: it pauses, writes a paragraph in plain English about what went wrong, and then retries with that paragraph pinned to the top of its context.
The next attempt is often correct — not because the model got smarter, but because the reflection captured what the trace alone doesn't: the lesson. Reflexion is the pattern that turns one-shot agents into iterative learners. No fine-tuning, no weight updates, just a written reflection that survives across attempts.
The whole idea in one line
The mental model: writing the lesson down#
Reinforcement learning works by adjusting weights based on reward signals. Reflexion does the same job in a different medium: the "reward signal" is converted into English prose, and the "weight update" is just adding that prose to the prompt for the next attempt.
Why this works: the model can attend to the reflection when generating its next attempt, so the lesson is operative. It's the difference between a student who repeats a mistake silently and one who writes "I forgot to handle the empty array case" in their notebook before retrying.
The Reflexion loop#
- Act. Agent attempts the task using whatever toolkit it has (often a ReAct loop).
- Evaluate. An external signal — test result, a verifier prompt, a human grader — judges whether the attempt succeeded.
- Reflect. If the attempt failed, a separate prompt asks the model to analyze what went wrong and what would work next time. The output is a short written reflection.
- Retry. The reflection is prepended to the next attempt's prompt. Loop back to step 1.
- Stop. When the attempt succeeds, hits a max-attempt budget, or reflections stop changing.
The reflection prompt#
The quality of Reflexion depends entirely on the reflection prompt. The pattern that works:
The agent attempted the task below and failed. Review what
happened and write a short reflection that, if added to the
next attempt's context, would help avoid the failure.
Task:
"""
{{task_description}}
"""
Trace of the failed attempt:
"""
{{trace}}
"""
Failure signal:
"""
{{evaluator_feedback}}
"""
Output a 2-3 sentence reflection that:
- Identifies the specific cause of failure (not generic advice)
- Suggests a concrete different approach for the next attempt
- Is actionable when read at the start of a new attempt
Reflection:Two non-obvious things this prompt does: forces specificity (cause + concrete alternative), and frames the output as something the next attempt will read. Without that framing, reflections tend toward self-blame or vague resolutions.
When Reflexion earns its keep#
Reflexion vs. alternatives
| If your situation is… | Reach for… | Why |
|---|---|---|
| Task with a clear pass/fail signal (test results, schema validation) | Reflexion | External evaluator gives the loop a clean stopping condition |
| Coding agents in long loops | Reflexion | Without it, agents repeat the same mistake; with it, they progress |
| Tasks with subjective evaluation (writing, design) | Skip — or use human-as-evaluator | Without an objective signal, the loop has no convergence criterion |
| Single-shot one-pass tasks | No retries needed → no Reflexion | The whole pattern assumes multiple attempts |
| Latency-sensitive interactive use | Skip | Each retry adds latency; users won't wait |
| Model already gets it right >90% of the time | Skip | The retries cost more than they help on the rare failures |
Reflexion can amplify bad reasoning
Going further: production patterns#
Persistent reflection memory#
Don't just use the reflection on the very next attempt — store it. The next time a similar task appears, retrieve relevant past reflections and inject them. The agent effectively builds an "institutional memory" of what doesn't work, without retraining.
Same retrieval mechanics as RAG: embed reflections, store in a vector DB, pull top-K relevant ones for the current task.
Separate critic model#
Use a different model (or at minimum a fresh session) for the reflection step. The agent that produced the failed attempt has the same blind spots that caused the failure; asking it to self-diagnose is asking too much. A different model often catches what the original missed.
Bounded retries with escalation#
Cap retries at 3-5. Beyond that, returns diminish and you're burning tokens. When the retry budget exhausts, escalate: return partial output with the reflections logged so a human can intervene.
Eval the reflections themselves#
Build a small dataset of (failed attempt, ideal reflection) pairs. Score your reflection prompt's output against this. Reflections that name vague causes ("the model didn't understand") score lower than ones that name specific causes ("forgot to check for empty array case"). Tune the reflection prompt until specific reflections dominate.
Common mistakes#
- Vague reflection prompts. "What went wrong?" produces self-flagellating prose with no actionable content. Force specificity in the prompt.
- No retry budget. Reflexion can run forever on tasks the model genuinely can't solve. Cap iterations.
- Same model for action and reflection. The model that failed often can't diagnose its own failure. Use a fresh session at minimum, ideally a different model.
- Subjective tasks without an evaluator. Without a pass/fail signal, the loop has no convergence — you're just cycling.
- Skipping observability. Reflexion produces complex traces. Without logging every reflection, debugging the loop is guesswork.
Quick reference#
The 60-second summary
What it is: verbal reinforcement learning. Try → fail → reflect in writing → retry with reflection in context.
When it shines: tasks with objective pass/fail signals (tests, schema validation), coding agents, multi-attempt workflows.
The non-negotiables: bounded retries, fresh model session for reflection, specific (not vague) reflection prompts.
The compound effect: persisted reflections become a memory of what doesn't work — the agent accumulates lessons across sessions.
What to read next#
Reflexion sits on top of ReAct — the action loop that produces the attempts. For a different way to lift accuracy on hard tasks, self-consistency. For the broader agent context, Introduction to agents and agent memory (where persistent reflections live). The original Shinn et al. (2023) paper is in our papers list.
Put this guide to work
Save your prompts, version every change, and share them with your team — free for up to 200 prompts.