You get:
- fixing problems that aren’t the real problem
- breaking what already works while trying to fix failures
- no systematic error analysis before refining
- fixes that address symptoms, not root causes
- repeating the same types of failures across prompt versions
But failures have patterns:
- input characteristics: what kind of inputs cause failures? (length, topic, structure)
- output error types: what’s wrong? (hallucination, omission, format, reasoning)
- failure frequency: how often does each failure type occur?
- root cause: prompt ambiguity, missing examples, unclear constraints
- fix strategy: add example, clarify instruction, add constraint, add fallback
Without failure analysis, you refine blind.
This prompt analyzes failures and suggests targeted fixes.
Assume the role of a prompt debugger who analyzes failure patterns. Your task is to analyze where a prompt fails and recommend specific fixes. Generate: 1. FAILURE LOG (input) - List of inputs where prompt failed - Expected output vs. actual output 2. FAILURE PATTERN ANALYSIS | Failure Type | Count | % of failures | Input characteristics | Root cause hypothesis | |--------------|-------|---------------|----------------------|----------------------| | [e.g., Hallucination] | X | X% | [e.g., "long inputs >500 words"] | [e.g., "no instruction to say 'I don't know'"] | 3. INPUT CHARACTERISTICS CORRELATION - Length: failures happen at [short/medium/long] lengths - Topic: failures cluster around [specific topics] - Structure: failures happen with [specific formats] - Ambiguity: failures happen when input is [clear/ambiguous] 4. ROOT CAUSE ANALYSIS (per failure type) - Most likely root cause - Evidence for this cause 5. TARGETED FIX RECOMMENDATIONS (per failure type) | Failure Type | Fix Type | Specific Change | Expected improvement | |--------------|----------|-----------------|---------------------| | [type] | [add example / clarify / add constraint] | [exact wording to add/change] | [X% reduction] | 6. FIXED PROMPT (proposed) - The prompt with recommended changes 7. VALIDATION PLAN - How to test that fixes work without breaking other cases INPUTS: Original prompt: [PASTE THE PROMPT] Failure examples (at least 5-10): [PASTE INPUTS + INCORRECT OUTPUTS] Success examples (for comparison): [PASTE INPUTS + CORRECT OUTPUTS, OPTIONAL] Task type: [CLASSIFICATION / GENERATION / EXTRACTION / OTHER] RULES: - Collect at least 10 failure examples before analyzing (small samples mislead) - Look for patterns across failures — one-off errors may not need fixing - Fix root causes, not symptoms (addressing one failure may fix many) - Test fixes on success examples to ensure you didn't break them - Save failure logs to build a regression test suite
- Collect failures systematically (log all prompt outputs, flag bad ones).
- Don’t fix after one failure — wait until you see a pattern (at least 5-10 similar failures).
- Fix root causes: if prompts fail on long inputs, add instruction for long inputs — don’t just tweak wording.
- Test your fix on the failure examples AND on success examples (ensure you didn’t regress).
- Save failure examples as regression tests for future prompt versions.
Original prompt:
“Classify customer emails as URGENT, NORMAL, or LOW. Respond with only one word.”
Failure examples:
“Input: ‘My account is locked and I need to pay a bill due tomorrow.’ Output: NORMAL (should be URGENT). Input: ‘Quick question about your product.’ Output: LOW (should be NORMAL). Input: ‘URGENT: Please help!!! My payment failed and I’ll be charged a late fee.’ Output: NORMAL (should be URGENT).”
Success examples:
“Input: ‘I have a general question about features.’ Output: LOW.”
This framework improves outcomes by forcing:
- failure pattern analysis (what kind of errors?)
- input correlation (when do failures happen?)
- root cause identification (why is the prompt failing?)
- targeted fix recommendations (fix the cause, not the symptom)
- validation planning (ensure fixes don’t break working cases)
Great failure-driven refinement doesn’t guess at fixes — it analyzes errors and fixes root causes.
Build Better AI Systems
Subscribe for advanced prompt engineering, AI coding tools, debugging frameworks, and practical strategies for developers and engineers.
