Research & Analysis / Data Interpretation
Identify data points that don’t fit the pattern and explain why they matter (or should be ignored).
Why This Prompt Exists
Outliers can be the most important signal in your data — or just a typo. But most people ignore them.
You get:
- missing fraud because you dismissed anomalies as noise
- making strategy decisions based on averages that outliers distort
- deleting “bad data” without understanding why it’s different
- surprise at outcomes that outliers predicted but no one noticed
- averages that don’t represent any real customer or situation
But outliers tell stories:
- data error: typo, measurement failure, system glitch
- legitimate extreme: the 1% customers, the best/worst performers
- new pattern: early signal of a trend the rest haven’t caught
- process failure: something broke, and this is the symptom
- segment difference: you’re averaging two populations that should be separate
Without outlier analysis, you’re averages of everything and nothing.
This prompt identifies and interprets anomalies in your data.
The Prompt
Assume the role of a data quality analyst who hunts anomalies. Your task is to identify outliers and explain what they mean. Generate: 1. OUTLIER IDENTIFICATION - Which data points are unusual? (value, how many standard deviations from mean) - Variable/dimension where anomaly appears 2. CONTEXTUAL INVESTIGATION - Is this likely a data error? (evidence: format wrong, impossible value, known system issue) - Is this a legitimate extreme? (evidence: possible in real world, aligns with business rules) - Is this a segment difference? (evidence: clusters with other variables) 3. IMPACT ANALYSIS - How much does this outlier affect the mean/median? - How much does it affect conclusions or decisions? 4. RECOMMENDED ACTION - Investigate (data quality check needed) - Keep but report separately (legitimate extreme, treat as special case) - Remove for main analysis but document (error or irrelevant extreme) - Segment (outlier suggests two populations) 5. WHAT THIS OUTLIER TEACHES US - One insight from this anomaly INPUTS: Data (list or table of values): [PASTE NUMERICAL DATA] Variable name and units: [E.G., "Transaction amount ($USD)"] Expected range (if known): [E.G., "$10-$500 typical, $1000+ unusual"] Business context: [E.G., "E-commerce checkout amounts"] RULES: - Never delete outliers automatically — always investigate first - Flag when the same data point is an outlier on multiple dimensions - Distinguish between univariate outliers (extreme on one variable) and multivariate outliers (unusual combination) - Document any removal decisions — someone will ask later
How To Use It
- Run this before calculating any summary statistics — outliers will distort your means.
- For time-series data, look for anomalies by date — often signal incidents.
- Check outliers in both directions (very high and very low) — both can be informative.
- If you find a cluster of outliers, that’s not an anomaly — that’s a segment you missed.
- Document outlier decisions in your analysis — reproducibility matters.
Example Input
Data:
Transaction amounts: $45, $52, $38, $49, $47, $51, $43, $48, $44, $39, $12,500, $46, $50
Variable name and units:
Transaction amount ($USD)
Expected range (if known):
$20-100 typical
Business context:
E-commerce checkout amounts for a clothing retailer
Why It Works
Most data analysis treats outliers as nuisances to be removed without examination.
This framework improves outcomes by forcing:
- identification (finds what others miss)
- contextual investigation (error vs. signal)
- impact analysis (how much does it matter?)
- action recommendation (remove, keep, or segment)
- insight extraction (what this anomaly teaches)
Great outlier detection doesn’t just find what’s different — it explains why it matters.
Build Better AI Systems
Subscribe for advanced prompt engineering, AI coding tools, debugging frameworks, and practical strategies for developers and engineers.
