System Health Dashboard Builder

AI Automation / Workflow Systems

Define KPIs and visualization for overall workflow system health — throughput, latency, error rates, SLAs — provides executive visibility into automation health.

Difficulty: Intermediate

Model: GPT-4 / Claude / Gemini

Use Case: Monitoring, Observability, SLA Tracking

Updated: May 2026

Why This Prompt Exists

Workflow systems have many moving parts. Without a dashboard, you can’t see the forest for the trees — and executives can’t see anything at all.

You get:

no visibility into overall system health (is automation working?)
executives surprised by automation failures (no dashboard)
metrics that don’t matter (vanity metrics, not action metrics)
no SLA tracking (can’t tell if you’re meeting promises)
reactive management (find out about failures from user complaints)

But dashboards can be designed:

health score: overall system health (0-100)
throughput: workflows completed per hour/day
latency: p95 and p99 completion times
error rates: percentage of failed executions
SLA attainment: percentage meeting performance targets

Without dashboards, you’re flying blind.

This prompt designs comprehensive system health dashboards.

The Prompt

Assume the role of an observability engineer who designs system health dashboards.

Your task is to define metrics and visualizations for workflow system health.

Generate:

1. DASHBOARD SECTIONS

| Section | Audience | Metrics | Refresh |
|---------|----------|---------|---------|
| Executive Summary | Leadership | System health score, SLA attainment | Daily |
| Operations | On-call engineers | Error rates, queue depth, latency | Real-time |
| Capacity Planning | Ops leads | Throughput trends, resource usage | Weekly |

2. SYSTEM HEALTH SCORE

Score ranges:
- 90-100: Healthy
- 70-89: Degraded
- 0-69: Critical

3. KEY METRICS TABLE

| Metric | Formula | Target | Alert Threshold |
|--------|---------|--------|-----------------|
| System health | [from above] | >90 | <80 |
| Error rate | failed / total | <1% | >5% |
| p95 latency | 95th percentile of duration | <5s | >10s |
| Throughput | workflows / hour | 10k/hr | <5k/hr |
| Queue depth | pending workflows | <100 | >500 |

4. VISUALIZATION RECOMMENDATIONS

- Health score: Gauge chart (red/yellow/green)
- Error rate: Time-series line chart
- Latency: Heatmap by workflow
- Throughput: Area chart with forecast
- Queue depth: Bar chart with alert line

5. ALERT RULES

| Metric | Condition | Severity | Action |
|--------|-----------|----------|--------|
| Health score | <80 for 5 min | Warning | Slack |
| Health score | <60 for 2 min | Critical | PagerDuty |
| Error rate | >5% for 2 min | Critical | PagerDuty |
| Queue depth | >500 for 10 min | Warning | Investigate |

6. SLA TRACKING

| Workflow | SLA (p95) | Current Attainment | Target |
|----------|-----------|-------------------|--------|
| WF-001 | 5s | 99.5% | 99.9% |
| WF-002 | 30s | 98.2% | 99.0% |

7. EXECUTIVE REPORTING
   - Weekly summary: health score trends, top issues
   - Monthly review: SLA attainment, capacity trends
   - Quarterly business review: ROI of automation

INPUTS:

Workflow inventory (from WS-01):
[PASTE WORKFLOW LIST]

SLAs from business requirements:
[E.G., "Lead routing: 5s p95, 99.9% uptime"]

Expected throughput (from business volume):
[E.G., "50,000 leads per day"]

Dashboard tool preferences:
[E.G., "DataDog, Grafana, Tableau"]

RULES:
- Health score should be understandable at a glance (red/yellow/green)
- Include leading indicators (queue depth) not just lagging (error rate)
- Different audiences need different views (executive vs. operations)
- Alert on symptoms (high error rate), not causes (exceptions)
- Review metrics quarterly — what matters changes over time
- Dashboard without action is decoration (include alerting)

How To Use It

Health score should be understandable at a glance — red/yellow/green.
Include leading indicators (queue depth) not just lagging (error rate).
Different audiences need different views — executive summary vs. operational dashboard.
Alert on symptoms (high error rate), not causes (specific exceptions).
Review metrics quarterly — what matters changes over time.
A dashboard without action is decoration — include alerting.

Example Input

Workflow inventory:
“WF-001 Lead Capture, WF-002 Lead Scoring, WF-003 Lead Routing, WF-004 Reporting”

SLAs from business requirements:
“Lead Capture to Routing: 30s total p95. Reporting: 5 minutes p95.”

Expected throughput:
“50,000 leads per day (peak 10,000/hour)”

Dashboard tool preferences:
“DataDog”

Why It Works

Most teams have metrics but no dashboard — or a dashboard with metrics no one understands. Neither helps.

This framework improves outcomes by forcing:

dashboard sectioning (who needs to see what?)
health score definition (single number for system status)
key metrics specification (what to track, what targets)
visualization design (how to see the data)
SLA tracking (are we meeting promises?)

Failure modes this prevents:

Executive surprise — “I didn’t know automation was failing”
Alert fatigue — too many metrics, no clear health signal
Dashboard decay — metrics that don’t change, no one reviews
No action — dashboard shows problems but no alerting

This improves on: Scattered metrics and no dashboard. A unified health dashboard provides visibility for everyone.

Related to: WS-01 (Documenter) for workflow inventory; WS-04 (Optimizer) for performance improvement targets.

Build Better AI Systems

Subscribe for advanced prompt engineering, AI coding tools, debugging frameworks, and practical strategies for developers and engineers.

Save this as a PDF

Build Better AI Systems

Share this: