How to monitor Server-Sent Events streams without killing connections
Your status page shows green while live notifications stopped ten minutes ago. That is an SSE false green: a naive HTTP probe got 200 on a handshake once, or a half-open stream looks alive while no events flow. Monitor Server-Sent Events health check honestly with a companion GET sidecar, not by hammering the long-lived /events URL.
Quick answer
Do not point StillOnline at your SSE /events stream — long-lived connections and buffering break naive GET monitors. Add GET /health/sse that returns 200 when the last event was sent within N seconds, or 503 when the stream pipeline is stale. StillOnline checks that sidecar URL with HTTP GET and status codes only. Free: one URL, five-minute interval.
For general health URL design, see health endpoint design for SaaS. For WebSocket-specific upgrade issues, see WebSocket uptime health checks.
1. Map why naive HTTP probes break SSE consumers
Three failure modes: probing /events directly (opens/closes streams), probing a static page while /events is dead (false green), and half-open connections where TCP stays up but the server stopped writing events.
| Probe mistake | What goes wrong | Better approach |
|---|---|---|
| GET /events every 5 min | New connection each probe; may skew metrics | Separate /health/sse sidecar |
| GET / only | Marketing page up; stream pipeline stale | Freshness heartbeat JSON |
| 200 on first byte only | Handshake OK; no events for hours | 503 when last_event_age > threshold |
Do: treat SSE like a subsystem with its own health signal mapped to HTTP 503. Do not: register the long-lived stream URL in StillOnline Free.
Triage: alert → curl GET /health/sse → if 503, SSE worker logs → if 200 but users report silence, suspect CDN buffering.
2. Compare companion /health/sse vs synthetic stream readers
A companion sidecar exposes GET /health/sse with JSON like {"status":"ok","last_event_seconds_ago":12}. A synthetic stream reader opens a real SSE connection from a worker. StillOnline fits the sidecar: scheduled HTTP GET, status code only on Free.
| Approach | StillOnline | Synthetic SSE reader |
|---|---|---|
| Protocol | HTTP GET on /health/sse | Long-lived text/event-stream client |
| Proves | App believes events are fresh | End-to-end event delivery |
| Cost/complexity | Low — one route | Worker + scheduler |
let lastEventAt = Date.now();
app.get("/health/sse", (req, res) => {
const ageSec = (Date.now() - lastEventAt) / 1000;
if (ageSec > 120) return res.status(503).json({ status: "stale", ageSec });
res.json({ status: "ok", ageSec });
});
// In SSE handler: lastEventAt = Date.now() on each write
Do: update last_event_at whenever your SSE handler sends any event. Do not: return 200 with stale last_event_at — map staleness to 503 so StillOnline Free sees failure.
3. Fix proxy buffering and CDN pitfalls
Reverse proxies often buffer responses until complete — that breaks SSE delivery. nginx needs proxy_buffering off and often proxy_cache off on the SSE location. Send X-Accel-Buffering: no from the app.
location /events {
proxy_pass http://app_upstream;
proxy_http_version 1.1;
proxy_set_header Connection '';
proxy_buffering off;
proxy_cache off;
chunked_transfer_encoding off;
}
Do: send periodic comment heartbeats (: keepalive) from the server if your product allows idle periods. Do not: rely on default 60s proxy_read_timeout without aligning server heartbeat interval.
4. Run detect, alert, and status page update workflow
When /health/sse flips to 503, run a fixed sequence so customers get honest comms fast.
- Confirm sidecar — curl GET /health/sse from outside.
- Check StillOnline — wait for debounced DOWN (two failed probes on Free).
- Triage layer — app process vs proxy vs upstream publisher.
- Post status page update within 15 minutes — "Live updates delayed."
- Resolve — re-test
/health/sse200 and spot-check browser EventSource.
Do: label the status component "Notifications" or "Live feed." Do not: mark resolved while last_event age is still above threshold.
5. Wire StillOnline operator runbook for SSE
Register https://api.yourproduct.com/health/sse — not /events. StillOnline sends GET, expects 200 when healthy, treats 503 as DOWN on Free.
- Sign in at stillonline.tech/app.
- Add check with full
/health/sseURL, GET, expect 200. - Exclude from auth — monitors send no cookies or JWT.
- Enable alerts — Free: one channel; Pro ($9/mo) adds more — pricing.
- Document stall vs outage in internal runbook.
Do: rehearse stall in staging by pausing the event publisher while keeping HTTP up. Do not: set freshness threshold shorter than your quiet-period heartbeats.
What's next
You stopped probing /events directly, added /health/sse freshness, fixed proxy buffering, and registered StillOnline on the sidecar. Add optional synthetic SSE reader when live updates are revenue-critical.
Open the StillOnline dashboard, paste /health/sse, and enable the alert channel you read during incidents.
Related guides
- Health endpoint design for SaaS
- WebSocket uptime health check monitoring
- Monitoring background workers and queues
- Health check URL quickstart
FAQ
How do I tell a stream stall from a hard outage with StillOnline?
Stall: sidecar returns 503 with high ageSec while the process still accepts HTTP. Hard outage: connection refused or 502/503 at the edge. Start with curl /health/sse from outside.
Does StillOnline open SSE streams?
No. StillOnline sends HTTP GET to the URL you configure. Point it at /health/sse, not text/event-stream /events.
What CDN and proxy settings break SSE monitoring?
Response buffering, aggressive idle timeouts, and caching on the stream path. Disable buffering, extend read timeouts above heartbeat interval.
Do I need a synthetic stream reader beyond StillOnline?
Not for MVP. The freshness sidecar plus StillOnline covers most indie SaaS. Add a worker that opens EventSource when notifications are business-critical.
Should I monitor /events or /health/sse in StillOnline?
/health/sse on Free (one URL). Never the long-lived stream URL.
How does SSE monitoring relate to WebSocket monitoring in StillOnline?
Same sidecar pattern: map subsystem health to GET 200/503. WebSockets need upgrade-specific checks — see our WebSocket uptime article.