← Blog

WebSocket uptime health check monitoring

Your REST API is green in every dashboard, yet chat and live updates are dead. That is a WebSocket false green: HTTP checks prove the web server answers, not that the realtime upgrade and long-lived session work. Fix it with a GET sidecar that maps WS failures to HTTP 503, then point StillOnline at that URL.

Quick answer

HTTP 200 on / or /api/health does not prove WebSocket upgrade (HTTP 101) or live ping/pong works. StillOnline sends HTTP GET only — no native WSS probe. Add GET /health that returns 503 when your WS process or heartbeat stats fail. Free: one URL, five-minute interval, status code only — map WS failure to 503, not 200 with bad JSON.

A WebSocket is a long-lived two-way channel — a phone call after a short handshake, not a single HTTP postcard. AWS ALB health checks do not support WebSockets; only HTTP GET on a dedicated path. If you only monitor REST /health or your marketing homepage, you miss nginx stripping Upgrade headers, a crashed WS worker, or an expired WSS certificate.

See health endpoint design for liveness vs readiness and API-only SaaS uptime checks for how StillOnline runs external GET probes end to end.

1. Why HTTP 200 on the same host does not prove WebSocket works

Three layers: L1 transport (port, TLS), L2 handshake (HTTP 101 on /ws), L3 session (ping/pong or app heartbeats). StillOnline via a sidecar covers L1–L2 signals; L3 needs in-app metrics mapped to 503 or a native WSS tool.

SymptomLikely layerFirst check
WS DOWN + HTTP UPL2 or L3nginx Upgrade, WS process, connection limits
Both DOWNL1Host, TLS cert, firewall
WS UP + HTTP DOWNMisrouteWrong probe URL or REST-only auth

Do: audit what your monitor actually hits — landing page, REST /api/health, or a WS-aware sidecar. Do not: assume green REST uptime covers chat, notifications, or collaborative editing.

Typical WS-only breaks after deploy: reverse proxy misconfig, per-process connection limits, CDN stripping Upgrade headers, and expired WSS certificates that block the upgrade entirely.

Triage workflow: alert → curl GET /health → if 503, WS logs → if 200 but users suffer, suspect L3 or proxy false green.

2. Sidecar HTTP health vs synthetic WebSocket handshake

A GET sidecar aggregates WS subsystem state and returns 200 or 503. A synthetic WS check opens WSS from outside and optionally pings. StillOnline does only the first: scheduled HTTP GET with status-code evaluation.

ApproachStillOnlineNative WS tools
ProtocolHTTP GET onlyWSS handshake + ping
Free signalStatus code onlyVaries by vendor
Best forOne canonical health URLChat-first L3 proof

Do: check WS process alive and last handshake within N minutes. Do not: return 200 with {"ws_ok": false} on FreeStillOnline Free does not parse JSON fields.

Sidecar sketch (Node ws):

app.get("/health", (req, res) => {
  const ok = !wsProcessCrashed && Date.now() - lastHandshakeMs < 300000;
  if (!ok) return res.status(503).json({ status: "down" });
  res.json({ status: "ok", connections: wss.clients.size });
});

Same sidecar pattern as GraphQL health monitoring and FastAPI /health monitoring — B06 adds WS subsystem signals.

Optional layer-3: native WSS monitors (Exit1, UptimeChecker) open WSS from their infrastructure. StillOnline remains the honest HTTP layer for status pages and owner alerts on indie budgets.

3. Timeouts, TLS, and reverse-proxy pitfalls

nginx default proxy_read_timeout is 60s — idle WS closes without ping/pong. Set heartbeat to about 75% of the shortest proxy timeout (60s proxy → 45s heartbeat). Server ping every 30s with 10s pong timeout is a common Node pattern.

False green: proxies may reply to ping (opcode 0x9) without forwarding to your backend. Sidecar must read in-process stats, not proxy ping alone.

  1. Pass Upgrade and Connection headers through nginx /ws.
  2. Set proxy_read_timeout above heartbeat interval.
  3. Track WSS cert expiry on the REST hostname.
  4. Log 503 reasons for on-call.
  5. curl /health from outside before blaming clients.

Do: document cert renewal in the runbook. Do not: expect browser JS to send protocol ping — only app JSON like {"type":"ping"} works.

Redirects and bot walls: uptime probe redirects and antibot.

4. When to expose a dedicated WebSocket probe URL

Pick one canonical HTTPS URL. Free = one project, one URL — usually GET /health with WS checks inside. Same public origin customers use; response under ~2 seconds.

curl -sS -o /dev/null -w "%{http_code}\n" https://api.example.com/health

Do: set Cache-Control: no-cache and skip JWT on probes. Do not: require auth — monitors send no Authorization header.

Add native WSS monitoring when chat is core revenue. Run it in addition to the sidecar StillOnline checks, not instead.

5. Wire StillOnline HTTP checks and your operator runbook

StillOnline runs GET probes, updates a status page, and alerts after two consecutive failures. No WebSocket upgrade — register the sidecar HTTPS URL.

  1. Sign in at stillonline.tech/app.
  2. Add checkGET https://api.yourapp.com/health, expect 200.
  3. Label component “Realtime” or “WebSocket” on the status page.
  4. Enable one alert channel on Free (Telegram or email).
  5. Wait 2–3 cycles (~10–15 min) before judging noise.

Do: use the URL that means “realtime is broken.” Do not: paste wss:// into StillOnline.

When an alert fires: confirm sidecar status with curl from a home network, check WS worker process and nginx error log, verify connection limits and cert expiry. REST may stay fully up during WS-only outages.

StillOnline debounces two consecutive failed probes before DOWN. On Free the cadence is about five minutes. Pro ($9/mo) and Ultimate ($29/mo) add more URLs and channels — pricing. Basics: health check quickstart.

What's next

Deploy the WS-aware sidecar, align proxy timeouts with heartbeat interval, register StillOnline on /health, and label the status page component “Realtime.” Rehearse WS DOWN + HTTP UP in staging by stopping only the WS worker while REST stays alive.

Open the StillOnline dashboard, paste your sidecar URL, and enable the alert channel you read during incidents.

Related guides

FAQ

Does StillOnline probe WebSockets natively?

No. StillOnline sends HTTP GET and checks the status code. Use a GET sidecar like /health returning 503 on WS failure. wss:// URLs will not get a handshake probe.

How should indie SaaS work around no native WS probes in StillOnline?

Build GET /health with WS logic (process alive, handshake freshness) and map failure to 503. Free reads status codes only. Add a native WSS monitor for optional layer-3 proof on top.

Why does StillOnline show green while chat is dead?

Your check likely hits REST or landing, not a WS-aware sidecar. Or the sidecar returns 200 when only transport works. Ensure 503 when handshake or heartbeat checks fail.

WS DOWN plus HTTP UP — where do I start?

curl /health from outside. If 503, inspect WS logs and nginx Upgrade config. If 200 but users suffer, suspect proxy ping false greens — strengthen sidecar metrics or add native WSS.

Should I monitor /health or /ws/health in StillOnline?

One canonical URL on Free — usually /health with WS checks folded in. Split paths only with multiple checks on Pro or Ultimate.

When do I need a native WebSocket monitor beyond StillOnline?

When realtime is revenue-critical and you need external WSS handshake proof. Keep StillOnline on the GET sidecar for status pages; add native WSS as a second layer.