← Blog

False positive uptime alerts: practical tuning for indie SaaS

False positives train you to ignore on-call: a blip during deploy, a cold start, or a 200 from a login page while the API is fine. The fix is not “buy Datadog” — it is health URL design and understanding how StillOnline marks checks DOWN.

This guide complements uptime probes and antibot with interval choices, debounce behavior, and when to split checks on Pro.

Quick answer

StillOnline marks a check DOWN after two consecutive failed probes — on Free, five-minute interval means roughly 10 minutes from first failure to alert. Reduce noise by pointing at a stable GET /api/health (not homepage behind WAF), returning 200 in under two seconds, and exempting the health path from aggressive bot rules. Free cannot change interval (300 s only); Pro allows 120–300 s and up to 10 URL checks per project. Fix redirect chains with curl -L before blaming the monitor.

Knobs you actually have

KnobFreePro / Ultimate
Probe interval300 s (5 min) fixed60–300 s per check
Fail threshold → DOWN2 consecutive failsSame
Owner alert repeat while DOWNEmail throttled 15 min; Telegram per transitionSame
Number of URL checks1 per project10 / 25

Debounce is intentional — notifications avoid paging on one packet loss.

Tuning workflow

1 — Verify the URL like a probe

curl -sS -o /dev/null -w "%{http_code} time:%{time_total}s final:%{url_effective}\n" -L --max-redirs 5 "https://api.yourproduct.com/health"

Expect 200, time < 2s, stable final: URL.

2 — Separate liveness from heavy /ready

Cold starts (serverless, Edge Functions) may exceed timeout once — use lightweight /health without DB on the cold path — health design.

3 — Antibot and redirects

Homepage 200 with challenge HTML = false green; login redirect = false green for product — full guide: antibot probes. PROBE_LIMITED (yellow) means antibot blocked probe without opening incident.

4 — Split checks on Pro

  • Check A: marketing site (optional)
  • Check B: api.yourproduct.com/health (authoritative)

Free must combine signals in one URL or accept tradeoffs.

5 — Deploy windows

Brief 503 during deploy may trigger DOWN — use /health liveness only if you want green during rolling deploys, or pause alerts manually (no snooze button in v1 — plan deploys).

When not to tune the monitor

SituationFix infrastructure, not threshold
Real 500s after deployRoll back
DB pool exhaustedFix pool or return 503 on /readyDB pool health
Cert expires tomorrowSSL monitoring

Related guides

FAQ

Can I set StillOnline to alert on one failure only?

No. Production uses two consecutive failures before DOWN to reduce noise.

Why did I get DOWN for 30 seconds during deploy?

Two failed probes in a row crossed the threshold. Use lighter /health or deploy when you can accept brief red — interval sets minimum recovery visibility.

Does shortening interval to 60 s on Pro increase false positives?

It can. Faster detection = more sensitivity to blips. Start at 300 s until health URL is stable.

StillOnline shows green but users cannot log in?

Probe likely hits wrong URL — auth flow limits and antibot guide.