← Blog

GraphQL API health check monitoring

Your GraphQL API returns HTTP 200 while the errors array screams failure — and uptime stays green because the monitor only checked a POST-only route. GraphQL health check monitoring needs a split: a cheap GET sidecar for external probes, and optional GraphQL canaries in CI. You will add GET /health on the same host as /graphql, skip introspection in production, and register one HTTPS URL in StillOnline.

Quick answer

GraphQL often returns HTTP 200 with failures in the JSON errors field. StillOnline sends GET requests and checks status codes only on Free — no POST body, no JSON parsing. Point it at GET /health or GET /ready, not POST /graphql. Use { __typename } in CI; never __schema introspection in production monitors. Free: one project, one URL, five-minute interval — pricing.

GraphQL lets clients send one query and get JSON back. The server may answer 200 OK when the query failed because transport worked but the resolver did not. An external uptime monitor hits a URL from the public internet on a schedule with no login and, on StillOnline Free, no custom headers.

For generic patterns, read our health check URL quickstart and API-only SaaS uptime checks. This guide covers GraphQL-specific traps: false greens on 200-with-errors, POST-only endpoints, introspection probes, and when GET /health is enough for StillOnline.

1. Compare GET /health vs GraphQL gateway failure modes

Liveness means the process accepts HTTP. Readiness means it can serve real traffic (database reachable). GraphQL gateways add a third layer: the engine may run while every query returns resolver errors inside JSON — invisible to a monitor that only glances at the status code on the wrong URL.

Monitor layerWhat it provesTool fit
GET /healthProcess aliveStillOnline Free — status code only
GET /ready with DB pingDependencies up; 503 when notStillOnline when DOWN must mean DB down
POST GraphQL + errors assertResolvers execute a real queryCI cron — not StillOnline
GET ?query={__typename}GraphQL HTTP path without POSTCI only unless failures return non-200

Apollo Server v4+ removed built-in health checks — teams add GraphQL-level checks or a separate HTTP route. GraphQL Yoga ships GET /health by default; Hive Gateway uses /healthcheck and /readiness. The indie StillOnline pattern: same host as GraphQL, public GET /health or /ready, share your /s/... status slug in API docs.

Probe workflow: StillOnline GET every 5 min → load balancer → GET /ready200 or 503 → status page + owner alert after two failed probes.

Do: use StillOnline for external liveness or readiness. Do not: assume HTTP 200 on /graphql means customers are fine — the errors array can be full while the status code stays green.

2. Choose lightweight probe queries and avoid introspection traps

The smallest safe query is { __typename } — it returns {"data":{"__typename":"Query"}} without touching databases. Introspection like { __schema { queryType { name } } } maps your schema. Production often disables it; monitors relying on introspection show false DOWN.

Some teams add a healthz field whose resolver pings the database — monitor ok: true in CI with a POST canary, not through StillOnline.

Apollo documents GET /graphql?query=%7B__typename%7D as a trivial health query. CSRF prevention may require header apollo-require-preflight: true on that GET. StillOnline Free sends no custom headers, so prefer a separate /health route that returns plain 200.

Yoga useReadinessCheck plugin exposes GET /ready and returns 503 when your check throws — ideal when StillOnline should page you on database failure.

Do: use { __typename } in staging or CI. Do not: point production monitors at introspection.

For REST handlers beside GraphQL, see FastAPI health check endpoint monitoring.

3. Fix auth, rate limits, and false greens on POST-only APIs

Many servers accept POST on /graphql and return 405 for GET. StillOnline probes use GET with no body — monitoring /graphql directly causes false alerts.

Partial success is worse: HTTP 200 with both data and errors means something worked and something broke. CronAlert and Velprove document this as the core GraphQL monitoring trap. StillOnline Free cannot parse $.errors — return 503 on /ready when dependencies fail, or run a POST canary in GitHub Actions that fails when errors is non-empty.

Sidecar health (Apollo pattern):

app.get('/health', (_req, res) => res.status(200).send('Okay!'));
// GraphQL stays on POST /graphql — StillOnline hits GET /health instead

Yoga readiness: default GET /ready returns 503 when your check throws. Keep probe paths unauthenticated — monitors send no Authorization header. Rate-limit /health generously or exclude it; a five-minute external probe should never compete with user traffic.

Status codes, timeouts, and Cache-Control headers: health endpoint design for SaaS. Aim for sub-second /health; cap /ready under about three seconds so geographic probes stay under the two-second practical ceiling.

Do: exclude /health and /ready from JWT middleware. Do not: require API keys on the URL StillOnline hits — false 401 alerts at 3 a.m. are common.

4. Configure StillOnline HTTP checks safely

StillOnline runs GET probes, updates a status page, and alerts the owner. No GraphQL POST bodies; no JSON validation on Free.

  1. Pick one URL. Free = one check — usually production GET /ready or GET /health.
  2. curl from outside. curl -sS -o /dev/null -w "%{http_code}\n" https://api.example.com/health — expect 200, under 2s.
  3. Register at stillonline.tech/app — full HTTPS URL, GET, expect 200.
  4. Wait 2–3 cycles (about five minutes on Free).
  5. Enable alerts — one channel on Free. Pro ($9/mo) and Ultimate ($29/mo) add more — pricing.
  6. Share /s/your-slug in API docs.

Do: monitor the public edge URL customers use. Do not: register POST /graphqlStillOnline will not send your query body.

Honest limits: external GET only; no custom headers on Free; HTTP status code only (not the errors array); two consecutive failed probes before DOWN. For GraphQL errors inside a 200 response, add a CI job that POSTs { __typename } and fails when errors is present — complementary to /health uptime, not a replacement.

5. Run a runbook when schema deploy breaks probes

Schema deploys can rename your healthz field, tighten auth on /graphql, or disable GET query strings. StillOnline keeps polling /health — unchanged HTTP routes survive deploys better than probes tied to schema shape.

  1. Before deploy: curl /health and /ready from outside your network.
  2. Rename healthz: update CI canary first; keep /health unchanged for StillOnline.
  3. After deploy: watch one probe cycle on the status page.
  4. Federation: gateway /ready should return 503 when critical subgraphs are unreachable.
  5. Document the wired URL in your runbook.

Do: decouple external uptime (GET /health) from schema canaries (CI). Do not: delete /health because Playground looks fine from your laptop.

What's next

You have a GraphQL-friendly split: StillOnline on GET /ready for owner alerts and a public status page, plus optional CI for errors array assertions. Add Telegram or Slack if email is too slow, and link the status page from your footer.

Open the StillOnline dashboard, paste your readiness URL, and enable the channel you read when production blinks.

Related guides

FAQ

Can uptime tools use POST for GraphQL?

Many competitors support POST with a JSON body and assert on the errors field. StillOnline does not — probes are GET without a request body. Use GET /health on the same host, or a GET query string only if your framework maps GraphQL failure to a non-200 status code.

Should StillOnline monitor /graphql directly?

Only if GET works without auth. POST-only /graphql returns 405 on GET. Prefer /health per our API-only monitoring guide.

Does StillOnline detect GraphQL errors in JSON?

No on Free — it evaluates the HTTP status code only. Design /ready to return 503 when dependencies fail so external monitors see DOWN without parsing JSON. Use CI or a GraphQL-native monitor to assert the errors field is empty on a synthetic query.

Should I monitor /health or /ready?

/ready when DB failure should page you; /health for fewer alerts during deploys.

Is { __typename } safe in production?

Yes in CI. Avoid __schema introspection — often disabled in prod. StillOnline needs non-200 HTTP on failure, not JSON parsing.

GraphQL plus StillOnline Free — staging or production?

One URL on Free — pick production. Pro adds more checks for separate environments.