Troubleshooting
Common issues and their solutions when using cf-monitor.
Common issues and their solutions when using cf-monitor.
Monitor worker not receiving tail events
Symptoms: No errors appearing in GET /errors, no fingerprints in KV.
Causes and fixes:
-
Missing tail_consumers — check your worker’s wrangler config includes
"tail_consumers": [{ "service": "cf-monitor" }]. Runnpx cf-monitor wireto verify. -
Propagation delay — after deploying a new worker or changing
tail_consumers, Cloudflare takes 30-60 seconds to activate the tail binding. Wait a minute and test again. -
Monitor worker not deployed — run
npx cf-monitor statusto check if the monitor worker is healthy. -
Worker name mismatch —
tail_consumersreferences the monitor worker by name. Ensure it matches thenamefield in the monitor worker’s wrangler config (default:cf-monitor).
No metrics in Analytics Engine
Symptoms: npx cf-monitor status works but AE SQL queries return no data.
Causes and fixes:
-
AE write propagation — Analytics Engine writes take 30-90 seconds to become queryable. This is a platform limitation, not a bug.
-
Missing CF_MONITOR_AE binding — check your worker’s wrangler config includes the
analytics_engine_datasetsbinding. The binding name must beCF_MONITOR_AE. -
No traffic — AE data is only written when your worker handles requests. Hit your worker and wait 60 seconds.
-
Zero metrics — if all binding operations return zero (e.g. no D1 calls), the SDK skips the AE write to save cost. This is by design.
Circuit breaker won’t reset
Symptoms: Worker returns 503 even after waiting for TTL to expire.
Causes and fixes:
-
KV edge propagation — KV TTL expiration can take up to 60 seconds to propagate across Cloudflare’s edge. Wait a full minute after expected expiry.
-
Manual reset — force a reset via the admin endpoint:
curl -X POST https://cf-monitor.YOUR_SUBDOMAIN.workers.dev/admin/cb/reset \ -H "Authorization: Bearer YOUR_ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"featureId": "your-feature-id"}' -
You used
wrangler kv key deleteto reset — this breaks the fast-propagation pattern. cf-monitor resets a CB by writing'GO'with a 60-second TTL (not by deleting the key), which forces KV edge cache invalidation. If youdeletethe key instead, edges that cachedSTOPkeep serving 503s for up to ~60 seconds. Always usePOST /admin/cb/reset. -
Monthly budget also tripped — daily budgets reset via TTL, but if the monthly budget is also exceeded, the CB will be re-tripped on the next hourly check. Increase the monthly budget or wait for the month to roll over.
-
Account-level CB — check if the account CB is active:
curl https://cf-monitor.YOUR_SUBDOMAIN.workers.dev/statusClear it with:
curl -X POST .../admin/cb/account \ -H "Authorization: Bearer YOUR_ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"status":"clear"}'
CLI init fails
Symptoms: npx cf-monitor init errors out.
Causes and fixes:
-
Missing API token — set
CLOUDFLARE_API_TOKENin your environment or pass--api-token. -
Wrong account ID — the account ID is a 32-character hex string. Find it in the Cloudflare dashboard under Account Home > Account ID (right sidebar).
-
Insufficient permissions — the API token needs: Workers KV Storage (Edit), Account Analytics (Read), Workers Scripts (Edit).
-
Network issues — the CLI makes API calls to
api.cloudflare.com. Ensure you’re not behind a proxy that blocks these.
Worker name shows as ‘worker’
Symptoms: feature IDs all start with worker: instead of your actual worker name.
Causes and fixes:
-
WORKER_NAME not set — run
npx cf-monitor wire --applyto automatically injectWORKER_NAMEfrom your wrangler config’snamefield. -
Manual fix — add to your wrangler config:
{ "vars": { "WORKER_NAME": "my-worker-name" } } -
SDK override — set
workerNamein the monitor config:monitor({ workerName: 'my-worker', fetch: handler });
Detection chain: config.workerName > env.WORKER_NAME > env.name > 'worker'
GitHub issues not being created
Symptoms: Errors are captured (visible in GET /errors) but no GitHub issues are created in the repo.
Causes and fixes:
-
Missing
GITHUB_TOKENsecret — set it vianpx cf-monitor secret set GITHUB_TOKEN. This must be a GitHub PAT withrepoorissues:writescope. -
Missing
GITHUB_REPOvar or config — check that either:GITHUB_REPOis set in.cf-monitor/wrangler.jsoncvars, ORCF_MONITOR_CONFIGis set (automatically embedded since v0.3.6 when--github-repois passed toinitorcf-monitor.yamlhasgithub.repoconfigured)
-
cf-monitor.yamlnot re-embedded — if you addedgithub.repotocf-monitor.yamlafter initial deploy, runnpx cf-monitor deployto re-embed the config. -
Rate limited — cf-monitor limits to 10 issues per script per hour. Check
GET /errorsfor rate limit entries. -
Deduplication — if the same error fingerprint already has a GitHub issue, cf-monitor won’t create a duplicate. Check KV key
err:fp:{fingerprint}.
Verify: Run npx cf-monitor status — the response shows whether GitHub is configured.
Budget enforcement not working
Symptoms: usage accumulates in KV (budget:usage:daily:* keys) but no circuit breakers trip and no Slack warnings appear.
Causes and fixes:
-
No budget config keys — check KV for
budget:config:*keys. If empty, the hourly budget-check cron will auto-seed defaults fromPAID_PLAN_DAILY_BUDGETSon the next run. Trigger it manually:curl -X POST https://cf-monitor.YOUR_SUBDOMAIN.workers.dev/admin/cron/budget-check \ -H "Authorization: Bearer YOUR_ADMIN_TOKEN" -
Config-sync not run — if you set custom budgets in
cf-monitor.yaml, push them to KV:npx cf-monitor config sync -
Seed flag active — auto-seeding is prevented for 24 hours after the last seed (to avoid hourly KV writes). If you need to re-seed immediately, delete the flag:
wrangler kv key delete "budget:config:__seeded__" --namespace-id YOUR_KV_NAMESPACE_ID -
__account__fallback — even without per-feature configs, the__account__config applies to all features. If this is missing too, auto-seeding failed. Checkwrangler tail cf-monitorfor errors.
Budget warnings not appearing in Slack
Symptoms: budgets are being exceeded (CB trips visible) but no Slack messages.
Causes and fixes:
-
SLACK_WEBHOOK_URL not set — run
npx cf-monitor secret set SLACK_WEBHOOK_URLand paste your Slack incoming webhook URL. -
Deduplication — budget warnings are deduplicated for 1 hour (daily) or 24 hours (monthly). If you just resolved the issue and it triggered again, the alert may be suppressed.
-
Test the payload — verify Slack payload formatting:
curl -X POST https://cf-monitor.YOUR_SUBDOMAIN.workers.dev/admin/test/slack-dry-run \ -H "Authorization: Bearer YOUR_ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"type":"budget-warning","featureId":"test","metric":"kv_reads","current":900,"limit":1000}'
GitHub issues not being created
Symptoms: errors are captured (fingerprints in KV) but no GitHub issues appear.
Causes and fixes:
-
GITHUB_REPO or GITHUB_TOKEN not set — both are required. Run:
npx cf-monitor secret set GITHUB_TOKENAnd ensure
github.repois set in cf-monitor.yaml. -
Token permissions — the token needs
reposcope (classic PAT) orissues: writepermission (fine-grained PAT). -
Rate limit — max 10 issues per script per hour. If you’ve triggered many errors quickly, wait for the rate window to pass.
-
Test the format — use the dry-run endpoint to see what would be created:
curl -X POST .../admin/test/github-dry-run \ -H "Authorization: Bearer YOUR_ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"scriptName":"my-worker","outcome":"exception","errorMessage":"test error"}'
Feature IDs are wrong or unexpected
Symptoms: budget keys and AE data use unexpected feature IDs.
Causes and fixes:
-
Path normalisation — cf-monitor strips numeric segments (
/users/123becomesusers), UUIDs, and limits paths to 2 segments. This is intentional to prevent feature ID explosion. -
Explicit control — use the
featuresmap for routes that need specific IDs:monitor({ features: { 'POST /api/scan': 'scanner:social', 'GET /api/users/:id': 'api:users', }, fetch: handler, }); -
Single bucket — for simple workers, use
featureIdto put everything in one budget:monitor({ featureId: 'my-worker:all', fetch: handler });
Usage data shows “No usage data collected yet”
Symptoms: npx cf-monitor usage or GET /usage returns no data.
Causes and fixes:
-
First cron hasn’t run — account usage is collected hourly on the
0 * * * *schedule. Wait for the next hour, or trigger manually:curl -X POST https://cf-monitor.YOUR_SUBDOMAIN.workers.dev/admin/cron/collect-account-usage \ -H "Authorization: Bearer YOUR_ADMIN_TOKEN" -
Missing CLOUDFLARE_API_TOKEN — the same API token used for worker discovery is used for GraphQL queries. Ensure it’s set as a secret on the cf-monitor worker.
-
No services in use — if your account has zero D1, KV, R2, etc. activity in the last 24 hours, the usage snapshot will show empty services. This is correct behaviour.
-
GraphQL API unavailable — the CF GraphQL Analytics API occasionally returns errors. Check the monitor worker’s logs for
[cf-monitor:usage]messages.
Plan shows as “paid” when account is actually free
Symptoms: GET /status or npx cf-monitor status shows plan: "paid" on a Workers Free account.
Causes and fixes:
-
Token lacks billing permission — plan detection requires the
Account Settings: Readpermission (#billing:read) on your API token. Without it, cf-monitor conservatively defaults to “paid” (which means higher budget limits — safe for Paid but effectively leaves Free accounts under-protected since Paid limits are ~10× Free limits). Add the permission to your token for accurate detection. -
Cached result — the detected plan is cached in KV for 24 hours. If you recently upgraded/downgraded your plan, wait for cache expiry or delete the
config:planKV key manually:wrangler kv key delete "config:plan" --namespace-id YOUR_KV_NAMESPACE_ID
Billing period hasn’t updated after plan change
Symptoms: GET /plan shows stale billingPeriod dates (e.g. days that have already passed, or the wrong day-of-month after changing your billing cycle). Monthly budget KV keys use the old period.
Cause: config:billing_period is cached in KV for 32 days. Plan upgrades, downgrades, or billing-day changes don’t force an immediate refresh — the cache silently ages out up to a month later.
Fix: delete the cache key to force re-detection on the next budget-check cron:
wrangler kv key delete "config:billing_period" --namespace-id YOUR_KV_NAMESPACE_ID
# Optional: trigger budget-check immediately rather than wait for the next hour
curl -X POST https://cf-monitor.YOUR_SUBDOMAIN.workers.dev/admin/cron/budget-check \
-H "Authorization: Bearer YOUR_ADMIN_TOKEN"
If you had monthly budgets accumulating under the old period key, those counters remain valid — the transition logic reads both old and new period keys during the v0.2.x → v0.3.x transition window, so no usage data is lost.
Budget configs keep disappearing from KV
Symptoms: budget:config:* keys exist in KV, then disappear ~24 hours later, then reappear. No cf-monitor.yaml budgets: block configured.
Cause: auto-seeded budget configs have a 25-hour TTL. Auto-seeding runs once on the first hourly budget-check cron when KV has no budget:config:* keys. The seed flag (budget:config:__seeded__, 24-hour TTL) prevents re-seeding every hour, but once both the configs and the seed flag expire, the next cron re-seeds.
This is working as intended — limits don’t change between seedings as long as your CF plan is stable. But it’s cosmetically noisy and makes custom budget values impossible via KV editing (they’d be overwritten on next re-seed).
Fix: define your own budgets in cf-monitor.yaml:
budgets:
daily:
d1_writes: 50000
kv_writes: 10000
monthly:
d1_writes: 1000000
Then push them to KV (this writes without a TTL, so they’re permanent):
npx cf-monitor config sync
Your explicit budgets take precedence over auto-seeded defaults on subsequent crons.
Debug endpoints
These endpoints are always available on the monitor worker for troubleshooting:
| Endpoint | What it tells you |
|---|---|
GET /_health | Is the monitor worker running? |
GET /status | Account health, plan, billing period, CB states, GitHub/Slack config |
GET /errors | Recent error fingerprints and their GitHub issue URLs |
GET /budgets | Active circuit breakers, billing period |
GET /workers | Which workers have been discovered on the account |
GET /plan | Detected plan type, billing period, days remaining, plan allowances |
GET /usage | Account-wide per-service usage from CF GraphQL (approximate) |
GET /self-health | Self-monitoring: stale crons, error counts, handler breakdown |
Admin endpoints returning 401
Symptoms: All POST /admin/* requests return {"error":"Unauthorized"}.
Causes and fixes:
-
ADMIN_TOKEN not set — set the secret on the cf-monitor worker:
openssl rand -hex 32 # Generate a token npx cf-monitor secret set ADMIN_TOKEN -
Missing Authorization header — admin requests require:
curl -X POST .../admin/cron/budget-check \ -H "Authorization: Bearer YOUR_ADMIN_TOKEN" -
Wrong token — ensure the token in the header matches what was set via
secret set. Tokens are case-sensitive. -
Missing “Bearer ” prefix — the header must be
Authorization: Bearer <token>, notAuthorization: <token>.
See Security — Admin endpoint authentication for details.
Self-monitoring shows stale crons
Symptoms: GET /self-health returns 503 with staleCrons listing one or more handlers.
Causes and fixes:
-
Cron recently deployed — after first deploy, it may take up to the cron interval (15 min or 1 hour) for all cron handlers to run once. Wait for the next scheduled execution.
-
Worker not running — check
npx cf-monitor statusandwrangler tail cf-monitorfor errors. -
KV propagation — self-monitoring timestamps are stored in KV with 48-hour TTL. Edge cache inconsistency may briefly show stale data.
-
Actual failure — if a specific cron handler consistently appears stale, check
wrangler tail cf-monitorfor errors during that handler’s schedule. Common causes: API token expired, GitHub rate limit, Slack webhook revoked. -
Race condition (pre-v0.3.7) — versions before v0.3.7 stored all cron timestamps in a single KV blob. When two crons ran concurrently (e.g. daily-rollup + worker-discovery at midnight), the last writer clobbered the other’s timestamp, causing a false stale alert. Upgrade to v0.3.7+ which uses per-handler KV keys.