Post-mortem: Cloudflare Bot Management blocked MCP webhooks

2026-05-09 749 words 4 minutes

/images/postmortem-cf-bot-blocking-mcp-featured.jpg

Contents

The symptom

I just finished a webhook endpoint in hugo-mcp-proxy that will receive notifications from GitHub on every push to the arleo.eu repo. Clean implementation: HMAC-SHA256, rate limiting, IPAddressAllow GitHub ranges in systemd.

Functional test from an external client:

        
$ curl -X POST https://mcp-hugo.arleo.eu/webhook/test \
    -H "Content-Type: application/json" \
    -d '{"test": true}'

Response: 403 Forbidden.

Strange. The service is running, my source IP is whitelisted, the HMAC is correct. Why 403?

Server-side investigation

NUC nginx logs:

$ sudo tail -100 /var/log/nginx/mcp-hugo.access.log | grep webhook

Empty. No request reaches nginx.

mcp-oauth-proxy logs:

$ sudo journalctl -u mcp-oauth-proxy -n 100 | grep webhook

Empty too. The request doesn’t even reach the service.

Either it’s blocked by the firewall before nginx (CrowdSec or ufw), or upstream by Cloudflare.

The truth at Cloudflare

I open the Cloudflare dashboard → Security → Events. Filter on mcp-hugo.arleo.eu:

2026-05-08 14:23:11  Action: Block
                     Source IP: 82.65.X.X
                     Country: FR
                     User-Agent: python-httpx/0.28.1
                     Rule: Cloudflare Bot Management
                     Score: 6 (likely automated)

Cloudflare Bot Management detects my curl -X POST because (probably) I made the test from a Python script using httpx. The UA python-httpx/0.28.1 is in a list of typically-automated UAs.

The webhook is legitimately automated (that’s the point) but Cloudflare Bot Management doesn’t differentiate good and bad bots by default.

Why this default behavior

Cloudflare Bot Management protects against abusive bots: aggressive SEO crawlers, price scrapers, credential brute-force, etc. It treats any non-browser client as suspect by default, unless signals say otherwise (reputation, Bot ID, JS challenge passed).

For a webhook endpoint (which MUST receive automated traffic), this behavior is exactly the opposite of what you want.

The fix: scoped bypass rule

Cloudflare dashboard → Security → WAF → Custom Rules → Create rule:

Name: Allow webhook traffic to mcp-hugo
Expression: 
  (http.host eq "mcp-hugo.arleo.eu") 
  and (http.request.uri.path matches "^/webhook/")
  and (ip.src in {160.79.104.0/21 140.82.112.0/20})
Action: Skip
Skip products: Bot Management

Three combined conditions:

Explicit host — no global bypass, just the MCP subdomain
Path match — only /webhook/* routes, not other MCP tools
Source IP — official claude.ai (160.79.104.0/21) and GitHub webhooks (140.82.112.0/20) ranges

The triple filter ensures this bypass can only be abused by a client on the official network ranges + hitting the right path + on the right host. A random attacker can’t benefit from it.

Validation test

        
$ curl -X POST https://mcp-hugo.arleo.eu/webhook/test \
    -H "Content-Type: application/json" \
    -d '{"test": true}'

{"status": "ok", "received": true}

200ms response.

How I found the claude.ai ranges

Cloudflare doesn’t explicitly document claude.ai (Anthropic) outbound ranges. To identify them:

I captured source IPs in Cloudflare logs when a normal MCP call came from Claude.ai
All were in 160.79.104.0/21
Verified on RIPE/ARIN: this range belongs to Anthropic
I asked Anthropic via support — official confirmation they use this range for fetchers (web_fetch, MCP, etc.)

For GitHub webhooks: they publish their ranges in official docs (https://api.github.com/meta). Main range is 140.82.112.0/20.

Lessons learned

1. Bot Management = default enemy of webhooks

Cloudflare Bot Management is useful for 95% of routes (browser-facing). For the remaining 5% (webhooks, public APIs, MCP), you must explicitly create a scoped bypass.

Rule: any endpoint designed to receive automated traffic must be audited in Cloudflare Events within 24h of deployment.

2. Triple condition for bypasses

A Cloudflare bypass scoped on (host, path, source IP) is resilient. If a single criterion sufficed, it’d be an attack vector.

I’ve seen Cloudflare rules in prod that just do Action: Skip if path matches /webhook/. Mistake: anyone can now hit /webhook/anything from any IP with any UA.

3. Cloudflare logs don’t lie, but they’re not real-time

The Cloudflare Events dashboard has ~30s latency. If you curl then immediately open the dashboard, you might see nothing and incorrectly conclude the request wasn’t blocked by Cloudflare. Wait a minute, refresh.

4. UA `python-httpx` is suspect in 2026

Like python-requests, aiohttp, etc. Most WAFs flag these UAs as “automated” by default. If you build a custom client, two options:

Descriptive custom UA: User-Agent: arleo-monitor/1.0 (+https://arleo.eu/security.txt) — readable, traceable
IP range whitelist server-side — more robust if attacker can spoof UA

I ended up adopting both: descriptive UA and Cloudflare IP whitelist.

Conclusion

403 Forbidden on a legitimate webhook, cause: Bot Management. Fix: triple-condition Custom rule (host + path + source IP range). Total: ~1h debugging, of which 40min looking in the wrong places (nginx, mcp-oauth-proxy, systemd) before thinking of Cloudflare.

Final lesson: when an HTTP call returns a WAF code (403, 429), always start by looking at WAF events. The application server probably never even saw the request.