CrowdSec Log Pipeline with Vector: Filtering Noise and Capturing Real Bans

2026-04-15 823 words 4 minutes

/images/crowdsec-vector-pipeline-featured.jpg

Contents

⚡ In short

The initial Vector pipeline was flooding BetterStack with ~500 events/24h, of which 434 were CAPI pulls with no local monitoring value. This work reconfigures the Vector filter to keep only high-value bans (cscli) and fixes a major blind spot: actual nginx-lua bouncer bans were not appearing anywhere in BetterStack.

🧠 Why

This homelab’s security stack relies on three components working together:

nginx with the CrowdSec lua bouncer (lua-resty-crowdsec) for real-time request blocking
CrowdSec for threat detection and ban decision management
Vector centralizing logs to BetterStack for monitoring

After setting up the initial pipeline, two problems quickly became apparent. First, the signal was drowned in noise: out of 500 events/24h, 434 came from the hourly community CAPI pull and 66 from third-party lists — neither represents a threat detected on this infrastructure. Second, actual lua bouncer bans (real-time blocks in nginx) were not appearing anywhere in BetterStack, creating a blind spot on real security activity.

🔧 What was done

Problem 1: CAPI and third-party list noise

Over a 24-hour period, the distribution of CrowdSec events in BetterStack was:

Origin	Count	Nature
`CAPI`	434	Community pull every hour
`lists`	66	Third-party lists (firehol_greensnow, otx-webscanners…)
`cscli`	0	Local manual bans — never seen

CAPI and lists events arrive in bursts every hour on the dot (at :09 each hour), corresponding to the community list sync cycle. Solution: modify the Vector filter to keep only origin == "cscli":

        
        
        
    
# In vector.yaml
crowdsec_decisions_filter:
  type: "filter"
  inputs:
    - "crowdsec_decisions_flatten"
  condition: |
    exists(.cs) && .cs.origin == "cscli"

Problem 2: Effective lua bouncer bans invisible in BetterStack

The nginx lua bouncer blocks IPs in real time, but these actual blocks were not appearing anywhere in BetterStack. Yet they are logged by nginx in /var/log/nginx/error.log:

2026/04/15 03:36:37 [alert] 67913#67913: *3949 [lua] crowdsec.lua:783: Allow(): \
  [Crowdsec] denied '43.130.106.18' with 'ban' (by bouncer), \
  client: 43.130.106.18, server: www.arleo.eu, \
  request: "GET / HTTP/2.0", host: "www.arleo.eu"

The issue came from the existing nginx filter in Vector, which silently dropped any message containing the word crowdsec. Since error.log is already included in the nginx source, there is no need to create a new source — the solution is to insert a transform before the filter to tag and reroute these events.

New pipeline architecture

     ┌────────────────────────┐
     │    nginx error.log      │
     └──────────┬──────────────┘
                │
                ▼
     ┌─────────────────────────────────┐
     │   better_stack_nginx_parser      │
     │   (parses all nginx logs)        │
     └──────────┬──────────────────────┘
                │
                ▼
     ┌─────────────────────────────────┐
     │  crowdsec_nginx_ban_extractor    │
     │  (detects [Crowdsec] denied)     │
     │  tags cs_nginx_ban = true/false  │
     └────┬─────────────────┬───────────┘
          │                 │
   cs_nginx_ban==true  cs_nginx_ban==false
          │                 │
          ▼                 ▼
  crowdsec_nginx_   better_stack_
  ban_filter        nginx_filter
          │                 │
          ▼                 ▼
  CrowdSec          nginx
  BetterStack sink  BetterStack sink

The extractor transform

        
        
        
    
crowdsec_nginx_ban_extractor:
  type: "remap"
  inputs:
    - "better_stack_nginx_parser_XXXXX"
  source: |
    msg = string(.message) ?? ""
    if contains(msg, "[Crowdsec] denied") && contains(msg, "with 'ban'") {
      m = parse_regex(msg, r'\[Crowdsec\] denied \'(?P<banned_ip>[^\']+)\' with \'ban\'') ?? {}
      ip   = string(m.banned_ip) ?? "?"
      req  = string(.nginx.request) ?? "-"
      host = string(.nginx.host) ?? string(.nginx.server) ?? "-"
      .cs_nginx_ban = true
      .cs_banned_ip = ip
      .cs_origin    = "nginx-bouncer"
      .platform     = "CrowdSec"
      .message      = "Ban " + ip + " | " + req + " | " + host
      del(.file)
      del(.level)
      del(.nginx.cid)
      del(.nginx.pid)
      del(.nginx.tid)
    } else {
      .cs_nginx_ban = false
    }

The .message field is built to be immediately readable in the BetterStack tail:

Ban 43.130.106.18 | GET / HTTP/2.0 | www.arleo.eu

Modifying the existing nginx filter

Add the exclusion condition for already-rerouted bans:

        
        
        
    
better_stack_nginx_filter_XXXXX:
  type: "filter"
  inputs:
    - "crowdsec_nginx_ban_extractor"   # ← now points to the new transform
  condition: |
    !contains(string(.message) ?? "", "crowdsec") &&
    !contains(string(.message) ?? "", "Initialisation done") &&
    !contains(string(.message) ?? "", "APPSEC is enabled") &&
    !((.nginx.status == 499) && contains(string(.nginx.path) ?? "", "empty.php")) &&
    !contains(string(.message) ?? "", "lua tcp socket read timed out") &&
    !(.cs_nginx_ban == true)           # ← exclude rerouted bans

Unified CrowdSec sink

Both flows (cscli bans and lua bans) converge into the same sink:

        
crowdsec_betterstack_sink:
  type: "http"
  inputs:
    - "crowdsec_decisions_filter"   # cscli bans
    - "crowdsec_nginx_ban_filter"   # nginx lua bans

Bonus: A Cloudflare Token Blocked by Its Own Server

Alongside the Vector work, the crowdsec-cf-sync.py script had been silently failing for several days with HTTP 401 Authentication error. The cause: the Cloudflare token had an IP restriction (not_in) that explicitly included the server’s own WAN IP. Every API request sent from the NUC was rejected by Cloudflare.

Fix: remove the server’s IP from the token’s not_in list via the Cloudflare API. The script immediately resumed normal synchronization (13 active bans re-synchronized).

🏁 Conclusion

The signal-to-noise ratio went from ~500 events/24h — mostly irrelevant — down to only the events that deserve attention: cscli bans (manual decisions or local scenarios) and nginx-bouncer bans (real-time effective blocks with IP, request and vhost). The pipeline now provides an accurate view of the server’s real security activity.

To go further:

💡 Add a nginx-bouncer ban counter in a BetterStack dashboard to visualize blocking spikes in real time
💡 Extend the cscli filter to also include bans from custom local scenarios (origin == "crowdsec" with IP scope)