Contents

Normalizing nginx and CrowdSec Logs in BetterStack with Vector

โšก In short

Two problems coexisted in BetterStack: mcp-oauth.access.log logs arrived as unreadable raw JSON, and CrowdSec logs produced visual duplicates. This work normalizes all logs so they display as structured clickable tags, with correct timestamps and without parasitic fields.

๐Ÿง  Why

BetterStack displays logs as highlighted clickable tags in the Live Tail when JSON fields are properly structured. Before this work, observation was degraded on two fronts:

  • mcp-oauth.access.log logs arrived as unreadable raw JSON (custom format incompatible with Vector’s nginx parser) โ€” fields nginx.client, nginx.path, nginx.status were not extracted
  • CrowdSec and CF WAF logs arrived as plain text with duplicates (Ban ban | ... | Ban ban)

The goal was to normalize all logs in BetterStack to display like standard nginx logs:

host:NUC8i3BEH  platform:Nginx  nginx.status:200  nginx.method:GET  nginx.path:/ping  nginx.client:91.98.38.26

๐Ÿ”ง What was done

Problem 1 โ€” mcp-oauth.access.log (non-standard JSON format)

Diagnosis

The mcp-oauth.access.log file used a custom JSON format produced by nginx, incompatible with Vector’s regex parser. Two secondary issues identified:

  • The dt field used Vector’s ingestion timestamp rather than the actual nginx request time (~5 second drift)
  • The level: null field was systematically present, polluting the BetterStack severity histogram

Fix in Vector (/etc/vector/vector.yaml)

Added a JSON detection branch at the beginning of the existing remap transform, before the regex attempts:

if contains(string(.file) ?? "", "mcp-oauth") {
  parsed, err = parse_json(.message)
  if err == null && is_object(parsed) {
    .nginx = {}
    .nginx.client           = parsed.real_ip
    .nginx.method           = parsed.method
    .nginx.path             = parsed.uri
    .nginx.status           = to_int(parsed.status) ?? null
    .nginx.size             = to_int(parsed.bytes_sent) ?? null
    .nginx.agent            = parsed.http_user_agent
    .nginx.limit_req_status = parsed.limit_req_status
    .nginx.request          = join!([string(parsed.method) ?? "", " ", string(parsed.uri) ?? ""])
    .platform               = "Nginx"

    # Use actual nginx request time, not Vector ingestion timestamp
    parsed_time, err = parse_timestamp(parsed.time, "%+")
    if err == null {
      .dt = format_timestamp!(parsed_time, "%+")
    } else {
      .dt = del(.timestamp)
    }
    del(.timestamp)
    del(.message)
    del(.source_type)
    return .
  }
}

Fix for level: null in the main transform:

.level = del(.nginx.severity)
if is_null(.level) { del(.level) }

Result

host:NUC8i3BEH  platform:Nginx  nginx.status:200  nginx.method:POST
nginx.path:/oauth-mcp/mcp  nginx.client:160.79.106.35
nginx.agent:Claude-User  nginx.limit_req_status:PASSED

Problem 2 โ€” CrowdSec and CF WAF logs (plain text with duplicates)

Diagnosis

The crowdsec-poller.py and crowdsec-cf-sync.py scripts produced a redundant message field with visual duplicates in BetterStack:

Ban ban | 47.82.11.22 | http:scan | dur:3h59m35s | origin:CAPI
Alert banned | 77.74.177.114 | cloudflare-waf/4-hits |  | cloudflare-waf/4-hits
CF WAF ban: 77.74.177.114 | 4 hits | 4h

Solution โ€” Nested cs{} object

The final solution is a nested cs{} JSON object in the payloads. BetterStack natively navigates nested objects with dot notation in the Logs message format โ€” exactly like it handles the nginx{} object.

Final structure โ€” crowdsec-poller.py:

import socket

record = {
    "dt": dt_str,
    "host": socket.gethostname(),
    "platform": "CrowdSec",
    "cs": {
        "event_type": "decision",
        "ip": decision_ip,
        "type": "ban",
        "scenario": "http:scan",
        "duration": "4h",
        "origin": "CAPI",
        "scope": "Ip",
        "simulated": False
    }
    # no "message" field
}

Final structure โ€” crowdsec-cf-sync.py โ†’ send_to_betterstack():

import socket

payload = {
    "dt": timestamp,
    "host": socket.gethostname(),
    "platform": "CFWaf",
    "cs": {
        "ip": ip,
        "hits": hit_count,
        "action": cf_action,
        "duration": duration,
        "recidive": recidive_count,
        "uris": uris,
        "source": "cloudflare_waf"
    }
    # no "message" field
}

Fix in Vector

The crowdsec_decisions_flatten transform parses JSON, removes parasitic fields and keeps the cs{} object intact:

parsed, err = parse_json(.message)
if err != null { abort }
. = merge(., parsed) ?? .
del(.source_type)
del(.file)
del(.message)
# DO NOT delete .host โ€” provided by Python scripts

A crowdsec_decisions_filter filter transform blocks logs without an IP and overly verbose alerts:

exists(.cs) && (get(., ["cs", "ip"]) != Ok(null)) && (get(., ["cs", "event_type"]) != Ok("alert"))

The sink points to this filter:

crowdsec_betterstack_sink:
  inputs:
    - "crowdsec_decisions_filter"

BetterStack Configuration โ€” Logs message format

In Sources โ†’ crowdsec-decisions โ†’ Advanced settings โ†’ Logs message format:

{host} {platform} {cs.type} {cs.ip} {cs.scenario} {cs.duration} {cs.origin} {cs.action} {cs.hits} {cs.uris} {cs.source}

Note: BetterStack supports dot notation {cs.ip} to navigate nested objects. For fields whose name literally contains a dot, use the {["field.name"]} syntax.

Final result in Live Tail

CrowdSec CAPI decision:

host:NUC8i3BEH  platform:CrowdSec  cs.type:ban  cs.ip:47.82.11.22
cs.scenario:http:scan  cs.duration:3h  cs.origin:CAPI

CF WAF ban:

host:NUC8i3BEH  platform:CFWaf  cs.ip:4.197.75.18  cs.action:block
cs.duration:24h  cs.hits:50  cs.uris:["/wp-admin.php","/shell.php"]

Final Vector pipeline architecture

/var/log/nginx/*.log
        โ†“
better_stack_nginx_parser (remap)
  โ”œโ”€โ”€ mcp-oauth branch โ†’ nginx{} object + actual nginx timestamp
  โ””โ”€โ”€ standard branch  โ†’ classic nginx regexes
        โ†“
better_stack_nginx_filter (filter)
        โ†“
BetterStack nginx-status (s2315113)

/var/log/crowdsec/decisions.log
        โ†“
crowdsec_decisions_source (file)
        โ†“
crowdsec_decisions_flatten (remap) โ€” parse JSON, del(.message)
        โ†“
crowdsec_decisions_filter (filter) โ€” exists(.cs) && ip present && not alert
        โ†“
BetterStack crowdsec-decisions (s2328224)

Modified files

FileChanges
/etc/vector/vector.yamlmcp-oauth branch, level null fix, CrowdSec transform, CrowdSec filter
/usr/local/bin/crowdsec-poller.pyNested cs{} object, added host, removed message
/usr/local/bin/crowdsec-cf-sync.pyNested cs{} object, added host, removed message, platform: CFWaf

Useful commands

# Validate Vector config without restarting
sudo vector validate /etc/vector/vector.yaml

# Restart Vector
sudo systemctl restart vector

# Vector logs in real time (without host_metrics noise)
sudo journalctl -fu vector | grep -v "host_metrics"

# Inject a test CrowdSec log
echo '{"dt":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","host":"NUC8i3BEH","platform":"CrowdSec","cs":{"event_type":"decision","ip":"1.2.3.4","type":"ban","scenario":"http:scan","duration":"4h","origin":"CAPI"}}' | sudo tee -a /var/log/crowdsec/decisions.log

# Inject a test CF WAF log
echo '{"dt":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","host":"NUC8i3BEH","platform":"CFWaf","cs":{"ip":"1.2.3.4","hits":5,"action":"block","duration":"4h","source":"cloudflare_waf","uris":["/shell.php"]}}' | sudo tee -a /var/log/crowdsec/decisions.log

๐Ÿ Conclusion

This normalization transforms unreadable logs into structured, actionable data in BetterStack. Clickable tags in the Live Tail enable instant filtering by IP, scenario or action โ€” which was impossible with raw JSON blobs or duplicated text messages. Fixing the nginx timestamp eliminates the ~5 second drift that corrupted chronological ordering.

To go further:

  • ๐Ÿ’ก Add BetterStack alerts on recurring cs.ip patterns to detect coordinated attack campaigns
  • ๐Ÿ’ก Create a BetterStack dashboard with breakdown by cs.origin to visualize the CAPI/cscli/nginx-bouncer ratio