Normalizing nginx and CrowdSec Logs in BetterStack with Vector

โก In short
Two problems coexisted in BetterStack: mcp-oauth.access.log logs arrived as unreadable raw JSON, and CrowdSec logs produced visual duplicates. This work normalizes all logs so they display as structured clickable tags, with correct timestamps and without parasitic fields.
๐ง Why
BetterStack displays logs as highlighted clickable tags in the Live Tail when JSON fields are properly structured. Before this work, observation was degraded on two fronts:
mcp-oauth.access.loglogs arrived as unreadable raw JSON (custom format incompatible with Vector’s nginx parser) โ fieldsnginx.client,nginx.path,nginx.statuswere not extracted- CrowdSec and CF WAF logs arrived as plain text with duplicates (
Ban ban | ... | Ban ban)
The goal was to normalize all logs in BetterStack to display like standard nginx logs:
host:NUC8i3BEH platform:Nginx nginx.status:200 nginx.method:GET nginx.path:/ping nginx.client:91.98.38.26๐ง What was done
Problem 1 โ mcp-oauth.access.log (non-standard JSON format)
Diagnosis
The mcp-oauth.access.log file used a custom JSON format produced by nginx, incompatible with Vector’s regex parser. Two secondary issues identified:
- The
dtfield used Vector’s ingestion timestamp rather than the actual nginx request time (~5 second drift) - The
level: nullfield was systematically present, polluting the BetterStack severity histogram
Fix in Vector (/etc/vector/vector.yaml)
Added a JSON detection branch at the beginning of the existing remap transform, before the regex attempts:
if contains(string(.file) ?? "", "mcp-oauth") {
parsed, err = parse_json(.message)
if err == null && is_object(parsed) {
.nginx = {}
.nginx.client = parsed.real_ip
.nginx.method = parsed.method
.nginx.path = parsed.uri
.nginx.status = to_int(parsed.status) ?? null
.nginx.size = to_int(parsed.bytes_sent) ?? null
.nginx.agent = parsed.http_user_agent
.nginx.limit_req_status = parsed.limit_req_status
.nginx.request = join!([string(parsed.method) ?? "", " ", string(parsed.uri) ?? ""])
.platform = "Nginx"
# Use actual nginx request time, not Vector ingestion timestamp
parsed_time, err = parse_timestamp(parsed.time, "%+")
if err == null {
.dt = format_timestamp!(parsed_time, "%+")
} else {
.dt = del(.timestamp)
}
del(.timestamp)
del(.message)
del(.source_type)
return .
}
}Fix for level: null in the main transform:
.level = del(.nginx.severity)
if is_null(.level) { del(.level) }Result
host:NUC8i3BEH platform:Nginx nginx.status:200 nginx.method:POST
nginx.path:/oauth-mcp/mcp nginx.client:160.79.106.35
nginx.agent:Claude-User nginx.limit_req_status:PASSEDProblem 2 โ CrowdSec and CF WAF logs (plain text with duplicates)
Diagnosis
The crowdsec-poller.py and crowdsec-cf-sync.py scripts produced a redundant message field with visual duplicates in BetterStack:
Ban ban | 47.82.11.22 | http:scan | dur:3h59m35s | origin:CAPI
Alert banned | 77.74.177.114 | cloudflare-waf/4-hits | | cloudflare-waf/4-hits
CF WAF ban: 77.74.177.114 | 4 hits | 4hSolution โ Nested cs{} object
The final solution is a nested cs{} JSON object in the payloads. BetterStack natively navigates nested objects with dot notation in the Logs message format โ exactly like it handles the nginx{} object.
Final structure โ crowdsec-poller.py:
import socket
record = {
"dt": dt_str,
"host": socket.gethostname(),
"platform": "CrowdSec",
"cs": {
"event_type": "decision",
"ip": decision_ip,
"type": "ban",
"scenario": "http:scan",
"duration": "4h",
"origin": "CAPI",
"scope": "Ip",
"simulated": False
}
# no "message" field
}Final structure โ crowdsec-cf-sync.py โ send_to_betterstack():
import socket
payload = {
"dt": timestamp,
"host": socket.gethostname(),
"platform": "CFWaf",
"cs": {
"ip": ip,
"hits": hit_count,
"action": cf_action,
"duration": duration,
"recidive": recidive_count,
"uris": uris,
"source": "cloudflare_waf"
}
# no "message" field
}Fix in Vector
The crowdsec_decisions_flatten transform parses JSON, removes parasitic fields and keeps the cs{} object intact:
parsed, err = parse_json(.message)
if err != null { abort }
. = merge(., parsed) ?? .
del(.source_type)
del(.file)
del(.message)
# DO NOT delete .host โ provided by Python scriptsA crowdsec_decisions_filter filter transform blocks logs without an IP and overly verbose alerts:
exists(.cs) && (get(., ["cs", "ip"]) != Ok(null)) && (get(., ["cs", "event_type"]) != Ok("alert"))The sink points to this filter:
crowdsec_betterstack_sink:
inputs:
- "crowdsec_decisions_filter"BetterStack Configuration โ Logs message format
In Sources โ crowdsec-decisions โ Advanced settings โ Logs message format:
{host} {platform} {cs.type} {cs.ip} {cs.scenario} {cs.duration} {cs.origin} {cs.action} {cs.hits} {cs.uris} {cs.source}Note: BetterStack supports dot notation
{cs.ip}to navigate nested objects. For fields whose name literally contains a dot, use the{["field.name"]}syntax.
Final result in Live Tail
CrowdSec CAPI decision:
host:NUC8i3BEH platform:CrowdSec cs.type:ban cs.ip:47.82.11.22
cs.scenario:http:scan cs.duration:3h cs.origin:CAPICF WAF ban:
host:NUC8i3BEH platform:CFWaf cs.ip:4.197.75.18 cs.action:block
cs.duration:24h cs.hits:50 cs.uris:["/wp-admin.php","/shell.php"]Final Vector pipeline architecture
/var/log/nginx/*.log
โ
better_stack_nginx_parser (remap)
โโโ mcp-oauth branch โ nginx{} object + actual nginx timestamp
โโโ standard branch โ classic nginx regexes
โ
better_stack_nginx_filter (filter)
โ
BetterStack nginx-status (s2315113)
/var/log/crowdsec/decisions.log
โ
crowdsec_decisions_source (file)
โ
crowdsec_decisions_flatten (remap) โ parse JSON, del(.message)
โ
crowdsec_decisions_filter (filter) โ exists(.cs) && ip present && not alert
โ
BetterStack crowdsec-decisions (s2328224)Modified files
| File | Changes |
|---|---|
/etc/vector/vector.yaml | mcp-oauth branch, level null fix, CrowdSec transform, CrowdSec filter |
/usr/local/bin/crowdsec-poller.py | Nested cs{} object, added host, removed message |
/usr/local/bin/crowdsec-cf-sync.py | Nested cs{} object, added host, removed message, platform: CFWaf |
Useful commands
# Validate Vector config without restarting
sudo vector validate /etc/vector/vector.yaml
# Restart Vector
sudo systemctl restart vector
# Vector logs in real time (without host_metrics noise)
sudo journalctl -fu vector | grep -v "host_metrics"
# Inject a test CrowdSec log
echo '{"dt":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","host":"NUC8i3BEH","platform":"CrowdSec","cs":{"event_type":"decision","ip":"1.2.3.4","type":"ban","scenario":"http:scan","duration":"4h","origin":"CAPI"}}' | sudo tee -a /var/log/crowdsec/decisions.log
# Inject a test CF WAF log
echo '{"dt":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","host":"NUC8i3BEH","platform":"CFWaf","cs":{"ip":"1.2.3.4","hits":5,"action":"block","duration":"4h","source":"cloudflare_waf","uris":["/shell.php"]}}' | sudo tee -a /var/log/crowdsec/decisions.log๐ Conclusion
This normalization transforms unreadable logs into structured, actionable data in BetterStack. Clickable tags in the Live Tail enable instant filtering by IP, scenario or action โ which was impossible with raw JSON blobs or duplicated text messages. Fixing the nginx timestamp eliminates the ~5 second drift that corrupted chronological ordering.
To go further:
- ๐ก Add BetterStack alerts on recurring
cs.ippatterns to detect coordinated attack campaigns - ๐ก Create a BetterStack dashboard with breakdown by
cs.originto visualize the CAPI/cscli/nginx-bouncer ratio