Hugo SEO: Googlebot 404s, noindex aliases and sitemap normalization

Context
Google Search Console was reporting four categories of issues on arleo.eu:
- Googlebot 404s:
/fr/tag/cloudflare,/en/tag/nginx,/fr/tag/javascript… URLs with/fr/prefix or singular/tag/never served by nginx - 16 “Excluded by noindex tag” pages: all redirect pages generated by
aliases:in Hugo frontmatter - Robots tag:
noodphardcoded in the LoveIt theme - FR/EN sitemap: 104 vs 105 URLs — a duplicate FR tag and two missing tags
Act 1: Hugo aliases → nginx 301 redirects
Why Hugo generates noindex pages
Hugo generates aliases: frontmatter entries as static HTML files:
<meta http-equiv="refresh" content="0; url=https://www.arleo.eu/posts/X/">
<meta name="robots" content="noindex">The noindex is intentional from Hugo’s perspective: it prevents Google from indexing the redirect page instead of the destination. But GSC surfaces it as “Excluded by noindex tag” — polluting the report and signaling to Google that URLs on your site are intentionally hidden.
Fix: permanent nginx redirects + remove aliases
59 rewrite ... permanent rules in the NUC vhost:
# === SEO Redirects ===
# 1. Old Grav slugs and Hugo aliases
rewrite ^/crowdsec-cloudflare-waf-autoban/?$ /posts/crowdsec-cloudflare-waf-autoban/ permanent;
rewrite ^/en/crowdsec-cloudflare-waf-autoban/?$ /en/posts/crowdsec-cloudflare-waf-autoban/ permanent;
rewrite ^/fr/crowdsec-cloudflare-waf-autoban/?$ /posts/crowdsec-cloudflare-waf-autoban/ permanent;
# ... (56 more specific rules)
# 2. /fr/ prefix — default language, no subdirectory
rewrite ^/fr/tag/([^/]+)/?$ /tags/$1/ permanent;
rewrite ^/fr/(.+)$ /$1 permanent;
# 3. Singular /tag/ → plural /tags/ (Hugo taxonomy)
rewrite ^/tag/([^/]+)/?$ /tags/$1/ permanent;
rewrite ^/en/tag/([^/]+)/?$ /en/tags/$1/ permanent;
# 4. FR tags in EN context
rewrite ^/en/tags/reseau/?$ /en/tags/network/ permanent;
rewrite ^/en/tags/securite/?$ /en/tags/security/ permanent;Removed aliases: blocks from all 35 affected Markdown files. Hugo rebuilt with 68 aliases (only Hugo’s automatic cross-language aliases remain) instead of 208.
Result: redirects are pure nginx 301s, invisible to GSC as noindex pages.
Act 2: noodp robots tag → index, follow
The LoveIt theme injects into baseof.html:
<meta name="robots" content="noodp" />noodp was intended to prevent Google from using the Open Directory Project (DMOZ) title in snippets. DMOZ closed in 2017 — the directive has had no effect for years, but some crawlers still interpret it as an indexing restriction.
Site-level override in layouts/baseof.html:
<meta name="robots" content="index, follow" />This file takes priority over the theme template without modifying the theme itself.
Act 3: FR tag normalization and sitemap parity
Diagnosis
sed 's|https://www.arleo.eu||' sitemap_fr.txt | sort > fr_paths.txt
sed 's|https://www.arleo.eu/en||' sitemap_en.txt | sort > en_paths.txt
comm -13 fr_paths.txt en_paths.txt # In EN, not in FR
# → /tags/network/ /tags/security/ /tags/performance/ /tags/purge/ /categories/security/
comm -23 fr_paths.txt en_paths.txt # In FR, not in EN
# → /tags/reseau/ /tags/securite/ /tags/sécurité/ /categories/sécurité/FR had:
/tags/securite/AND/tags/sécurité/— duplicate unaccented vs accented/tags/reseau/— unaccented, noréseaucounterpart
EN had in addition: /tags/performance/ and /tags/purge/ without FR equivalents.
Fixes
# 5 FR posts: securite → sécurité
for post in audit-securite-modsecurity-crowdsec csp-nonce hardening-systemd-mcp \
roadmap-sprint-securite-mcp sprint-securite-mcp-livre; do
sed -i 's/^- securite$/- sécurité/' content/posts/$post/index.fr.md
done
# 2 FR posts: reseau → réseau
for post in crowdsec-cloudflare-waf-autoban post-mortem-522-wan-failover; do
sed -i 's/^- reseau$/- réseau/' content/posts/$post/index.fr.md
doneAdded purge and performance to hugo-mcp-plugin-cloudflare/index.fr.md.
Cleaned up stale directories from public/ — Hugo does not delete old files during incremental rebuilds:
rm -rf public/tags/reseau public/tags/securiteResult: FR 105 = EN 105.
Act 4: /fr/sitemap.xml redirect loop
The general rule rewrite ^/fr/(.+)$ /$1 permanent was intercepting /fr/sitemap.xml — an actual file generated by Hugo — and redirecting it to /sitemap.xml (the sitemap index). The index lists… /fr/sitemap.xml. Infinite loop for crawlers.
Detection:
curl -sk https://www.arleo.eu/fr/sitemap.xml -o /dev/null -w '%{http_code} → %{redirect_url}\n'
# 301 → https://www.arleo.eu/sitemap.xmlFix: break exception before the general rule:
rewrite ^/fr/(sitemap\.xml)$ /fr/$1 break;
rewrite ^/fr/tag/([^/]+)/?$ /tags/$1/ permanent;
rewrite ^/fr/(.+)$ /$1 permanent;break stops rewrite processing. The request then falls through to the location ~* sitemap\.xml$ block which proxies to the Hugo VM.
nginx gotcha: rewrites in server context execute before location blocks. A break without URI change therefore lets the request continue to the correct location.
Summary
| Issue | Cause | Fix |
|---|---|---|
404 /fr/tag/* | Singular + unhandled prefix | rewrite ^/fr/tag/ + ^/tag/ |
| 16 GSC noindex pages | Hugo aliases: generates noindex HTML | nginx 301 + remove aliases |
robots: noodp | LoveIt baseof.html hardcoded | Override layouts/baseof.html |
| FR sitemap 104 vs EN 105 | Duplicate + missing tags | sed batch + add tags |
/fr/sitemap.xml loop | ^/fr/(.+) too greedy | break exception before rule |