Contents

Hugo SEO: Googlebot 404s, noindex aliases and sitemap normalization

Context

Google Search Console was reporting four categories of issues on arleo.eu:

  • Googlebot 404s: /fr/tag/cloudflare, /en/tag/nginx, /fr/tag/javascript… URLs with /fr/ prefix or singular /tag/ never served by nginx
  • 16 “Excluded by noindex tag” pages: all redirect pages generated by aliases: in Hugo frontmatter
  • Robots tag: noodp hardcoded in the LoveIt theme
  • FR/EN sitemap: 104 vs 105 URLs — a duplicate FR tag and two missing tags

Act 1: Hugo aliases → nginx 301 redirects

Why Hugo generates noindex pages

Hugo generates aliases: frontmatter entries as static HTML files:

<meta http-equiv="refresh" content="0; url=https://www.arleo.eu/posts/X/">
<meta name="robots" content="noindex">

The noindex is intentional from Hugo’s perspective: it prevents Google from indexing the redirect page instead of the destination. But GSC surfaces it as “Excluded by noindex tag” — polluting the report and signaling to Google that URLs on your site are intentionally hidden.

Fix: permanent nginx redirects + remove aliases

59 rewrite ... permanent rules in the NUC vhost:

# === SEO Redirects ===
# 1. Old Grav slugs and Hugo aliases
rewrite ^/crowdsec-cloudflare-waf-autoban/?$ /posts/crowdsec-cloudflare-waf-autoban/ permanent;
rewrite ^/en/crowdsec-cloudflare-waf-autoban/?$ /en/posts/crowdsec-cloudflare-waf-autoban/ permanent;
rewrite ^/fr/crowdsec-cloudflare-waf-autoban/?$ /posts/crowdsec-cloudflare-waf-autoban/ permanent;
# ... (56 more specific rules)

# 2. /fr/ prefix — default language, no subdirectory
rewrite ^/fr/tag/([^/]+)/?$  /tags/$1/  permanent;
rewrite ^/fr/(.+)$           /$1        permanent;

# 3. Singular /tag/ → plural /tags/ (Hugo taxonomy)
rewrite ^/tag/([^/]+)/?$     /tags/$1/  permanent;
rewrite ^/en/tag/([^/]+)/?$  /en/tags/$1/  permanent;

# 4. FR tags in EN context
rewrite ^/en/tags/reseau/?$   /en/tags/network/   permanent;
rewrite ^/en/tags/securite/?$ /en/tags/security/  permanent;

Removed aliases: blocks from all 35 affected Markdown files. Hugo rebuilt with 68 aliases (only Hugo’s automatic cross-language aliases remain) instead of 208.

Result: redirects are pure nginx 301s, invisible to GSC as noindex pages.


Act 2: noodp robots tag → index, follow

The LoveIt theme injects into baseof.html:

<meta name="robots" content="noodp" />

noodp was intended to prevent Google from using the Open Directory Project (DMOZ) title in snippets. DMOZ closed in 2017 — the directive has had no effect for years, but some crawlers still interpret it as an indexing restriction.

Site-level override in layouts/baseof.html:

<meta name="robots" content="index, follow" />

This file takes priority over the theme template without modifying the theme itself.


Act 3: FR tag normalization and sitemap parity

Diagnosis

sed 's|https://www.arleo.eu||' sitemap_fr.txt | sort > fr_paths.txt
sed 's|https://www.arleo.eu/en||' sitemap_en.txt | sort > en_paths.txt

comm -13 fr_paths.txt en_paths.txt   # In EN, not in FR
# → /tags/network/  /tags/security/  /tags/performance/  /tags/purge/  /categories/security/

comm -23 fr_paths.txt en_paths.txt   # In FR, not in EN
# → /tags/reseau/  /tags/securite/  /tags/sécurité/  /categories/sécurité/

FR had:

  • /tags/securite/ AND /tags/sécurité/ — duplicate unaccented vs accented
  • /tags/reseau/ — unaccented, no réseau counterpart

EN had in addition: /tags/performance/ and /tags/purge/ without FR equivalents.

Fixes

# 5 FR posts: securite → sécurité
for post in audit-securite-modsecurity-crowdsec csp-nonce hardening-systemd-mcp \
            roadmap-sprint-securite-mcp sprint-securite-mcp-livre; do
  sed -i 's/^- securite$/- sécurité/' content/posts/$post/index.fr.md
done

# 2 FR posts: reseau → réseau
for post in crowdsec-cloudflare-waf-autoban post-mortem-522-wan-failover; do
  sed -i 's/^- reseau$/- réseau/' content/posts/$post/index.fr.md
done

Added purge and performance to hugo-mcp-plugin-cloudflare/index.fr.md.

Cleaned up stale directories from public/ — Hugo does not delete old files during incremental rebuilds:

rm -rf public/tags/reseau public/tags/securite

Result: FR 105 = EN 105.


Act 4: /fr/sitemap.xml redirect loop

The general rule rewrite ^/fr/(.+)$ /$1 permanent was intercepting /fr/sitemap.xml — an actual file generated by Hugo — and redirecting it to /sitemap.xml (the sitemap index). The index lists… /fr/sitemap.xml. Infinite loop for crawlers.

Detection:

curl -sk https://www.arleo.eu/fr/sitemap.xml -o /dev/null -w '%{http_code} → %{redirect_url}\n'
# 301 → https://www.arleo.eu/sitemap.xml

Fix: break exception before the general rule:

rewrite ^/fr/(sitemap\.xml)$  /fr/$1  break;
rewrite ^/fr/tag/([^/]+)/?$   /tags/$1/ permanent;
rewrite ^/fr/(.+)$            /$1       permanent;

break stops rewrite processing. The request then falls through to the location ~* sitemap\.xml$ block which proxies to the Hugo VM.

nginx gotcha: rewrites in server context execute before location blocks. A break without URI change therefore lets the request continue to the correct location.


Summary

Diagram Diagram
IssueCauseFix
404 /fr/tag/*Singular + unhandled prefixrewrite ^/fr/tag/ + ^/tag/
16 GSC noindex pagesHugo aliases: generates noindex HTMLnginx 301 + remove aliases
robots: noodpLoveIt baseof.html hardcodedOverride layouts/baseof.html
FR sitemap 104 vs EN 105Duplicate + missing tagssed batch + add tags
/fr/sitemap.xml loop^/fr/(.+) too greedybreak exception before rule