Hugo SEO: noindex on taxonomies to fix Bing's "too many thin pages" warning

Context
Bing Webmaster Tools was raising a “There are too many pages with insufficient content” recommendation (severity: moderate) — 15 pages flagged, all of the same type:
https://www.arleo.eu/en/tags/svg/https://www.arleo.eu/en/tags/sonarr/https://www.arleo.eu/en/categories/incidents/- etc.
These are Hugo taxonomy pages — tag and category listing pages. They contain no article content of their own, just a list of links. From Bing’s perspective, that’s thin content.
Chosen strategy
Three options were available:
- Block via
robots.txt→ discouraged: prevents crawling, hides internal links - Enrich taxonomy page content → too costly for a technical blog
noindex,follow→ the right answer: tell robots not to index the page, but still follow links to articles
Option 3 is the correct call. The follow preserves internal linking — robots continue discovering articles through these pages.
Implementation
Hugo concept: .Kind
Hugo classifies every page by its kind:
| Kind | Example URL |
|---|---|
home | / |
page | /posts/my-article/ |
section | /posts/ |
taxonomy | /tags/, /categories/ |
term | /tags/hugo/, /categories/homelab/ |
The pages to handle are .Kind == "taxonomy" and .Kind == "term".
Change: layouts/baseof.html
The site uses a custom baseof.html that already overrides the LoveIt theme. It’s the natural place to centralise robots logic.
Before:
<meta name="robots" content="index, follow" />After:
{{- if .Params.robots -}}
<meta name="robots" content="{{ .Params.robots }}" />
{{- else if or (eq .Kind "taxonomy") (eq .Kind "term") -}}
<meta name="robots" content="noindex,follow" />
{{- else -}}
<meta name="robots" content="index, follow" />
{{- end -}}A three-level cascade:
- Front matter
robots:— absolute priority for future one-off exceptions - Taxonomies — automatic
noindex,follow - Everything else —
index, follow
One file changed, no duplication, no taxonomy layout overrides needed.
Removing taxonomies from the sitemap
Having noindex pages in a sitemap is inconsistent — you’re telling robots not to index a page while explicitly pointing them to it. Best practice is to remove them.
The LoveIt sitemap includes all pages. A minimal override in layouts/sitemap.xml:
{{- $excludedKinds := slice "taxonomy" "term" -}}
{{- range (where .Data.Pages "Section" "!=" "gallery") -}}
{{- if not (in $excludedKinds .Kind) -}}
<url>
<loc>{{- .Permalink -}}</loc>
...
</url>
{{- end -}}
{{- end -}}Result: the EN sitemap drops from 117 to 40 URLs (77 taxonomy entries removed).
Verification
After rebuild:
# Taxonomy pages → noindex,follow
curl -s https://www.arleo.eu/tags/ | grep -o 'name=robots[^>]*>'
# → name=robots content="noindex,follow">
curl -s https://www.arleo.eu/en/tags/sonarr/ | grep -o 'name=robots[^>]*>'
# → name=robots content="noindex,follow">
# Articles → index, follow
curl -s https://www.arleo.eu/en/posts/debug-seo-404-broken-links/ | grep -o 'name=robots[^>]*>'
# → name=robots content="index, follow">
# Clean sitemap
curl -s https://www.arleo.eu/en/sitemap.xml | grep -o '<loc>[^<]*</loc>' | grep 'tags\|categories' | wc -l
# → 0Modified files
| File | Role |
|---|---|
layouts/baseof.html | Conditional robots logic |
layouts/sitemap.xml | Sitemap override — excludes taxonomy/term |
robots.txt was not touched. Taxonomy URLs remain accessible and crawlable — only their indexing is disabled.
On the Bing side
The fix is immediate on the technical side. Bing will take a few days to re-crawl the pages and update the recommendation. To speed things up: in Bing Webmaster Tools → Recommendations → manually validate or submit a sitemap recrawl request.