Contents

Hugo SEO: noindex on taxonomies to fix Bing's "too many thin pages" warning

Context

Bing Webmaster Tools was raising a “There are too many pages with insufficient content” recommendation (severity: moderate) — 15 pages flagged, all of the same type:

  • https://www.arleo.eu/en/tags/svg/
  • https://www.arleo.eu/en/tags/sonarr/
  • https://www.arleo.eu/en/categories/incidents/
  • etc.

These are Hugo taxonomy pages — tag and category listing pages. They contain no article content of their own, just a list of links. From Bing’s perspective, that’s thin content.


Chosen strategy

Three options were available:

  1. Block via robots.txt → discouraged: prevents crawling, hides internal links
  2. Enrich taxonomy page content → too costly for a technical blog
  3. noindex,follow → the right answer: tell robots not to index the page, but still follow links to articles

Option 3 is the correct call. The follow preserves internal linking — robots continue discovering articles through these pages.

flowchart TD A[Hugo page request] --> B{.Params.robots set?} B -- yes --> C[Use front matter value] B -- no --> D{.Kind = taxonomy or term?} D -- yes --> E[noindex,follow] D -- no --> F[index, follow] C --> G[meta name=robots] E --> G F --> G

Implementation

Hugo concept: .Kind

Hugo classifies every page by its kind:

KindExample URL
home/
page/posts/my-article/
section/posts/
taxonomy/tags/, /categories/
term/tags/hugo/, /categories/homelab/

The pages to handle are .Kind == "taxonomy" and .Kind == "term".

Change: layouts/baseof.html

The site uses a custom baseof.html that already overrides the LoveIt theme. It’s the natural place to centralise robots logic.

Before:

<meta name="robots" content="index, follow" />

After:

{{- if .Params.robots -}}
<meta name="robots" content="{{ .Params.robots }}" />
{{- else if or (eq .Kind "taxonomy") (eq .Kind "term") -}}
<meta name="robots" content="noindex,follow" />
{{- else -}}
<meta name="robots" content="index, follow" />
{{- end -}}

A three-level cascade:

  1. Front matter robots: — absolute priority for future one-off exceptions
  2. Taxonomies — automatic noindex,follow
  3. Everything elseindex, follow

One file changed, no duplication, no taxonomy layout overrides needed.

Removing taxonomies from the sitemap

Having noindex pages in a sitemap is inconsistent — you’re telling robots not to index a page while explicitly pointing them to it. Best practice is to remove them.

The LoveIt sitemap includes all pages. A minimal override in layouts/sitemap.xml:

{{- $excludedKinds := slice "taxonomy" "term" -}}
{{- range (where .Data.Pages "Section" "!=" "gallery") -}}
    {{- if not (in $excludedKinds .Kind) -}}
    <url>
        <loc>{{- .Permalink -}}</loc>
        ...
    </url>
    {{- end -}}
{{- end -}}

Result: the EN sitemap drops from 117 to 40 URLs (77 taxonomy entries removed).


Verification

After rebuild:

# Taxonomy pages → noindex,follow
curl -s https://www.arleo.eu/tags/ | grep -o 'name=robots[^>]*>'
# → name=robots content="noindex,follow">

curl -s https://www.arleo.eu/en/tags/sonarr/ | grep -o 'name=robots[^>]*>'
# → name=robots content="noindex,follow">

# Articles → index, follow
curl -s https://www.arleo.eu/en/posts/debug-seo-404-broken-links/ | grep -o 'name=robots[^>]*>'
# → name=robots content="index, follow">

# Clean sitemap
curl -s https://www.arleo.eu/en/sitemap.xml | grep -o '<loc>[^<]*</loc>' | grep 'tags\|categories' | wc -l
# → 0

Modified files

FileRole
layouts/baseof.htmlConditional robots logic
layouts/sitemap.xmlSitemap override — excludes taxonomy/term

robots.txt was not touched. Taxonomy URLs remain accessible and crawlable — only their indexing is disabled.


On the Bing side

The fix is immediate on the technical side. Bing will take a few days to re-crawl the pages and update the recommendation. To speed things up: in Bing Webmaster Tools → Recommendations → manually validate or submit a sitemap recrawl request.