⚠️ Conflicting signals — noindex wins, sitemap entry is noise

This page has <meta name="robots" content="noindex"> in its HTML. It is also listed in technical-seo-examples-sitemap.xml. These signals contradict each other. Google resolves this by honouring the noindex directive — the sitemap entry encourages crawling but doesn't override the noindex.

What this demonstrates

This page carries two contradictory signals: a <meta name="robots" content="noindex"> tag in the HTML, and a <loc> entry for this URL in technical-seo-examples-sitemap.xml. The sitemap tells Google to crawl and index this URL. The noindex directive says don't. These cannot both be satisfied.

Google resolves the conflict by honouring the noindex directive. The page is not indexed. The sitemap entry does not override noindex — it only encourages crawling, which Google will do anyway before it reads the noindex tag.

Why it matters

This conflict is extremely common on large sites and happens most often after migrations or when noindex is added to pages that remain in the sitemap. CMS systems that generate sitemaps and CMS systems that manage noindex tags are often separate — they're not always in sync.

The outcome: the page stays out of the index (noindex wins). But the sitemap entry wastes crawl budget. Google crawls the URL, reads the sitemap suggestion to treat it as important, fetches the page, then finds the noindex and excludes it. That crawl slot could have been used on an indexable page.

At scale — thousands of noindexed pages still in the sitemap — this creates meaningful crawl budget drag, particularly on large or frequently updated sites. Google's crawl resources are finite. Sitemap entries should only include URLs you actually want indexed.

The code

The conflicting signals on this page — noindex in HTML, URL in sitemap.

<!-- In the HTML <head>: noindex directive --> <meta name="robots" content="noindex"> # In technical-seo-examples-sitemap.xml: URL is listed # (intentional — this conflict IS the demo) <url> <loc>https://sallymills.com/indexing/noindex-in-sitemap/</loc> </url> # Correct: remove noindexed pages from the sitemap # Sitemaps should only contain URLs you want indexed

What Google does

  1. Google discovers this URL in the sitemap and queues it for crawling.
  2. Googlebot crawls this page and reads the HTML response.
  3. In the <head>, it finds <meta name="robots" content="noindex">.
  4. The noindex directive takes precedence over the sitemap entry. The page is excluded from the index.
  5. The sitemap entry did not help — it only caused a crawl that was ultimately wasted.
  6. In Google Search Console, this page appears under "Excluded by 'noindex' tag" in the Indexing report, not as indexed despite the sitemap entry.

How to detect it

  • view-source Ctrl+U (Windows) / Cmd+U (Mac) → search for robots in the HTML — you'll find the noindex tag. Then open technical-seo-examples-sitemap.xml and search for this URL — you'll find it listed there. Both signals are visible; the conflict is clear.
  • curl Open Command Prompt (Windows) or Terminal (Mac) and run these two commands: curl -L https://sallymills.com/indexing/noindex-in-sitemap/ | grep 'noindex' curl -L https://sallymills.com/technical-seo-examples-sitemap.xml | grep 'noindex-in-sitemap' The first returns the meta robots tag; the second returns the sitemap entry. Together they confirm the conflict. (Windows: replace | grep with | findstr.)
  • Google Search Console This URL appears in the Indexing report under "Excluded by 'noindex' tag" — not indexed, despite being in the sitemap. GSC may also surface this as a discrepancy in the Sitemaps section if it has crawled this URL and found the conflict.
  • Screaming Frog Crawl the sitemap → check each URL in the Directives tab → Meta Robots column shows "noindex". Any sitemap URL showing noindex in that column has this conflict. Filter by "noindex" in the Meta Robots column after crawling the sitemap to find all instances at scale.

How to fix it

Remove noindexed pages from the sitemap. Sitemaps are a crawl priority signal — they should only contain URLs you want Google to index. After any migration or change that adds noindex tags to pages, run an audit to identify any noindexed URLs still listed in the sitemap and remove them from the sitemap file.

The most reliable way to catch this at scale: crawl your sitemap with Screaming Frog, then filter for any URLs where the Meta Robots column shows "noindex". Those URLs are your conflict list. Either remove the noindex (if the pages should be indexed) or remove the sitemap entry (if the pages should stay noindexed).