Google uses a layered system of directives to decide whether to index a page. These signals operate at different levels — HTTP headers, HTML meta tags, sitemaps, and canonical tags — and when they conflict, there's a clear priority order. Getting these wrong means pages that should be indexed aren't, or pages you're trying to suppress stay in the index.
How indexing directives work
The hierarchy of indexing controls, from access level to page level:
- robots.txt — Controls crawl access. Can prevent Googlebot from visiting a URL, but does not prevent indexing. A blocked page can still appear in search results if Google discovers it from links or sitemaps elsewhere. robots.txt blocking is not noindex.
- X-Robots-Tag (HTTP header) — A page-level noindex directive delivered in the server response headers. Useful for non-HTML resources (PDFs, images) that don't have a
<head>. Invisible in view-source — you need to inspect response headers to see it. - meta robots tag — A page-level noindex directive in the HTML
<head>. The most common method. Visible in view-source. Must be present in raw HTML to be reliable — injecting it via JavaScript creates a Wave 1 vs Wave 2 timing risk. - Canonical tags — Preference signals for URL consolidation, not indexing directives. They don't prevent indexation. If a page has both noindex and a canonical, noindex wins — the canonical is ignored.
Key principle: noindex always beats canonical. If a page carries both a noindex directive and a canonical tag, noindex takes priority. The canonical consolidation does not happen. This trips up a lot of teams during migration work — they add noindex to staging pages that have canonicals configured, then wonder why the canonical isn't working.
Standard noindex — HTML meta tag:
<head>
<meta name="robots" content="noindex">
<!-- rest of head -->
</head>
Live examples
Each page below is a real, crawlable demo. The directives are live — Googlebot encounters exactly what's described.
meta robots noindex
The standard HTML noindex directive in the <head>. Visible in view-source. Googlebot reads it on Wave 1. The most reliable method.
X-Robots-Tag (HTTP Header)
noindex via HTTP response header. No meta tag in the HTML. Invisible in view-source — only visible in response headers.
View demo → 🔴 Issue — unreliable directivenoindex via JavaScript
The noindex meta tag is added by JavaScript. Not in raw HTML. Wave 1 sees no directive — the page may be indexed before Wave 2 processes it.
View demo → ⚠️ Conflicting signalsnoindex Page in Sitemap
noindex in HTML but the URL is also listed in the sitemap. Contradictory signals. noindex wins — but the sitemap entry wastes crawl budget.
View demo → ⚠️ Conflicting signalsnoindex + Canonical Conflict
Both noindex and a canonical tag are present. noindex wins — the canonical consolidation doesn't happen. A common misunderstanding during migrations.
View demo →