Google uses a layered system of directives to decide whether to index a page. These signals operate at different levels — HTTP headers, HTML meta tags, sitemaps, and canonical tags — and when they conflict, there's a clear priority order. Getting these wrong means pages that should be indexed aren't, or pages you're trying to suppress stay in the index.

How indexing directives work

The hierarchy of indexing controls, from access level to page level:

  1. robots.txt — Controls crawl access. Can prevent Googlebot from visiting a URL, but does not prevent indexing. A blocked page can still appear in search results if Google discovers it from links or sitemaps elsewhere. robots.txt blocking is not noindex.
  2. X-Robots-Tag (HTTP header) — A page-level noindex directive delivered in the server response headers. Useful for non-HTML resources (PDFs, images) that don't have a <head>. Invisible in view-source — you need to inspect response headers to see it.
  3. meta robots tag — A page-level noindex directive in the HTML <head>. The most common method. Visible in view-source. Must be present in raw HTML to be reliable — injecting it via JavaScript creates a Wave 1 vs Wave 2 timing risk.
  4. Canonical tags — Preference signals for URL consolidation, not indexing directives. They don't prevent indexation. If a page has both noindex and a canonical, noindex wins — the canonical is ignored.

Key principle: noindex always beats canonical. If a page carries both a noindex directive and a canonical tag, noindex takes priority. The canonical consolidation does not happen. This trips up a lot of teams during migration work — they add noindex to staging pages that have canonicals configured, then wonder why the canonical isn't working.

Standard noindex — HTML meta tag:

<head> <meta name="robots" content="noindex"> <!-- rest of head --> </head>

Live examples

Each page below is a real, crawlable demo. The directives are live — Googlebot encounters exactly what's described.

Google resources