This page's canonical points to /canonicals/robots-blocked-destination/ — a URL that is disallowed in robots.txt. The destination page exists but Google cannot crawl it. Check https://sallymills.com/robots.txt to see the live disallow rule.
What this demonstrates
The canonical tag on this page points to a URL that has been blocked from crawling by a Disallow rule in robots.txt. The destination page exists and returns a 200 — but Googlebot is instructed not to access it, so it can't read the page to verify it as a valid canonical destination.
Why it matters
This is one of the subtler canonical failures. The canonical looks correct — it's an absolute URL, the destination exists, there's no 404. The problem is invisible unless you cross-reference the canonical destination against robots.txt.
If Google can't access the canonical destination, it can't confirm it as the preferred version. Consolidation may not happen. This trap appears most often when a site blocks staging or parameter-based variants in robots.txt — and someone sets those blocked URLs as canonical destinations, or vice versa.
The code
The canonical tag on this page — and the robots.txt rule that blocks its destination.
<!-- Canonical tag on this page -->
<link rel="canonical" href="https://sallymills.com/canonicals/robots-blocked-destination/">
# robots.txt — the destination is disallowed
Disallow: /canonicals/robots-blocked-destination/
What Google does
- Googlebot crawls this page and reads the canonical pointing to
/canonicals/robots-blocked-destination/. - Googlebot checks
robots.txtbefore attempting to crawl the destination. - The
Disallowrule prevents Googlebot from accessing the destination. - Google cannot verify the destination as a valid preferred URL.
- The canonical hint is effectively ignored. This page is likely treated as self-canonical.
How to detect it
-
view-source
Ctrl+U(Windows) /Cmd+U(Mac) → search forcanonical→ copy the destination URL. Then openhttps://sallymills.com/robots.txtand search for that path. -
curl
Open Command Prompt (Windows) or Terminal (Mac) and run:
curl https://sallymills.com/robots.txt | grep robots-blocked-destination— Returns the Disallow rule. Then run:curl -I https://sallymills.com/canonicals/robots-blocked-destination/— The destination returns a 200 (it exists, it's just blocked to crawlers). The-Iflag fetches headers only. (Windows: replace| grep robots-blocked-destinationwith| findstr robots-blocked-destination.) - Google Search Console The destination URL may appear in Coverage under "Blocked by robots.txt". This page may appear under "Crawled — currently not indexed" since its canonical is inaccessible.
- Screaming Frog Canonicals tab → copy the canonical destination URL → crawl it separately with robots.txt checking enabled → it will show as "Blocked by robots.txt". Or run a robots.txt check on the destination directly from the Robots tab.
How to fix it
Ensure canonical destinations are always crawlable. Never disallow a URL in robots.txt that you're using as a canonical destination — and never set a robots.txt-blocked URL as a canonical target. Audit your canonical destinations against your robots.txt rules, especially after migrations or when robots.txt changes are made.