How to Use ImportXML to Scrape for SEO [FREE Google Sheet]

a cute robot in rainbow colours using importxml

Often as SEO’s we need to review content and data points on different URLs. Screaming Frog is an amazing tool for this but when we are working with a list of less than 1,000 and only need a couple of data points I find it way easier to use Google Sheets in built Scraping.

Gimme! Gimme! Gimme! the Google sheet now – Take a copy or check out the formulas now!

What’s XPath?*

XPath (XML Path Language) is a query language used for selecting nodes from an XML (or HTML) document. It allows you to navigate through the elements and attributes of an XML document, enabling you to pinpoint specific parts of the document’s structure.

XPath expressions can be used in various programming languages and tools for tasks such as data extraction, validation, transformation, and traversal of XML documents. It provides a syntax for selecting nodes based on their properties, relationships, and positions within the document hierarchy.

Basically, XPath lets you target a specific HTML element such a h1 or a meta title etc.

You can get the XPath in dev tools by copying the element you want to capture.

ImportXML for SEO Basics

The examples below will all work on the following cute page from the Australian Museum as an example in cell A2:

https://australian.museum/learn/animals/mammals/greater-bilby/

They should work on your website too assuming you have the element on the page. If you have more than one of these elements on a page see my troubleshooting at the bottom.

H1

<h1>Greater Bilby</h1>

XPath = //h1

Xpath to HTML Break down:

//h1 looks for <h1>

Google Sheet Formula	Output
=IMPORTXML(A2,”//h1″)	Greater Bilby

Meta Title / Page Title

<title>Greater Bilby - The Australian Museum</title>

XPath = //title

Xpath to HTML Breakdown:

//title looks for <title>

Google Sheet Formula	Output
=IMPORTXML(A2,”//title”)	Greater Bilby – The Australian Museum

Meta Description

<meta name="description" content="The Greater Bilby, sometimes depicted as Australia's Easter Bunny, belongs to a group of ground-dwelling marsupials known as bandicoots.">

XPath = //meta[@name=’description’]/@content

Xpath to HTML Break down:

//meta looks for <meta
[@name=’description’] looks for name=”description”
/@content looks for content=”The Greater Bilby…..

Formula:

Google Sheet Formula	Output
=IMPORTXML(A2,”//meta[@name=’description’]/@content”)	The Greater Bilby, sometimes depicted as Australia’s Easter Bunny, belongs to a group of ground-dwelling marsupials known as bandicoots.

Canonical

<link rel="canonical" href="https://australian.museum/learn/animals/mammals/greater-bilby/">

XPath = //link[@rel=’canonical’]/@href

Xpath to HTML Break down:

//link looks for <link
[@rel=’canonical’] looks for rel=”canonical”
/@href looks for href=”https://australi…

Formula:

Google Sheet Formula	Output
=IMPORTXML(A2,”//link[@rel=’canonical’]/@href”)	https://australian.museum/learn/animals/mammals/greater-bilby/

H2/H3/H3 etc.

<h2>H2 1</h2>
<p>Some text</p>
<h2>H2 3</h2>
<p>Some text</p>
<h2>H2 3</h2>
<p>Some text</p>

XPath = //h2

Xpath to HTML Break down:

//h2 looks for all <h2>

Formula:

When there are multiple values you’ll need to use transpose to make sure it doesn’t overwrite content in the cell below.

Google Sheet Formula	Output Col 1	Output Col 2	Output Col 3
=transpose(IMPORTXML(A2,”//h2))	H2 1	H2 2	H2 3

Custom ImportXml

Sometimes we want to grab say authors or dates of articles. So we need to use a bit more complex XML. The two below are quite complicated but all you really need to know is you can copy XPath and use it.

Date & Author Example

https://australian.museum/learn/animals/mammals/greater-bilby/

<main id="main">
..
<article>
<div class="page-header__metadata">
		<ul class="metadata">
			<li class="metadata__item metadata__item--author">
				<dl class="metadata__pair">
					<dt class="metadata__label">Author(s)</dt>
					<dd class="metadata__value">Sue Burrell</dd>
				</dl>
			</li>
			<li class="metadata__item">
				<dl class="metadata__pair">
					<dt class="metadata__label">Updated</dt>
					<dd class="metadata__value">21/04/22</dd>
				</dl>
			</li>
			<li class="metadata__item">
				<dl class="metadata__pair">
					<dt class="metadata__label">Read time</dt>
					<dd class="metadata__value">2 minutes</dd>
				</dl>
			</li>
		</ul>
	</div>
..
</article>
</main>

Date

Date XPath = //*[@id=’main’]/article/header/div/div/div/div/ul/li[2]/dl/dd”

Formula:

Google Sheet Formula	Output
=importxml(A2,”//*[@id=’main’]/article/header/div/div/div/div/ul/li[2]/dl/dd”)	28/04/22

Author

Author XPath = //*[@id=’main’]/article/header/div/div/div/div/ul/li[contains(@class, ‘metadata__item metadata__item–author’)]/dl/dd

Formula:

Google Sheet Formula	Output
=importxml(A2,”//*[@id=’main’]/article/header/div/div/div/div/ul/li[contains(@class, ‘metadata__item metadata__item–author’)]/dl/dd”)	Sue Burrell

Troubleshooting

I have more than one value for the element and it is overwriting the one below.

You can use the =transpose(importXML(A2, “XPATH”) to cater for this. Transpose will instead add the values in the next column instead of row.

My value isn’t working and I copied straight from dev tools?

Watch out for the different quotation marks.

Xpath in Chrome Dev Tools uses ” ” around attributes

Google Sheets uses ‘ ‘ around attributes

If you have any questions feel free to reach out through email or twitter or comment below!

Posted

February 28, 2024

SEO Automation

Sal

Tags:

Using ImportXML in Sheets for SEO & Scraping

On this page

What’s XPath?*

ImportXML for SEO Basics

H1

Xpath to HTML Break down:

Meta Title / Page Title

Xpath to HTML Breakdown:

Meta Description

Xpath to HTML Break down:

Formula:

Canonical

Xpath to HTML Break down:

Formula:

H2/H3/H3 etc.

Xpath to HTML Break down:

Formula:

Custom ImportXml

Date & Author Example

Date

Formula:

Author

Formula:

Troubleshooting

I have more than one value for the element and it is overwriting the one below.

My value isn’t working and I copied straight from dev tools?

Comments

Leave a Reply Cancel reply