FuzzyMatch for Automated Redirect Mapping [FREE Colab Script]

a cute robot in rainbow colours using fuzzy match

Howdy folks, below you’ll find a script to help you match old 404 URLs with new URLs! This means that rather than playing matchy with URLs you can click play and let Fuzzy Match do the work for you.

Gimme! Gimme! Gimme! the script now – Take a copy or use the script now!

This script will output a csv with your old URLs matched to your new ones and a percentage match.

Thank you to these geniuses

As per most of my scripts throughout time, I have frankensteined other people’s work. So massive should out to these legends!

Understanding Fuzzy Matching

A massive thank you to Lazarina Stoy, your article on Fuzzy matching for SEO is amazing! And your blog is amazing – you are a genius.

If you want to learn about fuzzy matching, please read her article and honestly everything else on her site.

Twitter: lazarinastoy

Fuzzy Matching in Python

I originally started using Fuzzymatch all thanks to Antoine Eripret and his script is very much his with some minor changes to allow less technical SEOs use it easily.

Thank you Antoine!

Also, check out the second part of his article where he uses Google Search to decide where a URL should be sent to. Very cool.

Twitter: antoineripret

Example Excel Workbook

For this script to work you need to upload an xlsx file with a specific layout.

  1. Create an xlsx workbook with 2 worksheets,
  2. 1st called ‘old’ with your old 404 urls in the first column
  3. 2nd called ‘new’ with your new 200 urls in the first column

Download an example!

Step 1: Run ‘pip install PolyFuzz’

Click the play button next to the line of code below. This installs the library PolyFuzz – Don’t be worried about the movie hacker code below.

pip install PolyFuzz

Step 2: Load in the libraries

Polyfuzz

Polyfuzz is a Python library that runs Fuzzy Match to match similar strings. This is the important part of the script doing the hard work for us.

Pandas

If you’ve ever used Python before you’ve probably heard of Pandas which allows us to analyse and manipulate data. It’s an amazing tool and looks after our data frames.

Openpyxl

Openpyxl allows Python to open, manipulate and read Excel files.

import pandas as pd
from openpyxl import load_workbook
from polyfuzz import PolyFuzz
from google.colab import files

Step 3: Upload your Excel file

  1. Create an xlsx workbook with 2 worksheets,
  2. 1st called ‘old’ with your old 404 urls in the first column
  3. 2nd called ‘new’ with your new 200 urls in the first column
# upload the xlsx file
upload = files.upload()
input_file = list(upload.keys())[0]  # get the name of the uploaded file

Step 4: Run FuzzyMatch!

This is the fancy part of the script that does all the matching for you and outputs results. If the results look good and you have no errors you can move to the final step and export the matched URLs!

#load urls lists
old = pd.read_excel(input_file, sheet_name='old')
new = pd.read_excel(input_file, sheet_name='new')

#convert to Python list (required by Polyfuzz)
old = old['URL'].tolist()
new = new['URL'].tolist()

#launch fuzzy matching
model = PolyFuzz("TF-IDF")
model.match(old, new)

#load results
result = model.get_matches()

#prints the results below so you can visually see what's been done
print(result)

# Create a DataFrame from the results
df = pd.DataFrame(result, columns=['From', 'To', 'Similarity'])

# Save the DataFrame to a CSV file
df.to_csv('/content/redirect-map.csv', index=False)

Step 5: Download your redirect map! Woohoo!

files.download("/content/redirect-map.csv")

Then you’re done! Your URLs will be matched and you can instead QA the results.

If you have any questions feel free to reach out through email or twitter or comment below!


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *