Below you'll find a script to help you match old 404 URLs with new URLs. Rather than manually matching URLs one by one, you can click play and let FuzzyMatch do the work for you.

Get the script — take a copy or use it now in Google Colab

The script outputs a CSV with your old URLs matched to your new ones, along with a percentage similarity score.

Thank you to these geniuses

As with most of my scripts, I've frankensteined other people's work — so a massive shoutout to these legends.

Understanding Fuzzy Matching

A huge thank you to Lazarina Stoy — her article on Fuzzy matching for SEO is excellent, and her blog is generally worth bookmarking.

Twitter: @lazarinastoy

Fuzzy Matching in Python

I originally got into FuzzyMatch thanks to Antoine Eripret and his script — the code is very much his, with minor changes to make it more accessible for less technical SEOs. Also check out the second part of his article where he uses Google Search to decide where a URL should redirect to.

Twitter: @antoineripret

Example Excel workbook

For the script to work, you need to upload an .xlsx file with a specific structure:

  1. Create an .xlsx workbook with 2 worksheets
  2. Name the first sheet old and paste your old website URLs in the first column
  3. Name the second sheet new and paste your new website URLs in the first column

Download an example workbook

Step 1: Run pip install PolyFuzz

Click the play button next to the line of code below. This installs the PolyFuzz library.

pip install PolyFuzz

Step 2: Load in the libraries

PolyFuzz is a Python library that runs FuzzyMatch to match similar strings. This is the important part doing the heavy lifting.

Pandas allows us to analyse and manipulate data. It handles our data frames throughout the script.

Openpyxl allows Python to open, read, and manipulate Excel files.

import pandas as pd
from openpyxl import load_workbook
from polyfuzz import PolyFuzz
from google.colab import files

Step 3: Upload your Excel file

  1. Create an .xlsx workbook with 2 worksheets
  2. First sheet called old — your old 404 URLs in column A
  3. Second sheet called new — your new 200 URLs in column A
# Upload the xlsx file
upload = files.upload()
input_file = list(upload.keys())[0]  # get the name of the uploaded file

Step 4: Run FuzzyMatch

This is the part of the script that does all the matching and outputs results. If results look good and there are no errors, move to the final step.

# Load URL lists
old = pd.read_excel(input_file, sheet_name='old')
new = pd.read_excel(input_file, sheet_name='new')

# Convert to Python list (required by PolyFuzz)
old = old['URL'].tolist()
new = new['URL'].tolist()

# Launch fuzzy matching
model = PolyFuzz("TF-IDF")
model.match(old, new)

# Load results
result = model.get_matches()

# Print results so you can visually check what's been done
print(result)

# Create a DataFrame from the results
df = pd.DataFrame(result, columns=['From', 'To', 'Similarity'])

# Save the DataFrame to a CSV file
df.to_csv('/content/redirect-map.csv', index=False)

Step 5: Download your redirect map

files.download("/content/redirect-map.csv")

Done. Your URLs will be matched and you can QA the results rather than doing the matching by hand.