Blogs | Markivis

How to Clean Your Data Before HubSpot Migration | Markivis

Written by Markivis | May 4, 2026 5:33:33 AM

Key Takeaways

  • Spend 2-4 weeks cleaning data before migration—it's worth the time investment

  • Focus on duplicates, missing fields, and formatting consistency

  • Use automated tools to clean data faster than manual work

  • Email addresses are critical; validate every one before migration

  • A clean dataset now prevents months of cleanup work later

How to Clean Your Data Before HubSpot Migration: Step-by-Step Guide

Here's the truth: you're probably not going to clean your data. And that's okay. Most companies don't until they have to. But if you migrate dirty data into HubSpot, you'll spend months cleaning it there, which is much harder and more expensive. This guide makes data cleaning manageable and shows you why it's worth your time.

Why Cleaning Data Matters

Before we jump into how, let's talk about why this matters.

What happens if you migrate dirty data:

  • Duplicate records — You end up with multiple records for the same person, making your database unreliable

  • Bad reports — If your data is wrong, your reports are wrong, and your decisions are based on false information

  • Wasted marketing spend — You email the same person multiple times because you don't know they're duplicates

  • Frustrated sales team — Salespeople don't trust a CRM with bad data and stop using it

  • Manual cleanup required — Months of work later to clean up the database

  • Lost opportunity — Wrong data means you miss opportunities (wrong contact info, bad lead scoring, etc.)

The cost of dirty data migrating:

If you migrate 100,000 contacts with 10% duplicates (10,000 duplicates), that's 10,000 records that need to be merged in HubSpot. At 5 minutes to merge each one, that's 50,000 minutes of work (833 hours, or 21 work weeks). That's one person's full-time job for 5 months.

Clean your data first. It's a no-brainer.

Step 1: Understand Your Current Data Quality

Before you clean anything, understand what you're dealing with.

Do a data audit:

  1. Export a sample of 1,000 records from your current CRM
  2. Manually check 100 random records
  3. Ask these questions:

- Are names formatted consistently?

- Do all records have email addresses?

- Are phone numbers in the same format?

- Are there obvious duplicates?

- Are there test records or spam?

- What percentage of records have incomplete information?

This gives you a baseline understanding of data quality.

Common data quality issues:

  • Duplicates: John Smith, john.smith@email.com AND John Smith, jsmith@email.com

  • Missing emails: 5% of records have no email address

  • Formatting inconsistencies: Phone numbers as 555-555-5555 and (555) 555-5555 and 5555555555

  • Bad data: "Test@test.com", "noemail@noemail.com", spam email addresses

  • Incomplete records: Names without email, companies without contacts

  • Inconsistent capitalization: JOHN SMITH, john smith, John Smith

  • Typos: Yahooo.com instead of Yahoo.com

  • Merged fields: "John Smith | john@email.com" in one field instead of separated

Step 2: Define Your Data Quality Standards

Decide what "clean" looks like for your business.

Create a checklist:

  • Email addresses — Every contact should have an email (or at least a phone number)

  • Name formatting — Names should be in consistent format (John Smith, not JOHN SMITH or john smith)

  • Phone numbers — Phone numbers should be in consistent format: (XXX) XXX-XXXX

  • Duplicate handling — Identify and merge duplicates

  • Test records — Remove any fake or test data

  • Spam addresses — Remove known spam or bounced email addresses

  • Field consistency — Custom fields should have consistent values (not some saying "Y" and others saying "yes")

  • Date formatting — Dates should be in consistent format: YYYY-MM-DD

  • Required fields — Define which fields are required vs. optional

Write this down. You'll reference it throughout your cleaning process.

Step 3: Remove Duplicates

Duplicates are the biggest data quality problem. Handle them first.

How to find duplicates:

Option 1: Use your current CRM's duplicate detection

Most CRMs have built-in duplicate detection:

  • Salesforce: Go to Settings → Data Quality → Duplicate Detection

  • Pipedrive: Go to Organization → Check for Duplicates

  • ActiveCampaign: Use "Merge Contacts" feature

  • Mailchimp: Use built-in duplicate detection

Option 2: Use a data cleaning tool

Tools like:

  • Datorama

  • Trifacta

  • DataWrangler

  • Excel formulas (if you export to spreadsheet)

These can identify duplicates by:

  • Exact email match

  • Similar name with same email

  • Similar name with same phone

  • Similar company name

How to merge duplicates:

  1. Identify duplicate records

  2. Decide which one to keep (usually keep the one with more complete information)

  3. Merge all information into the primary record

  4. Delete the duplicate

  5. Document the merge (in case you need to reference it later)

Pro tip: Start with email-based duplicates (same email = definitely duplicate). Then move to name-based duplicates, which are riskier.

How many duplicates is normal?

Most databases have 3-10% duplicates. If you have 10,000 contacts, expect 300-1,000 duplicates. That's normal. Plan on 1-2 weeks to handle them all.

Step 4: Remove Bad Records

Some records should just be deleted.

Delete these types of records:

  • Test records — Anything you created for testing (test@test.com, contact names like "Test User")
  • Spam email addresses — Obvious spam like "admin@admin.com", "noreply@example.com"
  • Invalid/bounced emails — If your CRM marks emails as bounced or invalid, delete them
  • No contact info — Records with no email AND no phone number are basically useless
  • Clearly wrong data — If something is obviously wrong (company name is "ASDF" or phone is "111-111-1111"), delete it
  • Inactive archives — Records you marked as inactive over 2 years ago

How many records should you expect to delete?

Typically 5-15% of your database. For 10,000 contacts, expect to delete 500-1,500. That's okay. You're improving quality.

Step 5: Standardize Phone Numbers

Phone numbers are usually a mess. Let's fix them.

Current state: You probably have formats like:

  • 5555555555
  • 555-555-5555
  • (555) 555-5555
  • +1 (555) 555-5555
  • 555.555.5555

Choose one standard format

Pick one: (XXX) XXX-XXXX is most common for US numbers.

If you have a spreadsheet:

Use a formula to standardize. Example in Excel:

=IF(LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-","")," ",""),"(",""),")",""),"+",""))=10,

"("&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-","")," ",""),"(",""),")",""),"+",""),1,3)&") "&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-","")," ",""),"(",""),")",""),"+",""),4,3)&"-"&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-","")," ",""),"(",""),")",""),"+",""),7,4), A2)

This formula takes phone numbers in any format and converts them to (XXX) XXX-XXXX.

If you use data cleaning tools:

Most tools have phone number standardization built in. Let them handle it.

Step 6: Standardize Name Formatting

Names should follow a consistent pattern.

Choose a standard: Title Case (John Smith) is most common.

Current state: You probably have:

  • JOHN SMITH

  • john smith

  • John Smith

  • john.smith

  • Smith, John

Fix with Excel formula:

=PROPER(A2)

This converts any text to Title Case (first letter capitalized, rest lowercase).

Use Find & Replace to clean up:

  • Find: " " (two spaces) → Replace: " " (one space)

  • Find: "^" → Replace: nothing (remove special characters)

  • Find: "Mr. " "Mrs. " "Ms. " at the beginning → Replace: nothing (remove titles)

Step 7: Validate Email Addresses

Email addresses are critical. Validate them.

Check for obviously bad emails:

  • Missing @ symbol

  • Multiple @ symbols

  • Typos in common domains (gmail.com vs. gmai.com)

  • No domain extension (.com, .org, etc.)

Use email validation tools:

  • ZeroBounce

  • NeverBounce

  • BriteVerify

  • EmailChecker

These tools check if an email address actually exists. They typically charge per verification ($0.01-$0.05 per email).

Process:

  1. Upload your email list

  2. Tool validates each email

  3. Get a list of valid, invalid, and questionable emails

  4. Remove invalid emails

  5. Manually review questionable ones

What to expect:

  • 80-95% valid emails

  • 5-20% invalid emails (deliverability issues, typos, etc.)

  • It's worth running this because bad emails waste time and money

Step 8: Standardize Custom Field Values

If you have custom fields with dropdown or checkbox values, standardize them.

Example: Maybe some records have "Prospect" and others have "prospect" or "PROSPECT"

Fix by:

  1. List all unique values in each custom field

  2. Choose the standard values you'll use

  3. Use Find & Replace to fix inconsistencies

  4. Go through manually for tricky ones

Common custom field standardization:

  • Industry: Make sure all industries use consistent capitalization and naming

  • Status: Prospect, Lead, Customer (choose which and use consistently)

  • Size: Small, Mid-Market, Enterprise (or S, M, L – pick one format)

  • Source: Website, Referral, Trade Show (be consistent)

Step 9: Fill Missing Critical Information

Some fields are too important to leave blank.

Critical fields that should be filled:

  • Email — Every contact should have an email if possible

  • Company name — If it's a B2B database, every contact should have a company

  • Name — If you only have first or last name, try to complete it

How to fill missing information:

  1. Reach out to the contact — If email is missing, call and ask

  2. Research online — LinkedIn, company website, email finder tools

  3. Use historical data — If they're an existing customer, look it up

  4. Leave it blank — If you truly don't know, leave it blank rather than guessing

Don't make up data. Better to have incomplete information than false information.

Step 10: Flag Inactive Records

Decide what to do with very old records.

Define "inactive": Hasn't been contacted in 12+ months

Options:

  1. Delete them — Simplest

  2. Archive them — Keep in a separate "Archive" list or view in HubSpot

  3. Keep them but flag them — Use a field to mark them as "inactive" so you don't market to them

We recommend: Archive them. Keep historical data just in case, but don't import into your active database.

Step 11: Verify Your Cleaning with Spot Checks

After cleaning, verify you did it right.

Spot check:

  1. Open 50 random records
  2. Check:

    - Phone numbers are in consistent format

    - Names are properly capitalized

    - No obvious duplicates

    - Email addresses look valid

    - No test/spam records

  3. Fix any issues you find

Quality check success metric:

  • 95%+ of records should be clean after a spot check

  • If it's lower, go back and do more cleanup

Step 12: Create a Cleaned Data Export

Once everything is clean, export your final dataset.

Steps:

  1. In your current CRM, select all records you want to migrate

  2. Export to CSV

  3. Open in Excel or Google Sheets

  4. Verify it looks correct

  5. Save as a properly formatted CSV file

Make sure:

  • First row is column headers (Email, First Name, Last Name, etc.)

  • No special characters or formatting in the data

  • All fields match the format you've standardized

Tools That Make Data Cleaning Easier

If you have a large database or limited time, use specialized tools:

For duplicates:

  • LinkedIn Cleaner

  • Leadiro

  • RocketReach

For standardization:

  • Trifacta

  • Datorama

  • OpenRefine (free)

For email validation:

  • ZeroBounce
  • NeverBounce
  • Hunter.io

For overall data quality:

  • Many CRMs have built-in tools
  • Spreadsheet formulas and Find & Replace (if data is in CSV)

Timeline for Data Cleaning

Small database (under 50,000 records):

  • 2-4 weeks of dedicated work
  • Or 4-8 weeks of part-time work

Medium database (50,000-500,000):

  • 4-8 weeks of dedicated work
  • Tools can speed this up significantly

Large database (500,000+):

  • 8+ weeks even with tools
  • Consider hiring a specialist or using a full-service cleaning company