Spend 2-4 weeks cleaning data before migration—it's worth the time investment
Focus on duplicates, missing fields, and formatting consistency
Use automated tools to clean data faster than manual work
Email addresses are critical; validate every one before migration
A clean dataset now prevents months of cleanup work later
Here's the truth: you're probably not going to clean your data. And that's okay. Most companies don't until they have to. But if you migrate dirty data into HubSpot, you'll spend months cleaning it there, which is much harder and more expensive. This guide makes data cleaning manageable and shows you why it's worth your time.
Before we jump into how, let's talk about why this matters.
What happens if you migrate dirty data:
Duplicate records — You end up with multiple records for the same person, making your database unreliable
Bad reports — If your data is wrong, your reports are wrong, and your decisions are based on false information
Wasted marketing spend — You email the same person multiple times because you don't know they're duplicates
Frustrated sales team — Salespeople don't trust a CRM with bad data and stop using it
Manual cleanup required — Months of work later to clean up the database
Lost opportunity — Wrong data means you miss opportunities (wrong contact info, bad lead scoring, etc.)
The cost of dirty data migrating:
If you migrate 100,000 contacts with 10% duplicates (10,000 duplicates), that's 10,000 records that need to be merged in HubSpot. At 5 minutes to merge each one, that's 50,000 minutes of work (833 hours, or 21 work weeks). That's one person's full-time job for 5 months.
Clean your data first. It's a no-brainer.
Before you clean anything, understand what you're dealing with.
Do a data audit:
- Are names formatted consistently?
- Do all records have email addresses?
- Are phone numbers in the same format?
- Are there obvious duplicates?
- Are there test records or spam?
- What percentage of records have incomplete information?
This gives you a baseline understanding of data quality.
Common data quality issues:
Duplicates: John Smith, john.smith@email.com AND John Smith, jsmith@email.com
Missing emails: 5% of records have no email address
Formatting inconsistencies: Phone numbers as 555-555-5555 and (555) 555-5555 and 5555555555
Bad data: "Test@test.com", "noemail@noemail.com", spam email addresses
Incomplete records: Names without email, companies without contacts
Inconsistent capitalization: JOHN SMITH, john smith, John Smith
Typos: Yahooo.com instead of Yahoo.com
Merged fields: "John Smith | john@email.com" in one field instead of separated
Decide what "clean" looks like for your business.
Create a checklist:
Email addresses — Every contact should have an email (or at least a phone number)
Name formatting — Names should be in consistent format (John Smith, not JOHN SMITH or john smith)
Phone numbers — Phone numbers should be in consistent format: (XXX) XXX-XXXX
Duplicate handling — Identify and merge duplicates
Test records — Remove any fake or test data
Spam addresses — Remove known spam or bounced email addresses
Field consistency — Custom fields should have consistent values (not some saying "Y" and others saying "yes")
Date formatting — Dates should be in consistent format: YYYY-MM-DD
Required fields — Define which fields are required vs. optional
Write this down. You'll reference it throughout your cleaning process.
Duplicates are the biggest data quality problem. Handle them first.
How to find duplicates:
Option 1: Use your current CRM's duplicate detection
Most CRMs have built-in duplicate detection:
Salesforce: Go to Settings → Data Quality → Duplicate Detection
Pipedrive: Go to Organization → Check for Duplicates
ActiveCampaign: Use "Merge Contacts" feature
Mailchimp: Use built-in duplicate detection
Option 2: Use a data cleaning tool
Tools like:
Datorama
Trifacta
DataWrangler
Excel formulas (if you export to spreadsheet)
These can identify duplicates by:
Exact email match
Similar name with same email
Similar name with same phone
Similar company name
How to merge duplicates:
Identify duplicate records
Decide which one to keep (usually keep the one with more complete information)
Merge all information into the primary record
Delete the duplicate
Document the merge (in case you need to reference it later)
Pro tip: Start with email-based duplicates (same email = definitely duplicate). Then move to name-based duplicates, which are riskier.
How many duplicates is normal?
Most databases have 3-10% duplicates. If you have 10,000 contacts, expect 300-1,000 duplicates. That's normal. Plan on 1-2 weeks to handle them all.
Some records should just be deleted.
Delete these types of records:
How many records should you expect to delete?
Typically 5-15% of your database. For 10,000 contacts, expect to delete 500-1,500. That's okay. You're improving quality.
Phone numbers are usually a mess. Let's fix them.
Current state: You probably have formats like:
Choose one standard format
Pick one: (XXX) XXX-XXXX is most common for US numbers.
If you have a spreadsheet:
Use a formula to standardize. Example in Excel:
=IF(LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-","")," ",""),"(",""),")",""),"+",""))=10,
"("&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-","")," ",""),"(",""),")",""),"+",""),1,3)&") "&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-","")," ",""),"(",""),")",""),"+",""),4,3)&"-"&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-","")," ",""),"(",""),")",""),"+",""),7,4), A2)
This formula takes phone numbers in any format and converts them to (XXX) XXX-XXXX.
If you use data cleaning tools:
Most tools have phone number standardization built in. Let them handle it.
Names should follow a consistent pattern.
Choose a standard: Title Case (John Smith) is most common.
Current state: You probably have:
JOHN SMITH
john smith
John Smith
john.smith
Smith, John
Fix with Excel formula:
=PROPER(A2)
This converts any text to Title Case (first letter capitalized, rest lowercase).
Use Find & Replace to clean up:
Find: " " (two spaces) → Replace: " " (one space)
Find: "^" → Replace: nothing (remove special characters)
Find: "Mr. " "Mrs. " "Ms. " at the beginning → Replace: nothing (remove titles)
Email addresses are critical. Validate them.
Check for obviously bad emails:
Missing @ symbol
Multiple @ symbols
Typos in common domains (gmail.com vs. gmai.com)
No domain extension (.com, .org, etc.)
Use email validation tools:
ZeroBounce
NeverBounce
BriteVerify
EmailChecker
These tools check if an email address actually exists. They typically charge per verification ($0.01-$0.05 per email).
Process:
Upload your email list
Tool validates each email
Get a list of valid, invalid, and questionable emails
Remove invalid emails
Manually review questionable ones
What to expect:
80-95% valid emails
5-20% invalid emails (deliverability issues, typos, etc.)
It's worth running this because bad emails waste time and money
If you have custom fields with dropdown or checkbox values, standardize them.
Example: Maybe some records have "Prospect" and others have "prospect" or "PROSPECT"
Fix by:
List all unique values in each custom field
Choose the standard values you'll use
Use Find & Replace to fix inconsistencies
Go through manually for tricky ones
Common custom field standardization:
Industry: Make sure all industries use consistent capitalization and naming
Status: Prospect, Lead, Customer (choose which and use consistently)
Size: Small, Mid-Market, Enterprise (or S, M, L – pick one format)
Source: Website, Referral, Trade Show (be consistent)
Some fields are too important to leave blank.
Critical fields that should be filled:
Email — Every contact should have an email if possible
Company name — If it's a B2B database, every contact should have a company
Name — If you only have first or last name, try to complete it
How to fill missing information:
Reach out to the contact — If email is missing, call and ask
Research online — LinkedIn, company website, email finder tools
Use historical data — If they're an existing customer, look it up
Leave it blank — If you truly don't know, leave it blank rather than guessing
Don't make up data. Better to have incomplete information than false information.
Decide what to do with very old records.
Define "inactive": Hasn't been contacted in 12+ months
Options:
Delete them — Simplest
Archive them — Keep in a separate "Archive" list or view in HubSpot
Keep them but flag them — Use a field to mark them as "inactive" so you don't market to them
We recommend: Archive them. Keep historical data just in case, but don't import into your active database.
After cleaning, verify you did it right.
Spot check:
- Phone numbers are in consistent format
- Names are properly capitalized
- No obvious duplicates
- Email addresses look valid
- No test/spam records
Quality check success metric:
95%+ of records should be clean after a spot check
If it's lower, go back and do more cleanup
Once everything is clean, export your final dataset.
Steps:
In your current CRM, select all records you want to migrate
Export to CSV
Open in Excel or Google Sheets
Verify it looks correct
Save as a properly formatted CSV file
Make sure:
First row is column headers (Email, First Name, Last Name, etc.)
No special characters or formatting in the data
All fields match the format you've standardized
If you have a large database or limited time, use specialized tools:
For duplicates:
LinkedIn Cleaner
Leadiro
RocketReach
For standardization:
Trifacta
Datorama
OpenRefine (free)
For email validation:
For overall data quality:
Small database (under 50,000 records):
Medium database (50,000-500,000):
Large database (500,000+):