Key Takeaways
-
Spend 2-4 weeks cleaning data before migration—it's worth the time investment
-
Focus on duplicates, missing fields, and formatting consistency
-
Use automated tools to clean data faster than manual work
-
Email addresses are critical; validate every one before migration
-
A clean dataset now prevents months of cleanup work later
How to Clean Your Data Before HubSpot Migration: Step-by-Step Guide
Here's the truth: you're probably not going to clean your data. And that's okay. Most companies don't until they have to. But if you migrate dirty data into HubSpot, you'll spend months cleaning it there, which is much harder and more expensive. This guide makes data cleaning manageable and shows you why it's worth your time.
Why Cleaning Data Matters
Before we jump into how, let's talk about why this matters.
What happens if you migrate dirty data:
-
Duplicate records — You end up with multiple records for the same person, making your database unreliable
-
Bad reports — If your data is wrong, your reports are wrong, and your decisions are based on false information
-
Wasted marketing spend — You email the same person multiple times because you don't know they're duplicates
-
Frustrated sales team — Salespeople don't trust a CRM with bad data and stop using it
-
Manual cleanup required — Months of work later to clean up the database
-
Lost opportunity — Wrong data means you miss opportunities (wrong contact info, bad lead scoring, etc.)
The cost of dirty data migrating:
If you migrate 100,000 contacts with 10% duplicates (10,000 duplicates), that's 10,000 records that need to be merged in HubSpot. At 5 minutes to merge each one, that's 50,000 minutes of work (833 hours, or 21 work weeks). That's one person's full-time job for 5 months.
Clean your data first. It's a no-brainer.
Step 1: Understand Your Current Data Quality
Before you clean anything, understand what you're dealing with.
Do a data audit:
- Export a sample of 1,000 records from your current CRM
- Manually check 100 random records
- Ask these questions:
- Are names formatted consistently?
- Do all records have email addresses?
- Are phone numbers in the same format?
- Are there obvious duplicates?
- Are there test records or spam?
- What percentage of records have incomplete information?
This gives you a baseline understanding of data quality.
Common data quality issues:
-
Duplicates: John Smith, john.smith@email.com AND John Smith, jsmith@email.com
-
Missing emails: 5% of records have no email address
-
Formatting inconsistencies: Phone numbers as 555-555-5555 and (555) 555-5555 and 5555555555
-
Bad data: "Test@test.com", "noemail@noemail.com", spam email addresses
-
Incomplete records: Names without email, companies without contacts
-
Inconsistent capitalization: JOHN SMITH, john smith, John Smith
-
Typos: Yahooo.com instead of Yahoo.com
-
Merged fields: "John Smith | john@email.com" in one field instead of separated
Step 2: Define Your Data Quality Standards
Decide what "clean" looks like for your business.
Create a checklist:
-
Email addresses — Every contact should have an email (or at least a phone number)
-
Name formatting — Names should be in consistent format (John Smith, not JOHN SMITH or john smith)
-
Phone numbers — Phone numbers should be in consistent format: (XXX) XXX-XXXX
-
Duplicate handling — Identify and merge duplicates
-
Test records — Remove any fake or test data
-
Spam addresses — Remove known spam or bounced email addresses
-
Field consistency — Custom fields should have consistent values (not some saying "Y" and others saying "yes")
-
Date formatting — Dates should be in consistent format: YYYY-MM-DD
-
Required fields — Define which fields are required vs. optional
Write this down. You'll reference it throughout your cleaning process.
Step 3: Remove Duplicates
Duplicates are the biggest data quality problem. Handle them first.
How to find duplicates:
Option 1: Use your current CRM's duplicate detection
Most CRMs have built-in duplicate detection:
-
Salesforce: Go to Settings → Data Quality → Duplicate Detection
-
Pipedrive: Go to Organization → Check for Duplicates
-
ActiveCampaign: Use "Merge Contacts" feature
-
Mailchimp: Use built-in duplicate detection
Option 2: Use a data cleaning tool
Tools like:
-
Datorama
-
Trifacta
-
DataWrangler
-
Excel formulas (if you export to spreadsheet)
These can identify duplicates by:
-
Exact email match
-
Similar name with same email
-
Similar name with same phone
-
Similar company name
How to merge duplicates:
-
Identify duplicate records
-
Decide which one to keep (usually keep the one with more complete information)
-
Merge all information into the primary record
-
Delete the duplicate
-
Document the merge (in case you need to reference it later)
Pro tip: Start with email-based duplicates (same email = definitely duplicate). Then move to name-based duplicates, which are riskier.
How many duplicates is normal?
Most databases have 3-10% duplicates. If you have 10,000 contacts, expect 300-1,000 duplicates. That's normal. Plan on 1-2 weeks to handle them all.
Step 4: Remove Bad Records
Some records should just be deleted.
Delete these types of records:
- Test records — Anything you created for testing (test@test.com, contact names like "Test User")
- Spam email addresses — Obvious spam like "admin@admin.com", "noreply@example.com"
- Invalid/bounced emails — If your CRM marks emails as bounced or invalid, delete them
- No contact info — Records with no email AND no phone number are basically useless
- Clearly wrong data — If something is obviously wrong (company name is "ASDF" or phone is "111-111-1111"), delete it
- Inactive archives — Records you marked as inactive over 2 years ago
How many records should you expect to delete?
Typically 5-15% of your database. For 10,000 contacts, expect to delete 500-1,500. That's okay. You're improving quality.
Step 5: Standardize Phone Numbers
Phone numbers are usually a mess. Let's fix them.
Current state: You probably have formats like:
- 5555555555
- 555-555-5555
- (555) 555-5555
- +1 (555) 555-5555
- 555.555.5555
Choose one standard format
Pick one: (XXX) XXX-XXXX is most common for US numbers.
If you have a spreadsheet:
Use a formula to standardize. Example in Excel:
=IF(LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-","")," ",""),"(",""),")",""),"+",""))=10,
"("&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-","")," ",""),"(",""),")",""),"+",""),1,3)&") "&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-","")," ",""),"(",""),")",""),"+",""),4,3)&"-"&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-","")," ",""),"(",""),")",""),"+",""),7,4), A2)
This formula takes phone numbers in any format and converts them to (XXX) XXX-XXXX.
If you use data cleaning tools:
Most tools have phone number standardization built in. Let them handle it.
Step 6: Standardize Name Formatting
Names should follow a consistent pattern.
Choose a standard: Title Case (John Smith) is most common.
Current state: You probably have:
-
JOHN SMITH
-
john smith
-
John Smith
-
john.smith
-
Smith, John
Fix with Excel formula:
=PROPER(A2)
This converts any text to Title Case (first letter capitalized, rest lowercase).
Use Find & Replace to clean up:
-
Find: " " (two spaces) → Replace: " " (one space)
-
Find: "^" → Replace: nothing (remove special characters)
-
Find: "Mr. " "Mrs. " "Ms. " at the beginning → Replace: nothing (remove titles)
Step 7: Validate Email Addresses
Email addresses are critical. Validate them.
Check for obviously bad emails:
-
Missing @ symbol
-
Multiple @ symbols
-
Typos in common domains (gmail.com vs. gmai.com)
-
No domain extension (.com, .org, etc.)
Use email validation tools:
-
ZeroBounce
-
NeverBounce
-
BriteVerify
-
EmailChecker
These tools check if an email address actually exists. They typically charge per verification ($0.01-$0.05 per email).
Process:
-
Upload your email list
-
Tool validates each email
-
Get a list of valid, invalid, and questionable emails
-
Remove invalid emails
-
Manually review questionable ones
What to expect:
-
80-95% valid emails
-
5-20% invalid emails (deliverability issues, typos, etc.)
-
It's worth running this because bad emails waste time and money
Step 8: Standardize Custom Field Values
If you have custom fields with dropdown or checkbox values, standardize them.
Example: Maybe some records have "Prospect" and others have "prospect" or "PROSPECT"
Fix by:
-
List all unique values in each custom field
-
Choose the standard values you'll use
-
Use Find & Replace to fix inconsistencies
-
Go through manually for tricky ones
Common custom field standardization:
-
Industry: Make sure all industries use consistent capitalization and naming
-
Status: Prospect, Lead, Customer (choose which and use consistently)
-
Size: Small, Mid-Market, Enterprise (or S, M, L – pick one format)
-
Source: Website, Referral, Trade Show (be consistent)
Step 9: Fill Missing Critical Information
Some fields are too important to leave blank.
Critical fields that should be filled:
-
Email — Every contact should have an email if possible
-
Company name — If it's a B2B database, every contact should have a company
-
Name — If you only have first or last name, try to complete it
How to fill missing information:
-
Reach out to the contact — If email is missing, call and ask
-
Research online — LinkedIn, company website, email finder tools
-
Use historical data — If they're an existing customer, look it up
-
Leave it blank — If you truly don't know, leave it blank rather than guessing
Don't make up data. Better to have incomplete information than false information.
Step 10: Flag Inactive Records
Decide what to do with very old records.
Define "inactive": Hasn't been contacted in 12+ months
Options:
-
Delete them — Simplest
-
Archive them — Keep in a separate "Archive" list or view in HubSpot
-
Keep them but flag them — Use a field to mark them as "inactive" so you don't market to them
We recommend: Archive them. Keep historical data just in case, but don't import into your active database.
Step 11: Verify Your Cleaning with Spot Checks
After cleaning, verify you did it right.
Spot check:
- Open 50 random records
- Check:
- Phone numbers are in consistent format
- Names are properly capitalized
- No obvious duplicates
- Email addresses look valid
- No test/spam records
- Fix any issues you find
Quality check success metric:
-
95%+ of records should be clean after a spot check
-
If it's lower, go back and do more cleanup
Step 12: Create a Cleaned Data Export
Once everything is clean, export your final dataset.
Steps:
-
In your current CRM, select all records you want to migrate
-
Export to CSV
-
Open in Excel or Google Sheets
-
Verify it looks correct
-
Save as a properly formatted CSV file
Make sure:
-
First row is column headers (Email, First Name, Last Name, etc.)
-
No special characters or formatting in the data
-
All fields match the format you've standardized
Tools That Make Data Cleaning Easier
If you have a large database or limited time, use specialized tools:
For duplicates:
-
LinkedIn Cleaner
-
Leadiro
-
RocketReach
For standardization:
-
Trifacta
-
Datorama
-
OpenRefine (free)
For email validation:
- ZeroBounce
- NeverBounce
- Hunter.io
For overall data quality:
- Many CRMs have built-in tools
- Spreadsheet formulas and Find & Replace (if data is in CSV)
Timeline for Data Cleaning
Small database (under 50,000 records):
- 2-4 weeks of dedicated work
- Or 4-8 weeks of part-time work
Medium database (50,000-500,000):
- 4-8 weeks of dedicated work
- Tools can speed this up significantly
Large database (500,000+):
- 8+ weeks even with tools
- Consider hiring a specialist or using a full-service cleaning company
FAQ
You need to get rid of obvious bad data (duplicates, test records, spam). Fine-tuning formatting is nice but less critical. Focus on: 1) No duplicates 2) No test/spam records 3) Email validation.
Technically yes, but it's much harder. You'll be doing this in HubSpot where tools are less powerful, and your team won't have a clean database to work with. Much better to clean first.
Yes. Even spending 2-3 weeks cleaning now saves you months of pain later. Focus on the highest-impact cleanups: duplicates, missing emails, and obvious bad data.
Freelancers: $1,000-$10,000 depending on size and complexity
Agencies: $5,000-$50,000
In-house: 1-2 weeks of someone's time
Yes, make a copy of your data and clean the copy. Don't clean while people are still using the system—they'll add data faster than you can clean.
Yes. Ask people to minimize new data entry during the cleaning period. Any data added during cleaning might get missed.
Trying to be too perfect. Don't spend 3 weeks getting phone numbers to format. Get the 80/20 point: obvious bad data out, duplicates merged, emails validated. Done.
Ready to Migrate with Clean Data?
Data cleaning is the unsexy part of CRM implementation, but it's maybe the most important. A few weeks of cleanup work now prevents months of problems later.
If you're about to migrate and wondering if your data is clean enough, let's talk. We can assess your data quality and tell you whether you need more cleanup or if you're ready to migrate.
Let's get your data ready. Schedule a free 20-minute consultation where we assess your situation and help you decide on the right approach. We'll give you a realistic estimate of time, cost, and risk for both options so you can make an informed decision.