DevToys Web Pro iconDevToys Web ProBlog
Oceń nas:
Wypróbuj rozszerzenie przeglądarki:
← Back to Blog

Email Normalizer Guide: Deduplication, Gmail Dots, and Plus Aliases

7 min read

Your sign-up form rejects duplicates, but the same user registers three times: john@gmail.com, j.o.h.n@gmail.com, and john+promo@gmail.com. All three land in the same inbox. Email normalization is the process of reducing an address to a canonical form so that these variants deduplicate correctly. The Email Normalizer tool does this in the browser — this guide explains what is happening under the hood and how to build it yourself.

Gmail Dot-Insensitivity

Gmail ignores dots in the local part (everything before the @). According to Google's own documentation, johnsmith@gmail.com, john.smith@gmail.com, and j.o.h.n.s.m.i.t.h@gmail.com all deliver to exactly the same mailbox. There is no limit on how many dots you can insert, or where you place them — the mail server strips them all before routing.

This behavior is unique to Gmail (and its Google Workspace alias, googlemail.com). Other providers treat dots as significant characters in the local part. john.smith@outlook.com and johnsmith@outlook.com are two completely different accounts at Microsoft.

Plus-Sign Subaddressing

RFC 5321 defines the concept of subaddressing: a + character in the local part followed by a tag that the receiving server can use for filtering. The tag is ignored for delivery purposes — john+newsletter@gmail.com and john+spam@gmail.com both reach john@gmail.com.

Support for plus aliases varies by provider. Gmail, Outlook, Yahoo, and Fastmail all support it. Some smaller providers do not, and a few older systems incorrectly reject the + as an invalid character. When normalizing for deduplication, stripping everything from + onward in the local part is correct for most major providers.

Case Sensitivity: What the RFC Says vs. Reality

RFC 5321, the standard that governs SMTP, states that the local part of an email address is case-sensitive. Technically, John@example.com and john@example.com could be different mailboxes. The domain part, however, is explicitly case-insensitive per DNS rules.

In practice, virtually every major mail provider treats the local part as case-insensitive. Gmail, Outlook, Yahoo, and Fastmail all normalize to lowercase internally. Because of this near-universal behavior, lowercasing the entire address before normalization is a safe and widely accepted practice for deduplication purposes — even though it is technically not mandated by the RFC.

Provider-Specific Normalization Rules

ProviderStrip dotsStrip plus tagDomain alias
GmailYes (local part only)Yesgooglemail.comgmail.com
Google WorkspaceYesYesNo alias mapping
Outlook / Hotmail / LiveNoYesNo alias mapping
Yahoo MailNoYes (uses - as separator too)No alias mapping
FastmailNoYesNo alias mapping
iCloudNoNome.com, mac.comicloud.com

Yahoo Mail additionally supports a hyphen-based subaddress separator: john-newsletters@yahoo.com routes to john@yahoo.com. This is less commonly normalized because the hyphen is also a valid character inside local parts at other providers.

The Canonicalization Algorithm

A practical normalization function for deduplication applies these steps in order:

  1. Lowercase the entire address.
  2. Split on @ to separate local part and domain.
  3. Map known domain aliases (e.g., googlemail.com to gmail.com).
  4. Strip the plus tag: remove everything from the first + in the local part onward.
  5. For gmail.com (and Google Workspace domains): remove all dots from the local part.
  6. Reassemble as local@domain.
const GMAIL_DOMAINS = new Set(['gmail.com', 'googlemail.com']);

function normalizeEmail(raw) {
  if (!raw || typeof raw !== 'string') return null;

  const trimmed = raw.trim().toLowerCase();
  const atIndex = trimmed.lastIndexOf('@');
  if (atIndex === -1) return null;

  let local = trimmed.slice(0, atIndex);
  let domain = trimmed.slice(atIndex + 1);

  // Normalize domain aliases
  if (domain === 'googlemail.com') domain = 'gmail.com';

  // Strip plus-tag subaddress
  const plusIndex = local.indexOf('+');
  if (plusIndex !== -1) local = local.slice(0, plusIndex);

  // Strip dots for Gmail only
  if (GMAIL_DOMAINS.has(domain)) {
    local = local.replace(/\./g, '');
  }

  // Basic sanity check
  if (!local || !domain.includes('.')) return null;

  return local + '@' + domain;
}

Using lastIndexOf for the @ split handles the rare (but valid) case of a quoted local part containing an @ sign. In practice, if you encounter such addresses, they are unlikely to survive normalization anyway — treat them as edge cases and log them separately.

Why You Should Store Both the Raw and Normalized Address

A common mistake is to store only the normalized form. There are two good reasons not to do this:

  • You send email to the address the user typed. Stripping dots or plus tags before storing means you are sending to a transformed address the user never explicitly gave you. Most providers will still deliver it, but it looks unprofessional and can break allowlisting rules on the recipient's end.
  • Normalization rules can change. Provider behavior is not documented in a stable RFC — it is inferred from observed behavior. If you have stored only the normalized form, you cannot re-derive the original address if the rules change or if you made an error.

The recommended pattern is to store the raw address in a email column and the normalized form in a separate email_normalized column. Apply uniqueness constraints and deduplication checks only against email_normalized, and use email for all outbound communication.

// Database schema pattern (SQL)
// CREATE TABLE users (
//   id         BIGSERIAL PRIMARY KEY,
//   email      TEXT NOT NULL,                -- raw, as the user typed
//   email_norm TEXT NOT NULL UNIQUE,         -- normalized, for dedup
//   created_at TIMESTAMPTZ DEFAULT NOW()
// );

async function createUser(db, rawEmail) {
  const normalized = normalizeEmail(rawEmail);
  if (!normalized) throw new Error('Invalid email address');

  // Check for duplicates using normalized form
  const existing = await db.query(
    'SELECT id FROM users WHERE email_norm = $1',
    [normalized]
  );
  if (existing.rows.length > 0) {
    throw new Error('Account already exists');
  }

  return db.query(
    'INSERT INTO users (email, email_norm) VALUES ($1, $2) RETURNING id',
    [rawEmail.trim(), normalized]
  );
}

Abuse Prevention vs. Privacy Concerns

The primary use case for email normalization is abuse prevention: blocking a single person from creating multiple accounts by cycling through dot variants or plus tags. This is legitimate and common in sign-up flows, trial systems, and coupon redemption.

However, normalization carries a real privacy risk: it can wrongly link two distinct people. Consider a family sharing a Gmail account, or a university where two students coincidentally have addresses that normalize to the same string at different providers. Storing a normalized email as a stable user identity amplifies this risk.

Best practices to mitigate the privacy impact:

  • Use normalization only at the point of entry (sign-up, password reset) — not as a permanent cross-system identifier.
  • When a duplicate is detected, ask the user to sign in to the existing account rather than silently merging or blocking without explanation.
  • Do not share or expose the normalized form outside your system — it is an internal dedup key, not a canonical identity.
  • Be conservative with non-Gmail providers. Only apply dot-stripping for domains where it is documented behavior.

Batch Normalization for Existing Lists

If you are normalizing an existing list of addresses — say, a CRM export or a mailing list — the process is straightforward: run each address through the normalization function and group by the output. For large files, the Text Extractor tool can help you pull raw email addresses out of mixed-content exports before you run them through normalization. See also the Text Extractor guide for patterns on extracting emails from log files and CSVs.

// Batch normalize and deduplicate a list of email addresses
function deduplicateEmails(rawList) {
  const seen = new Map(); // normalized -> first raw address seen

  for (const raw of rawList) {
    const norm = normalizeEmail(raw);
    if (!norm) continue;         // skip invalid addresses
    if (!seen.has(norm)) {
      seen.set(norm, raw.trim()); // keep the first occurrence
    }
  }

  return {
    unique: Array.from(seen.values()),       // raw addresses, deduplicated
    duplicateCount: rawList.length - seen.size,
  };
}

// Example
const list = [
  'John@Gmail.com',
  'j.o.h.n@gmail.com',
  'john+promo@gmail.com',
  'john@googlemail.com',
  'alice@example.com',
];
const result = deduplicateEmails(list);
console.log(result.unique);        // ['John@Gmail.com', 'alice@example.com']
console.log(result.duplicateCount); // 3

Validation Is Not the Same as Normalization

Normalization and validation are separate concerns. Normalization transforms a valid address into a canonical form. Validation checks whether an address is syntactically and deliverably correct — verifying that the domain has MX records, for example. You should validate first, then normalize. Running normalization on a malformed address can produce silently incorrect output.

For syntax validation, a pragmatic JavaScript regex covers the 99% case. For deliverability validation (checking MX records), you need a DNS lookup, which is a server-side operation. Never rely on a client-side check alone for abuse prevention — the client controls their own network and can bypass DNS-based checks.


Email normalization is a small function with outsized impact on data quality and abuse resistance. The Email Normalizer tool applies all of the rules described here — provider-specific dot stripping, plus-tag removal, and domain alias mapping — directly in the browser, with no data leaving your machine.