Identifiers & Hashing
Identifiers are the foundation of BPP. They let the platform resolve user identities across multiple data sources and unify events and attributes into a single customer profile — the Bytek ID.
This page is the practical reference for which identifiers to provide, how BPP normalizes and hashes PII, and how identifiers are mapped. For the conceptual walkthrough of how matching and stitching work, see User Reconciliation.
How identifiers map to the Bytek ID
When you map your dataset in the Data Source Manager, each column that represents an identifier is declared with:
- Identifier type — a canonical name (e.g.
email,cookie_id,crm_contact_id). - PII flag — whether the column holds raw personal data.
- Requires hashing — whether BPP should hash it on ingestion.
BPP matches identifiers by their type and value, not by column name. If the same identifier type appears in multiple tables, all of them are linked to the same Bytek ID — even when the underlying columns are named differently.
Example: a column named
hashed_emailinusers_mainand a column namedheminevents_webcan both be declared as identifier type
| table_name | field_name | identifier_type | is_pii |
|---|---|---|---|
| users_main | hashed_email | email | no |
| events_web | hem | email | no |
| events_web | fp_cookie_id | cookie_id | no |
If a single event row contains both hem and fp_cookie_id, both are linked to the same Bytek ID.
:::warning User-level identifiers only
Map only user-level identifiers (email, phone, cookie, CRM contact ID). Entity-level IDs — subscription_id, account_id, order_id, crm_account_id — must not be declared as user identifiers.
:::
Automatic PII normalization and hashing
For identifiers flagged as PII that requires hashing, BPP normalizes and hashes the value during ingestion, so you can provide raw PII safely. BPP never stores plain-text PII.
If a value arrives already hashed at source, BPP detects this and skips re-hashing.
General process
- Trim leading and trailing whitespace.
- Lowercase all text.
- Normalize provider-specific quirks (e.g. Gmail rules).
- Hash with SHA-256 (hex, lowercase).
Per-type rules
- Trim, lowercase.
- For
@gmail.com/@googlemail.com: remove dots (.) from the local part and drop+tagsuffixes (e.g.John.Doe+promo@gmail.com→johndoe@gmail.com). - SHA-256 after normalization.
- You may provide pre-hashed emails (SHA-256) or raw emails — BPP normalizes and hashes automatically.
Phone number
- Remove spaces, parentheses, and dashes.
- Convert to E.164 format (e.g.
+14155552671). - SHA-256 after normalization.
Name + surname
- Lowercase, trim, remove diacritics.
- Concatenate fields (e.g.
john+doe) and apply SHA-256. - Typically used only for offline match or identity enrichment.
Postal address
- Lowercase, remove punctuation, standardize abbreviations (
St→Street). - Concatenate into a single string and apply SHA-256.
Device identifiers (non-PII) — GA Client ID, GA4 user pseudo-ID, first-party cookie, mobile advertising ID (IDFA, GAID)
- Already anonymized; no hashing required. Stored as-is for behavioural analysis and cross-device linking.
Common identifier types
| Identifier type | Example field names | Description | PII | Hashed |
|---|---|---|---|---|
hashed_email | hem, hashed_email, email_hash | SHA-256 hashed user email | No | Yes |
email | email, user_email | Raw user email (BPP will hash) | Yes | No |
hashed_phone | hphone, phone_hash | SHA-256 hashed phone number | No | Yes |
phone | phone, mobile_number | Raw phone number (BPP will hash) | Yes | No |
cookie_id | fp_cookie_id, ga_client_id | First-party cookie / GA identifier | No | No |
device_id | idfa, gaid, device_id | Mobile device / app identifier | No | No |
crm_contact_id | crm_contact_id, hubspot_vid | CRM contact-level identifier | No | No |
domain_id | domain_id | Domain-level ID for web identity | No | No |
The Bytek ID
The Bytek ID is the system-generated, anonymized user key created during identity resolution.
- Each unique person receives one Bytek ID.
- All their sub-identifiers (email, phone, cookie, CRM contact ID, …) link to it.
- When new identifiers appear, the identity graph is updated and merged on the next daily run.
- BPP writes the resolved key back to your warehouse as a
bpp_user_idcolumn on the enriched copy of each table.
This unified key enables consistent joins, aggregations, and model training across time, channels, and systems. See User Reconciliation for merge rules and coverage metrics.
Best practices
- Include at least one stable user identifier in every user and event table.
- Use consistent identifier types across datasets — the same concept must share the same identifier type everywhere.
- Ensure user identifiers form the primary key of the user table: at least one always populated, no duplicates.
- Provide multiple identifiers where possible to maximize match rates.
- Never tag entity-level IDs (
subscription_id,order_id,account_id) as user identifiers. - Hash PII consistently (correct casing, whitespace, Gmail normalization, E.164 phones) so hashes match Google and Meta — or let BPP hash it by flagging the column as PII.
Summary
- BPP performs identity resolution to unify users across sources.
- Each user receives a unique Bytek ID, surfaced as a
bpp_user_idcolumn. - Normalization and hashing keep matching privacy-safe and deterministic.
- Identifiers are mapped in the Data Source Manager UI — by type, PII flag, and hashing — not by column name.