Data Preparation
BPP connects to your Google BigQuery dataset and reads your first-party data in place — it never duplicates it. The quality of everything downstream (identity resolution, AI model accuracy, audience precision, ad-platform match rates) depends on how well that dataset is structured.
This section is the reference for how to structure the tables BPP reads: what a User table and an Event table must contain, how to map identifiers, the mistakes to avoid, and concrete blueprints for common business types.
:::tip Who this is for You only need this section if you are preparing or reviewing your BigQuery dataset before (or during) onboarding. If your data is already structured and connected, head to the Data Source Manager to map it into the platform. :::
The two table types
BPP organises your data into two logical kinds of table:
| Table type | Grain | Purpose |
|---|---|---|
| User table | One row per person | Persistent profiles: identifiers and stable attributes (demographics, firmographics, pre-aggregated KPIs). |
| Event table | One row per event | Historical actions or transactions over time, each with a timestamp and at least one user identifier. |
Every table — user or event — must carry at least one user identifier so BPP can resolve it to a unified profile. See Identifiers & Hashing for the full reference.
How BPP uses your data
- Reads your User and Event tables directly from BigQuery.
- Resolves identities across all sources, unifying each person under a single Bytek ID (written back as a
bpp_user_idcolumn). See User Reconciliation and Event Reconciliation. - Enriches the data with AI models (propensity, pcLTV, RFM, interests).
- Activates insights by syncing audiences and signals to your ad platforms.
A well-structured dataset is what makes steps 2–4 accurate.
In this section
- User Tables — structure, mandatory requirements, and a worked example.
- Event Tables — structure, mandatory requirements, best practices, and a worked example.
- Identifiers & Hashing — identifier types, PII normalization and hashing rules, and how mapping works.
- Common Mistakes — the most frequent data-structuring pitfalls and how to avoid them.
- Industry Examples — end-to-end table blueprints for D2C e-commerce, omnichannel retail, B2B SaaS, B2B lead-gen, automotive, and publishers.
- AI Model Data Requirements — the input tables each AI model needs to train.
How mapping happens
You don't hand-write any schema files. Once your dataset is structured, you classify tables, label fields, and declare identifiers directly in the UI through the Data Source Manager onboarding flow. This section tells you what the data underneath should look like so that mapping step is straightforward.