Data Preparation

BPP connects to your Google BigQuery dataset and reads your first-party data in place — it never duplicates it. The quality of everything downstream (identity resolution, AI model accuracy, audience precision, ad-platform match rates) depends on how well that dataset is structured.

This section is the reference for how to structure the tables BPP reads: what a User table and an Event table must contain, how to map identifiers, the mistakes to avoid, and concrete blueprints for common business types.

:::tip Who this is for You only need this section if you are preparing or reviewing your BigQuery dataset before (or during) onboarding. If your data is already structured and connected, head to the Data Source Manager to map it into the platform. :::

The two table types

BPP organises your data into two logical kinds of table:

Table type	Grain	Purpose
User table	One row per person	Persistent profiles: identifiers and stable attributes (demographics, firmographics, pre-aggregated KPIs).
Event table	One row per event	Historical actions or transactions over time, each with a timestamp and at least one user identifier.

Every table — user or event — must carry at least one user identifier so BPP can resolve it to a unified profile. See Identifiers & Hashing for the full reference.

How BPP uses your data

Reads your User and Event tables directly from BigQuery.
Resolves identities across all sources, unifying each person under a single Bytek ID (written back as a bpp_user_id column). See User Reconciliation and Event Reconciliation.
Enriches the data with AI models (propensity, pcLTV, RFM, interests).
Activates insights by syncing audiences and signals to your ad platforms.

A well-structured dataset is what makes steps 2–4 accurate.

In this section

User Tables — structure, mandatory requirements, and a worked example.
Event Tables — structure, mandatory requirements, best practices, and a worked example.
Identifiers & Hashing — identifier types, PII normalization and hashing rules, and how mapping works.
Common Mistakes — the most frequent data-structuring pitfalls and how to avoid them.
Industry Examples — end-to-end table blueprints for D2C e-commerce, omnichannel retail, B2B SaaS, B2B lead-gen, automotive, and publishers.
AI Model Data Requirements — the input tables each AI model needs to train.

How mapping happens

You don't hand-write any schema files. Once your dataset is structured, you classify tables, label fields, and declare identifiers directly in the UI through the Data Source Manager onboarding flow. This section tells you what the data underneath should look like so that mapping step is straightforward.

The two table types​

How BPP uses your data​

In this section​

How mapping happens​

The two table types

How BPP uses your data

In this section

How mapping happens