Skip to main content

D2C E-commerce Example

In a Direct-to-Consumer (D2C) e-commerce setup, BPP leverages transactional data (orders, returns), behavioural data (pageviews, add-to-cart events), and user attributes (demographics, devices, channels) to train predictive models such as:

  • Action Prediction (likelihood of purchase or repeat purchase)
  • pcLTV (Predicted Customer Lifetime Value)
  • RFM Segmentation
  • Interest Analysis (product/category affinity)

IDs & identity resolution

Every table must contain at least one user identifier. BPP resolves identities across sources to unify them under a Bytek ID.

  • First-party cookie ID (from GA4 or web tracker) → stable browsing ID.
  • Hashed email (SHA-256, lowercase, trimmed, Gmail dots removed) → key for cross-channel match.
  • CRM / e-commerce user ID → platform-native stable identifier.
  • Phone number (hashed SHA-256) when available.

See Identifiers & Hashing for the full normalization rules. If you provide cleartext PII, BPP can lowercase, trim, normalize, and hash it on ingestion when the column is flagged as PII.


Core tables

1. User table — Customers

CREATE TABLE my_dataset.customers (
customer_id STRING, -- platform user ID
hem STRING, -- hashed email
phone_hashed STRING, -- hashed phone
first_seen DATETIME, -- first interaction
last_seen DATETIME, -- last interaction
signup_date DATETIME, -- when the user registered
country STRING, -- country of the customer
city STRING, -- city (optional)
gender STRING, -- demographic attribute
age INT64, -- demographic attribute
acquisition_channel STRING,-- e.g. paid_search, organic, referral
device_class STRING, -- desktop, mobile, tablet
browser STRING, -- browser name
total_orders INT64, -- pre-aggregated feature
avg_order_value FLOAT64, -- pre-aggregated feature
last_order_date DATETIME -- pre-aggregated feature
);

2. Event table — Orders (immutable)

Orders must be immutable. Use either orders_completed (one row per completed order) or order_status_history (one row per status change).

CREATE TABLE my_dataset.orders_completed (
order_id STRING, -- strongly recommended for reconciliation
customer_id STRING, -- join to user
event_timestamp DATETIME, -- order completion time (UTC)
monetary FLOAT64, -- total order value
currency STRING, -- ISO 4217 code
basket JSON, -- products purchased
payment_method STRING, -- credit card, PayPal, etc.
shipping_country STRING, -- country of delivery
transaction_status STRING -- e.g. completed, refunded
);

Order ID is not required by BPP models, but it is strongly recommended for ad-platform integrations (Google/Meta value bidding).

Basket JSON structure

[
{
"product_id": "SKU_12345",
"product_name": "Running Shoes X",
"brand": "Nike",
"product_category": "Shoes||Running",
"product_quantity": 1,
"product_price": 89.99
},
{
"product_id": "SKU_67890",
"product_name": "Training T-Shirt",
"brand": "Adidas",
"product_category": "Apparel||Tops",
"product_quantity": 2,
"product_price": 25.00
}
]
  • Categories can be hierarchical using ||.
  • At least product_id, product_quantity, and product_price are mandatory.

3. Event table — Web behavioural events

CREATE TABLE my_dataset.web_events (
event_id STRING,
event_timestamp DATETIME,
customer_id STRING,
event_type STRING, -- e.g. pageview, add_to_cart, checkout_start
page_url STRING, -- absolute URL, incl. https://
referrer STRING, -- referral URL if available
device_class STRING, -- desktop, mobile, tablet
browser STRING, -- browser name
country STRING,
gclid STRING, -- if from Google Ads click
gbraid STRING, -- if from iOS Safari traffic
wbraid STRING -- if from Android traffic
);

Do

  • Use absolute page URLs for topic/interest classification.
  • Capture click IDs (gclid, wbraid, gbraid) for ad matching.
  • Track device class & browser for behaviour-based features.

Don't

  • Overwrite events — 1 row = 1 event with its own timestamp.
  • Truncate URLs — you'll lose query params critical for attribution.

Do & don't summary

✅ Do❌ Don't
Use immutable facts (orders completed or status history)Overwrite order rows with new statuses
Store transaction_id for reconciliation & adsDrop transaction_id — debugging & bidding will fail
Include absolute URLs in behavioural dataKeep only page titles — topic classification won't work
Capture gclid/gbraid/wbraid click IDsDrop attribution parameters
Hash PII with SHA-256 (Google/Meta rules)Hash incorrectly (wrong casing, whitespace, Gmail dots)

Why this matters

  • Immutable order history keeps pcLTV and RFM calculations accurate.
  • Granular behavioural events power Action Prediction and Interest Analysis.
  • Consistent identifiers (hashed email, cookies, CRM IDs) enable ad-platform matching.
  • Pre-aggregated user features make model training faster and more accurate.