B2B Lead Generation Example
This blueprint shows how to model User and Event tables for B2B lead-gen scenarios — lead scoring, Action Prediction, Interest, RFM on deals, audiences, and signals.
1. Identity & IDs
BPP ingests multiple identifiers and maps them to a Bytek ID via identity resolution.
Recommended identifiers to capture
- First-party cookie/device ID from website (e.g. GA4
user_pseudo_id) →fp_cookie_id - Hashed email (Google-style normalization + SHA-256) →
hem - Hashed phone (E.164 + SHA-256) →
hphone - CRM Contact ID (stable, non-PII) →
crm_contact_id - Click IDs on web events (to improve ad match):
gclid,gbraid,wbraid,fbclid
BPP accepts PII clear or hashed. If cleartext is provided (and the column is flagged as PII), BPP automatically trims → lowercases → normalizes → SHA-256 hashes before identity resolution and activation. See Identifiers & Hashing.
2. Required tables (overview)
| Table | Grain | Purpose |
|---|---|---|
crm_contacts | 1 row = 1 contact | User table. Include relevant account fields flattened into the contact (no separate accounts table). |
prospecting_apollo | 1 row = 1 prospect | Optional User table for external enrichment (Apollo or similar). |
web_events | 1 row = 1 web event | Pageviews/forms with absolute URL + attribution & device fields. |
deal_stage_history | 1 row = 1 stage change | Pipeline status changes (append-only). |
email_engagement | 1 row = 1 email event | Opens, clicks, bounces, unsubscribes. |
meetings, calls | 1 row = 1 meeting/call | Sales activities to build behavioural features. |
Partition event tables by timestamp; cluster by user identifiers for performance.
3. User tables
3.1 crm_contacts (User)
One row per contact. Put needed account/company info inside the contact row to keep a pure user grain.
CREATE TABLE my_dataset.crm_contacts (
-- Identifiers
crm_contact_id STRING, -- stable CRM key (required)
hem STRING, -- SHA-256(email) hex lowercase
hphone STRING, -- SHA-256(E.164 phone) hex lowercase
primary_email STRING, -- only if policy allows (else NULL)
fp_cookie_id_primary STRING, -- main first-party cookie/device id
fp_cookie_ids_all ARRAY<STRING>, -- optional: other cookies seen
-- Contact profile
first_name STRING,
last_name STRING,
job_role STRING, -- normalized: "CEO","Marketing Director",...
seniority STRING, -- "C-Level","Director","Manager",...
department STRING, -- "Marketing","Sales","IT",...
phone_country_code STRING, -- e.g. "+39"
language STRING,
-- Company/account fields (flattened into contact)
company_name STRING,
industry STRING, -- normalized taxonomy
company_size_bucket STRING, -- "1-10","11-50","51-200","201-1000","1000+"
annual_revenue_eur FLOAT64,
hq_country STRING,
hq_region STRING,
-- Funnel & ownership
lifecycle_stage STRING, -- "Lead","MQL","SQL","Opportunity","Customer"
lead_source STRING, -- original source (utm_source or CRM field)
lead_medium STRING, -- channel/utm_medium
lead_campaign STRING, -- utm_campaign / CRM campaign
owner_user_id STRING, -- CRM owner / sales rep
-- Consent & compliance
marketing_consent BOOL,
privacy_consent_ts DATETIME,
-- Tech & geo traits (aggregated to user)
dominant_device_class STRING, -- desktop/mobile/tablet
dominant_browser STRING,
primary_country STRING,
primary_region STRING,
-- Aggregated behaviours (precomputed)
sessions_last_30d INT64,
pageviews_last_30d INT64,
form_submits_last_90d INT64,
meetings_last_90d INT64,
calls_last_90d INT64,
days_to_first_meeting INT64, -- first web session -> first meeting
last_activity_ts DATETIME,
-- Housekeeping
created_at DATETIME,
updated_at DATETIME
);
3.2 prospecting_apollo (User, optional)
CREATE TABLE my_dataset.prospecting_apollo (
apollo_contact_id STRING,
hem STRING, -- hashed email to join
hphone STRING, -- hashed phone to join
company_name STRING,
job_role STRING,
seniority STRING,
department STRING,
linkedin_url STRING,
company_size_bucket STRING,
industry STRING,
country STRING,
enrich_score FLOAT64, -- vendor confidence score
updated_at DATETIME
);
4. Event tables
Event rules: one timestamp (UTC), one user identifier (
crm_contact_idorhemorfp_cookie_id).
4.1 web_events (pageviews & forms)
Store absolute URLs to support topic classification (IAB/Custom Interest) and keep attribution.
CREATE TABLE my_dataset.web_events (
event_id STRING,
event_timestamp DATETIME, -- UTC
fp_cookie_id STRING, -- GA4 user_pseudo_id or custom cookie
hem STRING, -- hashed email if identified on session
crm_contact_id STRING, -- if logged-in or matched
event_type STRING, -- 'pageview','form_submit','signup',...
-- Web context
page_url STRING, -- absolute URL (incl. protocol + path + query)
page_title STRING,
referrer STRING,
utm_source STRING,
utm_medium STRING,
utm_campaign STRING,
gclid STRING,
gbraid STRING,
wbraid STRING,
fbclid STRING,
-- Device & geo
device_class STRING, -- desktop/mobile/tablet
browser STRING,
country STRING,
region STRING,
city STRING,
-- Session quality (optional)
session_id STRING,
time_on_page_sec INT64,
scroll_depth_pct INT64,
created_at DATETIME
);
4.2 deal_stage_history (pipeline history)
Append-only log of every stage change — the canonical source for AP targets, RFM on deals, and conversion adjustments.
CREATE TABLE my_dataset.deal_stage_history (
row_id STRING,
event_timestamp DATETIME, -- stage change time (UTC)
crm_contact_id STRING,
deal_id STRING,
old_stage STRING,
new_stage STRING, -- 'Open','Qualification','Negotiation','Closed Won','Closed Lost','Churn','Renew'
pipeline STRING,
deal_amount_eur FLOAT64, -- snapshot at change
currency STRING,
owner_user_id STRING,
source_system STRING, -- HubSpot, SFDC, etc.
created_at DATETIME
);
4.3 email_engagement (marketing automation)
CREATE TABLE my_dataset.email_engagement (
event_id STRING,
event_timestamp DATETIME,
hem STRING,
crm_contact_id STRING,
email_campaign STRING,
event_name STRING, -- 'open','click','bounce','unsubscribe'
link_url STRING, -- for 'click'
ip_country STRING,
user_agent STRING,
created_at DATETIME
);
4.4 meetings & calls (sales activities)
CREATE TABLE my_dataset.meetings (
meeting_id STRING,
event_timestamp DATETIME,
crm_contact_id STRING,
organizer_id STRING,
subject STRING,
duration_min INT64,
outcome STRING, -- 'completed','no_show','rescheduled'
created_at DATETIME
);
CREATE TABLE my_dataset.calls (
call_id STRING,
event_timestamp DATETIME,
crm_contact_id STRING,
direction STRING, -- 'inbound','outbound'
duration_sec INT64,
outcome STRING, -- 'connected','voicemail','missed'
disposition STRING, -- 'interested','not_interested','follow_up'
created_at DATETIME
);
5. Do / Don't (with rationale)
Do
- Use history/event tables for deals; don't overwrite
stage. - Capture absolute
page_url+utm_*+gclid/gbraid/wbraidto enable attribution and Interest. - Store hashed PII (
hem,hphone) and keep consent flags. - Pre-aggregate event features into the user table for modelling.
- Keep UTC across all timestamps; partition by timestamp, cluster by identifiers.
Don't
- Keep only a single
dealssnapshot with mutable status (breaks targets & timing). - Drop URL query parameters (you lose attribution & topic cues).
- Mix user-grain fields into event tables or vice versa.
- Store free-text outcomes only — normalize stages and outcomes.
6. Example: aggregating behavioural features to user grain
-- Pageviews and sessions in the last 30 days per contact
WITH web AS (
SELECT
crm_contact_id,
COUNTIF(event_type = 'pageview') AS pageviews_last_30d,
COUNT(DISTINCT session_id) AS sessions_last_30d,
APPROX_COUNT_DISTINCT(country) AS distinct_countries_last_30d,
APPROX_COUNT_DISTINCT(CONCAT(device_class, '|', browser)) AS distinct_device_browser_last_30d
FROM my_dataset.web_events
WHERE event_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY 1
)
SELECT
c.*,
w.pageviews_last_30d,
w.sessions_last_30d,
w.distinct_countries_last_30d,
w.distinct_device_browser_last_30d
FROM my_dataset.crm_contacts c
LEFT JOIN web w
ON w.crm_contact_id = c.crm_contact_id;
Many of these aggregations can also be computed inside BPP with the Fields Builder instead of in your own ETL.
7. Quality & governance checklist
- IDs present: every event row has ≥1 of (
crm_contact_id,hem,fp_cookie_id). - Timezone: timestamps are UTC; dataset partitioned by event timestamp.
- Hashing: emails/phones normalized + SHA-256; stored hex lowercase.
- Consent:
marketing_consentandprivacy_consent_tstracked on the user table. - Stage taxonomy: controlled vocabulary for
new_stage. - URL integrity:
page_urlis absolute (incl. protocol) with query params retained. - Data freshness: event tables updated at least daily.
8. Why this design fits BPP
- Zero-copy: BPP reads your User/Event tables in place.
- Model readiness: AP targets come from
deal_stage_history; Interest usespage_url; RFM & pcLTV leverage immutable facts. - Activation-ready: hashed identifiers, click IDs, and deal IDs enable high match rates and reliable downstream signaling.