Skip to main content

B2B Lead Generation Example

This blueprint shows how to model User and Event tables for B2B lead-gen scenarios — lead scoring, Action Prediction, Interest, RFM on deals, audiences, and signals.


1. Identity & IDs

BPP ingests multiple identifiers and maps them to a Bytek ID via identity resolution.

Recommended identifiers to capture

  • First-party cookie/device ID from website (e.g. GA4 user_pseudo_id) → fp_cookie_id
  • Hashed email (Google-style normalization + SHA-256) → hem
  • Hashed phone (E.164 + SHA-256) → hphone
  • CRM Contact ID (stable, non-PII) → crm_contact_id
  • Click IDs on web events (to improve ad match): gclid, gbraid, wbraid, fbclid

BPP accepts PII clear or hashed. If cleartext is provided (and the column is flagged as PII), BPP automatically trims → lowercases → normalizes → SHA-256 hashes before identity resolution and activation. See Identifiers & Hashing.


2. Required tables (overview)

TableGrainPurpose
crm_contacts1 row = 1 contactUser table. Include relevant account fields flattened into the contact (no separate accounts table).
prospecting_apollo1 row = 1 prospectOptional User table for external enrichment (Apollo or similar).
web_events1 row = 1 web eventPageviews/forms with absolute URL + attribution & device fields.
deal_stage_history1 row = 1 stage changePipeline status changes (append-only).
email_engagement1 row = 1 email eventOpens, clicks, bounces, unsubscribes.
meetings, calls1 row = 1 meeting/callSales activities to build behavioural features.

Partition event tables by timestamp; cluster by user identifiers for performance.


3. User tables

3.1 crm_contacts (User)

One row per contact. Put needed account/company info inside the contact row to keep a pure user grain.

CREATE TABLE my_dataset.crm_contacts (
-- Identifiers
crm_contact_id STRING, -- stable CRM key (required)
hem STRING, -- SHA-256(email) hex lowercase
hphone STRING, -- SHA-256(E.164 phone) hex lowercase
primary_email STRING, -- only if policy allows (else NULL)
fp_cookie_id_primary STRING, -- main first-party cookie/device id
fp_cookie_ids_all ARRAY<STRING>, -- optional: other cookies seen

-- Contact profile
first_name STRING,
last_name STRING,
job_role STRING, -- normalized: "CEO","Marketing Director",...
seniority STRING, -- "C-Level","Director","Manager",...
department STRING, -- "Marketing","Sales","IT",...
phone_country_code STRING, -- e.g. "+39"
language STRING,

-- Company/account fields (flattened into contact)
company_name STRING,
industry STRING, -- normalized taxonomy
company_size_bucket STRING, -- "1-10","11-50","51-200","201-1000","1000+"
annual_revenue_eur FLOAT64,
hq_country STRING,
hq_region STRING,

-- Funnel & ownership
lifecycle_stage STRING, -- "Lead","MQL","SQL","Opportunity","Customer"
lead_source STRING, -- original source (utm_source or CRM field)
lead_medium STRING, -- channel/utm_medium
lead_campaign STRING, -- utm_campaign / CRM campaign
owner_user_id STRING, -- CRM owner / sales rep

-- Consent & compliance
marketing_consent BOOL,
privacy_consent_ts DATETIME,

-- Tech & geo traits (aggregated to user)
dominant_device_class STRING, -- desktop/mobile/tablet
dominant_browser STRING,
primary_country STRING,
primary_region STRING,

-- Aggregated behaviours (precomputed)
sessions_last_30d INT64,
pageviews_last_30d INT64,
form_submits_last_90d INT64,
meetings_last_90d INT64,
calls_last_90d INT64,
days_to_first_meeting INT64, -- first web session -> first meeting
last_activity_ts DATETIME,

-- Housekeeping
created_at DATETIME,
updated_at DATETIME
);

3.2 prospecting_apollo (User, optional)

CREATE TABLE my_dataset.prospecting_apollo (
apollo_contact_id STRING,
hem STRING, -- hashed email to join
hphone STRING, -- hashed phone to join
company_name STRING,
job_role STRING,
seniority STRING,
department STRING,
linkedin_url STRING,
company_size_bucket STRING,
industry STRING,
country STRING,
enrich_score FLOAT64, -- vendor confidence score
updated_at DATETIME
);

4. Event tables

Event rules: one timestamp (UTC), one user identifier (crm_contact_id or hem or fp_cookie_id).

4.1 web_events (pageviews & forms)

Store absolute URLs to support topic classification (IAB/Custom Interest) and keep attribution.

CREATE TABLE my_dataset.web_events (
event_id STRING,
event_timestamp DATETIME, -- UTC
fp_cookie_id STRING, -- GA4 user_pseudo_id or custom cookie
hem STRING, -- hashed email if identified on session
crm_contact_id STRING, -- if logged-in or matched
event_type STRING, -- 'pageview','form_submit','signup',...

-- Web context
page_url STRING, -- absolute URL (incl. protocol + path + query)
page_title STRING,
referrer STRING,
utm_source STRING,
utm_medium STRING,
utm_campaign STRING,
gclid STRING,
gbraid STRING,
wbraid STRING,
fbclid STRING,

-- Device & geo
device_class STRING, -- desktop/mobile/tablet
browser STRING,
country STRING,
region STRING,
city STRING,

-- Session quality (optional)
session_id STRING,
time_on_page_sec INT64,
scroll_depth_pct INT64,

created_at DATETIME
);

4.2 deal_stage_history (pipeline history)

Append-only log of every stage change — the canonical source for AP targets, RFM on deals, and conversion adjustments.

CREATE TABLE my_dataset.deal_stage_history (
row_id STRING,
event_timestamp DATETIME, -- stage change time (UTC)
crm_contact_id STRING,
deal_id STRING,
old_stage STRING,
new_stage STRING, -- 'Open','Qualification','Negotiation','Closed Won','Closed Lost','Churn','Renew'
pipeline STRING,
deal_amount_eur FLOAT64, -- snapshot at change
currency STRING,
owner_user_id STRING,
source_system STRING, -- HubSpot, SFDC, etc.
created_at DATETIME
);

4.3 email_engagement (marketing automation)

CREATE TABLE my_dataset.email_engagement (
event_id STRING,
event_timestamp DATETIME,
hem STRING,
crm_contact_id STRING,
email_campaign STRING,
event_name STRING, -- 'open','click','bounce','unsubscribe'
link_url STRING, -- for 'click'
ip_country STRING,
user_agent STRING,
created_at DATETIME
);

4.4 meetings & calls (sales activities)

CREATE TABLE my_dataset.meetings (
meeting_id STRING,
event_timestamp DATETIME,
crm_contact_id STRING,
organizer_id STRING,
subject STRING,
duration_min INT64,
outcome STRING, -- 'completed','no_show','rescheduled'
created_at DATETIME
);

CREATE TABLE my_dataset.calls (
call_id STRING,
event_timestamp DATETIME,
crm_contact_id STRING,
direction STRING, -- 'inbound','outbound'
duration_sec INT64,
outcome STRING, -- 'connected','voicemail','missed'
disposition STRING, -- 'interested','not_interested','follow_up'
created_at DATETIME
);

5. Do / Don't (with rationale)

Do

  • Use history/event tables for deals; don't overwrite stage.
  • Capture absolute page_url + utm_* + gclid/gbraid/wbraid to enable attribution and Interest.
  • Store hashed PII (hem, hphone) and keep consent flags.
  • Pre-aggregate event features into the user table for modelling.
  • Keep UTC across all timestamps; partition by timestamp, cluster by identifiers.

Don't

  • Keep only a single deals snapshot with mutable status (breaks targets & timing).
  • Drop URL query parameters (you lose attribution & topic cues).
  • Mix user-grain fields into event tables or vice versa.
  • Store free-text outcomes only — normalize stages and outcomes.

6. Example: aggregating behavioural features to user grain

-- Pageviews and sessions in the last 30 days per contact
WITH web AS (
SELECT
crm_contact_id,
COUNTIF(event_type = 'pageview') AS pageviews_last_30d,
COUNT(DISTINCT session_id) AS sessions_last_30d,
APPROX_COUNT_DISTINCT(country) AS distinct_countries_last_30d,
APPROX_COUNT_DISTINCT(CONCAT(device_class, '|', browser)) AS distinct_device_browser_last_30d
FROM my_dataset.web_events
WHERE event_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY 1
)
SELECT
c.*,
w.pageviews_last_30d,
w.sessions_last_30d,
w.distinct_countries_last_30d,
w.distinct_device_browser_last_30d
FROM my_dataset.crm_contacts c
LEFT JOIN web w
ON w.crm_contact_id = c.crm_contact_id;

Many of these aggregations can also be computed inside BPP with the Fields Builder instead of in your own ETL.


7. Quality & governance checklist

  • IDs present: every event row has ≥1 of (crm_contact_id, hem, fp_cookie_id).
  • Timezone: timestamps are UTC; dataset partitioned by event timestamp.
  • Hashing: emails/phones normalized + SHA-256; stored hex lowercase.
  • Consent: marketing_consent and privacy_consent_ts tracked on the user table.
  • Stage taxonomy: controlled vocabulary for new_stage.
  • URL integrity: page_url is absolute (incl. protocol) with query params retained.
  • Data freshness: event tables updated at least daily.

8. Why this design fits BPP

  • Zero-copy: BPP reads your User/Event tables in place.
  • Model readiness: AP targets come from deal_stage_history; Interest uses page_url; RFM & pcLTV leverage immutable facts.
  • Activation-ready: hashed identifiers, click IDs, and deal IDs enable high match rates and reliable downstream signaling.