Publisher Example
In a Publisher / Media Company scenario, BPP integrates newsletter data, paywall subscriptions, and UGC ad postings to power:
- Action Prediction: likelihood a free reader will subscribe, a subscriber will churn, or a user will post content in the UGC section.
- pcLTV: predicted revenue from subscriptions + ads + UGC fees.
- RFM Segmentation: active vs dormant readers/subscribers.
- Interest Analysis: content-category affinity from reading and browsing.
- Signals & Audiences: activation of subscribers and readers across ad platforms.
IDs & identity resolution
Publishers typically collect IDs across multiple channels. BPP unifies them under the Bytek ID.
Recommended identifiers
- Newsletter email (hashed) — key identifier.
- Hashed phone (optional, for UGC).
- Paywall subscription ID — platform-native ID.
- First-party cookie / GA4 ID — online browsing.
- UGC account ID — for the classifieds section.
See Identifiers & Hashing for normalization rules.
Core tables
1. User table — Newsletter Subscribers
CREATE TABLE my_dataset.newsletter_users (
newsletter_id STRING, -- internal ID
hem STRING, -- hashed email
signup_date DATETIME,
subscription_status STRING, -- active, unsubscribed, bounced
source STRING, -- signup channel
topics JSON, -- declared interests e.g. ["Politics", "Sports"]
last_open DATETIME,
total_opens INT64,
total_clicks INT64,
avg_open_rate FLOAT64,
consent_marketing BOOL,
consent_timestamp DATETIME,
created_at DATETIME,
updated_at DATETIME
);
2. Event table — Paywall Subscriptions
CREATE TABLE my_dataset.paywall_subscriptions (
subscription_id STRING,
event_timestamp DATETIME,
hem STRING,
subscription_status STRING, -- active, cancelled, renewed, expired
plan_type STRING, -- monthly, annual, trial
plan_price FLOAT64,
currency STRING,
payment_method STRING,
cancellation_reason STRING,
renewal_date DATETIME,
created_at DATETIME
);
3. Event table — Content Consumption
CREATE TABLE my_dataset.content_events (
event_id STRING,
event_timestamp DATETIME,
hem STRING,
fp_cookie_id STRING,
event_type STRING, -- pageview, video_play, article_like, comment
page_url STRING, -- absolute URL for Interest analysis
content_category STRING, -- e.g. Politics, Sports, Lifestyle
device_class STRING,
browser STRING,
country STRING,
referrer STRING,
gclid STRING,
created_at DATETIME
);
4. Event table — UGC Ads / Classifieds
CREATE TABLE my_dataset.ugc_ads (
ugc_ad_id STRING,
event_timestamp DATETIME,
ugc_user_id STRING,
hem STRING,
ad_category STRING, -- Cars, Real Estate, Jobs
ad_title STRING,
ad_value FLOAT64,
currency STRING,
ad_status STRING, -- posted, updated, removed
engagement_count INT64, -- views, clicks
created_at DATETIME
);
Basket JSON (optional, for paywall upsells)
If the paywall includes upsells (eBooks, courses), use the basket format:
[
{
"product_id": "EBOOK_2025_01",
"product_name": "AI in Journalism",
"product_category": "Ebooks||Technology",
"product_price": 9.99,
"product_quantity": 1
}
]
Do & don't summary
| ✅ Do | ❌ Don't |
|---|---|
| Track subscription history as events | Overwrite subscription_status in one row |
| Store absolute URLs for content events | Drop attribution params (utm, gclid) |
| Keep hashed email consistent across newsletter/paywall/UGC | Hash without normalization |
| Model UGC postings as events | Just store "number of ads" per user |
| Add content_category for interest analysis | Rely only on titles |
| Store renewals/cancellations explicitly | Only track a current subscription flag |
How BPP uses these tables
- Action Prediction — Positive = subscription renewal, ad posting; Negative = cancellation, unsubscribe. Features: open/click rates, content affinity, UGC engagement.
- pcLTV — Transactions = paywall payments + UGC fees. Value = subscription + upsell spend.
- RFM — Recency = last payment / newsletter open. Frequency = content views / ads posted. Monetary = total revenue (paywall + UGC).
- Interest — From content categories (
Politics,Sports, …) or via topic classification onpage_url.
Why this design works
- Multiple business lines unified (newsletter, paywall, UGC).
- Event-based modelling captures real user journeys.
- Hashed email ensures cross-channel activation.
- Content categories + URLs power Interest Analysis and segmentation.