Skip to main content

Publisher Example

In a Publisher / Media Company scenario, BPP integrates newsletter data, paywall subscriptions, and UGC ad postings to power:

  • Action Prediction: likelihood a free reader will subscribe, a subscriber will churn, or a user will post content in the UGC section.
  • pcLTV: predicted revenue from subscriptions + ads + UGC fees.
  • RFM Segmentation: active vs dormant readers/subscribers.
  • Interest Analysis: content-category affinity from reading and browsing.
  • Signals & Audiences: activation of subscribers and readers across ad platforms.

IDs & identity resolution

Publishers typically collect IDs across multiple channels. BPP unifies them under the Bytek ID.

  • Newsletter email (hashed) — key identifier.
  • Hashed phone (optional, for UGC).
  • Paywall subscription ID — platform-native ID.
  • First-party cookie / GA4 ID — online browsing.
  • UGC account ID — for the classifieds section.

See Identifiers & Hashing for normalization rules.


Core tables

1. User table — Newsletter Subscribers

CREATE TABLE my_dataset.newsletter_users (
newsletter_id STRING, -- internal ID
hem STRING, -- hashed email
signup_date DATETIME,
subscription_status STRING, -- active, unsubscribed, bounced
source STRING, -- signup channel
topics JSON, -- declared interests e.g. ["Politics", "Sports"]
last_open DATETIME,
total_opens INT64,
total_clicks INT64,
avg_open_rate FLOAT64,
consent_marketing BOOL,
consent_timestamp DATETIME,
created_at DATETIME,
updated_at DATETIME
);

2. Event table — Paywall Subscriptions

CREATE TABLE my_dataset.paywall_subscriptions (
subscription_id STRING,
event_timestamp DATETIME,
hem STRING,
subscription_status STRING, -- active, cancelled, renewed, expired
plan_type STRING, -- monthly, annual, trial
plan_price FLOAT64,
currency STRING,
payment_method STRING,
cancellation_reason STRING,
renewal_date DATETIME,
created_at DATETIME
);

3. Event table — Content Consumption

CREATE TABLE my_dataset.content_events (
event_id STRING,
event_timestamp DATETIME,
hem STRING,
fp_cookie_id STRING,
event_type STRING, -- pageview, video_play, article_like, comment
page_url STRING, -- absolute URL for Interest analysis
content_category STRING, -- e.g. Politics, Sports, Lifestyle
device_class STRING,
browser STRING,
country STRING,
referrer STRING,
gclid STRING,
created_at DATETIME
);

4. Event table — UGC Ads / Classifieds

CREATE TABLE my_dataset.ugc_ads (
ugc_ad_id STRING,
event_timestamp DATETIME,
ugc_user_id STRING,
hem STRING,
ad_category STRING, -- Cars, Real Estate, Jobs
ad_title STRING,
ad_value FLOAT64,
currency STRING,
ad_status STRING, -- posted, updated, removed
engagement_count INT64, -- views, clicks
created_at DATETIME
);

Basket JSON (optional, for paywall upsells)

If the paywall includes upsells (eBooks, courses), use the basket format:

[
{
"product_id": "EBOOK_2025_01",
"product_name": "AI in Journalism",
"product_category": "Ebooks||Technology",
"product_price": 9.99,
"product_quantity": 1
}
]

Do & don't summary

✅ Do❌ Don't
Track subscription history as eventsOverwrite subscription_status in one row
Store absolute URLs for content eventsDrop attribution params (utm, gclid)
Keep hashed email consistent across newsletter/paywall/UGCHash without normalization
Model UGC postings as eventsJust store "number of ads" per user
Add content_category for interest analysisRely only on titles
Store renewals/cancellations explicitlyOnly track a current subscription flag

How BPP uses these tables

  • Action Prediction — Positive = subscription renewal, ad posting; Negative = cancellation, unsubscribe. Features: open/click rates, content affinity, UGC engagement.
  • pcLTV — Transactions = paywall payments + UGC fees. Value = subscription + upsell spend.
  • RFM — Recency = last payment / newsletter open. Frequency = content views / ads posted. Monetary = total revenue (paywall + UGC).
  • Interest — From content categories (Politics, Sports, …) or via topic classification on page_url.

Why this design works

  • Multiple business lines unified (newsletter, paywall, UGC).
  • Event-based modelling captures real user journeys.
  • Hashed email ensures cross-channel activation.
  • Content categories + URLs power Interest Analysis and segmentation.