Interest — Reference
The Interest model identifies the most relevant interests for each customer by analysing labelled events in your data warehouse. It is a descriptive model and supports three interest types:
- Product Interest — the interest label is already present in the event row (e.g. a product category). No classification step needed.
- IAB Interest — requires a Topic Classification step that scrapes and classifies page content into standard IAB categories.
- Custom Interest — also uses Topic Classification, but against a custom taxonomy defined for your business.
Input data: see AI Model Data Requirements → Interest for the event-table fields each interest type needs.
How it works
-
Input data — reads from an event table. Each row must include at least:
- a user identifier (
user_id_column, typicallybpp_user_id), - an event timestamp (
date_column), - the interest field (
interest_column) — either the interest label (Product Interest) or the page URL (IAB / Custom Interest).
- a user identifier (
-
Interest column
- Product Interest — the interest is read directly from
interest_column, e.g."Shoes||Sportswear||Running"(hierarchical labels joined with||). - IAB / Custom Interest —
interest_columnholds the page URL. The Topic Classification pipeline scrapes and classifies the page, writing the label before analysis.
- Product Interest — the interest is read directly from
-
Aggregation — interests are grouped and counted per user over a configurable lookback window (
lookback_days). -
Statistical scoring — for each interest, the population mean and standard deviation of its frequency are computed. Each user–interest pair gets a z-score:
z = (count - mean) / std -
Threshold filtering — only interests with a z-score above
thresholdare kept. A higher threshold keeps only interests that are significantly more frequent than average for that user. -
Top-N selection — the top interests per user are retained.
-
Output — results are written to a dedicated BigQuery table, one row per user with their selected interests as a JSON list.
JSON configuration reference
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
data_src.region | STRING | ✅ | — | Cloud region of the dataset (auto-populated by the platform) |
data_src.project_id | STRING | ✅ | — | GCP project ID (auto-populated) |
data_src.dataset_id | STRING | ✅ | — | BigQuery dataset (auto-populated) |
data_src.source_table_id | STRING | ✅ | — | Input event table ID (a reconciled *_bpp table) |
data_src.date_column | STRING | ❌ | "event_date" | Column with the event timestamp |
data_src.user_id_column | STRING | ✅ | — | User identifier column (typically bpp_user_id) |
interest_column | STRING | ✅ | — | Field containing the interest label (Product) or page URL (IAB/Custom) |
lookback_days | INT | ❌ | 7 | Number of days of event history considered |
threshold | FLOAT | ❌ | 0.7 | z-score threshold for selecting interests |
Example configuration
{
"data_src": {
"region": "europe-west8",
"dataset_id": "bpp_tables",
"project_id": "example-project",
"date_column": "event_date",
"user_id_column": "bpp_user_id",
"source_table_id": "event_pageview_bpp"
},
"threshold": -0.5,
"lookback_days": 60,
"interest_column": "page_title"
}
Output structure
| Field | Type | Description |
|---|---|---|
bpp_user_id | INT64 | Unified Bytek ID of the user |
taxonomy_name | STRING | Interest classification type: product, iab, or custom |
value | JSON | List of the user's selected interests |
created_at | DATETIME | Processing timestamp |
These fields are available in the audience builder once the model completes — see Predictions → Interest.
Use cases
- Segmentation — build audiences of users with strong affinity for certain products, categories, or topics.
- Recommendations — personalise product or content recommendations from dominant interests.
- Enrichment — feed interest profiles into your CRM, CDP, or ad platforms for better targeting.
Best practices
- Product Interest — ensure the event table has a clean, hierarchical
interest_column(use||for multi-level categories). - IAB / Custom Interest — confirm Topic Classification has run before Interest analysis, and that
interest_columnholds absolute, crawlable URLs (includinghttps://). - Threshold — lower values are more inclusive (broader interest sets); higher values are stricter (only the strongest interests).
- Lookback — use longer windows (60–90 days) for long consideration cycles, shorter (7–14 days) for fast-moving products.