Interest — Reference

The Interest model identifies the most relevant interests for each customer by analysing labelled events in your data warehouse. It is a descriptive model and supports three interest types:

Product Interest — the interest label is already present in the event row (e.g. a product category). No classification step needed.
IAB Interest — requires a Topic Classification step that scrapes and classifies page content into standard IAB categories.
Custom Interest — also uses Topic Classification, but against a custom taxonomy defined for your business.

Input data: see AI Model Data Requirements → Interest for the event-table fields each interest type needs.

How it works

Input data — reads from an event table. Each row must include at least:
- a user identifier (user_id_column, typically bpp_user_id),
- an event timestamp (date_column),
- the interest field (interest_column) — either the interest label (Product Interest) or the page URL (IAB / Custom Interest).
Interest column
- Product Interest — the interest is read directly from interest_column, e.g. "Shoes||Sportswear||Running" (hierarchical labels joined with ||).
- IAB / Custom Interest — interest_column holds the page URL. The Topic Classification pipeline scrapes and classifies the page, writing the label before analysis.
Aggregation — interests are grouped and counted per user over a configurable lookback window (lookback_days).
Statistical scoring — for each interest, the population mean and standard deviation of its frequency are computed. Each user–interest pair gets a z-score:
```
z = (count - mean) / std
```
Threshold filtering — only interests with a z-score above threshold are kept. A higher threshold keeps only interests that are significantly more frequent than average for that user.
Top-N selection — the top interests per user are retained.
Output — results are written to a dedicated BigQuery table, one row per user with their selected interests as a JSON list.

JSON configuration reference

Field	Type	Required	Default	Description
`data_src.region`	STRING	✅	—	Cloud region of the dataset (auto-populated by the platform)
`data_src.project_id`	STRING	✅	—	GCP project ID (auto-populated)
`data_src.dataset_id`	STRING	✅	—	BigQuery dataset (auto-populated)
`data_src.source_table_id`	STRING	✅	—	Input event table ID (a reconciled `*_bpp` table)
`data_src.date_column`	STRING	❌	`"event_date"`	Column with the event timestamp
`data_src.user_id_column`	STRING	✅	—	User identifier column (typically `bpp_user_id`)
`interest_column`	STRING	✅	—	Field containing the interest label (Product) or page URL (IAB/Custom)
`lookback_days`	INT	❌	`7`	Number of days of event history considered
`threshold`	FLOAT	❌	`0.7`	z-score threshold for selecting interests

Example configuration

{
  "data_src": {
    "region": "europe-west8",
    "dataset_id": "bpp_tables",
    "project_id": "example-project",
    "date_column": "event_date",
    "user_id_column": "bpp_user_id",
    "source_table_id": "event_pageview_bpp"
  },
  "threshold": -0.5,
  "lookback_days": 60,
  "interest_column": "page_title"
}

Output structure

Field	Type	Description
`bpp_user_id`	INT64	Unified Bytek ID of the user
`taxonomy_name`	STRING	Interest classification type: `product`, `iab`, or `custom`
`value`	JSON	List of the user's selected interests
`created_at`	DATETIME	Processing timestamp

These fields are available in the audience builder once the model completes — see Predictions → Interest.

Use cases

Segmentation — build audiences of users with strong affinity for certain products, categories, or topics.
Recommendations — personalise product or content recommendations from dominant interests.
Enrichment — feed interest profiles into your CRM, CDP, or ad platforms for better targeting.

Best practices

Product Interest — ensure the event table has a clean, hierarchical interest_column (use || for multi-level categories).
IAB / Custom Interest — confirm Topic Classification has run before Interest analysis, and that interest_column holds absolute, crawlable URLs (including https://).
Threshold — lower values are more inclusive (broader interest sets); higher values are stricter (only the strongest interests).
Lookback — use longer windows (60–90 days) for long consideration cycles, shorter (7–14 days) for fast-moving products.

How it works​

JSON configuration reference​

Example configuration​

Output structure​

Use cases​

Best practices​