Overview and Schema - BigQuery

Heap SQL lets you directly access your Heap data using SQL. You can run ad-hoc analyses, connect to BI tools such as Tableau, or join the raw Heap data with your own internal data sources.

Best of all, we automatically keep the SQL data up-to-date and optimize its performance for you. Define an event within the Heap interface, and in just a few hours, you'll be able to query it retroactively in a clean SQL format.

This doc gives an overview of the Heap SQL BigQuery integration.

Setup

In order to start accessing Heap SQL data through BigQuery, you'll require an existing Google Cloud project. After some initial setup of your project, all that needs to be done is to add our Heap service account as a BigQuery user, and share your Project ID with us. All of these steps are detailed in our Connection Requirements document.

Schema

All data in your Heap account is stored under the same GCP project that was specified during setup.

Data in separate environments are stored in different datasets under that project. By default, datasets are named using the project name, then an underscore, followed by the environment name. For instance, each table in your main production environment will be part of the main_production dataset, while the tables in the development environment of a project named "My iOS App" will be in a dataset called my_ios_app_development.

For each environment, Heap will create the following views:

  1. One users view.
  2. One sessions view.
  3. One pageviews view.
  4. One all_events view
  5. One view for each event you've defined in the Heap interface and have syncing to Heap SQL.

Each of the above views (except for all_events) is backed by a "raw" table with name <view_name>_raw. This means that every environment will have both a users view and users_raw table, for example. The views perform deduplication, as the underlying raw tables may have duplicated data introduced during the sync process. Additionally, the users view filters out users that are the from user in an identify call. For that reason, we recommend querying only against the deduplicated views.

These objects have a dynamic schema that auto-update to reflect any new events or properties you've tracked.

In addition to the objects above, the following metadata tables will be created:

  1. One _sync_info table.
  2. One _dropped_tables table.
  3. One user_migrations table which lists all instances of merging and migrating users.

Please note that user migrations are not applied retroactively in BigQuery, so data must be joined against the user_migrations table to reflect any migrations.

Users View

The users view returns a row for each user in the environment. The column schema is:

Column Name Type Description
user_id INTEGER Unique ID of user, randomly generated by Heap.
identity STRING User's username or other unique token, passed via heap.identify API. Must be unique.
handle STRING User's username or other unique token, passed in via heap.addUserProperties API.
email STRING User's email address, passed in via heap.addUserProperties API.
joindate TIMESTAMP UTC timestamp when the user was first seen.
last_modified TIMESTAMP UTC timestamp when the user's data was last modified.
user properties… STRING There will be one column for every unique user property you've sent via the heap.addUserProperties API (name, gender, account status, etc).

Sessions View

For web, a session ends after 30 minutes of inactivity from the user. For iOS, a session ends after your app has entered the background. The sessions view has the following schema:

Column Name Type Library Description
event_id INTEGER All Unique ID of associated session event, randomly generated by Heap used internally by Heap for data syncing.
user_id INTEGER All Unique ID of associated user, randomly generated by Heap.
session_id INTEGER All Unique ID of associated session, randomly generated by Heap.
time TIMESTAMP All UTC timestamp when session started.
library STRING All Version of Heap library which initiated the session. Can be one of "web" or "iOS".
platform STRING Web iOS User's operating system.
device_type STRING Web iOS Device type, which can be one of "Mobile", "Tablet", or "Desktop".
country STRING Web iOS Country in which user session occurred, based on IP.
region STRING Web iOS Region in which user session occurred, based on IP.
city STRING Web iOS City in which user session occurred, based on IP.
IP STRING Web iOS The IP address for the session, which is used for determining geolocation.
referrer STRING Web URL that linked to your site and initiated the session. If the user navigated directly to your site, or referral headers were stripped, then this value will appear as direct.
landing_page STRING Web URL of the first pageview of the session.
browser STRING Web User's browser.
search_keyword STRING Web Search term that brought the user to your site.
utm_source STRING Web GA-based utm_source tag associated with the session's initial pageview.
utm_campaign STRING Web GA-based utm_campaign tag associated with the session's initial pageview.
utm_medium STRING Web GA-based utm_medium tag associated with the session's initial pageview.
utm_term STRING Web GA-based utm_term tag associated with the session's initial pageview.
utm_content STRING Web GA-based utm_content tag associated with the session's initial pageview.
device STRING iOS User's device model.
carrier STRING iOS User's mobile carrier.
app_name STRING iOS Current name of iOS app, as determined by CFBundleName.
app_version STRING iOS Current version of iOS app, as determined by CFBundleShortVersionString.

Pageviews View

Heap will create a view that returns every page view by default.

Column Name Type Library Description
event_id INTEGER All Unique ID of associated pageview event.
user_id INTEGER All Unique ID of associated user, randomly generated by Heap.
session_id INTEGER All Unique ID of associated session, randomly generated by Heap.
session_time TIMESTAMP All Timestamp when session started.
time TIMESTAMP All UTC timestamp when the pageview occurred.
library STRING All Version of Heap library which initiated the session. Can be one of "web" or "iOS".
platform STRING Web iOS User's operating system.
device_type STRING Web iOS Device type, which can be one of "Mobile", "Tablet", or "Desktop".
country STRING Web iOS Country in which user session occurred, based on IP.
region STRING Web iOS Region in which user session occurred, based on IP.
city STRING Web iOS City in which user session occurred, based on IP.
IP STRING Web iOS The IP address for the session, which is used for determining geolocation.
referrer STRING Web URL that linked to your site and initiated the session. If the user navigated directly to your site, or referral headers were stripped, then this value will appear as direct.
landing_page STRING Web URL of the first pageview of the session.
browser STRING Web User's browser.
search_keyword STRING Web Search term that brought the user to your site.
utm_source STRING Web GA-based utm_source tag associated with the session's initial pageview.
utm_campaign STRING Web GA-based utm_campaign tag associated with the session's initial pageview.
utm_medium STRING Web GA-based utm_medium tag associated with the session's initial pageview.
utm_term STRING Web GA-based utm_term tag associated with the session's initial pageview.
utm_content STRING Web GA-based utm_content tag associated with the session's initial pageview.
path STRING Web The path of the pageview.
query STRING Web The query parameters associated with the pageview.
hash STRING Web The hash parameters associated with the pageview
title STRING Web Title of the current page.
device STRING iOS User's device model.
carrier STRING iOS User's mobile phone carrier.
app_name STRING iOS Current name of iOS app, as determined by CFBundleName.
app_version STRING iOS Current version of iOS app, as determined by CFBundleShortVersionString.
view_controller STRING iOS Name of the current view controller.
screen_a11y_id STRING iOS accessibilityIdentifier for the current view controller.
screen_a11y_label STRING iOS accessibilityLabel for the current view controller.

Event Views

Heap will create one view (and corresponding 'raw' table) for every defined event you've chosen to sync.

For each event view, the columns are determined by the version of the Heap library that sent the event, along with any custom properties you may have added via Snapshots or the custom API. For instance, if an event was sent by the Heap iOS library, its BigQuery view will contain iOS-specific columns like app_version or carrier, but not web-specific columns like landing_page or search_keyword. This helps keep your schema as clean as possible.

The name of these views will be the event_name itself and reside under the projectname_environmentname schema. To produce the event_name portion, we strip the event name of any non-alphanumeric characters and snake-case the result. For example, an event named "Sign Up - Click Link (Any)" becomes sign_up_click_link_any. (Note that this means it's possible for events to collide into the same table. Make sure your event names are sufficiently distinct!)

Each of these event views will be fully retroactive, meaning it will contain a row for every occurrence of the event since the day you installed Heap. The column schema is:

Column Name Type Libraries Description
event_id INTEGER All Unique ID of associated event, randomly generated by Heap.
user_id INTEGER All Unique ID of associated user, randomly generated by Heap.
session_id INTEGER All Unique ID of associated session, randomly generated by Heap.
time TIMESTAMP All UTC timestamp when event happened.
session_time TIMESTAMP All Timestamp when session started.
type STRING All For web auto-tracked events, can be any of view page, click, submit, change, with push state events registered as view page events. For iOS auto-tracked events, can be touch, edit field, or a gesture recognizer you've defined. For custom events, this will be the custom event name.
library STRING All Version of Heap library on which event occurred. Can be one of "web", "iOS", or "server".
platform STRING Web iOS User's operating system.
device_type STRING Web iOS Device type, which can be one of "Mobile", "Tablet", or "Desktop".
country STRING Web iOS Country in which user session occurred, based on IP.
region STRING Web iOS Region in which user session occurred, based on IP.
city STRING Web iOS City in which user session occurred, based on IP.
IP STRING Web iOS The IP address for the session, which is used for determining geolocation.
referrer STRING Web URL that linked to your site and initiated the session. If the user navigated directly to your site, or referral headers were stripped, then this value will appear as direct.
landing_page STRING Web URL of the first pageview of the session.
browser STRING Web User's browser.
search_keyword STRING Web Search term that brought the user to your site.
utm_source STRING Web GA-based utm_source tag associated with the session's initial pageview.
utm_campaign STRING Web GA-based utm_campaign tag associated with the session's initial pageview.
utm_medium STRING Web GA-based utm_medium tag associated with the session's initial pageview.
utm_term STRING Web GA-based utm_term tag associated with the session's initial pageview.
utm_content STRING Web GA-based utm_content tag associated with the session's initial pageview.
domain STRING Web Domain including subdomain, e.g. blog.heapanalytics.com.
path STRING Web Portion of the current URL following your domain, e.g. /docs for heapanalytics.com/docs.
hash STRING Web Portion of the current URL following the hash sign, e.g. #install for heapanalytics.com/docs#install.
query STRING Web Query params of the page's current URL, e.g. ?utm_id=1234 for heapanalytics.com?utm_id=1234.
title STRING Web Title of the current page.
href STRING Web href property of link (used for clicks on anchor tags).
device STRING iOS User's device model.
carrier STRING iOS User's mobile phone carrier.
app_name STRING iOS Current name of iOS app, as determined by CFBundleName.
app_version STRING iOS Current version of iOS app, as determined by CFBundleShortVersionString.
action_method STRING iOS Name of the action method triggered by this event, e.g. loginButtonWasPressed.
view_controller STRING iOS Name of the current view controller.
screen_a11y_id STRING iOS accessibilityIdentifier for the current view controller.
screen_a11y_label STRING iOS accessibilityLabel for the current view controller.
target_view_class STRING iOS Underlying class name of an iOS action's target, e.g. UITableCellView.
target_view_name STRING iOS Instance variable name of an iOS action's target, e.g. loginButtonView.
target_a11y_id STRING iOS accessibilityIdentifier of an iOS action's target.
target_a11y_label STRING iOS accessibilityLabel of an iOS action's target.
target_text STRING Web iOS Button text of the event target.
Event properties… STRING All There will be one column for every unique event property you've attached, either via the heap.track API, the addEventProperties API, or Snapshots. The column type is automatically inferred from the underlying property values.

All events view

Heap will create a view, all_events, which contains every event/pageview/session that has been synced to your BigQuery dataset. This view is defined on top of the other event tables in your dataset, and includes an extra column event_view_name that denotes the source table of each row.

In most cases, the schema for the all_events view will consist of the union of all columns from all event tables, with nulls filled in where applicable. However, if you are syncing many tables to your project, this can run into BigQuery query length limits, in which case we'll fall back to one of the following reduced column sets:

  1. All builtin Heap columns - i.e., event_view_name and columns contained in the pageviews/sessions tables
  2. Basic columns - event_view_name, user_id, event_id, session_id, time

If your all_events view is created with a reduced column set, you can join it with the underlying event views to access any property columns that are not available in the view.

Sync Info Table

Heap creates a table called _sync_info that contains metadata around the sync process and last updated times.

Column Name Type Description
event_table_name STRING the event name
sync_started TIMESTAMP a timestamp for when that table began syncing
sync_ended TIMESTAMP a timestamp for when that table completed syncing
synced_to_time TIMESTAMP a timestamp that reflects the most recent occurrence of the event in Heap SQL
inserted_row_count INTEGER the number of rows inserted during the most recent sync

Dropped Tables Table

Heap creates a table called _dropped_tables that contains a record of all data tables and views that were dropped as part of an update.

Column Name Type Description
event_table_name STRING name of table or view dropped
dropped_at TIMESTAMP time of drop

User Migrations Table

When a user is identified, a migration of raw heap data occurs to aggregate the data under the new identity. That migration is recorded in this table. Since migrations are not applied retroactively in BigQuery, queries should join against this user_migrations table for the most accurate results.

Column Name Type Description
from_user_id INTEGER the migrating user's ID
to_user_id INTEGER the destination user's ID

Data Syncing

As you define new events and modify existing events, we seamlessly update your cluster schema. This ensures Heap SQL is always a clean, accurate representation of your raw data.

In particular, the following changes can happen on each sync:

  • When an event is toggled to sync for the first time, we create a new table for that event.
  • When a syncing, defined event is modified in the Heap interface, we tear down the existing table for that event, create a new one in its place, and populate the new table.
  • When a custom event property is seen for the first time, we add a new property column to the corresponding event table.
  • When a custom user property is seen for the first time, we add a new user property column to the users table.
  • When user-level properties are updated with heap.addUserProperties, we update corresponding rows in the users table.
  • When an anonymous user gets identified with an existing identity via heap.identify, we reflect this 'user migration' with a row in the user_migrations describing which user (from_user_id) was migrated where (to_user_id).