Prerequisites
In order to connect Google BigQuery to Cube, you need to provide service account
credentials. Cube requires the service account to have BigQuery Data Viewer
and BigQuery Job User roles enabled. If you plan to use pre-aggregations,
the account will need the BigQuery Data Editor role instead of BigQuery Data Viewer.
You can learn more about acquiring
Google BigQuery credentials here.
In Cube Cloud, you can authenticate with OIDC workload identity
federation instead of a key file — the same roles
apply to the impersonated service account.
- The Google Cloud Project ID for the BigQuery project
- A set of Google Cloud service credentials which allow access to the BigQuery project
- The Google Cloud region for the BigQuery project
Setup
Manual
Add the following to a.env file in your Cube project:
CUBEJS_DB_BQ_CREDENTIALS:
Cube Cloud
In some cases you’ll need to allow connections from your Cube Cloud deployment
IP address to your database. You can copy the IP address from either the
Database Setup step in deployment creation, or from Settings →
Configuration in your deployment.
OIDC workload identity federation
Instead of a service account key file, Cube Cloud deployments can authenticate to BigQuery with OIDC workload identity federation: a Workload Identity Federation provider in your GCP project trusts Cube’s OIDC issuer, and the driver authenticates through the GCP default credential chain — no JSON key to provision or rotate. Select OIDC workload identity federation in the connection wizard, or set the equivalent environment variables:GCP_SERVICE_ACCOUNT_EMAIL selects the service account Cube impersonates;
leave it unset to authenticate as the federated principal directly. See the
GCP OIDC guide for the Workload Identity Pool,
provider, and IAM setup.
Cube Cloud also supports connecting to data sources within private VPCs
if single-tenant infrastructure is used. Check out the
VPC connectivity guide for details.
Environment Variables
| Environment Variable | Description | Possible Values | Required |
|---|---|---|---|
CUBEJS_DB_BQ_PROJECT_ID | The Google BigQuery project ID to connect to | A valid Google BigQuery Project ID | ✅ |
CUBEJS_DB_BQ_KEY_FILE | The path to a JSON key file for connecting to Google BigQuery | A valid Google BigQuery JSON key file | ✅1 |
CUBEJS_DB_BQ_CREDENTIALS | A Base64 encoded JSON key file for connecting to Google BigQuery | A valid Google BigQuery JSON key file encoded as a Base64 string | ❌ |
CUBEJS_DB_BQ_LOCATION | The Google BigQuery dataset location to connect to. Required if used with pre-aggregations outside of US. If not set then BQ driver will fail with Dataset was not found in location US error | A valid Google BigQuery regional location | ⚠️ |
CUBEJS_DB_EXPORT_BUCKET | The name of a bucket in cloud storage | A valid bucket name from cloud storage | ❌ |
CUBEJS_DB_EXPORT_BUCKET_TYPE | The cloud provider where the bucket is hosted | gcp | ❌ |
CUBEJS_DB_MAX_POOL | The maximum number of concurrent database connections to pool. Default is 40 | A valid number | ❌ |
CUBEJS_CONCURRENCY | The number of concurrent queries to the data source | A valid number | ❌ |
CUBEJS_DB_BQ_CREDENTIALS.
Pre-Aggregation Feature Support
count_distinct_approx
Measures of typecount_distinct_approx can
be used in pre-aggregations when using Google BigQuery as a source database. To
learn more about Google BigQuery’s support for approximate aggregate functions,
click here.
Pre-Aggregation Build Strategies
To learn more about pre-aggregation build strategies, head
here.
| Feature | Works with read-only mode? | Is default? |
|---|---|---|
| Batching | ❌ | ✅ |
| Export Bucket | ❌ | ❌ |
Batching
No extra configuration is required to configure batching for Google BigQuery.Export bucket
Google Cloud Storage
For improved pre-aggregation performance with large datasets, enable export bucket functionality by configuring Cube with the following environment variables:When using an export bucket, remember to assign the BigQuery Data Editor and
Storage Object Admin role to your BigQuery service account.