How to configure the “Export Collections to BigQuery” extension and use SQL “full” power on your Firestore data

Published in

Firebase Tips & Tricks

8 min readJul 30, 2021

Few months ago, Firebase published an Extension that allows exporting the documents in a Cloud Firestore collection to BigQuery, the serverless, highly scalable, enterprise data warehouse offered by Google Cloud.

Being able to export your Firestore data to BigQuery opens many possibilities in terms of data transformation, analysis, visualization, and reporting with business intelligence tools like Google Data Studio or QlikView. As a matter of fact, BigQuery supports a standard SQL dialect that is ANSI:2011 compliant.

Also very interesting, you can easily cross your Firestore data with other data sources, whether it is other Firestore databases, data sources in object storage (Cloud Storage) or transactional databases like Cloud SQL.

And you can as well generate real time insights for your users, manipulate data in Google Sheets without requiring SQL knowledge or bring ML to your data with BigQuery ML, all of that from your Firestore data imported to BigQuery.

Possibilities are endless…

With the “Export Collections to BigQuery” extension, you can send realtime, incremental updates from any Firestore collection to BigQuery. It will listen for document changes in your specified Firestore collection, then exports the changes into BigQuery. In other words, the data in BigQuery is a mirror of your content in Firestore.

Since an extension is a packaged solution, all you have to do is install and configure the extension. Let’s see in detail how to do that.

The easiest way to install and manage the extension is the Firebase console¹.

In the console, click the “Extensions” menu item at the bottom of the vertical menu of the Firebase console. The Extensions page opens: find the “Export Collections to BigQuery” card and click on “Install”. The configuration panel opens. It is divided in 4 steps, that we describe below:

Step 1 - Review APIs enabled and resources created: basically this is an “information” steps. It indicates the resources to be enabled. Just click “Next”.
Step 2 - Review billing and usage: Another “information” step about billing. Click “Next”.
Step 3 - Review access granted to this extension: Again, this is an “information” step. Click “Next”.
Step 4 - Configure extension: OK, here starts the extension configuration! As shown in the image below there are several elements you need to configure.

For the rest of this article, we will take the example of a Firestore database containing orders: We will mirror an orders collection which contains documents with three fields: date (Timestamp), amount and country (as ISO Country codes, like BE, ES or GB). Let’s start the configuration!

You first need to choose the locations (i.e. regions) for the Cloud Function and the BigQuery Dataset. If possible choose the same regions for these two services as well as for the Firestore database, in order to minimize billing costs. If not possible, try to choose the closest regions.
Then you have to configure the path (or name) of the Firestore collection that will be mirrored in BigQuery. If you want to mirror a subcollection, use the {wildcard} notation like products/{productId}/orders. In our case we select the orders collection, as explained above.
After configuring the collection, you need to enter the desired BigQuery dataset. As explained in the BigQuery doc, datasets are top-level containers that are used to organize and control access to your tables and views. We choose firebase_orders for our example.
Next, you need to enter the Table ID, which is the prefix that will be used for the corresponding table and view inside the BigQuery dataset. We choose to use orders: we will see later how this appear in the BigQuery console.
Finally, you must decide if you want to partition the BigQuery table and BigQuery view created by the extension. We will select “none” for the moment but at the end of the article we explain this option.
So, now that we have completed the extension config, click on the “Install Extension” button. After few minutes the extension is installed and it is time to look at how it works!

Let’s test the extension: In Firestore, we create few documents in the orders collection as shown below.

“orders” Collection in the Firestore database

As soon as we create the first document in the collection, the extension creates the firebase_orders dataset in BigQuery with two resources:

A table of raw data that stores a full change history of the documents within the collection. This table includes a number of metadata fields so that BigQuery can display the current state of the data. The main metadata fields are timestamp, document_name, and the operation for the document change. Note that the table is named orders_raw_changelog using the prefix we configured before.
A view, named orders_raw_latest, which represents the current state of the data within the collection.

The below image shows the BigQuery console which is a part of the Google Cloud Console (not of the Firebase console). Use the following URL to open the BigQuery console: https://console.cloud.google.com/bigquery?project=<your-project-id>.

BigQuery console (SQL workspace) in the Google Cloud Console

We can see the firebase_orders dataset (in the red rectangle) with its table and view (in the green rectangle). On the right pane we see the schema of the orders_raw_latest view (in the purple rectangle). This view mirrors our Firestore orders collection.

We can note that the schema contains several fields, likedocument_name, document_id and data. The data field is actually a JSON representation of the current document state.

Let’s use the “QUERY” button at the top to query this BigTable view. A SELECT * FROM SQL query returns two rows, corresponding to the two documents in the Firestore database, as shown below:

If you execute the SELECT data FROM ... SQL query you’ll get, for each Firestore doc, its JSON representation, as shown below:

Having the Firestore data as JSON is not very handy if we want to query individual fields in order to get the full power of the BigQuery SQL querying capabilities!

Hopefully there are several options to solve that.

A first option is to use SQL JSON functions as follows, for example:

From there you can start using SQL aggregate functions, like the following example which uses SUM().

It starts to be really interesting! Doing such aggregate calculations within Firestore, a NoSQL database, is not straightforward.

The classical way with Firestore is to maintain counters, which can rapidly become cumbersome if you need to aggregate over several dimensions (e.g. if you want to simulate the use of SUM() or COUNT() in conjunction with a GROUP BY clause). By mirroring your data to BigQuery you can very easily get advantage of the SQL capabilities.

Another possibility to directly access the individual fields of the Firestore documents is to use the dedicated script provided with the Extension.

This script simply creates a BigQuery view, based on a JSON schema configuration file that you provide, by using the exact same BigQuery’s built-in JSON functions that we used above to extract the content from the JSON representation of the Firestore document (stored in the data field of the orders_raw_latest view).

I’m not going to reproduce here the detailed instructions that we find in the documentation. Just note that it is an easy three step consisting in creating a “schema file” in JSON format, setting up the necessary credentials and running the script.

For setting up the credentials, I recommend using the gcloud CLI: It requires installing the Google Cloud SDK but you will avoid creating a new service account in your Google Cloud project.

Let’s try the script with the following schema file:

After running the fs-bq-import-collection script with this schema file, we can see in the BigQuery console two new views, as shown below (in the red rectangle).

We can also see that the schema of the orders_schema_orders_schema_latest view contains the three fields defined in the schema file (blue rectangle) and that we can build an SQL query with these fields (green rectangles).

Three more things to be noted, before the end of this article:

If you want to export different collections you need to install several instances of the extension in your Firebase project². Each installed instance will have its own customized configuration. You can even declare multiple instances that export to the same dataset and table, in such a way that you mix the data from several Firestore collections in the same BigQuery table (and view).
As mentioned above, when you configure the extension, you can partition the BigQuery table and view created by the extension. A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data and improves query performance. When configuring the extension, you can select the granularity of partitioning based upon one of: HOUR, DAY, MONTH, YEAR. This will generate one partition per day, hour, month or year, respectively. More details on partitioned tables here in the BigQuery doc.
After being configured, the extension will only export to BigQuery the Firestore documents that are created or changed. It will not export the existing documents (i.e. the ones that were created before the extension installation). To “backfill” your BigQuery dataset with all the documents in the collection, you can run a specific import script provided with the extension. See the doc for more details.

That’s it! Enjoy querying your Firestore documents via SQL and making your end users happy with, for example, real time insights, easy data manipulation in Google Sheets or visualization via tools like QlikView!

[1] You can also use the Firebase Command Line Interface (CLI), see the extension doc.

[2] Remember, as explained above, if you want to export a subcollection, use the {wildcard} notation.

How to configure the “Export Collections to BigQuery” extension and use SQL “full” power on your Firestore data

Written by Renaud Tarnec