Introduction to Cloud Firestore for SQL Developers (Part 1)

Ezfire

The relational database, such as PostgreSQL and MySQL, has long been, and arguably still is, the most popular choice of database when building web applications. It's maturity, performance and the flexibility of the SQL language coupled with a rich ecosystem of tools and libraries, such as ORMs, make it an obvious choice in many instances.

Relation databases are not without their challenges however. In particular, relational databases can be difficult to scale horizontally, and performance degrades quickly for all but the simplest queries on large datasets. In response, there has been an emergence in recent years of so called non-relational (or NoSQL) databases that use different concepts and tradeoffs to achieve easier scalability. Google Cloud Firestore is one such NoSQL database that focuses on extreme scalability for apps that will grow quickly.

In this article, we will look at Cloud Firestore through the lens of a SQL developer, compare and contrast the difference between the Cloud Firestore data model and the relational data model, and translate some basic concepts and query patterns from SQL into the Firestore world. All Firestore queries included in this article can be run directly on Ezfire by signing up for our app.

But before we get to the queries, what is Cloud Firestore?

What is Cloud Firestore?

Cloud Firestore is a non-relational, schemaless, document-oriented datastore available on Firebase and Google Cloud. In Cloud Firestore, your data is made up of collections of JSON documents, each with a unique id. Cloud Firestore models the relationship between collections and documents very similar to how one would model restful resources, associating with each document a path, e.g. users/user_id. As such, unlike other document stores, in Cloud Firestore, individual documents can have nested collections allowing you to encode "ownership" directly into your data structure. For example, suppose you have a collection of users with a document with id reospeedwagon. Then you can add a collection of addresses for this user nested directly under the user document at the path users/reospeedwagon/addresses.

The focus of the design of Cloud Firestore was on ease of use and scalability. In particular, Cloud Firestore claims that query performance scales with the size of data queried and not the size of the dataset. To achieve this, when querying Firestore, unlike SQL, it is required that all queries have an associated index. This puts some limitations on the kinds of queries that are possible, which we will discuss later.

Comparing Firestore to SQL

To look at the differences between Cloud Firestore and SQL more concretely, lets assume we are building an app, and we have need for some data on countries and all the cities in those countries. The first question we can ask is "How will these two data models differ?"

Data Models

In SQL world, it is obvious that we would need to have 2 tables with 1 foreign key.

-- Example tables for Postgres
CREATE TABLE countries (
  id SERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  alpha_2 TEXT NOT NULL,
  alpha_3 TEXT NOT NULL
);

CREATE TABLE cities (
  id SERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  country_id INTEGER REFERENCES countries(id)
);

In the context of Cloud Firestore, there are a few ways that we could model this same relationship:

Since document stores support nested arrays in the JSON document, we could have one countries collection with all cities included as a nested array.
We can have one root collection, countries, with a nested collection, cities, for each document in the countries collection.
We can have two root collections, countries and cities, with some kind of reference to each city's country as a field in the city's JSON document.

Which approach is the best to choose? Well it all depends on the situation. Nested arrays are a good choice when the cardinality of the data to be nested is low, and is expected to stay low. For example, if we were nesting states instead of cities, an array would be a good choice. However, since the number of cities in a country is in fact quite large, this isn't a good option for our data model.

With option 2, we don't have a requirement of low cardinality like we do with the array solution. However, when using a nested collection, it is required that you identify the parent document when making a query. This imposes a restriction on the way that we query cities that may not satisfy our requirements e.g. maybe we will need to find all cities in the world with populations greater than 100 000.

How about option 3? Conceptually this looks the closest to the tables we've defined in our SQL implementation. Furthermore, Firestore supports a reference data type that allows you to store a reference to a document as a field in another document, very similar to a foreign key. Using TypeScript notation, this data model would look something like:

import { DocumentReference } from "firebase/firestore";

type Country {
  id: string;
  name: string;
  alpha2: string;
  alpha3: string;
}

type City {
  id: string;
  name: string;
  country: DocumentReference;
}

This data model gives us the flexibility we need to reasonably work with the cities collection, it should be the one we pick. Furthermore, using the reference data type in our model makes retrieving the country of the city as easy at the foreign key.

Now, that we how our data model established, how can we actually use it?

Queries

To compare and contrast the querying in Cloud Firestore vs SQL, we need to address the elephant in the room, and that is the question "What is the query language of Cloud Firestore?" Truthfully, Cloud Firestore does not have a natural query language like SQL, but instead has a number of language specific SDKs.

On Ezfire, we allow users to write and execute queries against their production Cloud Firestore databases by exposing the Cloud Firestore Node.js API on an object called db. In this section we will translate some basic SQL queries into Cloud Firestore queries you can run on Ezfire.

Reading Data

So, how does data fetching work? In SQL, we know we just use the SELECT statement like this:

-- Given the id of some country
SELECT * FROM countries WHERE id = <country_id>;

To translate the above SQL query into Firestore language, we would write something like:

db.collection("countries").doc("<country_id>").get();

To break down what this means, the db.collection("countries").doc("<country_id>") portion of the above expression is building a reference to a document with id <country_id> in the collection countries and .get() portion attempts to read that document from the database.

What about if we want to select on something other than the id? Let's say we need to find the document for the United States using is ISO alpha-2 code, US. The corresponding SQL query would look something like:

SELECT * FROM countries WHERE alpha_2 = 'US' LIMIT 1;

We can easily do the same thing in Firestore with:

db.collection("countries").where("alpha2", "==", "US").limit(1).get();

Writing Data

What about if we need to write data to our database? Creating new rows in a SQL table is done with the INSERT statement like so:

INSERT INTO countries (name, alpha_2, alpha_3) VALUES ('United States', 'US', 'USA');

To translate this for Firestore, we have:

db.collection("countries").doc().create({
  name: "United States",
  alpha2: "US",
  alpha3: "USA",
});

In the above query, doc() returns a document reference with a random unique id, and create() attempts to write the given data to that document reference in the database.

Unlike in SQL, there is no way to update multiple documents in a collection with a single query. Updates must be performed be specifying the full document reference, and look like:

db.collection("countries").doc("<id>").update({
  name: "United States of America",
});

This principle also holds true for deletes:

db.collection("countries").doc("<id>").delete();

Conclusion

In this article we introduced Cloud Firestore, and looked at how to translate some basic concepts and queries from the SQL world. In the next article in this series, we will talk about how to handle more complex SQL concepts such as joins and transactions. Thanks for reading!

If you are interested in getting more experience with Cloud Firestore, sign up for our app and try out some of the queries in this article on our sample database.

Tutorial