Introduction to Cloud Firestore for SQL Developers (Part 2)

Ezfire

In the previous article in this series, we started looking at the NoSQL database Cloud Firestore from the perspective of a SQL developer. We introduced the basics of data modeling with Cloud Firestore, contrasted that with relational databases, and translated the most basic SQL queries into queries for Cloud Firestore. If you have yet read Part 1, I would encourage you to do so first.

In this article, we are going to look at some more advanced, but fundamental, SQL concepts and show how you can translate them into the Cloud Firestore world.

As in the previous article, all Firestore queries included in this article can be run directly on Ezfire by signing up for our app.

But with that out of the way, let's take a look at one of the most fundamental relation concepts, joins.

Joins

Joins are a fundamental part of working with relational databases. They allow the user to write queries that span multiple tables by stitching that tables together based on the constraints of the queries. Joins make it easy to pull related sets of data together in a single query, such as a user and all their posts:

SELECT * FROM users INNER JOIN posts ON users.id = posts.user_id WHERE users.id = $1;

Document stores, such as Firestore, don't often support join like operations at the database level. You instead need to perform the join manually at the application level. This looks something like this:

Promise.all([
  db.collection("users").doc(userId).get(),
  db.collection("posts").where("userId", "==", userId).get(),
]);

Things become a little bit more complicated if you need to perform multiple joins. For example if we wanted to load a user, all their posts and the associated comments we could write in SQL:

SELECT * FROM users
  INNER JOIN posts ON users.id = posts.user_id
  INNER JOIN comments ON posts.id = comments.post_id
  WHERE users.id = $1;

Querying data in this way is just not as easy in Cloud Firestore as it is in SQL. To do this with Cloud Firestore, you need perform an additional query for each individual post to load the comments for that post. The creates a lot of network traffic and is understandably quite slow.

So what is the alternative? Well if we remember that each document in our posts collection is an arbitrary JSON document, we'll notice that we can add an array of comments directly to our post document, so that every time you load the post, you have all the comments. Altering our data model in this way eliminates the need for another join at the cost of adding potentially a lot more data to the post document.

What is important to note here however is that neither of these alternatives are fundamentally the "right" way to do things. What method works best will depend largely on your application and how you use your data.

Ordering

When writing queries with SQL, we can sort the results of a query in ascending or descending order using the ORDER BY keyword. Ordered queries look something like this:

-- Fetch a user's posts sorted newest to oldest.
SELECT * posts WHERE posts.user_id = $1 ORDER BY created_at DESC;

-- Fetch a user's posts sorted oldest to newest.
SELECT * posts WHERE posts.user_id = $1 ORDER BY created_at ASC;

You can sort results in Cloud Firestore in a very similar way using the orderBy method. Below is an example of this in action:

// Fetch a user's posts sorted newest to oldest.
db.collection("posts").where("userId", "==" userId).orderBy("createdAt", "desc").get();

// Fetch a user's posts sorted oldest to newest.
db.collection("posts").where("userId", "==" userId).orderBy("createdAt", "asc").get();

One caveat in the case of Cloud Firestore is that to sort a query on a given field, you need to have an index on your database that supports that sort order. For example, to run the queries above, you would need two composite indexes on userId and createdAt. One must be ascending on createdAt and the other descending on createdAt.

Transactions

When performing multiple reads and writes with a relational database, we use a transaction to maintain consistency between those operations. The transaction ensures that all operations within the transaction happen atomically, that is, either all operations succeed or none of them succeed. This allows you to enforce consistent modifications to your data by ensuring that related mutations are applied together.

The canonical example of this is transferring money from one account to another. For this to work correctly, you would need to subtract the transferred amount from the first account and add it to the second. If one of these operations succeeds but the other fails, you will have either created or destroyed some money in your system, and left it in an inconsistent state. That is just not good.

Below is an example of the SQL you would write to perform the above transfer:

BEGIN TRANSACTION;

UPDATE accounts SET amount = amount - 100 WHERE id = $1;

UPDATE accounts SET amount = amount + 100 WHERE id = $2;

COMMIT TRANSACTION;

Cloud Firestore also supports these kinds of ACID transactions with a couple caveats:

Each transaction can only perform a maximum of 500 operations including reads and writes.
All reads in the transactions must happen before any writes.

To translate the above SQL query into Firestore, we can use the runTransaction function.

db.runTransaction(async tx => {
  const account1 = db.collection("accounts").doc("1");
  const account2 = db.collection("accounts").doc("2");
  tx.update(account1, { amount: FieldValue.increment(-100) });
  tx.update(account2, { amount: FieldValue.increment(100) });
});

The runTransaction function returns a promise that resolves when the transaction is applied successfully or rejects if the transaction fails.

Let's take a look at a more complex example. Sometimes we need to load data first in our transaction to avoid conflicting writes happening at the same time. Take for example the case above where you have an array of comments in each one of your posts. Each time a user comments on the post, you need to add that comment to the array of comments in the post, and save the post document. Now what if two users are writing comments and submit them concurrently? If we don't first read the post in our transaction, its possible that one of these two comments will be missing from the final state, as the two updates submitted weren't aware of the others changes.

When using transactions, during the commit phase, Firestore will check if any of the documents read in the transaction were modified during the time that the transaction was taking place. If any were modified, they whole transaction block will be ran again to ensure a result consistent with what is in the database.

As an example of how to make use of this, consider the following query:

db.runTransaction(async tx => {
  const ref = db.collection("posts").doc("1");
  const post = (await tx.get(ref)).data();
  post.comments = [...post.comments, { text: "This is a good post" }];
  tx.set(ref, post);
});

In the above query, we first read the post from our transaction, modify the comments array by adding a new comment, and save this very same post once again. Doing modifications to your documents following this pattern will ensure that you do not have a problem with concurrent writes.

Conclusion

In this article we took at some more advanced fundamental SQL concepts and how they translate to the Cloud Firestore world. In some cases the translations were fairly straightforward, like transactions which are full supported in Firestore. In others, such as join, we needed to be a bit more clever about how we solved the problem. Thanks for reading, I hope you learned something useful.

If you are interested in getting more experience with Cloud Firestore, sign up for our app and try out some of the queries in this article on our sample database.

Tutorial

Introduction to Cloud Firestore for SQL Developers (Part 2)

In this article we will discuss how to translate some advanced fundamentals of SQL into NoSQL in the context of Cloud Firestore.

Joins

Ordering

Transactions

Conclusion

Ready to get started?