Data Modeling Patterns for Cloud Firestore

Ezfire

Due to their proliferation and long history, many web developers are familiar with how to model data when working with a relational database. Taking a collection of domain objects and their relationships and mapping them to various tables and relations is a rote practice with well-defined patterns.

When looking at non-relational databases, and specifically document stores like Cloud Firestore, there are additional degrees of freedom on what shape your data can take in the database. Conversely, there are limitations on the types of queries you can dispatch. As a result, there are a different set of considerations and effective patterns to employ when modeling your data.

In this article, we are going to take a look at some data modeling patterns that are useful when working with Cloud Firestore. Effective use of these patterns can save you a lot of trouble in a growing application and even save you money in the long run by reducing the amount of reads dispatched to your database. Let's get started!

Nested Arrays

When working with a relational database, "has many" relationships always result in the introduction of a new table to encode that relationship, e.g. if a user has many posts, we need posts table with a foreign key on the user. For document stores like Cloud Firestore however, there is more flexibility in the way you can store these relationships. Like the relational database, you can use a separate collection to store has many relationships. However, with document stores, we also have the option of storing has many relationships as nested arrays in the document.

In the case of a user with posts, the nested array may not be a good choice because there is potentially no limit to the number of posts a user can create. However, consider the case of orders on an e-commerce website. There is a reasonable limit on the number of items that can be in any one order (for example real orders will likely never exceed 1000 items). Storing that amount of data as a nested collection on the order document is reasonable and in fact makes accessing the order in its entirety faster and easier.

Let's make this a little more concrete. Consider the following slice of a domain model

Orders *---> OrderItems

We can quickly map these to documents in the database with the following shapes:

type Order {
  id: string;
  total: number;
  currency: string;
  itemCount: number;
}

type OrderItem {
  name: string;
  image: string;
  upc: string;
  quantity: number;
  price: number;
}

However, since we know that we should realistically never have more than 1000 order items, we can just include that data directly in the order document. This produces the following augmented order document:

type Order {
  id: string;
  total: number;
  currency: string;
  itemCount: number;
  items: OrderItem[];
}

Now, our has many relationship exists completely in the orders collection. As a result the we need to perform many fewer reads to load a complete order with Cloud Firestore, which increases performance and reduces cost.

Polymorphism

At some point, any sophisticated object oriented data model is going to require some polymorphism, and in a web app, that means we are going to have to store polymorphic data. Luckily, the flexibility and schemaless design of document stores like Cloud Firestore make them perfect for easily storing polymorphic data. Truthfully, its as easy as adding a type field to your document.

Let's look at an example. Consider again the case of an app where users have many posts. In an app like Reddit, posts can have multiple different types, such as text posts, link posts and image posts. This is a prime example of polymorphic data.

To model this data for Cloud Firestore, we can write the following document shapes:

type Post {
  type: "Text" | "Image" | "Link";
  title: string;
  user: DocumentReference;
}

type TextPost = Post & {
  type: "Text";
  text: string;
}

type ImagePost = Post & {
  type: "Image";
  src: string;
  alt: string;
}

type LinkPost = Post & {
  type: "Link";
  link: string;
}

With this data model, we can easily store polymorphic data in the posts collection. To resolve the type of the post after reading from the database, we just need to look at the type field:

db.collection("posts")
  .doc(postId)
  .get()
  .then(doc => doc.data())
  .then(data => {
    // Check the type field t see what kind of document this is.
    if (data.type === "Text") {
      return data.text;
    } else if (data.type === "Image") {
      return data.src;
    } else {
      // Must be a Link
      return data.link;
    }
  });

Master / Aggregate Pattern

One of the most interesting thing about Cloud Firestore is it's pay-per-use pricing structure which differs from the majority of hosted database products on the market. At $0.06 per 100,000 reads, $0.18 per 100,000 writes and $0.02 per 100,000 deletes, it's easy to see how a heavily trafficked app could quickly become very expensive compared to a database with a fixed hourly rate. At the same time, this pricing structure makes optimizing for cost straight forward. Just do less.

More often than not, a typical app is performing many more reads than writes so it can be very helpful to reduce the number of reads required to render a page. In the NoSQL world, this is often done by denormalizing data from one collection to another so that a single document read can provide all the necessary data. In this way, we can trade off a small number of writes to significantly reduce the number of reads. This not only results in faster load times, but reduces cost for Cloud Firestore.

This is where the Master / Aggregate pattern comes in. With this pattern, you split you collections into master collections and aggregate collections. Master collections act as the source of truth for your data, while aggregates collections contain documents built up of data denormalized from the master collections. This allows you to read a curated set of data by reading a single document in the database that contains all the data you need. Each time the master collection is updated, those changes will propogate to the aggregate collections to keep them up to date.

To make this more concrete, consider the case where you have an app with users that can create some kind of posts e.g. a site like Reddit. You would need a users collection and a posts collection to store this data and it could look something like:

type User {
  name: string;
  email: string;
}

type Post {
  createdAt: Timestamp;
  content: string;
  user: DocumentReference;
}

Now support you have a requirement for a user profile page that shows the user's information and their last ten posts. We could do this my loading both the user document and last 10 posts like:

Promise.all([
  db.collection("users").doc(userId).get(),
  db
    .collection("posts")
    .where("userId", "==", userId)
    .orderBy("createdAt", "desc")
    .limit(10)
    .get(),
]);

Now this will work and will likely perform well, but each time we are loading the profile page, we perform 11 document reads! If the profile pages are heavily trafficked, the cost will quickly add up.

The solution here is to introduce an aggregate collection called userProfiles that pulls data from both the users collection and the posts collection when they update. The document would look something like this:

type UserProfile {
  name: string;
  recentPosts: {
    createdAt: Timestamp;
    content: string;
  }[]
}

Using Cloud Firestore function triggers like onWrite for both the users collection and posts collection, we can sync the required data to the userProfiles collection when the user document is updated or a new post is created. Using this technique, we need only perform 1 document read to get all the data we need for the profile page instead of 11 at the cost of a single write when the user is updated or a new post is created.

db.collection("userProfiles").doc(userId).get();

The result is a faster app and big savings on cost.

Conclusion

In this article, we looked at a few useful data modelling patterns for non-relational databases, and specifically Cloud Firestore. Effectively using these patterns will help improve your developer experience, improve application performance and reduce infrastructure costs for your Cloud Firestore database. Thanks for reading, I hope you learned something useful.

If you are interested in getting more experience with Cloud Firestore, sign up for our app and try out some of the queries in this article on our sample database.

Tutorial

Data Modeling Patterns for Cloud Firestore

In this article, we will discuss some useful data modeling patterns for document stores in the context of Cloud Firestore.

Nested Arrays

Polymorphism

Master / Aggregate Pattern

Conclusion

Ready to get started?