Apollo Client is a JavaScript GraphQL client commonly used in React applications. It facilitates communication with a GraphQL backend in a convenient way and enhances user experience thanks to caching.
In this article, we go over ways to keep the Apollo Cache up-to-date after successful data mutations.
One of Apollo's features is a normalized cache. When configured right, it detects situations where 2 different query results returned parts of the same entity, and merges those results into a single object in memory.
This means that when there is an overlap between 2 queries, the result of the later query can update the values read by the earlier query, if those fields happened to have changed in the meantime.
Notice how the UserComponent
updated and read the latest age
(pointed to by
the blue arrow) that was fetched by a query initiated by a different component
(UserAgeComponent
). This is thanks to Apollo's knowledge that both queries
refer to the same object in memory, so updates propagate to all readers,
provided
the query fetch policy permits using the cache.
The Apollo cache is also normalized. This means multiple different fields can refer to the same object in memory.
The band CFowl is referenced both in the list of
Alice's likedBands
, and as her favoriteBand
. Apollo
can identify
that it is, in fact, the same band, and thus stores that as a single object in
memory, rather than two different objects that just happen to look the same.
When CFowl releases a new song, if Apollo Cache
learns about it, all components that subscribed to that information will get
updated, regardless of how Apollo found out about the songs
property having
changed.
Such a normalized cache is useful when building user interfaces. We often display the same data in multiple places and benefit from storing the freshest data for components to use, ensuring UI consistency.
Note that for normalization to work, we need to tell Apollo how to identify objects of a given type, so it can check if 2 objects should be treated as the same entity conceptually. The documentation covers that in great detail.
In the case of Splitgraph's GraphQL API, most of the time we can rely on the
nodeId
field of objects, but we could have also used the same fields that we
use for the selectionSet
during
schema-stitching (for example, cache User
objects based on their username
).
If you are interested in finding our more about how Apollo Cache works, we
recommend
this guide
from the authors of
the apollo-augmented-hooks
library.
All would be well, except for the fact most applications are not read-only. Users can change the content of the application and expect it to propagate throughout the UI.
As we all know, cache invalidation is one of the 2 hard problems in computer science (others being naming things, and off-by-1 errors, pun intended).
More concretely, when a mutation successfully finishes, we should update the cache so components display the updated data. Apollo documented available approaches:
Let's go over these and see their pros and cons.
Each GraphQL mutation exposes information back to the caller. Mutation responses are merged into the cache the same way query responses are.
This means that for simple mutations that update existing objects, it should be enough to return the updated data and let Apollo merge it into the normalized cache.
For example, when updating the description of a repository, notice how the
description
property is included in the repository
in the mutation response:
mutation UpdateRepositoryDescription(
$namespace: String!
$repository: String!
$description: String!
) {
updateRepositoryDescription(
namespace: $namespace
repository: $repository
description: $description
) {
repository {
id
description
}
}
}
This approach requires no additional network requests, since the updated data is included in the response to the original mutation request. There is also no extra work required by the application - the cache is updated automatically.
It is not the Holy Grail, though. It is difficult to express insertions or deletions in this way.
For example, when creating a new repository, the mutation response would have to contain all the possible lists in which the repository could exist. The situation is similar when deleting repositories. This could easily lead to overfetching, since the UI most likely does not need all these lists. It just needs to update the lists is already had in the cache.
Updates to lists are easier to express using other approaches of Apollo cache updates. Let's move on to them now.
When executing a mutation in Apollo, we can supply
an update
function.
It will be called with the mutation response and can modify the normalized data
in the cache in any way we want. For example, this lets us add or remove objects
from lists.
The function runs in the browser and does not require any additional network
calls, which makes it fast (unless you do lots of computations in your update
function, but still, it is synchronous).
Again, not the Holy Grail. The syntax for cache updates is quite verbose. We also run the risk of updating the cache in a way that differs from what the API would return. Moreover, we need to modify all the data that the mutation could have changed. This increases the risk of displaying inconsistent data.
Let's see if these problems could be solved by refetching queries.
The third way of updating the cached data that Apollo supports is refetching queries. After a mutation is done, Apollo can refetch a set of queries to read the latest data.
This approach results in the highest consistency with the API, since we refresh the data directly from the source. The cost is that there are additional network requests involved, which use up the user's bandwidth, and take time, during which the user sees stale information.
There are a two main ways to specify which queries to refetch, each with its upsides and downsides.
The most basic way is to manually specify the queries to refetch in
the refetchQueries
array.
It initially requires very little work, since you only need to specify either
query names, or can refetch active or all queries (using
the apolloClient.refetchQueries
function).
The ease of use comes at a maintenance cost:
Both of these downsides make this solution hard to maintain. To achieve full consistency, each mutation needs to be reviewed when adding, modifying, or removing GraphQL queries. This adds coupling between components that normally should not know about each other.
All in all, it is easy to start with this approach of manually specifying queries to refetch, and we recommend switching to the other solution (described below) as the application grows.
The other approach is a twist of the earlier approach of updating the cache in the frontend. The difference is that instead of modifying the cached values in-place, we mark them as invalid. Apollo will then refetch queries that included these fields in their previous results.
await client.refetchQueries({
updateCache(cache) {
cache.modify({
id: cache.identify(apple),
fields: {
color(_value, { INVALIDATE }) {
return INVALIDATE;
},
},
});
},
});
This approach scales better to more queries and mutations, and it does not
introduce additional coupling between components. If the updateCache
function
is implemented right and correctly approximates the effects of the mutation,
Apollo will then refetch all queries which returned those invalidated fields.
The ease of use of this method depends on the type of mutation. Let's see that on an example. Consider the following GraphQL response:
{
"data": {
"seafowlDatabase": {
"nodeId": "WyJzZWFmb3dsX2RhdGFiYXNlIiwiR2VsaW8iXQ==",
"name": "my-app",
"schemas": [
{
"nodeId": "WyJzZWFmb3dsX3NjaGVtYSIsIkdlbGlvIiwiR2VsaW8iXQ==",
"name": "maps",
"tables": [
{
"nodeId": "WyJzZWFmb3dsX3RhYmxlIiwiR2VsaW8iLCJHZWxpbyIsImNpdGllcyJd",
"name": "cities",
"__typename": "SeafowlTable"
},
{
"nodeId": "WyJzZWFmb3dsX3RhYmxlIiwiR2VsaW8iLCJhYmMiLCIxMiJd",
"name": "lakes",
"__typename": "SeafowlTable"
}
],
"__typename": "SeafowlSchema"
}
],
"__typename": "SeafowlDatabase"
}
}
}
It represents a Seafowl database with its contents. It is used in the sidebar of the Splitgraph Console.
Let's suppose we want to rename the lakes
table to waters
and assume that
the API cannot include the updated object in the mutation response. Thus, we
send a mutation:
mutation UpdateTableName {
updateTableName(
databaseName: "my-app"
schemaName: "maps"
oldTableName: "lakes"
newTableName: "waters"
) {
mutationId
}
}
Then, after the mutation is done, we can refetch all queries that referenced the
name
field of that affected table:
await client.refetchQueries({
updateCache(cache) {
cache.modify({
id: cache.identify(lakesTable),
fields: {
name: (_value, { INVALIDATE }) => INVALIDATE,
},
});
},
});
This seems like overkill, because we could have updated the cache directly
instead of refetching the queries. Still, refetching queries after updates of
existing objects is easy, assuming we can cache.identify
the lakesTable
.
Let's consider a different type of mutation. Instead of updating the table name,
we want to delete the lakes
table altogether. We send the following mutation:
mutation DeleteTable {
deleteTable(databaseName: "my-app", schemaName: "maps", tableName: "lakes") {
mutationId
}
}
Again, after the mutation is done, we can refetch all queries that referenced any field of the deleted table:
await client.refetchQueries({
updateCache(cache) {
cache.modify({
id: cache.identify(table),
// NOTE: notice `fields` is a function, not an object.
// This way we invalidate ALL table fields without
// knowing their names.
fields: (_value, { INVALIDATE }) => INVALIDATE,
});
},
});
This is just right. Apollo finds which queries referenced that deleted table, and it refetches them.
Note that returning the modified objects (in this case, SeafowlSchema
s that
contained the deleted lakes
table) in the mutation response would not scale
well. The deleted table could have appeared in many queries, and be returned by
various parents. For example, the lakes
table could have been also returned by
a query that returns the latest table in the entire Seafowl instance:
{
"data": {
"latestTableInInstance": {
"nodeId": "WyJzZWFmb3dsX3RhYmxlIiwiR2VsaW8iLCJhYmMiLCIxMiJd",
"name": "lakes",
"__typename": "SeafowlTable"
}
}
}
The deleteTable
mutation result would have to contain both the updated maps
SeafowlSchema
and this latestTableInInstance
top-level query. Whether to
include one or the other is not known statically, because it depends on what is
in the Apollo cache at the time of sending the mutation. Thus, letting Apollo
determine the queries to refetch is perfect for mutations that delete objects.
Mutations that add new objects are the most difficult to represent as cache updates. Apollo does not know in which queries the added object could appear. Thus, we need to determine all possible parents of the added object, identify them in the cache, and invalidate the field that references the added object.
To see that in practice, let's imagine we add a new mountains
SeafowlTable
using the following mutation:
mutation CreateTable($sourceSQLQuery: String!) {
createTable(
databaseName: "my-app"
schemaName: "maps"
tableName: "mountains"
sourceSQLQuery: $sourceSQLQuery
) {
mutationId
}
}
Now it is our job to identify all objects that could include that mountains
table and tell Apollo to refetch queries that reference them. This is a hard
problem, because we do not know what the IDs of these parent objects could be.
In our example, we need to refetch:
maps
SeafowlSchema
latestTableInInstance
queryLet's try that:
await client.refetchQueries({
updateCache(cache) {
cache.modify({
// TODO: somehow get access to `mapsSeafowlSchema`
id: cache.identify(mapsSeafowlSchema),
fields: {
tables: (_value, { INVALIDATE }) => INVALIDATE,
},
});
// Invalidate the top-level query
cache.modify({
fields: {
latestTableInInstance: (_value, { INVALIDATE }) => INVALIDATE,
},
});
},
});
Doing it right requires having access to the mapsSeafowlSchema
object (the
parent of the newly-added table) so we can identify it in the cache. Depending
on the structure of the code, this may not be trivial if the component
triggering the mutation did not require access to the schema. For example, it
could have only known the names of the database (my-app
) and the
SeafowlSchema
(maps
) in which to create the table.
To identify the parent SeafowlSchema
of the added table, it may first need to
fetch that SeafowlSchema
from the API (which could be cached, in which case
this is almost instant, but this is not guaranteed), and only then invalidate
it, so Apollo refetches also other queries.
What about other types in the GraphQL schema that can return this new table?
They also need to be considered to achieve full consistency. Moreover, when the
GraphQL schema changes, the updateCache
functions need to be re-evaluated,
since there could be now more fields that need to be invalidated when adding a
new table.
All of the above means that invalidating the cache after adding new objects in a mutation is a difficult problem to solve correctly.
There is a small caveat when there is no active query that returned the invalidated field. Invalidating a field will only refetch active queries that reference that field.
Consider the following scenario:
A
renders and fetches a query.A
is unmounted (and, thus, the query is unsubscribed).A
is rendered again. It reads the query result from cache. It
does not refetch the query, even though it was marked as invalid in an
earlier step.Not refetching the queries despite data being invalidated is a problem that
has been reported on GitHub.
The workaround
involves using the DELETE
sentinel value instead of INVALIDATE
during cache
mutations. This way the value is field is deleted from cache and Apollo is
forced to fetch the query, as it will not return partial data (unless
you tell it to).
Let's compare the presented ways of keeping Apollo cache up-to-date. We will assess these based on:
Technique | Ease of implementation | Ease of maintenance | Applicability | UX | Risk of overfetching | Risk of inconsistency |
---|---|---|---|---|---|---|
Including modified data in the mutation's response | Easy (if the API exposes the modified data) | Easy | Only updates of existing objects | Instant | Low | Low |
Updating the cache directly in the frontend | Difficult | Easy if done correctly from the start 1 | All types of mutations | Instant | None | High 4 |
Refetching queries - manually specifying queries to refetch | Easy | Difficult 2 | All types of mutations | Delayed update 3 | High | Low |
Refetching queries - marking cached fields as invalid | Easy for updates and deletes, difficult for insertions | Easy if done correctly from the start 1 | All types of mutations | Delayed update 3 | Low | Low |
Apollo offers many ways of keeping its cache consistent with the backend data. There is no single approach that works best in every scenario. Our recommendation is:
for updates, include the modified data in the mutation's response. This requires the least work, leads to best UX, and is easy to maintain.
for deletions, mark the fields of the deleted object as invalid.
When pressed for time, start by manually specifying queries to refetch, and
gradually move towards implementing the updateCache
function that
invalidates fields.
for insertions, aim to mark the fields of parent objects as invalid. If that turns out to be difficult, use a manual list of queries to refetch and review it periodically.
This amortizes the maintenance and implementation costs.
All in all, cache invalidation remains a hard problem, and using Apollo is not an exception.