If you are using
graphql, you are likely to making queries on a graph of data (surprise surprise). But it’s easy
to implement inefficient code with naive loading of a graph of data.
java-dataloader will help you to make this a more efficient process by both caching and batching requests for that graph of data items. If
has seen a data item before, it will have cached the value and will return it without having to ask for it again.
Imagine we have the StarWars query outlined below. It asks us to find a hero and their friend’s names and their friend’s friend’s names. It is likely that many of these people will be friends in common.
The result of this query is displayed below. You can see that Han, Leia, Luke and R2-D2 are a tight knit bunch of friends and share many friends in common.
A naive implementation would call a
DataFetcher to retrieve a person object every time it was invoked.
In this case it would be 15 calls over the network. Even though the group of people have a lot of common friends.
dataloader you can make the
graphql query much more efficient.
graphql descends each level of the query (e.g. as it processes
hero and then
friends and then for each their
the data loader is called to “promise” to deliver a person object. At each level
dataloader.dispatch() will be
called to fire off the batch requests for that part of the query. With caching turned on (the default) then
any previously returned person will be returned as-is for no cost.
In the above example there are only 5 unique people mentioned but with caching and batching retrieval in place there will be only 3 calls to the batch loader function. 3 calls over the network or to a database is much better than 15 calls, you will agree.
If you use capabilities like
java.util.concurrent.CompletableFuture.supplyAsync() then you can make it even more efficient by making the
the remote calls asynchronous to the rest of the query. This will make it even more timely since multiple calls can happen at once
if need be.
Here is how you might put this in place:
One thing to note is the above only works if you use
DataLoaderDispatcherInstrumentation which makes sure
is called. If this was not in place, then all the promises to data will never be dispatched ot the batch loader function
and hence nothing would ever resolve.
The only execution that works with DataLoader is
graphql.execution.AsyncExecutionStrategy. This is because this execution strategy knows
then the most optimal time to dispatch() your load calls is. It does this by deeply tracking how many fields are outstanding and whether they
are list values and so on.
Other execution strategies such as
ExecutorServiceExecutionStrategy cant do this and hence if the data loader code detects
you are not using
AsyncExecutionStrategy then it will simple dispatch the data loader as each field is encountered. You
caching of values but you will not get
batching of them.
If you are serving web requests then the data can be specific to the user requesting it. If you have user specific data then you will not want to cache data meant for user A to then later give it to user B in a subsequent request.
The scope of your DataLoader instances is important. You might want to create them per web request to ensure data is only cached within that web request and no more.
If your data can be shared across web requests then you might want to scope your data loaders so they survive longer than the web request say.
But if you are doing per request data loaders then creating a new set of
DataLoader objects per
request is super cheap. It’s the
GraphQLSchema creation that can be expensive, especially if you are using graphql SDL parsing.
Structure your code so that the schema is statically held, perhaps in a static variable or in a singleton IoC component but
build out a new
GraphQL set of objects on each request.
The data loader code pattern works by combining all the outstanding data loader calls into more efficient batch loading calls.
graphql-java tracks what outstanding data loader calls have been made and it is its responsibility to call
in the background at the most optimal time, which is when all graphql fields have been examined and dispatched.
However there is a code pattern that will cause your data loader calls to never complete and these MUST be avoided. This bad
pattern consists of making a an asynchronous off thread call to a
DataLoader in your data fetcher.
The following will not work (it will never complete).
In the example above, the call to
characterDataLoader.load(argId) can happen some time in the future on another thread. The graphql-java
engine has no way of knowing when it’s good time to dispatch outstanding
DataLoader calls and hence the data loader call might never complete
as expected and no results will be returned.
Remember a data loader call is just a promise to actually get a value later when its an optimal time for all outstanding calls to be batched together. The most optimal time is when the graphql field tree has been examined and all field values are currently dispatched.
The following is how you can still have asynchronous code, by placing it into the
Notice above the
characterDataLoader.load(argId) returns immediately. This will enqueue the call for data until a later time when all
the graphql fields are dispatched.
Then later when the
DataLoader is dispatched, it’s
BatchLoader function is called. This code can be asynchronous so that if you have multiple batch loader
functions they all can run at once. In the code above
CompletableFuture.supplyAsync(() -> getTheseCharacters(keys)); will run the
method in another thread.