Splitgraph is building the Unified Data Stack – an integrated and modern solution for working with data without worrying about its infrastructure.
You can try Splitgraph now! Browse the public data catalog to discover 40,000+ repositories to query with SQL, through the web IDE or any Postgres client.
Splitgraph is powered by open standards and simple abstractions, like data images – immutable tables that you can push and pull, or query on the fly.
We're not replacing SQL as our favorite query language anytime soon 😄, but we rely on GraphQL to implement many of Splitgraph's internal APIs. Thanks to GraphQL schema stitching, we were able to add new backend services which extend the set of fields published by older services. Clients don't need to specify which service provides which field, this is determined automatically by the schema stitching logic.
Projects like Apollo Client, graphql-playground and PostGraphile have made working with GraphQL a smooth experience, but the declarative query language has proven to be even more important than the tooling. It's given us the flexibility to develop services which can be used independently, but also as components implementing the following unified API:
type Query {
"""Returns a single `Repository` identified
by it's `namespace` and `repositoryName`."""
metadataRepository(
namespace: String!,
repositoryName: String!): Repository
authorizationRepository(
namespace: String!,
repositoryName: String!): Repository
"""Returns a single `User` based on their username."""
authorizationUser(username: String!): User
metadataUser(username: String!): User
}
type Repository {
namespace: String!,
repositoryName: String!,
description: String,
url: String,
"""Lists all users who have permission to read this repository."""
usersWithReadAccess: [User!]!
}
type User {
username: String!,
fullName: String,
"""Lists all repositories this user has permission to read."""
repositories: [Repository!]!
}
The "dashboard" page which greets users upon login is one of the many clients of this API. It needs to display the current user's name as well a link to each repository they have access to. A single GraphQL query can fetch the required fields:
query Dashboard($username: String!) {
metadataUser(username: $username) {
fullName,
repositories {
description,
url
}
}
}
Two separate services store the data requested by the Dashboard
query. The client sends the query to the gateway
service, which uses schema stitching to combine the APIs of the backing metadata
and authorization
services into the unified schema shown above. As the gateway
service executes the Dashboard
query, it queries the backing services.
The metadata
service is powered by PostGraphile and predates the authorization
service. Before developing the latter, Splitgraph's internal GraphQL API Schema looked similar to the following:
type Repository {
namespace: String!
repositoryName: String!
description: String
url: String
}
type User {
username: String!
fullName: String
}
type Query {
metadataRepository(
namespace: String!,
repositoryName: String!): Repository
metadataUser(username: String!): User
}
We built the authorization
service based on Ory Keto to keep track of user-repository permissions.
The authorization
service's schema extends the types introduced by the metadata
service:
"""This Repository type is merged with the Repository
type declared in the metadata service schema."""
type Repository {
"""namespace and repositoryName are the key fields
by which the repository types in the two
subschemas can be merged."""
namespace: String!
repositoryName: String!
usersWithReadAccess: [User!]!
}
"""Similar to Repository, User will be merged
with the metadata service's User type."""
type User {
"""username is the key field of User."""
username: String!
repositories: [Repository!]!
}
type Query {
"""authorization service specific fields for retrieving
User and Repository objects."""
authorizationRepository(
namespace: String!,
repositoryName: String!): Repository
authorizationUser(username: String!): User
}
The main idea is simple: the different fields of a type may be distributed among several services as long as the corresponding subschemas all use the same type name. By default, the "stitched" schema just combines all the fields of any given type from all subschemas, but accidental type merging can be avoided. Single-service GraphQL schemas operate under the closed world assumption that all fields belonging to a type are declared in the schema. Stitching leads to an open world model where any type may be extended with additional fields declared in a new subschema.
At first, the doubled fields in the Query
type may seem superfluous. They have the same return types after all: User
and Repository
.
Why they're necessary becomes apparent as soon as we examine how the Dashboard
query is processed by the gateway
service.
The gateway
service uses the merging flow to execute queries and delegate to the other services.
With the proper subschema configuration, the schema stitching code can determine which service should be queried for a particular field.
Consider the steps required to compute results for the Dashboard
query for user mrDorp
.
The client submits the Dashboard
query to the gateway service, which selects the fullName
and repositories
fields on the result of metadataUser
.
Since metadataUser
comes from the metadata
service's subschema, the gateway
queries it for the fullName
field.
metadataUser(username: "mrDorp") {
__typename
fullName
username
}
The username
field is added implicitly since it's the key field used to join the User
types in the two services. __typename
is always added by the schema stitching code. The response to the query is:
{
"data": {
"metadataUser": {
"__typename": "User",
"username": "mrDorp",
"fullName": "Ralph Dorp"
}
}
The repositories
field of the User
type belongs to the authorization
service, so the gateway
queries it next. The subschema configuration specifies that the top-level authorizationUser
field may be queried to get the User
fields declared in the service's subschema.
authorizationUser(username: "mrDorp") {
username
repositories {
__typename
namespace
repositoryName
}
}
Note that the Dashboard
query selected the description
and url
fields of each Respository
object, but these fields are declared by the metadata
service schema. In the request headed for the authorization
service, all that can be queried are the key fields. Just as in the previous query, the gateway
service implicitly adds the username
key field to the selection set. The response is the following:
{
"data": {
"authorizationUser": {
"__typename": "User",
"username": "mrDorp",
"repositories": [
{
"__typename": "Repository",
"namespace": "austintexas-gov",
"repositoryName": "austin-high-school-graduation-rates-xeb7-q8v3"
},
{
"__typename": "Repository",
"namespace": "bts-gov",
"repositoryName": "county-transportation-profiles-qdmf-cxm3"
}
]
}
}
Having obtained the repositories' key fields, the gateway
may query additional Repository
fields from the metadata
service. It consults the stitching configuration for the top-level field to query - in this case metadataRepository
.
metadataRespository(
namespace: "austintexas-gov",
repositoryName: "austin-high-school-graduation-rates-xeb7-q8v3") {
__typename
namespace
repositoryName
description
url
}
metadataRespository(
namespace: "bts-gov",
repositoryName: "county-transportation-profiles-qdmf-cxm3") {
__typename
namespace
repositoryName
description
url
}
Just as with the query for the authorizationUser
field, the key fields (namespace
and repositoryName
) and __typename
are implicitly added to the query to allow merging of result objects.
We created a PostGraphile plugin using GraphQL's DataLoader to combine the two queries above into a single GraphQL query (drop us a line if you're interested in using it). Unfortunately, the current schema does not allow for such optimization, so each repository's metadata fields are queried separately. The metadata
service responds with:
{
"data": {
"metadataRepository": {
"__typename": "Repository",
"namespace": "austintexas-gov",
"repositoryName": "austin-high-school-graduation-rates-xeb7-q8v3"
"description": "Graduation rates for Austin high schools for years 2012 to 2016 provided by the Texas Education Agency.",
"url": "https://www.splitgraph.com/austintexas-gov/austin-high-school-graduation-rates-xeb7-q8v3"
}
}
}
{
"data": {
"metadataRepository": {
"__typename": "Repository",
"namespace": "bts-gov",
"repositoryName": "county-transportation-profiles-qdmf-cxm3"
"description": "Profiles of transportation features of U.S. counties",
"url": "https://www.splitgraph.com/bts-gov/county-transportation-profiles-qdmf-cxm3"
}
}
}
The gateway
has all the fields required to respond to the Dashboard
query. Key fields which weren't selected in the Dashboard
query are discarded once the objects have been merged.
{
"data": {
"metadataUser": {
"__typename": "User",
"fullName": "Ralph Dorp",
"repositories": [
{
"__typename": "Repository",
"description": "Graduation rates for Austin high schools for years 2012 to 2016 provided by the Texas Education Agency.",
"url": "https://www.splitgraph.com/austintexas-gov/austin-high-school-graduation-rates-xeb7-q8v3"
},
{
"__typename": "Repository",
"description": "Profiles of transportation features of U.S. counties",
"url": "https://www.splitgraph.com/bts-gov/county-transportation-profiles-qdmf-cxm3"
}
]
}
}
}
While processing the Dashboard
query, the gateway
had to decide which service to query for each requested field. This is determined by the subschema configuration.
Below is the simplified code for the gateway
service, which is based on the TypeScript examples for remote schemas and schema stitching:
import { stitchSchemas } from "@graphql-tools/stitch";
import { fetch } from "cross-fetch";
import { print, GraphQLSchema } from "graphql";
import { introspectSchema, wrapSchema } from "@graphql-tools/wrap";
import type { AsyncExecutor } from "@graphql-tools/utils/executor";
import { SubschemaConfig } from "@graphql-tools/delegate";
const authorizationServiceUrl = "http://api.splitgrph.com/authorzation/graphql";
const metadataServiceUrl = "http://api.splitgrph.com/metadata/graphql";
const makeRemoteExecutor: (serviceUrl: string) => AsyncExecutor = (
serviceUrl: string
) => async ({ document, variables }) => {
const query = print(document);
const result = await fetch(serviceUrl, {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({ query, variables }),
});
return result.json();
};
const defineSubSchema = async (
serviceUrl: string,
merge?: SubschemaConfig["merge"]
): Promise<SubschemaConfig> => {
const executor = makeRemoteExecutor(serviceUrl);
const schema = wrapSchema({
schema: await introspectSchema(executor),
executor,
});
return { executor, schema, merge };
};
export const makePublicSchema = async (): Promise<GraphQLSchema> =>
stitchSchemas({
subschemas: [
await defineSubSchema(authorizationServiceUrl, {
User: {
fieldName: "authorizationUser",
selectionSet: "{ username }",
args: ({ username }) => ({ username }),
},
Repository: {
fieldName: "authorizationRepository",
selectionSet: "{ namespace repositoryName }",
args: ({ namespace, repositoryName }) => ({
namespace,
repositoryName,
}),
},
}),
await defineSubSchema(metadataServiceUrl, {
User: {
fieldName: "metadataUser",
selectionSet: "{ username }",
args: ({ username }) => ({ username }),
},
Repository: {
fieldName: "metadataRepository",
selectionSet: "{ namespace repositoryName }",
args: ({ namespace, repositoryName }) => ({
namespace,
repositoryName,
}),
},
}),
],
});
The GraphQLSchema
object returned by makePublicSchema
can be used to create an HTTP GraphQL endpoint.
In order to query a subschema, it's URL must be known. Conveniently, a GraphQL query can be used to obtain an endpoint's schema. This is what defineSubSchema()
does when it calls introspectSchema()
. The object passed to defineSubSchema()
is the merge configuration. Consider the fragment,
await defineSubSchema(authorizationServiceUrl, {
User: {
fieldName: "authorizationUser",
selectionSet: "{ username }",
args: ({ username }) => ({ username }),
}
It declares: when fields of the User
type are queried which were declared by the authorization
service schema, they can be obtained by passing the username
field of the pre-existing User
object as the argument to the top-level authorizationUser
field.
This was employed in step 2 of the query process described earlier, when the User.repository
field was merged with the User
object obtained from the metadata
service in the previous step.
fieldName
specifies the field of the top-level Query
type to consult. selectionSet
defines the set of fields to be selected from the existing User
object to get authorizationUser
's arguments, typically the key field or fields. selectionSet
could contain fields not yet available on the current User
instance. In such a case, an additional subschema query would be made before the field referred to by fieldName
is queried.
The args
function transforms the existing fields on the User
object to be used as arguments to authorizationUser
. In most cases, this is the identity function, but it can be useful for things like string to number conversion, especially when stitching schemas for APIs outside of our control. For example, one could extend the types in the official Shopify GraphQL Schema with custom fields routed to an internal service.
It would also be possible to start by querying the authorization
service:
query Dashboard2($username: String!) {
authorizationUser(username: $username) {
fullName,
repositories {
description,
url
}
}
}
The steps to executing the Dashboard2
query would be:
authorization
service for the username
key field and the repositories
field.User.fullName
field to the existing User
object instance by querying metadataUser
.url
and description
fields with the existing Repository
objects using metadataRepository
.Schema stitching has enabled us to extend an existing service's GraphQL API with new fields served by a new service. The "stitching" is seamless in the sense that clients don't need to know which field belongs to which service's subschema. We evolved our API without affecting earlier queries.