Welcome to sgr
sgr
is the open-source component that's at the core of Splitgraph. It's a
tool that allows the user to manipulate data images
(snapshots of SQL tables at a given point in time) as if they were code
repositories by versioning, pushing and pulling them.
PostgreSQL compatibility
sgr
works on top of PostgreSQL and uses SQL for all versioning and internal
operations. You can "check out" data into actual PostgreSQL tables, offering
read/write performance and feature parity with PostgreSQL and allowing you to
query it with any SQL client. The client application has no idea that it's
talking to a sgr
table and you don't need to rewrite any of your tools to use
sgr
. Anything that works with PostgreSQL will work with sgr
.
Building data with Splitfiles
sgr
also defines the declarative Splitfile language
with Dockerfile-like caching semantics that allows you to build Splitgraph
repositories in a composable, maintainable and reproducible way. When you build
data with Splitfiles, you get
provenance tracking. You can
inspect an image's metadata to find the exact upstream images, tables and
columns that went into it. With one command, sgr
can use this provenance data
to rebuild an image against a newer version of its upstream dependencies. You
can easily integrate sgr
into your existing CI pipelines, to keep your data
up-to-date and stay on top of changes to its inputs.
Layered querying
You do not need to download the full Splitgraph image to query it. Instead, you can query Splitgraph images with layered querying, which will download only the regions of the table relevant to your query, using bloom filters and other metadata. This is useful when you're exploring large datasets from your laptop, or when you're only interested in a subset of data from an image. This is still completely transparent to the client application, which sees a PostgreSQL schema that it can talk to using the Postgres wire protocol.
Adding data to sgr
sgr
does not limit your data sources to Postgres databases. It includes
first-class support for importing and querying data from other databases using
Postgres
foreign data wrappers.
You can create Splitgraph repositories from or query data in
MongoDB,
MySQL,
CSV files,
other Postgres databases,
Elasticsearch clusters
or
Snowflake warehouses
using the same interface.
Decentralized data sharing
Finally, sgr
is peer-to-peer. You can push and pull data images between other
sgr
installations and use it as a standalone tool to supercharge your data
workflows. Splitgraph is also an sgr
peer,
letting you publish your datasets and make them easily queryable by your Web
applications.