Working with Splitgraph
A sample Splitgraph query
Your application will mostly interact with Splitgraph by running SQL queries on data that you add or public data.
Here's a sample Splitgraph query:
SELECT COUNT(*) FROM "splitgraph/socrata:20200809".datasets
Splitgraph organizes data in collections of tables called repositories. In
this case, splitgraph/socrata
is the repository we're querying. Repository
names have two parts:
- Namespace, in this case
splitgraph
(this is similar to a GitHub/Docker organization) - Repository, in this case
socrata
Splitgraph repositories can be versioned or live.
A live repository acts as a "proxy" to a remote database. When you query a live repository, Splitgraph translates the inbound query to the remote database's query language and forwards it.
A versioned repository consists of multiple versions, or images. Each image is stored in a columnar format, inspired by modern cloud data warehouses like Snowflake.
The above splitgraph/socrata
repository is versioned. In the example query,
we're querying a certain human-readable tag (20200809
) that Splitgraph
attached to the image to denote its version.
If you omit the version, Splitgraph will use the latest
version of the
dataset. These are equivalent:
SELECT COUNT(*) FROM "splitgraph/socrata".datasets
SELECT COUNT(*) FROM "splitgraph/socrata:latest".datasets
If you're familiar with PostgreSQL, it might help to treat repositories as
schemas (in fact, "splitgraph/socrata"
is a schema in the above query).
Discovering data
You can attach metadata like READMEs or topics to Splitgraph repositories to make them discoverable by other people. You can also make a repository private and control who can access it.
You can use Splitgraph's data catalog to search for repositories, or add your own.
Adding data
There are multiple ways to add data to Splitgraph:
- Uploading a CSV file from the Web or
the
sgr
CLI - Setting up one of the over 100 SaaS sources or live queries to popular databases
- Writing to the Splitgraph DDN
- Pushing a data image from
the
sgr
CLI (advanced)
Splitgraph can also run dbt for you on a schedule or on-demand, offering a simple way to transform repositories.
Once your dataset is published, you can add metadata
like topics or a README file to make it easier for data consumers to discover.
You can also use the splitgraph.yml
format
to programmatically manage your repositories.
Finally, you can manage who can access or edit a given repository using Splitgraph's sharing options.
Consuming data
Splitgraph allows you to query data using a variety of methods:
- Built-in Web IDE that offers CSV downloads
- Data Delivery Network (DDN), a PostgreSQL-compatible endpoint that SQL clients can connect to
- HTTP API that lets you run SQL queries over HTTP