Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the state_of_delaware_nibrs_crime_report_totals_by
table in this repository, by referencing it like:
"delaware-gov/state-of-delaware-nibrs-crime-report-totals-by-8njr-fz7r:latest"."state_of_delaware_nibrs_crime_report_totals_by"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"agency_code", -- Two character code used to identify a given jurisdiction.
"adult_arrests", -- Number of adults arrested for the given offense. There can be more than one arrest for a single criminal offense.
"clearance_rate", -- The percentage of offenses that were cleared for that specific offense.
"year", -- The calendar year in which crimes occurred.
"number_of_offenses_received_by_law_enforcement", -- Total number of offenses reported by that agency for a specific offense.
"county", -- County the agency is located within. Towns/agencies with one geographic jurisdiction that spans two counties are identified by the county that contains the majority of their offenses. Multi-county/Statewide agencies are identified as “State”.
"occur_frequency_based_on_main_offense", -- Since some crimes occur frequently and others infrequently, the Delaware SAC has further classified the 24 main NIBRS offenses into four rough categories to make it easier to generate useful charts and figures. These categories are: Low; Low-Moderate; Moderate; High. The list of crimes included in each category may be found further in this document.
"offenses_cleared_by_law_enforcement", -- Total number of offenses that were considered to have been solved by law enforcement for one of several reasons .For crime reporting and analysis purposes, an offense is considered ‘cleared’ or solved when a suspect is either arrested and subsequently referred for prosecution or cannot be arrested but is otherwise clearly identifiable.
"sub_offense_sub_category", -- The Assault main offense has two primary sub-categories (Aggravated Assault and Simple Assault) and each sub-category, unlike the other 23 main offenses, has its own set of sub-categories (e.g., Aggravated Assault with a Firearm). This section contains those categories.
"sub_offense", -- Most of the 24 main offenses have sub-categories of offenses (e.g., Shoplifting is a sub-category of Larceny-Theft). This field identifies those sub-categories. For the main offenses that have no sub-categories, their names are repeated here.
"main_offense", -- There are 24 main offenses/crimes in the FBI's National Incident Based Reporting System (NIBRS). These offenses are termed “Group A” by the FBI.
"offense_type", -- This is a categorization scheme the Delaware Statistical Analysis Center uses to group Delaware's NIBRS data for reporting purposes. There are four categories: Violent, Serious Property, Drug, and Other Property and Social Offenses. The list of offenses in each category can be found further in this document.
"ucr_code", -- An alphanumeric code that the FBI uses to identify crimes reported in the National Incident Based Reporting System. The list of crimes can be found further in this document.
"jurisdiction", -- The law enforcement agency that completed the offense report. A report filed by an agency that entered another agencies jurisdiction would be filed under the reporting agency and not the jurisdiction it fell under. (Ex: Delaware State Police (DSP) entering a town that does not have a police officer present would be reported under the responding DSP Troop.) All towns that DSP provides coverage for and those with no full-time police officers are designated with an asterisk since a significant portion of the reports for that jurisdiction would be completed by the covering agency.
"juvenile_arrests" -- Number of juveniles arrested for the given offense. There can be more than one arrest for a single criminal offense.
FROM
"delaware-gov/state-of-delaware-nibrs-crime-report-totals-by-8njr-fz7r:latest"."state_of_delaware_nibrs_crime_report_totals_by"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query delaware-gov/state-of-delaware-nibrs-crime-report-totals-by-8njr-fz7r
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because delaware-gov/state-of-delaware-nibrs-crime-report-totals-by-8njr-fz7r:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone delaware-gov/state-of-delaware-nibrs-crime-report-totals-by-8njr-fz7r
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout delaware-gov/state-of-delaware-nibrs-crime-report-totals-by-8njr-fz7r:latest
This will download all the objects for the latest
tag of delaware-gov/state-of-delaware-nibrs-crime-report-totals-by-8njr-fz7r
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in delaware-gov/state-of-delaware-nibrs-crime-report-totals-by-8njr-fz7r:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered delaware-gov/state-of-delaware-nibrs-crime-report-totals-by-8njr-fz7r:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, delaware-gov/state-of-delaware-nibrs-crime-report-totals-by-8njr-fz7r
is just another Postgres schema.