Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the closed_iowa_vocational_rehabilitation_cases
table in this repository, by referencing it like:
"mydata-iowa-gov/closed-iowa-vocational-rehabilitation-cases-gcya-t3y9:latest"."closed_iowa_vocational_rehabilitation_cases"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"client_location_point", -- Geocoded location based on city, state, and zip code.
"client_location", -- Reported location at closure.
"annual_wage_change", -- Annual Wage (Closure) - Annual Wage (Application). Null values are associated with cases where the individual was not employed at time of case closure.
"hourly_wage_change", -- Hourly Wage (Closure) - Hourly Wage (Application). Null values are associated with cases where the individual was not employed at time of case closure.
"closure_monthly_wage", -- Reported monthly wage where individual was employed at case closure. Null values are associated with cases where the individual was not employed at time of case closure.
"closure_income_primary_support", -- Yes indicates that personal income was the individual's primary support at time of case closure. Null values are associated with cases where the individual was not employed at time of case closure.
"closure_employed", -- Yes indicates that the individual was employed at time case was closed.
"age_closure", -- Age of individual at case closure
"status_name", -- Type of closure
"months_open", -- Number of months the case was open from effective date of application to effective date of closure.
"closure_date", -- Effective date of case closure
"application_income_primary_support", -- Yes indicates personal income received by the individual associated with the case was their primary support available at time of application
"application_annual_wage", -- Annual wage reported at time of application
"application_monthly_wage", -- Monthly wage reported at time of application)
"application_hourly_wage", -- Hourly wage reported at time of application
"age_application", -- Age of individual at time of application
"application_date", -- Application effective date
"disability_category", -- This column includes SD, MSD and other where services were provided for the case. Significantly Disabled (SD) includes individuals who have a significant physical or mental impairment which seriously limits one or more functional capacities (mobility, communication, self-care, self-direction, interpersonal skills, work tolerance, or work skills) in terms of an employment outcome. Most Significantly Disabled (MSD) are individuals who are seriously limited in three or more functional capacities in terms of an employment outcome.
"veteran_status", -- Veteran indicates individual associated with the case self-identified as a Veteran. Veterans are those who served in the active military, naval or air service, and was honorable discharged. Null values are where individual did not self-identify veteran status.
"rsa_transition_student", -- Transition students are individuals who where 24 years old or younger at time of application
"minority_group", -- Values of minority or non-minority based on how individual associated with case self-identified their racial group. Minority represents those who identified as Black or African American, American Indian or Alaska Native, Asian, Native Hawaiian or Other Pacific Islander, Hispanic or Latino. Null values are where individual did not self-identify race.
"gender", -- Gender of the individual associated with the vocational rehabilitation case
"office_area", -- Area office in which the counselor managing the case was from at time of closure
"case_id", -- ID assigned to the case
"federal_fiscal_year", -- Federal fiscal year (October 1 - September 30, numbered after year it ends) that the case was closed
"monthly_wage_change", -- Monthly Wage (Closure) - Monthly Wage (Application). Null values are associated with cases where the individual was not employed at time of case closure.
"closure_hourly_wage", -- Reported hourly wage of individual where employed at case closure. Null values are associated with cases where the individual was not employed at time of case closure.
"closure_soc_description", -- Standard Occupational Classification (SOC) description published by the U.S. Department of Labor for individuals who were employed at time of case closure. Null values are associated with cases where the individual was not employed at time of case closure.
"closure_soc_category", -- Standard Occupational Classification (SOC) Category published by the U.S. Department of Labor where individual was employed at closure. Null values are associated with cases where the individual was not employed at time of case closure.
"competitive_employment", -- Yes indicates the individual exited program in employment in integrated settings with or without ongoing support services, self-employment, or BEP (Business Enterprise Program) employment with hourly rate of earnings equal to or higher than the state minimum wage. Null values are associated with cases where the individual was not employed at time of case closure.
"rsa_closure_reason", -- Reason case was closed based on categories established by the Rehabilitation Services Administration (RSA)
"closure_description", -- Description of employment status for successful closures. If "Not Answered" the case was not closed successfully
"meet_sd_categories", -- Yes indicates the case met at least the states minimum wage at closure and was MSD/SD categories where met.
"service_provided", -- Yes indicates services were provided by Iowa Vocational Rehabilitation Services on the specific case.
"client_county", -- County where individual resided at closure
"closure_annual_wage", -- Reported annual wage where individuals were employed at case closure. Null values are associated with cases where the individual was not employed at time of case closure.
"closure_hours_worked_week", -- Reported hours worked per week where individuals was employed at case closure. Null values are associated with cases where the individual was not employed at time of case closure.
"closure_primary_support", -- Identifies the individuals primary means of support at case closure
"status_code", -- Code associated with the status of the case. Name associated with code is provided in Status Name.
"disability_type", -- Primary disability category determined at eligibility. Impairments include: Cognitive, Psychosocial Mental, Physical, Deaf & Hard of Hearing, and Visual & Communicative
":@computed_region_e7ym_nrbf",
":@computed_region_uhgg_e8y2",
":@computed_region_i9mz_6gmt",
":@computed_region_wnea_7qqw",
":@computed_region_3r5t_5243"
FROM
"mydata-iowa-gov/closed-iowa-vocational-rehabilitation-cases-gcya-t3y9:latest"."closed_iowa_vocational_rehabilitation_cases"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query mydata-iowa-gov/closed-iowa-vocational-rehabilitation-cases-gcya-t3y9
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because mydata-iowa-gov/closed-iowa-vocational-rehabilitation-cases-gcya-t3y9:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone mydata-iowa-gov/closed-iowa-vocational-rehabilitation-cases-gcya-t3y9
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout mydata-iowa-gov/closed-iowa-vocational-rehabilitation-cases-gcya-t3y9:latest
This will download all the objects for the latest
tag of mydata-iowa-gov/closed-iowa-vocational-rehabilitation-cases-gcya-t3y9
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in mydata-iowa-gov/closed-iowa-vocational-rehabilitation-cases-gcya-t3y9:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered mydata-iowa-gov/closed-iowa-vocational-rehabilitation-cases-gcya-t3y9:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, mydata-iowa-gov/closed-iowa-vocational-rehabilitation-cases-gcya-t3y9
is just another Postgres schema.