Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the places_census_tract_data_gis_friendly_format_2021
table in this repository, by referencing it like:
"cdc-gov/places-census-tract-data-gis-friendly-format-2021-mb5y-ytti:latest"."places_census_tract_data_gis_friendly_format_2021"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"geolocation", -- Latitude, Longitude of city centroid (Format: Point(Longitude Latitude))
"teethlost_crude95ci", -- Estimated confidence interval for crude prevalence of all teeth lost among adults aged >=65 years
"teethlost_crudeprev", -- Model-based estimate for crude prevalence of all teeth lost among adults aged >=65 years, 2018
"stroke_crude95ci", -- Estimated confidence interval for crude prevalence of stroke among adults aged >=18 years
"stroke_crudeprev", -- Model-based estimate for crude prevalence of stroke among adults aged >=18 years, 2019
"sleep_crude95ci", -- Estimated confidence interval for crude prevalence of sleeping less than 7 hours among adults aged >=18 years
"sleep_crudeprev", -- Model-based estimate for crude prevalence of sleeping less than 7 hours among adults aged >=18 years, 2018
"phlth_crude95ci", -- Estimated confidence interval for crude prevalence of physical health not good for >=14 days among adults aged >=18 years
"phlth_crudeprev", -- Model-based estimate for crude prevalence of physical health not good for >=14 days among adults aged >=18 years, 2019
"obesity_crude95ci", -- Estimated confidence interval for crude prevalence of obesity among adults aged >=18 years
"obesity_crudeprev", -- Model-based estimate for crude prevalence of obesity among adults aged >=18 years, 2019
"mhlth_crude95ci", -- Estimated confidence interval for crude prevalence of mental health not good for >=14 days among adults aged >=18 years
"mammouse_crude95ci", -- Estimated confidence interval for crude prevalence of mammography use among women aged 50–74 years
"lpa_crude95ci", -- Estimated confidence interval for crude prevalence of no leisure-time physical activity among adults aged >=18 years
"lpa_crudeprev", -- Model-based estimate for crude prevalence of no leisure-time physical activity among adults aged >=18 years, 2019
"kidney_crude95ci", -- Estimated confidence interval for crude prevalence of chronic kidney disease among adults aged >=18 years
"kidney_crudeprev", -- Model-based estimate for crude prevalence of chronic kidney disease among adults aged >=18 years, 2019
"highchol_crude95ci", -- Estimated confidence interval for crude prevalence of high cholesterol among adults aged >=18 years who have been screened in the past 5 years
"highchol_crudeprev", -- Model-based estimate for crude prevalence of high cholesterol among adults aged >=18 years who have been screened in the past 5 years, 2019
"diabetes_crude95ci", -- Estimated confidence interval for crude prevalence of diagnosed diabetes among adults aged >=18 years
"depression_crude95ci", -- Estimated confidence interval for crude prevalence of depression among adults aged >=18 years, 2019
"depression_crudeprev", -- Model-based estimate for crude prevalence of depression among adults aged >=18 years, 2019
"dental_crudeprev", -- Model-based estimate for crude prevalence of visits to dentist or dental clinic among adults aged >=18 years, 2018
"csmoking_crude95ci", -- Estimated confidence interval for crude prevalence of current smoking among adults aged >=18 years
"csmoking_crudeprev", -- Model-based estimate for crude prevalence of current smoking among adults aged >=18 years, 2019
"corew_crude95ci", -- Estimated confidence interval for crude prevalence of older adult women aged >=65 years who are up to date on a core set of clinical preventive services: Flu shot past year, PPV shot ever, Colorectal cancer screening, and Mammogram past 2 years
"corew_crudeprev", -- Model-based estimate for crude prevalence of older adult women aged >=65 years who are up to date on a core set of clinical preventive services: Flu shot past year, PPV shot ever, Colorectal cancer screening, and Mammogram past 2 years, 2018
"corem_crude95ci", -- Estimated confidence interval for crude prevalence of older adult men aged >=65 years who are up to date on a core set of clinical preventive services: Flu shot past year, PPV shot ever, Colorectal cancer screening
"corem_crudeprev", -- Model-based estimate for crude prevalence of older adult men aged >=65 years who are up to date on a core set of clinical preventive services: Flu shot past year, PPV shot ever, Colorectal cancer screening, 2018
"copd_crude95ci", -- Estimated confidence interval for crude prevalence of chronic obstructive pulmonary disease among adults aged >=18 years
"copd_crudeprev", -- Model-based estimate for crude prevalence of chronic obstructive pulmonary disease among adults aged >=18 years, 2019
"colon_screen_crude95ci", -- Estimated confidence interval for crude prevalence of fecal occult blood test, sigmoidoscopy, or colonoscopy among adults aged 50–75 years
"cholscreen_crude95ci", -- Estimated confidence interval for crude prevalence of cholesterol screening among adults aged >=18 years
"cholscreen_crudeprev", -- Model-based estimate for crude prevalence of cholesterol screening among adults aged >=18 years, 2019
"checkup_crude95ci", -- Estimated confidence interval for crude prevalence of visits to doctor for routine checkup within the past year among adults aged >=18 years
"checkup_crudeprev", -- Model-based estimate for crude prevalence of visits to doctor for routine checkup within the past year among adults aged >=18 years, 2019
"chd_crude95ci", -- Estimated confidence interval for crude prevalence of coronary heart disease among adults aged >=18 years
"chd_crudeprev", -- Model-based estimate for crude prevalence of coronary heart disease among adults aged >=18 years, 2019
"cervical_crude95ci", -- Estimated confidence interval for crude prevalence of cervical cancer screening among adult women aged 21–65 years
"cervical_crudeprev", -- Model-based estimate for crude prevalence of cervical cancer screening among adult women aged 21–65 years, 2018
"casthma_crude95ci", -- Estimated confidence interval for crude prevalence of current asthma among adults aged >=18 years
"casthma_crudeprev", -- Model-based estimate for crude prevalence of current asthma among adults aged >=18 years, 2019
"cancer_crude95ci", -- Estimated confidence interval for crude prevalence of cancer (excluding skin cancer) among adults aged >=18 years
"cancer_crudeprev", -- Model-based estimate for crude prevalence of cancer (excluding skin cancer) among adults aged >=18 years, 2019
"bpmed_crude95ci", -- Estimated confidence interval for crude prevalence of taking medicine for high blood pressure control among adults aged >=18 years with high blood pressure
"bpmed_crudeprev", -- Model-based estimate for crude prevalence of taking medicine for high blood pressure control among adults aged >=18 years with high blood pressure, 2019
"bphigh_crude95ci", -- Estimated confidence interval for crude prevalence of high blood pressure among adults aged >=18 years
"bphigh_crudeprev", -- Model-based estimate for crude prevalence of high blood pressure among adults aged >=18 years, 2019
"binge_crudeprev", -- Model-based estimate for crude prevalence of binge drinking among adults aged >=18 years, 2019
"arthritis_crude95ci", -- Estimated confidence interval for crude prevalence of arthritis among adults aged ≥18 years
"arthritis_crudeprev", -- Model-based estimate for crude prevalence of arthritis among adults aged >=18 years, 2019
"access2_crude95ci", -- Estimated confidence interval for crude prevalence of current lack of health insurance among adults aged 18 - 64 years
"tractfips", -- Census tract FIPS code
"countyname", -- County name
"stateabbr", -- State abbreviation
"dental_crude95ci", -- Estimated confidence interval for crude prevalence of visits to dentist or dental clinic among adults aged >=18 years
"mhlth_crudeprev", -- Model-based estimate for crude prevalence of mental health not good for >=14 days among adults aged >=18 years, 2019
"mammouse_crudeprev", -- Model-based estimate for crude prevalence of mammography use among women aged 50–74 years, 2018
"ghlth_crude95ci", -- Estimated confidence interval for crude prevalence of fair or poor health among adults aged >=18 years
"ghlth_crudeprev", -- Model-based estimate for crude prevalence of fair or poor health among adults aged >=18 years, 2019
"diabetes_crudeprev", -- Model-based estimate for crude prevalence of diagnosed diabetes among adults aged >=18 years, 2019
"colon_screen_crudeprev", -- Model-based estimate for crude prevalence of fecal occult blood test, sigmoidoscopy, or colonoscopy among adults aged 50–75 years, 2018
"binge_crude95ci", -- Estimated confidence interval for crude prevalence of binge drinking among adults aged >=18 years
"access2_crudeprev", -- Model-based estimate for crude prevalence of current lack of health insurance among adults aged 18-64 years, 2019
"totalpopulation", -- 2010 Census population count
"countyfips", -- County FIPS code
"statedesc" -- State name
FROM
"cdc-gov/places-census-tract-data-gis-friendly-format-2021-mb5y-ytti:latest"."places_census_tract_data_gis_friendly_format_2021"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query cdc-gov/places-census-tract-data-gis-friendly-format-2021-mb5y-ytti
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because cdc-gov/places-census-tract-data-gis-friendly-format-2021-mb5y-ytti:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone cdc-gov/places-census-tract-data-gis-friendly-format-2021-mb5y-ytti
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout cdc-gov/places-census-tract-data-gis-friendly-format-2021-mb5y-ytti:latest
This will download all the objects for the latest
tag of cdc-gov/places-census-tract-data-gis-friendly-format-2021-mb5y-ytti
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in cdc-gov/places-census-tract-data-gis-friendly-format-2021-mb5y-ytti:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered cdc-gov/places-census-tract-data-gis-friendly-format-2021-mb5y-ytti:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, cdc-gov/places-census-tract-data-gis-friendly-format-2021-mb5y-ytti
is just another Postgres schema.