Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the public_transit_services_and_reducedfare_programs
table in this repository, by referencing it like:
"pa-gov/public-transit-services-and-reducedfare-programs-wgkx-3uws:latest"."public_transit_services_and_reducedfare_programs"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"provider_add2", -- The physical and/or mailing address of the transportation service provider's corporate headquarters.
"geocoded_column", -- Latitude and Longitude of the service providers headquarters.
":@computed_region_rayf_jjgk",
":@computed_region_r6rf_p9et",
"service_name", -- Type of public transportation services offered, ex. bus, rail, shared-ride, etc. Service Descriptions - Public transportation is available in every county in Pennsylvania, with a wide range of services including: (1) Fixed-route Bus - Services provided on a repetitive, fixed-schedule basis along a specific route with vehicles stopping to pick up and deliver passengers to specific locations; each fixed route trip serves the same origins and destinations. (2) Intercity Passenger Bus - Publicly funded, fixed-route service that operates with limited stops between two urbanized areas or that connects rural areas to an urbanized area. These routes are considered essential links in the regional/statewide network of intercity bus services but cannot be financially supported solely from user fares. (3) Passenger Rail - Passenger train service consisting of local travel which operates between a central city and outlying areas or express service between major population centers, with few intermediate stops. (4) Shared-ride/Demand Response - Community transportation services provided via passenger cars, vans, or small buses operating in response to calls from passengers to the transit operator, who then dispatches a vehicle to pick-up the passengers and transport them to their destinations. Passengers must be willing to share the vehicles with other passengers.
"program_name", -- Type of accessibility (seniors, persons with disabilities) or reduced-fare programs available. Program Descriptions - Pennsylvania offers programs to make transportation accessible to everyone, including seniors and persons with disabilities. These programs includes: (1) ADA Complementary Paratransit Service - Paratransit or other special service required by the Americans with Disabilities Act (ADA) as a complement to fixed-route service that is comparable to the level of fixed-route service provided to individuals without disabilities. Eligible individuals are functionally unable to use fixed-route transportation because of their disability. (2) Persons with Disabilities Half-fare Program - Individuals with disabilities, presenting a Commonwealth of Pennsylvania reduced-fare Transit Identification Care or Medicare Card at the time of fare payment, ride for half-fare on fixed-route transit service during non-peak periods. (3) Persons with Disabilities Program - Persons with disabilities pay a portion (approx. 15%) of the public fare for advance reservation, shared-ride transportation service. (4) Senior Free Transit - Senior citizens, 65 and older, ride free on local fixed-route service or for $1.00 on local rail service whenever the local public transit system is operating. Schedules and hours of service are available by contacting the local public transit agency. Senior citizens must present a Commonwealth I.D. Card or Medicare Card. (5) Senior Shared-ride Program - Registered senior citizens, 65 and older, pay only 15% of the public fare for advances registration, shared-ride transportation service.
"generic_county_longitude", -- Generic Longitude point within each county to assist in the creation of visualizations such as maps.
"id", -- Record identifier.
"county_name", -- The name of the county.
":@computed_region_amqz_jbr4",
":@computed_region_d3gw_znnf",
":@computed_region_nmsq_hqvv",
"provider_zip", -- The zip code of the transportation service provider's corporate headquarters.
"provider_website", -- The website of the transportation service provider.
"provider_add1", -- The physical and/or mailing address of the transportation service provider's corporate headquarters.
"fips_county_code", -- The Federal Information Processing Standard (FIPS) code is a three-digit number used by the United States government to identify counties.
"provider_city", -- The city of the transportation service provider's corporate headquarters.
"pennsylvania_county_code", -- Two-digit numbers used by the commonwealth to identify counties. NOTE: Pennsylvania county codes are in alphabetical order from 01-67. In this sequence Philadelphia's code is 51. PennDOT county codes are numbered the same way, except for Philadelphia, which is included at the end as number 67.
"provider_phone", -- The phone number for the transportation service provider.
"generic_county_latitude", -- Generic Latitude point within each county to assist in the creation of visualizations such as maps.
"penndot_county_code", -- Two-digit numbers used by PennDOT to identify counties.
"provider_st", -- The state of the transportation service provider's corporate headquarters.
"provider_name", -- The name of the company providing transportation services to a county.
"generic_county_location" -- Generic Latitude and Longitude point within each county to assist in the creation of visualizations such as maps.
FROM
"pa-gov/public-transit-services-and-reducedfare-programs-wgkx-3uws:latest"."public_transit_services_and_reducedfare_programs"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query pa-gov/public-transit-services-and-reducedfare-programs-wgkx-3uws
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because pa-gov/public-transit-services-and-reducedfare-programs-wgkx-3uws:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone pa-gov/public-transit-services-and-reducedfare-programs-wgkx-3uws
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout pa-gov/public-transit-services-and-reducedfare-programs-wgkx-3uws:latest
This will download all the objects for the latest
tag of pa-gov/public-transit-services-and-reducedfare-programs-wgkx-3uws
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in pa-gov/public-transit-services-and-reducedfare-programs-wgkx-3uws:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered pa-gov/public-transit-services-and-reducedfare-programs-wgkx-3uws:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, pa-gov/public-transit-services-and-reducedfare-programs-wgkx-3uws
is just another Postgres schema.