Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the police_incidents
table in this repository, by referencing it like:
"dallasopendata/police-incidents-qv6i-rri7:latest"."police_incidents"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"geocoded_column_address",
"city", -- City in which incident occurred
"zip_code", -- Zipcode in which incident occured
"y_cordinate", -- Y Cordinate
"upzdate", -- Date incident was last updated. Internal use
"nibrs_crime_category", -- NIBRS Crime Category
"nibrs_crime", -- NIBRS Crime
"ucr_offense", -- UCR Offense Name
"penalcode", -- State Penal Violation Code number
"cjis", -- Criminal Justice Information Services Code (CJIS)
"elenum", -- Reporting officers assigned element number
"ro1name", -- Responding Officer #1 Name
"callreceived", -- Date and time of the related call
"cfs_number", -- CFS Number
"eyear", -- The year the offense was entered into the system . Internal use
"edate", -- The date the incident record was created. Internal use
"reporteddate", -- The date of the incident as reported to the police
"year2", -- Year of the indent based on the Date of Occurrence (Date2)
"date2_of_occurrence_2", -- The second date of the date occurrence of the incident (Ex: incident occurred between 1/1/2016 and 1/2/2016)
"year1", -- Year of the indent based on the Date of Occurrence (Date1). Internal use
"date1", -- The first date of the date occurrence of the incident (Ex: incident occurred between 1/1/2016 and 1/2/2016)
"taag", -- Geographic areas targeted for higher than average crime
"division", -- Geographic area comprised of census blocks where incident occurred (smallest police geography)
"ra", -- Geographic area comprised of reporting areas where incident occurred
"apt", -- Apartment number
"incident_address", -- Address where incident occurred
"objattack", -- The target item… Parkinglot, Motor Vehicle
"premise", -- Location type where incident took place For example, Apartment Parking, Residence
"signal", -- Police call signal generated by Communications ( Type of 911 call dispacthed)
"geocoded_column_state",
"emonth", -- The month the offense was entered into the system. Internal use
"geocoded_column_zip",
"state", -- State in which incident occurred
"x_coordinate", -- X Coordinate
"nibrs_type", -- NIBRS Type
"nibrs_code", -- NIBRS Code
"ucrcode", -- UCR Code
"gang", -- Yes or no if offense is gang realated
"weaponused", -- Weapon Used
"mo", -- Short description of the offense
"followup2", -- Investigating Unit 2
"assoffbadge", -- Assisting Officer Badge No
"reptoff", -- Reporting Officer Badge No
"involvement", -- Person can be; victim, reporting person, witness
"calldispatched", -- Date and time related call was dispatched
"callcleared", -- Date and time related call was cleared
"callorgdate", -- The date the related call was received
"edatedayofyear", -- The calender number of the year the offense was entered. Internal use
"district", -- Geographic area comprised of city council districts where incident occurred
"offincident", -- Type of Incident
"servyr", -- Year associated with the incident number
"nibrs_crimeagainst", -- NIBRS Crime Against
"etime", -- The time the offense was entered into the system. Internal use
"offensecode", -- UCR Offense code
"geocoded_column_city",
"day1", -- Day of the indent based on the Date of Occurrence (Date1). Internal use
"geocoded_column", -- Location1
"date1dayofyear", -- The calender number of the year 1‐365 based on Date1. Internal use
"community", -- Community Prosecution Areas as designated by the City Community Prosecutors
"ro1badge", -- Responding Officer #1 Badge No
"comprace", -- Victim Race
"ro2badge", -- Responding officer #2 Badge number
"followup1", -- 1st Assigned investigative unit
"compethnicity", -- Victim Ethnicity
"time1", -- The first (starting) time of the time occurrence of the incident (Ex: incident occurred between 8:00am and 5:00pm)
"family", -- Yes or no if the offense is family violence
"status", -- Status of the offense
"hate", -- Yes or no if offense is a hate crime
"month1", -- Month (starting) of the indent based on the Date of Occurrence (Date1). Internal use
"watch", -- Police watch 1st 2nd or 3rd (1st watch = Late Night, 2nd watch = Days and 3rd watch = Evenings)
"eday", -- The day the offense was entered into the system. Internal use
"compsex", -- Victim Gender
"nibrs_group", -- NIBRS Group
"ucr_offdesc", -- UCR Offense description. Internal use
"hatecrimedescriptn", -- Hate Crime Description
"ucr_disp", -- UCR Disposition of the incident
"reviewbadgenum", -- Reviewing Officer Badge No
"date2dayofyear", -- The calender number of the year 1‐365 based on Date2. Internal use
"servnumid", -- Incident number plus year code plus offense number (Ex: -02 means there is two offense with this one incident) Internal use
"incidentnum", -- An RMS generated incident number (report number) with the year
"ro2name", -- Responding officer #2 Name
"victimtype", -- Victim Type
"time2", -- The second(end) time of the time occurrence of the incident (Ex: incident occurred between 8:00am and 5:00pm)
"day2", -- Day of the indent based on the Date of Occurrence (Date1)
"beat", -- Geographic area comprised of beats where incident occurred
"splrpt", -- No longer applies.. PreRMS
"sector", -- Geographic area comprised of Sectors where incident occurred
"drug", -- Yes or no if incident is drug related
"month2", -- Month (end) of the indent based on the Date of Occurrence (Date2)
"type", -- Offense category Part1 or Part2 or Not coded. Internal use
":@computed_region_2f7u_b5gs",
":@computed_region_sjyw_rtbm"
FROM
"dallasopendata/police-incidents-qv6i-rri7:latest"."police_incidents"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query dallasopendata/police-incidents-qv6i-rri7
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because dallasopendata/police-incidents-qv6i-rri7:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone dallasopendata/police-incidents-qv6i-rri7
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout dallasopendata/police-incidents-qv6i-rri7:latest
This will download all the objects for the latest
tag of dallasopendata/police-incidents-qv6i-rri7
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in dallasopendata/police-incidents-qv6i-rri7:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered dallasopendata/police-incidents-qv6i-rri7:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, dallasopendata/police-incidents-qv6i-rri7
is just another Postgres schema.