Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the major_safety_events
table in this repository, by referencing it like:
"datahub-transportation-gov/major-safety-events-9ivb-8ae9:latest"."major_safety_events"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"derailment_type", -- Identifies the type of rail derailment.
"other_worker_fatalities", -- Count of Other worker (e.g.,commercial worker / utilites worker / etc.) fatalities for the given event.
"suicide_fatalities", -- Count of Suicide related fatalities for the given event.
"people_waiting_or_leaving", -- Count of Person waiting / leaving from transit fatalities for the given event.
"transit_vehicle_operator", -- Count of Transit vehicle operator fatalities for the given event.
"bicyclist_fatalities", -- Count of Pedestrian: bicyclist fatalities for the given event.
"number_of_transit_vehicles", -- Identifies the number of transit vehicles involved in the event.
"total_injuries", -- Total number of involved injuries as a result of the event.
"towed_y_n", -- Identifier of collisions involving towing away from the scene for a transit roadway vehicle or other non-transit roadway vehicle that incurred disabling damage as a result of a collision.
"intentional_y_n", -- Identifies if the security event was intentional or not,
"suicide_serious_injuries", -- Count of serious Suicide related injuries for the given event.
"pedestrian_crossing_tracks", -- Count of Pedestrian: person crossing tracks fatalities for the given event.
"pedestrian_walking_along", -- Count of Pedestrian: person walking along tracks fatalities for the given event.
"transit_vehicle_rider_serious", -- Count of serious Transit vehicle rider Injuries for the given event.
"right_of_way_condition", -- Identifies the right-of-way (ROW) conditions for non-rail or rail collision or rail derailment.
"path_condition", -- Identifies the pathway (roadway) conditions for a non-rail collision.
"weather", -- Identifies the weather at the time of the event.
"other_fire_fuel_description", -- Text explanation from the 'Please describe" box used when "Other'' is selected for the Fire Fuel
"_5_digit_ntd_id", -- A five-digit identifying number for each agency used in the current NTD system.
"agency", -- The legal name of the entity reporting to the NTD.
"uace_code", -- The Census Bereau's Urban Area Census Code (UACE). This five-digit code uniquely identifies an urban area and remains unchanged from census to census.
"fire_type", -- Identifies the source of the fire. A fire is an uncontrolled combustion made evident by flame that requires suppression by equipment or personnel.
"evacuation_location", -- Identifies what was evacuated.
"longitude", -- The geographic coordinate that specifies the east–west position of a point on the Earth's surface.NTD requires a minimum of 4 decimal places, and if 7 decimals are not provided, the NTD systems fills to 7 decimals.
"number_of_vehicles_involved", -- Identifies the number of non-transit vehicles involved in the event.
"mode_name", -- A system for carrying transit passengers described by specific right-of-way (ROW), technology and operational features.
"incident_number", -- A unique system-generated identification number for each Major Safety Event.
"incident_date", -- The date on which the event occurred.
"incident_time", -- The actual time of day the event occurred.
"event_type", -- The type of event that occurred.
"assault_homicide_person_type_desc", -- Indicates whether the person assaulted is a transit operator, other transit worker, not a transit worker, or not specified. Only applies to assaults.
"assault_homicide_transit_worker_flag", -- When the value is True, the event is reported as an Assault on Transit Worker of any type. Otherwise, the value is false. Note, the value will appear as False for all events prior to 2023.
"occupant_of_other_vehicle_2", -- Count of serious Occupant of other vehicle Injuries for the given event.
"bicyclist_injuries", -- Count of Pedestrian: bicyclist Injuries for the given event.
"people_waiting_or_leaving_2", -- Count of serious Person waiting / leaving from transit Injuries for the given event.
"people_waiting_or_leaving_1", -- Count of Person waiting / leaving from transit Injuries for the given event.
"transit_vehicle_operator_1", -- Count of Transit vehicle operator Injuries for the given event.
"other_worker_serious_injuries", -- Count of serious Other worker (e.g.,commercial worker / utilites worker / etc.) Injuries for the given event.
"pederstiran_in_crosswalk_1", -- Count of serious Pedestrian: in crosswalk Injuries for the given event.
"other_event_type_description", -- Text explanation from the 'Please describe" box used when "Other Security Event " is selected
"runaway_train_flag", -- Identifier of events involving a runaway train. [A train which is no longer under the control of a driver regardless of whether the operator is physically on the vehicle at the time. This is limited to revenue vehicles.]
"collision_with", -- Identifies what the transit vehicle collided with: Motor Vehicle, Non-Transit Motor Vehicle, Non-Rail Transit Vehicle, Rail Transit Vehicle/Rail Passenger Train – Revenue Service, Rail Transit Vehicle/Rail Passenger Train – Not in Revenue Service, Rail Transit Vehicle/Rail Transit Maintenance or Service Vehicle on the rail fixed guideway, Person, Animal, Fixed Object, Transit vehicle, Vessel, Dock/Terminal center, Non-revenue vehicle, Other.
"property_damage_type", -- When a value is reported, this indicates Substantial Damage, which is defined as damage to transit or non-transit property including vehicles, facilities, equipment, rolling stock, or infrastructure that disrupts the operations of the rail transit agency and adversely affects the structural strength, performance, or operating characteristics of the property, requiring towing, rescue, on-site maintenance, or immediate removal prior to safe operation.
"life_safety_y_n", -- Identifier of events involving an evacuation for life-safety reasons. [A reportable evacuation is a condition that occurs when persons depart from transit vehicles or facilities for life safety reasons.]
"location_type", -- Location of the event
"other_fatalities", -- Count of Other fatalities for the given event.
"approximate_address", -- Identifies the exact or approximate address or location of the event.
"latitude", -- The geographic coordinate that specifies the north–south position of a point on the Earth's surface. NTD requires a minimum of 4 decimal places, and if 7 decimals are not provided, the NTD systems fills to 7 decimals.
"latlon", -- This is a point field containing the georeference for the location of the event (combines and formats Latitude and Longitude).
"safety_security", -- Identifier of a Safety (SFT) or (Security Event). [Safety Event: A collision, derailment, fire, hazardous material spill, act of nature (Act of God), evacuation, or OSONOC occurring on transit right-of-way, in a transit revenue facility, in a transit maintenance facility, or involving a transit revenue vehicle and meeting established NTD thresholds.] OR [Security event: An occurrence of a bomb threat, bombing, arson, hijacking, sabotage, cyber security event, assault, robbery, rape, burglary, suicide, attempted suicide (not involving a transit vehicle), larceny, theft, vandalism, homicide, CBR (chemical/biological/radiological) or nuclear release, or other event.]
"fire_fuel", -- For vehicle fires, identifies the type of fuel used to power the vehicle.
"current_condition", -- Identifies the water current conditions at the time of the ferryboat collision
"tide", -- Identifies the tide conditions at the time of the ferryboat collision.
"road_configuration", -- Identifies the roadway configuration at the scene of non-rail collision.
"intersection", -- Identifies the intersection control device located at the intersection for a non-rail collision or rail collision.
"other_serious_injuries", -- Count of serious Other Injuries for the given event.
"other_injuries", -- Count of Other Injuries for the given event.
"property_damage", -- Non-rail mode: The estimated dollar value of all property that is damaged in a Reportable Event. This includes transit-owned property and other vehicles and property involved in the event that are not owned by the transit agency (excludes personal property such as cell phones and computers). Property damage also includes the cost of clearing wreckage.
"total_fatalities", -- Total number of involved fatalities as a result of the event
"rail_alignment", -- Identifies the rail alignment of the track on which the collision or derailment occurred.
"rail_grade_crossing_control", -- Identifies the grade crossing control device used at the grade crossing for non-rail or rail collision or rail derailment.
":@computed_region_m2nu_4dib", -- This column was automatically created in order to record in what polygon from the dataset 'US Counties' (m2nu-4dib) the point in column 'latlon' is located. This enables the creation of region maps (choropleths) in the visualization canvas and data lens.
":@computed_region_8fe2_rd7y", -- This column was automatically created in order to record in what polygon from the dataset 'United States' (8fe2-rd7y) the point in column 'latlon' is located. This enables the creation of region maps (choropleths) in the visualization canvas and data lens.
"service_stop_control_device", -- Identifies the service stop control device located at the service stop for a rail collision.
"occupant_of_other_vehicle_1", -- Count of Occupant of other vehicle Injuries for the given event.
"transit_employee_serious", -- Count of serious Transit employee or contractor Injuries for the given event.
"manufacturer", -- Identifies the manufacturer of the transit vehicle for collisions.
"transit_employee_fatalities", -- Count of Transit employee or contractor fatalities for the given event. Includes "Other Transit Staff" inside the vehicle. Does not include vehicle operators.
"pedestrian_not_in_crosswalk", -- Count of Pedestrian: not in crosswalk fatalities for the given event.
"occupant_of_other_vehicle", -- Count of Occupant of other vehicle fatalities for the given event.
"pedestrian_walking_along_1", -- Count of Pedestrian: person walking along tracks Injuries for the given event.
"total_serious_injuries", -- Sum of the Count of all serious person type injuries for the given event.
"event_category", -- Overall category of the event type
"self_evacuation_y_n", -- Identifies a Self-evacuation, which occur when people vacate transit property without direction from transit personnel or another authority, are not automatically reportable.
"pederstiran_in_crosswalk", -- Count of Pedestrian: in crosswalk Injuries for the given event.
"bicyclist_serious_injuries", -- Count of serious Pedestrian: bicyclist Injuries for the given event.
"other_worker_injuries", -- Count of Other worker (e.g.,commercial worker / utilites worker / etc.) Injuries for the given event.
"transit_employee_injuries", -- Count of Transit employee or contractor Injuries for the given event. Includes "Other Transit Staff" inside the vehicle. Does not include vehicle operators.
"pedestrian_in_crosswalk", -- Count of Pedestrian: in crosswalk fatalities for the given event.
"transit_vehicle_rider", -- Count of vehicle passenger fatalities for the given event
"rail_bus_ferry", -- "Groups modes based on whether they are: Rail Running Non-Rail Surface Transportation Ferryboat"
"non_rail_transit_vehicle", -- Identifies the form of non-rail passenger conveyance used for revenue operations.
"type_of_service", -- Describes how public transportation services are provided by the transit agency: directly operated (DO) or purchased transportation which also includes from Taxi Operators and Transportation Network Companies (PT, TX, TN) services.
"lighting", -- Identifies the indoor or outdoor lighting conditions at the time of the collision or derailment.
"track_configuration", -- Identifies the railway configuration at the scene for derailments.
"year", -- The calendar year in which the event occurred.
"assault_homicide_type_desc", -- Identifies the type of assault on a transit worker as physical, non-physical, or not specified. Only applies to assaults.
"person_list", -- For each injury or fatality in an event, the following information is supplied separated by semicolons: I (Injury) or F (Fatality), H (Homicide or Assault) or S (Suicide or Attempted Suicide), T (if the person was trespassing) or U (if the trespassing status is unknown), Person Type, Age Group, Gender, Y (if medical transport was required)
"incident_description", -- A 2000 character field used to describe the details of the event.
"pedestrian_not_in_crosswalk_2", -- Count of serious Pedestrian: not in crosswalk Injuries for the given event.
"suicide_injuries", -- Count of Suicide related injuries for the given event.
"pedestrian_not_in_crosswalk_1", -- Count of Pedestrian: not in crosswalk Injuries for the given event.
"transit_vehicle_operator_2", -- Count of serious Transit vehicle operator Injuries for the given event.
"event_type_group", -- A derived column to allow users to filter based on broader event type (e.g., Collision, Derailment, Assault).
"evacuation_comment", -- The details of the evacuation. [A reportable evacuation is a condition that occurs when persons depart from transit vehicles or facilities for life safety reasons. Evacuations to a location that may put passengers or patrons in imminent danger (such as controlled rail right-of-way) must also be reported.]
"mode", -- Abbreviation of mode
"manufacturer_description", -- Text explanation from the 'Please describe" box used when "Other'' is selected for the Manufacturer type.
"transit_vehicle_rider_injuries", -- Count of Transit vehicle rider Injuries for the given event.
"fixed_route_flag", -- Fixed Route Flag
"hazmat_type", -- Identifies the type of substance that caused the Hazardous Material Spill. [The spill or release of any amount of hazardous material that creates an imminent danger to life, health, or the environment and requires special attention be given to clean up the material. ]
"hazmat_type_description", -- Text explanation from the 'Please describe" box used when "Other'' is selected for the "Haz Mat Type"
"pedestrian_crossing_tracks_1", -- Count of Pedestrian: person crossing tracks Injuries for the given event.
"pedestrian_crossing_tracks_2", -- Count of serious Pedestrian: person crossing tracks Injuries for the given event.
"pedestrian_walking_along_2", -- Count of serious Pedestrian: person walking along tracks Injuries for the given event.
"vehicle_action", -- Identifies the Non-rail, ferry or rail vehicle action at the time of the collision or derailment.
"action_type", -- Identifies the Non-rail, ferry or rail vehicle impact point at the time of the collision.
"transit_y_n", -- Identifies the number of transit (Y) and non-transit (N) vehicles involved in the event.
"fuel_type", -- Identifies the type of fuel used to power the transit vehicle involved.
"vehicle_speed" -- Identifies the actual or estimated speed at the time of the collision or derailment.
FROM
"datahub-transportation-gov/major-safety-events-9ivb-8ae9:latest"."major_safety_events"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query datahub-transportation-gov/major-safety-events-9ivb-8ae9
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because datahub-transportation-gov/major-safety-events-9ivb-8ae9:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone datahub-transportation-gov/major-safety-events-9ivb-8ae9
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout datahub-transportation-gov/major-safety-events-9ivb-8ae9:latest
This will download all the objects for the latest
tag of datahub-transportation-gov/major-safety-events-9ivb-8ae9
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in datahub-transportation-gov/major-safety-events-9ivb-8ae9:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered datahub-transportation-gov/major-safety-events-9ivb-8ae9:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, datahub-transportation-gov/major-safety-events-9ivb-8ae9
is just another Postgres schema.