Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the 2014_its_world_congress_connected_vehicle_test_bed
table in this repository, by referencing it like:
"datahub-transportation-gov/2014-its-world-congress-connected-vehicle-test-bed-3trh-hz5x:latest"."2014_its_world_congress_connected_vehicle_test_bed"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"spat_int_status", -- This attribute contains the Advance Traffic Controller status information on the intersection. Data is encoded in hexadecimal. Converting the hexadecimal to binary gives an 8 bit unsigned integer with bits 76543210. Each bit has value 0 or 1 and indicates whether or not a condition is present at an intersection. For the ISD dataset, this data element takes only one value: {0} which indicates the intersection is operating normally.
"timetolive", -- This field contains the time remaining before the ISD message expires. This field is enumerated as follows: 0 – 1 minute, 1 – 30 minutes, 2- one day, 3 – one week, 4 – one month, 5 – one year
"serviceregion_nwcorner_lat", -- This field contains the latitude of the northwest corner of the rectangle that the SDW presides over, expressed in 1/10th integer microdegrees. The value 900000001 shall be used when unavailable.
"map_msgcnt", -- This field contains a sequence number within a stream of messages with the same msgID and from the same sender. A sender may initialize this element to any value in the range 0-127 when sending the first message with a given msgID, or if the sender has changed identity since sending the most recent message with that msgID. Two further use cases exist when the sender has not changed identity: When the rest of the message content to be sent changes, the msgCnt shall be set equal to one greater than the value used in the most recent message sent with the same msgID. When the message content has not changed, the msgCnt is not changed. For this element the value after 127 is zero. The receipt of a non-sequential msgCnt value (from the same sending device and message type) implies that one or more messages from that sending device may have been lost, unless msgCnt has been reinitialized due to an identity change
"map_int_refpoint_lat", -- This field contains the latitude of a reference point at the described intersection, expressed in 1/10th integer microdegrees. The value 900000001 shall be used when unavailable.
"mapdata_int_refpoint_long", -- This field contains the longitude of a reference point at the described intersection, expressed in 1/10th integer microdegrees. The value 1800000001 shall be used when unavailable.
"map_int_approach_drivinglanes_laneattributes", -- The laneAttributes data element relates the allowed (possible) movements from a motorized vehicle lane. Note that in practice these values may be further restricted by vehicle class, local regulatory environment and other changing conditions. There are 17 enumerated values for the Lane Attributes data element.
"map_crc", -- This attribute is a two byte data element calculated over the payload bytes of the message (starting with the initial sequence and ending with the last data element before the CRC itself and including all tag, length, and values bytes found in between). It is always placed as the very last data element in the message. The generating polynomial used is the "CRC-CCITT" commonly expressed as x16 + x12 + x5 + 1. An initial seed value of zero shall be used. Note that because the first byte of every DSRC message is never zero (it is 0x30), framing errors due to incorrectly clocking initial zero values cannot occur. Note that the MSB byte is always transmitted first, following the typical ASN bytes order. When a well formed DSRC message (including its last two bytes holding the CRC value) is decoded and input to the CRC process, the resulting CRC should always be the value zero.
"spat_timestamp_hour", -- Hour when the SpaT message was generated in GMT.
"spat_int_states_laneset", -- This attribute represents movement within lanes at an intersection. Data is provided in hexadecimal. Each movement/lane pair is provided as a double-octet (four character) hex string. The first two characters represent the movement, while the last two characters represent the lane. The movement is obtained by converting the hexadecimal value to an 8 bit binary string with bits 76543210. A movement is present with a logical 1 for any given bit. Bit 0 represents straight motion, bit 1 represents a left turn, bit 2 represents a right turn, and bit 3 represents a U-turn. No other bits are used. For example, a laneSet string of 0x0703 has the movement characters 07, which are written as 00000111 in binary. Bits 0, 1, and 2 are in the logical 1 position, indicating straight, left, and right movement. The string has lane characters 03 which directly indicate lane 3. Therefore, a laneSet of 0x0703 indicates straight/left/right movement on lane 3. There is no limit to the number of movement lane/pairs possible. For example, a laneSet of 0x08040302 indicates a U-Turn on lane 4 (with movement string 00001000 in binary) and straight/left on lane 2 (with movement string 00000011 in binary). Values in decimal should be converted back to the full binary bit string in order to interpret the laneSet field.
"spat_int_states_currstate", -- This attribute defines the current state of a particular known movement and depends on the type of lane that the currState applies to.
"spat_int_states_timetochange", -- This attribute specifies the time remaining, in tenths of a second, before the signal phase will change to the next phase. The maximum time remaining is 12000 (120.0 seconds = 2 minutes). 0 indicates no time remaining. A special value of 12001 indicates indefinite time remaining (greater than 2 minutes) and a special value of 12002 indicates an unknown amount of time remaining.
"serviceregion_nwcorner_long", -- This field contains the longitude of the northwest corner of the rectangle that the SDW presides over, expressed in 1/10th integer microdegrees. The value 1800000001 shall be used when unavailable.
"map_msgid", -- The msgID is a data element used in each message to define which type of message follows from the message set defined by this Standard. This data element is always the first value inside the message and is used to tell the receiving application how to interpret the remaining bytes (i.e. what message structure has been used). For the ISD dataset, this data element takes only one value: “mapData”
"map_int_id", -- This field contains an ID (number) that globally and uniquely defines an intersection within a country or region. Assignment rules for this value are established elsewhere and may use regional assignment schemas that vary.
"map_int_approach_id", -- This field contains a unique index value for an approach or egress in an intersection for the convenience of human users. It is typically used along with an optional human readable string name for the object.
"spat_timestamp_day", -- Day when the SPaT message was generated in GMT.
"map_layertype", -- This field specifies the type of information found in a layer of a geographic map fragment (such as an intersection). For the ISD dataset, this data element takes only one value: “intersectionData”
"map_int_approach_drivinglanes_nodelist", -- This field contains the sequence of signed offset values for determining the Xs and Ys (and, possibly Width or Zs when present) using the current intersection position to build a path for the enclosing reference lane relating to a lane in the current intersection. Please consult page 77 of the SAE J2735 Standard for more information.
"dialogid", -- This field contains the ID of the type of data being transmitted. In this instance all entries will be “intersectionSitDataDep” indicating that intersection situation data (ISD) is being transmitted / presented here.
"spat_timestamp_month", -- Month when the SPaT message was generated in GMT.
"bundlenumber", -- This field contains the bundle number of the ISD message
"spat_timestamp_second", -- Second when the SPaT message was generated in GMT.
"map_int_approach_name", -- This field contains an optional human readable name for the approach lanes to the intersection.
"spat_int_id", -- The unique ID number for the SPaT message.
"spat_timestamp_year", -- Year when the SPaT message was generated in GMT.
"spat_int_states_stateconfidence", -- This field describes the confidence of the current signal phase and its estimated time values. Values are enumerated as follows: 0 – unknownEstimate, 1 – minTime, 2- maxTime, 3 - timeLikelyToChange
"spat_int_states_yellstateconfidence", -- This field describes the confidence of the yellow signal phase and its estimated time values. Values are enumerated as follows: 0 – unknownEstimate, 1 – minTime, 2- maxTime, 3 - timeLikelyToChange
"serviceregion_secorner_long", -- This field contains the longitude of the southeast corner of the rectangle that the SDW presides over, expressed in 1/10th integer microdegrees. The value 1800000001 shall be used when unavailable.
"map_int__approach_drivinglanes_lanenumber", -- This field contains a unique index value for a lane used to refer to that lane by other objects in the intersection map data structure. Lanes may be ingress (inbound traffic) or egress (outbound traffic) in nature, as well as barriers and other types of specialty lanes. All lanes are numbered. The laneNumber, in conjunction with the intersections_id, forms a regionally unique way to address a specific lane in that intersection.
"map_int_approaches_approach_drivinglanes_", -- This field contains the width of a lane in in centimeters. The maximum value would be a lane of over 327 centimeters.
"seqid", -- This field communicate the message content type. All entries in this instance will be “data” indicating that the data portion of the transmitted message is presented.
"serviceregion_secorner_lat", -- This field contains the latitude of the southeast corner of the rectangle that the SDW presides over, expressed in 1/10th integer microdegrees. The value 900000001 shall be used when unavailable
"map_int_name", -- This field contains an optional human readable name for the intersection.
"spat_timestamp_minute", -- Minute when the SPaT message was generated in GMT.
"requestid" -- This field contains the ID (number) of the ISD message.
FROM
"datahub-transportation-gov/2014-its-world-congress-connected-vehicle-test-bed-3trh-hz5x:latest"."2014_its_world_congress_connected_vehicle_test_bed"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query datahub-transportation-gov/2014-its-world-congress-connected-vehicle-test-bed-3trh-hz5x
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because datahub-transportation-gov/2014-its-world-congress-connected-vehicle-test-bed-3trh-hz5x:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone datahub-transportation-gov/2014-its-world-congress-connected-vehicle-test-bed-3trh-hz5x
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout datahub-transportation-gov/2014-its-world-congress-connected-vehicle-test-bed-3trh-hz5x:latest
This will download all the objects for the latest
tag of datahub-transportation-gov/2014-its-world-congress-connected-vehicle-test-bed-3trh-hz5x
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in datahub-transportation-gov/2014-its-world-congress-connected-vehicle-test-bed-3trh-hz5x:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered datahub-transportation-gov/2014-its-world-congress-connected-vehicle-test-bed-3trh-hz5x:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, datahub-transportation-gov/2014-its-world-congress-connected-vehicle-test-bed-3trh-hz5x
is just another Postgres schema.