Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the water_resources_work_orders
table in this repository, by referencing it like:
"citydata-mesaaz-gov/water-resources-work-orders-u9pq-7g95:latest"."water_resources_work_orders"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"work_order_category", -- Select the category for this work order. This field has default setting, but can be changed. This field uses a list to populate and is Enterprise wide.
"submit_to_open_month", -- Month submit to open was populated
"submit_to_open_month_date",
"submit_to_opened_date", -- Date submit to field was opened
"units_accomplished", -- Total number of units accomplished. Waste Water group uses this field to track miles of sewer line cleaned.
"work_completed_by_id", -- Unique ID of person who completed the work
"work_order_description", -- This field is automatically populated with the work order description selected when the work order was created.
"submit_to_month", -- Month submit to field was populated
"submit_to_id", -- Unique ID of person the work order is submitted to
"work_order_labor_cost", -- Total cost of labor
"work_order_map_scale", -- GIS Attribute that defines the map scale
"work_order_template_id", -- Unique ID of the work order template
"service_interruption_ratio", -- Used by the Water Service Interruption performance measure, this number represents the Total Number of Customer Accounts divided by the Total Number of Water Service Interruptions that month.
"requested_by", -- The user who requested the work order
"project_finish_date", -- Project date for the work to be completed
"project_start_month_date",
"project_name", -- Select from the drop-down list to tie the work order to a project.
"project_finish_year", -- Year the project was finished
"project_finish_month_date",
"printed_month_date",
"printed_date", -- This field is automatically populated with the date work order was printed
"is_reactive", -- Check box selected if work is reactive.
"initiated_date", -- The date when the work order was created
"initiated_by_id", -- Unique Id of person who initiated work order
"initiated_by", -- The user who created the work order.
"initiated_month_date",
"entity_type", -- This field is automatically populated with the entity type (asset type) selected when the work order was created
"contractor_id", -- Unique Contractor ID, for joining Contractor table/dataset
"cancelled_by", -- Field automatically populated with the login of the user who cancelled the work order
"cancel_reason", -- If the work order has been canceled, this field appears and a reason is entered.
"district", -- Enterprise wide field. Water uses this field to designate which work group the Process Controls group did work for.
"cycle_interval", -- Identifies the number for the cycle interval, 1, 30, etc.
"work_order_id", -- This field is automatically populated with the unique work order ID and cannot be changed.
"work_order_equipment_cost", -- Unique ID custom field category
"work_order_cost", -- Total cost of work order
"work_order_closed_month_date",
"units_locked", -- Yes or No field to lock the units of measure so it can not be changed
"supervisor_id", -- The unique ID of supervisor
"submit_to_year", -- Year submit to was populated
"submit_to_date", -- Date submit to field was populated
"submit_to", -- The person who the work order is submitted to.
"requested_by_id", -- The unique ID of who requested the work order
"project_id", -- Automatically populated with unique ID for project selected
"legally_billable", -- Check box selected if work legal and billable
"close_by_id", -- Login ID of user who closed the work order
"cancelled_month_date",
"cancelled_date", -- The date work order is canceled. This field is automatically populated with the date.
"cancelled_by_id", -- Unique ID of person who cancelled work order
"actual_start_year", -- The year work actually started
"actual_start_month_date",
"actual_start_month", -- The month work actually started
"actual_start_date", -- The date and time work actually started
"actual_finish_month_date",
"work_order_closed_by", -- Automatically populated by the person that closed the work order.
"project_start_year", -- Year the project was started
"actual_finish_date", -- The date and time work was actually finished
"work_order_material_cost", -- Total material cost
"work_completed_by", -- Name of person who completed the work
"submit_to_month_date",
"cancelled_month", -- The month work was cancelled
"work_order_permit_cost", -- Total permit cost
"primary_contract_id", -- Unique ID of contractor primary contract.
"supervisor", -- The supervisor of person doing the work.
"asset_unattached", -- Yes or No field, identifies if an asset is attached or not
"asset_group", -- Enterprise wide field. Selects a specific group within a domain. i.e. Water domain has two groups Water and Waste Water.
"map_page", -- Used to automatically populate with the GIS city section and city quarter on certain point features.
"cycle_from", -- Identifies which field the cycle will start from. Either Actual Finish Date or Projected Start Date.
"cancelled", -- Check box to mark work order cancelled.
"account_number", -- This list is made of old RC #'s. This is an Enterprise Wide field and the list is the same for all Domains
"map_template_name", -- Map template name used by the work order
"project_finish_month", -- Month the project was finished
"from_month", -- The month breakout of the From Date value
"cancelled_year", -- The year work was cancelled
"stage", -- Identifies whether work order is actual or proposed
"scheduled_month", -- Month the work order scheduled to begin
"is_it_reportable", -- For sanitary sewer overflow, if the event is large enough to be reportable.
"shop", -- Enter a shop or warehouse. This field is not used by the Water domain
"line_item_cost", -- Lists cost of a line item identified on a contract
"work_order_closed_year", -- Year work order was closed
"submit_to_opened_by_id", -- The unique ID of person submitted to.
"source_work_order_id", -- Parent work order ID
"project_start_date", -- Project start date for the work to begin
"initiated_year", -- The year the work order was created
"work_order_output", -- Not sure what this field is for. It is populated with A or C.
"work_order_custom_field", -- Unique ID custom field category
"service_interruption_duration", -- The length of time in hours of water service interruption if any.
"scheduled_date", -- Projected date for the work to begin
"printed_year", -- Year the work order was printed
"number_of_days_before", -- Number of days before the work order is printed
"initiated_by_app", -- Automatically populated by the app that created the work order. All records = Internal
"from_month_date",
"created_by_cycle", -- Yes or No field and identifies if the work order was created by the preventative cyclical work orders.
"actual_finish_year", -- The year work was actually finished
"resolution", -- Select final outcome. This is a list field and is not populated
":@computed_region_fcpr_wj2n",
"submit_to_opened_by", -- Name of person the submit to was opened by
"project_start_month", -- Month the project was started
"actual_finish_month", -- The month work was actually finished
"scheduled_month_date",
"scheduled_year", -- Year the work order scheduled to begin
"units_description", -- List of unit description, i.e. Each, Gallons, Hours, Feet, Inches, etc
"initiated_month", -- The month the work order was created
"from_year", -- The year breakout of the From Date value
"domain_id", -- Unique field to identify domain 1=Transportation, 2=Water, 3=Energy, 4=Facilities, 5=Parks, 10=Mesa Citywide
"work_order_map_extent", -- GIS Attribute that defines map extent
"work_order_close_date", -- Date work order was closed
"printed_month", -- Month the work order was printed
"expense_type", -- Maintenance and Capital Improvement
"work_order_closed_month", -- Month work order was closed
"submit_to_open_year", -- Year submit to open was populated
"status", -- Condition, or situation of the work order Open, Complete, Closed etc
"cycle_interval_unit", -- Identifies cycle interval, Days, Weeks, Months, or Years.
"update_map", -- Check this box if work order has custom fields with new data to update GIS
"from_date", -- The next cycle work order will be created by either Actual Finish Date, Projected Start Date, or user can Pick a Date.
"priority", -- Priority rating for the work order 1-5 with 1 being the highest priority.
"contract_billable" -- Check box to mark if the work order is legal billable
FROM
"citydata-mesaaz-gov/water-resources-work-orders-u9pq-7g95:latest"."water_resources_work_orders"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query citydata-mesaaz-gov/water-resources-work-orders-u9pq-7g95
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because citydata-mesaaz-gov/water-resources-work-orders-u9pq-7g95:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone citydata-mesaaz-gov/water-resources-work-orders-u9pq-7g95
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout citydata-mesaaz-gov/water-resources-work-orders-u9pq-7g95:latest
This will download all the objects for the latest
tag of citydata-mesaaz-gov/water-resources-work-orders-u9pq-7g95
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in citydata-mesaaz-gov/water-resources-work-orders-u9pq-7g95:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered citydata-mesaaz-gov/water-resources-work-orders-u9pq-7g95:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, citydata-mesaaz-gov/water-resources-work-orders-u9pq-7g95
is just another Postgres schema.