Splitgraph has been acquired by EDB!

wa-gov/municipal-stormwater-permit-outfall-data-d958-q2ci

Query the Data Delivery Network

The easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432 and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.

For example, you can query the municipal_stormwater_permit_outfall_data table in this repository, by referencing it like:

"wa-gov/municipal-stormwater-permit-outfall-data-d958-q2ci:latest"."municipal_stormwater_permit_outfall_data"

or in a full query, like:

SELECT
    ":id", -- Socrata column ID
    "fraction_analyzed", -- Indicates the fraction (total, dissolved, or suspended) of an aqueous sample that was analyzed.
    "lab_analysis_date", -- The analysis date reported by the lab.
    "new_result_units", -- Final units column following data compilation  
    "sample_collection_method", -- Code assigned to a procedure or method used to collect a sample.
    "sample_preparation_method", -- Code assigned to procedure or method used to prepare a sample.
    "sample_id", -- Primary ID to identify a sample.  May be selected by the sampler or assigned by the lab.
    "location_name", -- UNIQUE descriptive name for a field Location.
    "location_id", -- UNIQUE ID to identify the field location in EIM.
    "result_lab_name", -- Name of lab that analyzed the sample.
    "result_comment", -- Comments about the Result Value
    "paramstatus", -- A flag used in coding R analysis; include or omit
    "paramgroup", -- A group of Parameters that are related in some way, such as metals analytes, pesticide compounds, or PCBs. This allows rapid searching for related Parameters.
    "result_validation_level", -- Third-party or independent expert data validation following the most updated versions of USEPA guidance and national functional guidelines. Pre-August 2013 data follows older guidance.
    "parameter", -- Name of the parameter reported for the result. Parameters are most often thought of as chemical analytes, but also include things like temperature, fish weight, flow, etc.
    "original_eim_units", -- Units of measure associated with a Result Value.
    "original_eim_result_value", -- The Results associated with a particular Study or field Location. Results represent specific information about an environmental Parameter.  Results are determined by field measurements, physical observations, or analytical procedures performed on samples. Results are the basis for the analysis of environmental conditions.
    "lab_analysis_time", -- The analysis time reported by the lab.
    "result_data_qualifier", -- Standard annotations for documenting issues with Result Values, such as non-detects or estimates.
    "wetseason", -- Wet season is October 1 through April 30
    "sample_preparation_method_description", -- Description of procedure or method used to prepare a sample.
    "season", -- Winter=1; spring=2; summer=3; fall=4
    "original_eim_parameter_name", -- Name of the parameter reported for the result. Parameters are most often thought of as chemical analytes, but also include things like temperature, fish weight, flow, etc.
    "original_eim_parameter_cas", -- A unique number assigned by the Chemical Abstracts Service (CAS) Division of the American Chemical Society to each distinct chemical substance recorded in the Chemical Registry System.
    "nondetect_flag", -- A true/false flag as to whether the data is non-detect 
    "lab_analysis_date_accuracy", -- Indicates if the Lab Analysis Date is accurate to the day, week, month, year or unknown.  Except for historical data, most cases are Day.
    "result_basis", -- Physical state in which the analyte concentration was reported - either as the sample was received by the lab (wet weight) or adjusted to remove moisture (dry weight).
    "field_filtered_flag", -- Indicates if a sample was filtered in the field (not the lab).
    "sample_matrix", -- Describes the environmental medium which was measured or from which a sample was taken.
    "sample_source", -- Describes the environmental resource which was measured or from which a sample was taken.  More specific than Sample Matrix.
    "sample_replicate_flag", -- Indicates that the sample is a field replicate. Field replicates are separate samples identically collected as close as possible to the same point in space and time as the original sample. They are stored in separate containers, each of which is identically processed and analyzed. Field replicates provide insight into field and laboratory procedure variability (and in some cases contaminant distribution).
    "sample_field_replicate_id", -- Secondary ID to identify a field replicate sample. Field replicates are separate samples identically collected as close as possible to the same point in space and time as the original sample. They are stored in separate containers, each of which is identically processed and analyzed. Field replicates provide insight into field and laboratory procedure variability (and in some cases contaminant distribution).
    "field_collection_comment", -- Comments or descriptive information about the collection of data in the field.
    "new_fraction_analyzed", -- Final fraction column following data compilation  
    "field_collector", -- Name or type of organization that collected the data.
    "type", -- Land use type or precipitation measurement. Land use types are either commercial (COM), industrial (IND), low density residential (LDR), or high density residential (HDR).
    "new_result_value", -- Final results column following data compilation  
    "plot_data", -- A true/false flag used in coding R plots
    "sample_method_other_type", -- Additional data input field for collection, preparation, or preservation method.
    "result_method", -- Code assigned to a procedure or method used to derive a result.  Includes lab (analytical), field (measurement), and derivation methods.
    "sample_percent_sorted", -- Percent of sample that is sorted. For benthic macroinvertebrate and periphyton count data.
    "study_name", -- UNIQUE title that describes the study.
    "access_id", -- Not Used.
    "digestion_method", -- Indicates the degree of digestion or breakdown performed on a solid sample prior to analysis.
    "field_collection_end_date", -- The date that data collection in the field ended.
    "sample_use", -- Indicates that the sample was collected for a specific purpose, namely background, reference, or test. Commonly used for sediment data.
    "field_collection_end_time", -- Time that field data collection ended, in local time.
    "field_collection_start_date", -- Date that field collection began.
    "field_collection_start_time", -- Time that field data collection began, in local time.
    "storm_event_qualifier", -- Codes such as NQ1, NQ2
    "sample_composite_flag", -- Indicates that the sample is a composite created by combining two or more discrete samples collected spatially and/or temporally.
    "result_method_description", -- Description of a procedure or method used to derive a result.  Includes lab (analytical), field (measurement), and derivation methods.
    "study_specific_location_id", -- Unique ID to identify the field location within a particular Study.
    "sample_collection_method_description", -- Description of a procedure or method used to collect a sample.
    "sample_sub_id", -- Secondary ID to identify a set of field split samples. Mostly used for sediment data. For split samples with the same Sample ID or with different Sample IDs.
    "result_additional_comment", -- Additional comments about the Result Value
    "sample_method_other_description", -- Additional data input field for collection, preparation, or preservation method.
    "field_collection_type", -- General type of data collection conducted in field.
    "sample_method_other", -- Additional data input field for collection, preparation, or preservation method.
    "paramclass", -- A class of Parameters that are related at a high level, such as inorganic and organic contaminants, or measurements.
    "study_id" -- UNIQUE ID to identify the Study in EIM.
FROM
    "wa-gov/municipal-stormwater-permit-outfall-data-d958-q2ci:latest"."municipal_stormwater_permit_outfall_data"
LIMIT 100;

Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query wa-gov/municipal-stormwater-permit-outfall-data-d958-q2ci with SQL in under 60 seconds.

Query Your Local Engine

Install Splitgraph Locally

bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"

Read the installation docs.

Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.

It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgrcan manage the image, container and volume for you.

There are a few ways to ingest data into the local engine.

For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.

For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr cloneand sgr checkout.

Cloning Data

Because wa-gov/municipal-stormwater-permit-outfall-data-d958-q2ci:latest is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.

First, install Splitgraph if you haven't already.

Clone the metadata with `sgr clone`

This will be quick, and does not download the actual data.

sgr clone wa-gov/municipal-stormwater-permit-outfall-data-d958-q2ci

Checkout the data

Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest tag:

sgr checkout wa-gov/municipal-stormwater-permit-outfall-data-d958-q2ci:latest

This will download all the objects for the latest tag of wa-gov/municipal-stormwater-permit-outfall-data-d958-q2ci and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.

Alternatively, use "layered checkout" to avoid downloading all the data

The data in wa-gov/municipal-stormwater-permit-outfall-data-d958-q2ci:latest is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:

sgr checkout --layered wa-gov/municipal-stormwater-permit-outfall-data-d958-q2ci:latest

This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.

Read the layered querying documentation to learn about when and why you might want to use layered queries.

Query the data with your existing tools

Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, wa-gov/municipal-stormwater-permit-outfall-data-d958-q2ci is just another Postgres schema.

wa-gov/municipal-stormwater-permit-outfall-data-d958-q2ci

Query the Data Delivery Network

Query Your Local Engine

Cloning Data

Clone the metadata with `sgr clone`

Checkout the data

Alternatively, use "layered checkout" to avoid downloading all the data

Query the data with your existing tools

Related Documentation:

Product

Support

Company

Community

Query the Data Delivery Network

Query Your Local Engine

Cloning Data

Clone the metadata with sgr clone

Checkout the data

Alternatively, use "layered checkout" to avoid downloading all the data

Query the data with your existing tools

Related Documentation:

Product

Support

Company

Community

Clone the metadata with `sgr clone`