Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the waste_management_resource_conservation_and
table in this repository, by referencing it like:
"pa-gov/waste-management-resource-conservation-and-8aqs-mzku:latest"."waste_management_resource_conservation_and"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"county_code", -- The FIPS county code is a five-digit Federal Information Processing Standard (FIPS) code (FIPS 6-4) which uniquely identifies counties and county equivalents in the United States, certain U.S. possessions, and certain freely associated states.Each State has its own 2-digit number and each County within the state has its own 3-digit number which are combined into a 5-digit number to uniquely identify every US county. For more technical details : Federal Information Processing Standards Publications (FIPS PUBS) are issued by the National Institute of Standards and Technology (NIST) after approval by the Secretary of Commerce pursuant to Section 111 (d) of the Federal Property and Administrative Services Act of 1949 as amended by the Computer Security Act of 1987, Public Law 100-235. Federal Information Processing Standard (FIPS) 6-4, Counties and Equivalent Entities of the U.S., Its Possessions, and Associated Areas -- 90 Aug 31 , provides the names and codes that represent the counties and other entities treated as equivalent legal and/or statistical subdivisions of the 50 States, the District of Columbia, and the possessions and freely associated areas of the United States. Counties are considered to be the "first-order subdivisions" of each State and statistically equivalent entity, regardless of their local designations (county, parish, borough, etc.). Information gathered from census data - https://www.census.gov/library/reference/code-lists/ansi.html
"rhwc_waste_code_desc", -- Description of code used to describe hazardous waste. (WasteCode) (For Codes and Descriptions, see RCRA Waste Codes.xlsx in the attachment on the Primer page).
"hd_waste_code_id", -- Hazardous waste codes describing the handler's hazardous waste streams. Key field for the next three (3) columns.
"generated", -- Code indicating that the handler is engaged in generating waste on site. (Y/N)
"universal_waste_type_desc", -- Description of code indicating the type of universal waste.
"universal_waste_owner", -- Indicates the agency that defines the universal waste type. (Defaults to 'HQ')
"rhswc_waste_code_type", -- Code used to describe hazardous waste. (For Codes and Descriptions, see RCRA Waste Codes.xlsx attachment in the primer page.)
"hd_sec_waste_code_id", -- Hazardous waste codes describing the handler's hazardous waste streams. Key field for the next three (3) columns.
"land_based_unit_ind_text", -- Descriptive text describing the code to indicate if the Hazardous Secondary Material (HSM) is being managed in a Land Based Unit.
"land_based_unit_ind", -- Code to indicate if the Hazardous Secondary Material (HSM) is being managed in a Land Based Unit.
"actl_short_tons_qnty", -- The actual amount of Hazardous Secondary Material (HSM) generated by the Handler.
"estimated_short_tons_qnty", -- The estimated amount of Hazardous Secondary Material (HSM) generated by the Handler.
"fac_type_code_desc", -- Description of the type of facility generating Hazardous Secondary (HSM).
"fac_type_code", -- Type of facility generating Hazardous Secondary Material.
"hsm_seq_num", -- Unique number identifying the Hazardous Secondary Material Activity (HSM) Activity for the Handler.
"hd_sec_material_activity", -- Hazardous Secondary Material activity (HSM) of the Handler. Key field for the next eight (8) columns.
"zip", -- Mailing address information for the owner/operator.
"city", -- Mailing address information for the owner/operator.
"street2", -- Mailing address information for the owner/operator.
"street1", -- Mailing address information for the owner/operator.
"fax", -- Contact fax number for the owner/operator.
"email_address", -- Email address data for the owner/operator.
"title", -- Title of the contact person or the title of the person who certified the handler information reported to the authorizing agency.
"org_name", -- Organization Name.
"middle_initial", -- Middle Initial of Contact information for the owner/operator.
"first_name", -- First Name of Contact information for the owner/operator.
"rho_notes", -- Notes for the facility Owner/Operator.
"date_became_current", -- Date indicating when the owner/operator became current.
"owner_op_type", -- Code indicating the owner/operator type.
"hd_ownerop_id", -- Handler owner and operator information. Key field for the next twenty-four (24) columns.
"naics_code", -- The North American Industry Classification System (NAICS) Code that identifies the business activities of the facility.
"hd_naics_id", -- North American Industry Classification Status codes reported for the handler. Key field for the next four (4) columns.
"rhlc_contact_first_name", -- Large Quantity Generator (LQG) First name of the consolidation contact.
"rhlc_mail_city", -- Large Quantity Generator (LQG) Consolidation mailing city address information.
"rhlc_mail_street2", -- Large Quantity Generator (LQG) Consolidation supplemental mailing address information.
"rhlc_mail_street1", -- Large Quantity Generator (LQG) Consolidation mailing street address information.
"rhlc_handler_id", -- Large Quantity Generator (LQG) Code that uniquely identifies the consolidation handler.
"rhlc_seq_number", -- Large Quantity Generator (LQG) Unique number that identifies the Consolidation.
"hd_lqg_consolidation_id", -- Large Quantity Generator (LQG) consolidation info for a Handler. Key field for the next nineteen (19) columns.
"date_closed", -- Date closed.
"rhewc_waste_code_text", -- Descriptive text describing the Waste Code.
"rhewc_waste_code", -- Code used to describe hazardous waste.
"hd_episodic_waste_code_id", -- Episodic waste code details for Handler Episodic Waste. Key field for the next three (3) columns.
"waste_desc", -- Waste description.
"rhew_seq_number", -- Unique number that identifies the episodic waste.
"hd_episodic_waste_id", -- Episodic waste info for a Handler Episodic Event. Key field for the next three (3) columns
"rhee_end_date", -- Episodic event end date.
"rhee_start_date", -- Episodic event start date.
"rhee_contact_fax", -- Fax number of the contact of the episodic event.
"rhee_contact_phone", -- Telephone number of the contact of the episodic event.
"rhee_contact_last_name", -- Last name of the contact of the episodic event.
"rhee_contact_first_name", -- First name of the contact of the episodic event.
"event_other_desc", -- Other description of the episodic event.
"event_type_desc", -- Description of the type of the episodic event.
"hd_episodic_event_id", -- Identifier of an Episodic event info for a Handler. Key field for the next fifteen (15) columns
"cert_email_text", -- Email address of a person who certified the handler information reported to the authorizing agency.
"cert_middle_initial", -- Middle initial of a person who certified the handler information reported to the authorizing agency.
"cert_first_name", -- First name of a person who certified the handler information reported to the authorizing agency.
"cert_title", -- Title of the contact person or the title of the person who certified the handler information reported to the authorizing agency.
"cert_signed_date", -- Date on which the handler information was certified by the reporting site.
"cert_seq", -- Sequence number for each certification for the handler.
"hd_certification_id", -- Certification information for the person who certified report to the authorizing agency. Key field for the next seven (7) columns.
"recycler_notes", -- Notes for recycling hazardous waste.
"effc_date", -- The Effective Date of the action: 1. Hazardous Secondary Material notification in Handler, 2. Corrective Action Authority, 3. Financial Assurance Mechanism.
"state_waste_generator_desc", -- Description of code indicating that the handler is engaged in the generation of hazardous waste.
"transfer_facility_ind", -- Code indicating that the handler is a Hazardous Waste Transfer Facility (not to be confused with a used oil transfer facility). (Y/N)
"short_term_gen_ind", -- Code indicating that the handler is engaged in short-term hazardous waste generation activities. (Y/N)
"furnace_exemption", -- Code indicating that the handler qualifies for the Smelting, Melting, and Refining Furnace Exemption. (Y/N)
"tsd_activity", -- Code indicating that the handler is engaged in the treatment, storage, or disposal of hazardous waste. (Y/N)
"importer_activity", -- Code indicating that the handler is engaged in importing hazardous waste into the United States.
"state_district", -- Code indicating the state-designated legislative district(s) in which the site is located.
"land_type", -- Code indicating current ownership status of the land on which the facility is located. (For Codes and Descriptions, refer to RCRA Handler Land Type Codes.xlsx document attached in the primer page).
"used_oil_refiner", -- Code indicating that the handler is engaged in re-refining used oil activities. (Y/N)
"used_oil_processor", -- Code indicating that the handler is engaged in processing used oil activities. (Y/N)
"contact_zip", -- Handler contact zip mailing address information.
"contact_state", -- Handler contact state mailing address information. State USPS Code.
"contact_city", -- Handler contact city mailing address information; Mailing Address City Name.
"contact_street2", -- Handler contact supplemental mailing address information.
"contact_street1", -- Handler contact mailing address information.
"contact_fax", -- Handler contact contact fax number.
"contact_phone", -- Handler contact phone number.
"contact_email_address", -- Handler contact email address data.
"contact_title", -- Handler contact title of the contact person or the title of the person who certified the handler information reported to the authorizing agency.
"contact_last_name", -- Handler contact information; Last Name.
"contact_middle_initial", -- Handler contact information; Middle Initial.
"mail_zip", -- Handler zip code mailing address information.
"mail_state", -- Handler state mailing address information.
"mail_city", -- Handler mailing address information.
"mail_street2", -- Handler supplemental mailing address information
"mail_street1", -- Handler mailing address information.
"location_zip", -- Handler zip code location address information.
"location_city", -- Handler city location address information.
"location_street2", -- Handler street location supplemental address information.
"location_street1", -- Handler street location address information.
"han_notes", -- Notes regarding the handler.
"county_name", -- Name of the county in which the facility is located.
"accessibility", -- Code indicating the reason why the handler is not accessible for normal RCRA tracking and processing.
"tsd_date", -- The date that operation of the facility commenced, the date construction on the facility commenced, or the date that operation is expected to begin.
"acknowledge_date", -- Date information was received for the handler.
"handler_name", -- Name of the Handler.
"receive_date", -- Date that the form (indicated by the associated Source) was received from the handler by the appropriate authority.
"seq_number", -- Sequence number for each source record about a handler.
"source_type_desc", -- Description of Code indicating the source of information for the associated data (activity, wastes, etc.).
"universal_waste_type", -- Code indicating the type of universal waste.
"mail_addr_num_txt", -- Mailing Address Street Number for the owner/operator.
"expected_closure_date", -- Date of expected closure.
"est_qnty", -- The quantity of waste that is handled by each process code. This element pertains only to Part A submissions.
"rhee_contact_title", -- Title of the contact of the episodic event.
"recognized_trader_importer", -- Indicates that the Handler is participating in Import Trading activity. (Y/N)
"recycling_ind", -- Indicates the facility has a recycling process which the product has levels of hazardous constituents that are not comparable to or unable to be compared to a legitimate product or intermediate but that the recycling is still legitimate (Y/N)
"onsite_burner_exemption", -- Code indicating that the handler qualifies for the Small Quantity Onsite Burner Exemption. (Y/N)
"used_oil_transporter", -- Code indicating that the handler is engaged in used oil transportation and/or transfer facility activities. (Y/N)
"contact_street_number", -- Handler contact Address Street Number.
"contact_org_name", -- Handler contact information; Organization Formal Name.
"off_site_receipt", -- Code indicating that the handler, whether public or private, currently accepts hazardous waste from another site (site identified by a different EPA ID). If information is also available on the specific processes and wastes which are accepted, it is indicated by a flag at the process unit level (Process Unit Group Commercial Status).
"rhwc_waste_code_type", -- Code used to describe hazardous waste. (For Codes and Descriptions, see RCRA Waste Codes.xlsx in the attachment on the Primer page).
"rhwc_waste_code_owner", -- Indicates the agency that owns the data record. (Defaults to 'HQ')
"hd_universal_waste_id", -- Information about universal waste generated by the handler. Key field for the next five (5) columns.
"rhswc_waste_code_type_desc", -- Description of code used to describe hazardous waste. (For Codes and Descriptions, see RCRA Waste Codes.xlsx attachment in the primer page).
"phone", -- Telephone Number data for the owner/operator.
"last_name", -- Last Name of Contact information for the owner/operator.
"owner_op_type_desc", -- Description of code indicating the owner/operator type.
"owner_op_ind_desc", -- Description of code indicating whether the data is associated with a current or previous owner or operator. The system will allow multiple current owners and operators.
"owner_op_ind", -- Code indicating whether the data is associated with a current or previous owner or operator. The system will allow multiple current owners and operators.
"naics_desc", -- Description of the North American Industry Classification System Code that identifies the business activities of the facility.
"naics_seq", -- Sequence number for each North American Industry Classification Status (NAICS) code for the handler.
"rhlc_contact_phone_ext", -- Large Quantity Generator (LQG) Telephone number extension of the consolidation contact.
"rhlc_mail_state", -- Large Quantity Generator (LQG) Consolidation mailing state address information.
"rhlc_handler_name", -- Large Quantity Generator (LQG) Name of the consolidation handler
"in_compliance_ind", -- Type of in compliance. (Y/N)
"closure_type_desc", -- Description of the type of the closure.
"event_type", -- Type of the episodic event.
"cert_last_name", -- Last name of a person who certified the handler information reported to the authorizing agency.
"recycler_ind", -- Code indicating whether or not the handler performs recycling that produces a product containing hazardous material (Y/N)
"college_ind", -- Indicates whether or not the Handler is a College or University opting into SubPart K. (Y/N)
"fed_waste_generator_desc", -- Description of code indicating that the handler is engaged in the generation of hazardous waste.
"state_waste_generator", -- Code indicating that the handler is engaged in the generation of hazardous waste.
"underground_injection_activity", -- Code indicating that the handler generates and or treats, stores, or disposes of hazardous waste and has an injection well located at the installation. (Y/N)
"state_district_owner", -- Owner of the state district code. Usually 2-digit postal code (i.e. KS).
"land_type_desc", -- Description of code indicating current ownership status of the land on which the facility is located.
"used_oil_transfer_facility", -- Code indicating that the handler owns or operates a used oil transfer facility. (Y/N)
"used_oil_market_burner", -- Code indicating that the handler directs shipments of used oil to burners. (Y/N)
"contact_phone_ext", -- Handler contact telephone number extension.
"contact_first_name", -- Handler contact information; First Name.
"location_country", -- Handler location address information; Country Name.
"location_state", -- Handler state location address information; State USPS Code.
"county_code_owner", -- Indicates the agency that defines the county code.
"non_notifier", -- Flag indicating that the handler has been identified through a source other than Notification and is suspected of conducting Resource Conservation and Recovery Act (RCRA) -regulated activities without proper authority.
"source_type", -- Code indicating the source of information for the associated data (activity, wastes, etc.).
"handler_id", -- Code that uniquely identifies the handler.
"rhee_contact_middle_initial", -- Middle initial of the contact of the episodic event.
"contact_country", -- Handler contact country mailing address information.
"rhee_contact_email_address", -- Email address of the contact of the episodic event.
"rhee_contact_phone_ext", -- Phone extension of the contact of the episodic event.
"hd_lqg_closure_id", -- Large Quantity Generator (LQG) closure info for a Handler. Key field for the next six (6) columns
"event_owner", -- Owner of the episodic event. Defaults to 'HQ'
"mail_country", -- Handler country mailing address information.
"rhswc_waste_code_owner", -- Indicates the agency that owns the data record.
"accumulated", -- Code indicating that the handler is engaged in accumulating waste on site. (Y/N)
"fac_code_owner_name", -- Owner of the Facility Code. Should be HQ or the state code (i.e. KS)
"country", -- Mailing address information for the owner/operator.
"state", -- Mailing address information for the owner/operator.
"phone_ext", -- Telephone number extension for the owner/operator.
"date_ended_current", -- Date indicating when the owner/operator changed to a different owner/operator.
"owner_op_seq", -- Sequential number used to order multiple occurrences of owners and operators.
"naics_owner", -- Indicates the agency that defines the NAICS Code. (Defaults to 'HQ')
"rhlc_contact_phone", -- Large Quantity Generator (LQG) Telephone Number data of the consolidation contact.
"rhlc_contact_org_name", -- Large Quantity Generator (LQG) Name of the consolidation contact organization.
"rhlc_contact_fax", -- Large Quantity Generator (LQG) Contact fax number of the consolidation contact.
"rhlc_contact_title", -- Large Quantity Generator (LQG) Title of the consolidation contact person.
"rhlc_contact_middle_initial", -- Large Quantity Generator (LQG) Middle initial of the consolidation contact.
"rhlc_mail_zip", -- Large Quantity Generator (LQG) Consolidation mailing zip address information.
"rhlc_mail_country", -- Large Quantity Generator (LQG) Consolidation mailing county address information.
"rhlc_contact_email_address", -- Large Quantity Generator (LQG) Email address of the consolidation contact.
"rhlc_contact_last_name", -- Large Quantity Generator (LQG) Last name of the consolidation contact.
"closure_type", -- Type of the closure.
"new_closure_date", -- New closure date.
"waste_code_owner", -- Owner and definer of the waste code. Defaults to 'HQ'
"rhee_contact_org_name", -- Contact organization name of the episodic event.
"recycler_activity", -- Code indicating that the handler is engaged in recycling hazardous waste. (Y/N)
"subpart_p_withdrawal", -- Indicates if it's a Subpart P withdrawal (Y/N)
"financial_assurance_ind", -- Indicates whether or not the facility has provided Financial Assurance for the HSM Activities
"withdrawal_ind", -- Indicates whether or not the Handler is withdrawing from Sub Part K. (Y/N)
"state_waste_generator_owner", -- Indicates the agency that defines the generator status type. (Defaults to 'PA')
"reverse_distributor", -- Indicates that the Handler is a reverse distributor (Y/N)
"manifest_broker", -- Identifies that Handler is Manifest Broker. (Y/N)
"healthcare_fac", -- Indicates that the Handler is a health care facility (Y/N)
"slab_importer_ind", -- Indicates that the Handler is participating in Slab Import activity. (Y/N)
"fed_waste_generator", -- Code indicating that the handler is engaged in the generation of hazardous waste.
"mixed_waste_generator", -- Code indicating that the handler is engaged in generating mixed waste (waste that is both hazardous and radioactive). (Y/N)
"used_oil_burner", -- Code indicating that the handler is engaged in the burning of used oil fuel. (Y/N)
"used_oil_spec_marketer", -- Code indicating that the handler is a marketer who first claims the used oil meets the specifications. (Y/N)
"fed_waste_generator_owner", -- Indicates the agency that defines the generator status type. (Defaults to 'HQ')
"transporter_activity", -- Code indicating that the handler is engaged in the transportation of hazardous waste. ( Y/N)
"notification_rsn_code", -- Indicates the reason for notifying Hazardous Secondary Material.
"non_profit_ind", -- Indicates whether or not the Handler is a Non-Profit opting into SubPart K. (Y/N)
"universal_waste_dest_facility", -- Code indicating that the handler treats, disposes of, or recycles hazardous waste on site. (Y/N)
"recognized_trader_exporter", -- Indicates that the Handler is participating in Export Trading activity. (Y/N)
"slab_exporter_ind", -- Indicates that the Handler is participating in Slab Export activity. (Y/N)
"hospital_ind", -- Indicates whether or not the Handler is a Hospital opting into SubPart K. (Y/N)
"acknowledge_flag", -- Top level of all information about the handler.
"activity_location" -- Indicates the location of the agency regulating the handler.
FROM
"pa-gov/waste-management-resource-conservation-and-8aqs-mzku:latest"."waste_management_resource_conservation_and"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query pa-gov/waste-management-resource-conservation-and-8aqs-mzku
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because pa-gov/waste-management-resource-conservation-and-8aqs-mzku:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone pa-gov/waste-management-resource-conservation-and-8aqs-mzku
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout pa-gov/waste-management-resource-conservation-and-8aqs-mzku:latest
This will download all the objects for the latest
tag of pa-gov/waste-management-resource-conservation-and-8aqs-mzku
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in pa-gov/waste-management-resource-conservation-and-8aqs-mzku:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered pa-gov/waste-management-resource-conservation-and-8aqs-mzku:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, pa-gov/waste-management-resource-conservation-and-8aqs-mzku
is just another Postgres schema.