Splitgraph has been acquired by EDB!

pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5

Query the Data Delivery Network

The easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432 and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.

For example, you can query the emissions_inventory_system_eis_emissions_2017 table in this repository, by referencing it like:

"pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5:latest"."emissions_inventory_system_eis_emissions_2017"

or in a full query, like:

SELECT
    ":id", -- Socrata column ID
    "emis_unit_proc_id", -- A designator used to uniquely identify an emissions process.
    "unit_effectivedate", -- The date on which the identifier became effective.
    "reductionefficiency", -- The percent reduction achieved for the pollutant when all control measures are operating as designed.
    "controlmeasure", -- Identifies the piece of equipment or practice that is used to reduce one or more pollutants. 11 Single Cyclone 13 Multiple Cyclone 14 Centrifugal Separator 21 Spray Chamber 22 Cyclone Scrubber 23 Orfice Scrubber 24 Mechanical Scrubber 25 Venturi Scrubber 26 Packed Tower 27 Wet Filters 28 Water Curtain 29 Water Spray 31 Baghouse - Manual Cleaning 32 Baghouse - Mechanical Shakers 33 Baghouse - Pneumatic Shakers 34 Baghouse - Bag Collapse 35 Baghouse - Sonic Cleaning 36 Baghouse - Reverse Air Flow 37 Baghouse - Reverse Air Jets 39 Baghouse - Unknown Cleaning Mechanism 41 Single Stage Electrostatic Precipitator 42 Two Stage Electrostatic Precipitator 51 Settling Chamber 52 Impingement Separator 53 Panel Filters 54 Enclosure 55 Gravel Bed Filter 56 Annular Ring Filter 57 Demister 58 Continuous Medium Filter 61 Direct Flame Incin. w/o heat exchange 62 Catalytic Incinerator w/o heat exchange 63 Direct Flame Incin. with heat exchange 64 Catalytic Incinerator with heat exchange 71 Fixed Bed Adsorber 72 Continuous Adsorber 81 Surface Condenser 82 Contact Condenser 91 Packed Tower Adsorber 92 Plate or Tray Tower Adsorber 93 Spray Tower Adsorber 94 Spray Chamber Adsorber 95 Venturi Chamber Adsorber 96 Impingement Scrubber 97 Adsorption Column 101 Selective Catalytic Reduction 102 Selective Non-Catalytic Reduction 103 Non-Selective Catalytic Reduction 104 Oxidation Catalyst 105 Low NOx Burners 106 Water Injection 107 Steam Injection 108 Flue Gas Recirculation (FGR) 111 Heat Recovery Steam Generator (HRSG)
    "controlapproachdescription", -- Identifies the overall control system or approach, including capture effectiveness, where applied at an emissions unit to reduce the amount of pollutants released into the environment.
    "unitdesigncapacity", -- The measure of the size of the unit based on the maximum continuous throughput capacity of the unit. Measured in units of  “E6BTU/HR” (million Btu/hr).
    "emis_unit_id", -- Identifiers by which the emissions unit is known or has been known.
    "effectivedate", -- Date that the Air Quality primary facility was created in eFACTS.
    "longitudemeasure", -- The measure of the angular distance on a meridian east or est of the prime meridian. 
    "ems_unt_prc_rpt_pr_sp_cl", -- An identifier identifying the group of the next 5 columns.
    "latitudemeasure", -- The measure of the angular distance on a meridian north or south of the equator. 
    "locationaddresspostalcode", -- The code the represents a U.S. Zip code or International postal code. 
    "locationcountycode", -- State's County Code. For Pennsylvania the county codes are from 01 - 67 in Alphabetical order.
    "locationcountyname", -- Pennsylvania County Name
    "processcomment", -- Any comments regarding the emissions process.
    "geocoded_column", -- Georeferenced point column with Latitude and Longitude for use in creating map visuals. A generic point is given for each county to aid in the creation of placing metrics on map by county. 
    "averagehoursperday", -- The average number of hours per day that the emissions process is active within the reporting period.
    "facilitysiteid", -- Identifiers by which the facility site is known or has been know, and the system associated with the identifier
    "sp_cl_parameternumeratorumc", -- The numerator unit of measure for the parameter.  When this is null, then Parameter Value is a percentage as described in the Parameter Comment. Code values are as follows: E6FT3S: MILLION STANDARD CUBIC FEET,  E3GAL: 1000 GALLONS, TON:TONS,  E6BTU:MILLION BTUS
    "percentspringactivity", -- The percentage of the annual activity that occurred during the Spring months (March, April, May).
    "sp_cl_parametercomment", -- Any comments regarding the parameter. Null - TON,  Code values are as follows: E6FT3S: MILLION STANDARD CUBIC FEET,  E3GAL: 1000 GALLONS, TON:TONS,  E6BTU:MILLION BTUS
    "pollutant", -- The pollutant which is controlled by the control measure. The number prefixes are part of the chemical name describing the isomer ( ions or molecules with identical formulas but distinct structures ) part of the name.  For example - 1,2,3,4,5,6 refer to the positions on the cyclohexane molecule where a chlorine atom is substituted for a hydrogen.  The carbon atoms on the cyclohexane ring can be numbered 1 through 6.  Due to the symmetry of this particular molecule, it doesn’t really matter where we start counting on the ring.  What does matter is that each number is only mentioned once, which means there is only 1 chlorine atom attached to each carbon.  One can envision other forms of Hexachlorocyclohexane in which multiple chlorines are attached to a single carbon atom.  For example: 1,1,2,2,3,3 Hexachlorocyclohexane would have two chlorine atoms attached to each of 3 sequential carbon atoms, with none on the other three…  Now 1,2,3,4,5,6 Hexachlorocyclohexane can refer to a mixture of isomers of that molecule.  This is because the ring of cyclohexane’s carbon atoms locks the attached hydrogens and chlorines onto one side of the ring, and this naming convention is not specific enough to tell which chlorine atoms are locked “above” the ring and which are “below it”.  So in order to know which specific isomer we are talked about, a more specific name is required.  This is where the other numbers can help.  “58-89-9” is the  Chemical Abstracts Service (registration) number, sometimes abbreviated CAS # or CASRN.  This is really not part of the name of the molecule, but rather a unique serial number assigned to the molecule.  The number itself is pretty much arbitrarily assigned by the American Chemical Society, but it is important in that it is unique, like a bar code for the molecule.  The number is typically written in 3 parts separated by dashes, with two to seven digits in the first part, two in the second, and one in the third.  If we google CAS#58-89-9  we find out that we are working with the “gamma” isomer of 1,2,3,4,5,6 Hexachlorocyclohexane where the chlorines are arranged around the ring in an up-up-down-up-up-down configuration.  This is also known as “Lindane” a chemical which has been used as an insecticide and a pharmaceutical.
    "unitcomment", -- Any comments regarding the emissions unit activity.
    "locationaddressstatecode", -- The alphabetic codes that represent the name of the principal administrative subdivision of the United States, Canada, or Mexico. Standard 2-digit state code.
    "localityname", -- The name of the city, town, village, or other locality.
    "proc_identifier", -- An identifier by which an element is referred to in another system.
    "proc_controlapproachdesc", -- Description of the overall control system or approach applied to an emissions unit or process.
    "proc_approachcomment", -- Comments regarding the control approach.
    "emis_unit_proc_rpt_prd_id", -- An identifier identifying the group of the next 13 columns.
    "actualhoursperperiod", -- Actual number of hours the process is active or operating during for the reporting period.
    "calculationmaterial", -- Name of the material or fuel processed.
    "sp_cl_parametertype", -- Name of the parameter that describes the type of activity, throughput or input used in the calculation.
    "averageweeksperperiod", -- The average number of weeks that the emissions process is active within the reporting period.
    "percentwinteractivity", -- The percentage of the annual activity that occurred during the Winter months (December, January, February).
    "denominatorunitofmeasurecode", -- The denominator for the unit of measure of the reported emission factor  (in TONS).
    "percentfallactivity", -- The percentage of the annual activity that occurred during the Fall months (September, October, November).
    "reporting_pollutant", -- Code identifying the pollutant for which emissions are reported.
    "emissioncalculationmethod", -- Defines the method used to calculate emissions.
    "totalemissions", -- Total calculated or estimated amount of the pollutant (in TONS).
    "emissionfactor", -- The emission factor used for the emissions value if a calculated value was provided  (in TONS). The EMISSION_FACTOR = ROUND(source emission amount/throughput amount).
    "emis_unit_proc_rpt_prd_emis", -- An identifier identifying the group of the previous 5 columns.
    "emissionsyear", -- Year of the emission (discharge of criteria air pollutants (CAPs) and hazardous air pollutants (HAPs))
    ":@computed_region_rayf_jjgk",
    ":@computed_region_r6rf_p9et",
    ":@computed_region_amqz_jbr4",
    ":@computed_region_d3gw_znnf",
    "unitoperationdate", -- The date in which the unit commenced operational activities
    "period_startdate", -- The date on which the reporting period began.  Applies to the reporting of episodic or event emissions only.
    "period_enddate", -- The date on which the reporting period ended.
    "release_point_desc", -- Identifiers by which the emissions release point is known or has been known.
    "averagepercentemissions", -- The average annual percent of an emissions process that is vented through a release point.
    "sp_cl_parameterdenominatorumc", -- The denominator unit of measure for the parameter.  When this is null, then Parameter Value is a percentage as described in the Parameter Comment. Code values are as follows: E6FT3S: MILLION STANDARD CUBIC FEET,  E3GAL: 1000 GALLONS, TON:TONS,  E6BTU:MILLION BTUS
    "sp_cl_parametervalue", -- The value of the parameter.
    "unitdescription", -- Text description of the emissions unit.
    "calculationparametervalue", -- Activity or throughput of the process for a given time period.
    "parameterunitofmeasure", -- Code and description for the unit of measure for calculation parameter value.
    "averagedaysperweek", -- The average number of days per week that the emissions process is active within the reporting period.
    "percentsummeractivity", -- The percentage of the annual activity that occurred during the Summer months (June, July, August).
    "facilitysiteidentifier", -- A state's designated identifier by which the facility site is referred to by a system. The name of the facility is in the Facility Site Name  column.
    ":@computed_region_nmsq_hqvv",
    "stateandcountyfipscode", -- The list is from FIPS Counties codes used for the identification of the Counties and County equivalents of the United States.  The FIPS county code is a five-digit Federal Information Processing Standard (FIPS) code which uniquely identifies counties and county equivalents in the United States, certain U.S. possessions, and certain freely associated states. The first 2-digits are the State code and the last 3-digits are the county codes. 42001 Adams            42003 Allegheny        42005 Armstrong        42007 Beaver           42009 Bedford          42011 Berks            42013 Blair            42015 Bradford         42017 Bucks            42019 Butler           42021 Cambria          42023 Cameron          42025 Carbon           42027 Centre           42029 Chester          42031 Clarion          42033 Clearfield       42035 Clinton          42037 Columbia         42039 Crawford         42041 Cumberland       42043 Dauphin          42045 Delaware         42047 Elk              42049 Erie             42051 Fayette          42053 Forest           42055 Franklin         42057 Fulton           42059 Greene           42061 Huntingdon       42063 Indiana          42065 Jefferson        42067 Juniata          42069 Lackawanna       42071 Lancaster        42073 Lawrence         42075 Lebanon          42077 Lehigh           42079 Luzerne          42081 Lycoming         42083 McKean           42085 Mercer           42087 Mifflin          42089 Monroe           42091 Montgomery       42093 Montour          42095 Northampton      42097 Northumberland   42099 Perry            42101 Philadelphia     42103 Pike             42105 Potter           42107 Schuylkill       42109 Snyder           42111 Somerset         42113 Sullivan         42115 Susquehanna      42117 Tioga            42119 Union            42121 Venango          42123 Warren           42125 Washington       42127 Wayne            42129 Westmoreland     42131 Wyoming          42133 York            
    "unittype", -- Identifies the type of emissions unit activity. 100 Boiler. 270  Incinerator. 180 Process Heater 200 Furnace 220 Calciner 280 Flare 350 Process Equipment Fugitive Leaks 400 Storage Tank 430 Degreaser 450 Spray Booth or Coating Line 450 Spray Booth or Coating Line 470 Printing Line 470 Printing Line 470 Printing Line 480 Gasoline Loading Rack or Arm 600 Chemical Reactor 610 Oxidation Unit 620 Distillation Columns/Stripper 640 Mixer 680 Cooling Tower 720 Crusher 730 Grinder 740 Screen 760 Conveyor 770 Transfer Point 780 Silo 785 Open Storage Pile 255 Dryer
    "sourceclassificationcode", -- EPA Source Classification code that identifies an emissions process. The description is given in the next field, processdescription.
    "processdescription" -- A text description of the emissions process.
FROM
    "pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5:latest"."emissions_inventory_system_eis_emissions_2017"
LIMIT 100;

Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5 with SQL in under 60 seconds.

Query Your Local Engine

Install Splitgraph Locally

bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"

Read the installation docs.

Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.

It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgrcan manage the image, container and volume for you.

There are a few ways to ingest data into the local engine.

For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.

For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr cloneand sgr checkout.

Cloning Data

Because pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5:latest is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.

First, install Splitgraph if you haven't already.

Clone the metadata with `sgr clone`

This will be quick, and does not download the actual data.

sgr clone pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5

Checkout the data

Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest tag:

sgr checkout pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5:latest

This will download all the objects for the latest tag of pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5 and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.

Alternatively, use "layered checkout" to avoid downloading all the data

The data in pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5:latest is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:

sgr checkout --layered pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5:latest

This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.

Read the layered querying documentation to learn about when and why you might want to use layered queries.

Query the data with your existing tools

Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5 is just another Postgres schema.

pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5

Query the Data Delivery Network

Query Your Local Engine

Cloning Data

Clone the metadata with `sgr clone`

Checkout the data

Alternatively, use "layered checkout" to avoid downloading all the data

Query the data with your existing tools

Related Documentation:

Product

Support

Company

Community

Query the Data Delivery Network

Query Your Local Engine

Cloning Data

Clone the metadata with sgr clone

Checkout the data

Alternatively, use "layered checkout" to avoid downloading all the data

Query the data with your existing tools

Related Documentation:

Product

Support

Company

Community

Clone the metadata with `sgr clone`