pa-gov/agriculture-production-data-in-pennsylvania-1850-a5ik-vwk6
Loading...

Query the Data Delivery Network

Query the DDN

The easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432 and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.

For example, you can query the agriculture_production_data_in_pennsylvania_1850 table in this repository, by referencing it like:

"pa-gov/agriculture-production-data-in-pennsylvania-1850-a5ik-vwk6:latest"."agriculture_production_data_in_pennsylvania_1850"

or in a full query, like:

SELECT
    ":id", -- Socrata column ID
    "home_made_manufactures", -- The cash value of all articles manufactured within the year preceding the 1st day of June 1850, in or by the family, whether for home use or for sale. If the raw material has been purchased for such manufacture, the value of such raw material should not be included: the object being to ascertain the value of manufacturers by the family from their own productions, or the value of the labor expended on the production of others. This discrimination is important. 
    "beeswax_honey", -- Beeswax Honey measured in whole number of pounds raised on the farm during the year ending the 1st day of June 1850. 
    "maple_sugar", -- Maple Sugar measured in whole number of pounds raised on the farm during the year ending the 1st day of June 1850. 
    "silk_cocoons", -- Silk Cocoons measured in whole number of pounds raised on the farm during the year ending the 1st day of June 1850. 
    "hemp", -- Hemp measured in whole number of tons raised on the farm during the year ending the 1st day of June 1850. The grain which is gathered in that year is meant, though it may have been sown in 1848.
    "hops", -- Hops measured in whole number of pounds raised on the farm during the year ending the 1st day of June 1850. The grain which is gathered in that year is meant, though it may have been sown in 1848.
    "hay", -- Hay measured in whole number of tons raised on the farm during the year ending the 1st day of June 1850. The grain which is gathered in that year is meant, though it may have been sown in 1848.
    "butter", -- Butter measured in pounds on the farm during the year ending the 1st day of June. 
    "buckwheat", -- Buckwheat measured in whole number of bushels raised on the farm during the year ending the 1st day of June 1850. The grain which is gathered in that year is meant, though it may have been sown in 1848.
    "barley", -- Barley measured in whole number of bushels raised on the farm during the year ending the 1st day of June 1850. The grain which is gathered in that year is meant, though it may have been sown in 1848.
    "sweet_potatoes", -- Sweet Potatoes measured in whole number of bushels raised on the farm during the year ending the 1st day of June 1850. The grain which is gathered in that year is meant, though it may have been sown in 1848.
    "irish_potatoes", -- Irish Potatoes measured in whole number of bushels raised on the farm during the year ending the 1st day of June 1850. The grain which is gathered in that year is meant, though it may have been sown in 1848.
    "indian_corn", -- Indian Corn measured in whole number of bushels raised on the farm during the year ending the 1st day of June 1850. The grain which is gathered in that year is meant, though it may have been sown in 1848. 
    "rye", -- Rye measured in whole number of bushels raised on the farm during the year ending the 1st day of June 1850. The grain which is gathered in that year is meant, though it may have been sown in 1848. 
    "live_stock_value", -- The cash value of all live stock on hand on the 1st day of June 1850. 
    "sheep", -- Whole number of Sheep which belong to the farm on the 1st day of June 1850, which are one year old and older.
    "other_cattle", -- Whole number of other cattle (the number of all cattle not before enumerated) which belong to the farm on the 1st day of June 1850, which are one year old and older.
    "working_oxen", -- Whole number of Working Oxen which belong to the farm on the 1st day of June 1850, which are one year old and older.
    "milch_cows", -- Whole number of milch cows which belong to the farm on the 1st day of June 1850, which are one year old and older. Milch Cows are Cattle that are reared for their milk. More commonly called a Dairy or Milk cow today. 
    "horses", -- Whole number of Horses which belong to the farm on the 1st day of June 1850, which are one year old and older. 
    "implements_machinery", -- Aggregate value of all farming or planting implements and machinery used to cultivate and produce crops including wagons, thrashing machines, cotton gins and sugar mills. All implements and machinery used to cultivate and produce crops and fit the same for market or consumption on the 1st day of June 1850.
    "unimproved_land", -- Number of acres or unimproved land on the 1st day of June 1850 such as a wood lot or other land, the timber or range of which is used for farm purposes. It is not necessary that is should be contiguous to the improved land. 
    "improved_land", -- Number of acres of improved land on the 1st day of June 1850, by which is meant cleared and used for grazing, grass or tillage or which is now fallow, connected with or belonging to the farm which the assistant marshal is reporting. It is not necessary that is should be contiguous, but it must be owned or managed by the person whose name is inserted in the column. 
    "farms", -- The returns of all farms or plantations on the 1st day of June 1850, the produce of which amounts to $100 in value, are to be included in this schedule; but it is not intended to include the returns of small lots, owned or worked by persons following mechanical or other pursuits, where the productions are not $100 in value. In 1850, $100 is roughly equal to $3,000 in 2017.  
    "type", -- Type of PA Municipality. Such as Township, Borough, city, etc. 
    "municipality", -- Name of the Pennsylvania Municipality. In 2017 there are 2,561 municipalities. 
    "county", -- Name of the Pennsylvania County. Pennsylvania has 67 counties.  Counties omitted from the Data and reason:  Cameron - Did not exist until March 29, 1860; created from parts of Clinton, Elk, McKean, and Potter Counties Forest - No farms were recorded in this county; mostly forest area Lackawanna - Did not exist until August 13, 1878; created from the eastern half of Luzerne County Snyder - Did not exist until March 2, 1855; created from the southern half of Union County
    "clover_seed", -- Clover Seed measured in whole number of bushels raised on the farm during the year ending the 1st day of June 1850. The grain which is gathered in that year is meant, though it may have been sown in 1848.
    "peas_beans", -- Peas and Beans measured in whole number of bushels raised on the farm during the year ending the 1st day of June 1850. The grain which is gathered in that year is meant, though it may have been shown in 1848.
    "wool", -- Wool measured in whole number of pounds raised on the farm during the year ending the 1st day of June 1850. 
    "tobacco", -- Tobacco measured in whole number of pounds raised on the farm during the year ending the 1st day of June 1850. The grain which is gathered in that year is meant, though it may have been sown in 1848. 
    "wheat", -- Wheat measured in whole number of bushels raised on the farm during the year ending the 1st day of June 1850. The grain which is gathered in that year is meant, though it may have been sown in 1848. 
    "swine", -- Whole number of Swine which belong to the farm on the 1st day of June 1850, which are one year old and older.
    "asses_mules", -- Whole number of Asses and Mules which belong to the farm on the 1st day of June 1850, which are one year old and older.
    "cheese", -- Cheese measured in pounds on the farm during the year ending the 1st day of June. 
    "orchard_products_value", -- The cash value of all Orchard Products on hand on the 1st day of June 1850. 
    "animals_slaughtered", -- The cash value of animals slaughtered raised on the farm during the year ending the 1st day of June 1850. 
    "other_seeds", -- Other Seeds measured in whole number of bushels raised on the farm during the year ending the 1st day of June 1850. The grain which is gathered in that year is meant, though it may have been sown in 1848.
    "flax", -- Flax measured in whole number of pounds raised on the farm during the year ending the 1st day of June 1850. The grain which is gathered in that year is meant, though it may have been sown in 1848.
    "molasses", -- Molasses measured in whole number of gallons raised on the farm during the year ending the 1st day of June 1850. 
    "oats", -- Oats measured in whole number of bushels raised on the farm during the year ending the 1st day of June 1850. The grain which is gathered in that year is meant, though it may have been sown in 1848. 
    "flaxseed", -- Flaxseed measured in pounds raised on the farm during the year ending the 1st day of June 1850. The grain which is gathered in that year is meant, though it may have been sown in 1848.
    "wine", -- Wine measured in gallons on the farm during the year ending the 1st day of June 1850. 
    "garden_produce_value", -- The cash value of all Garden Produce on hand on the 1st day of June 1850. 
    "cash_value" -- Actual cash value of whole number of improved and unimproved acres on the 1st day of June 1850.
FROM
    "pa-gov/agriculture-production-data-in-pennsylvania-1850-a5ik-vwk6:latest"."agriculture_production_data_in_pennsylvania_1850"
LIMIT 100;

Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query pa-gov/agriculture-production-data-in-pennsylvania-1850-a5ik-vwk6 with SQL in under 60 seconds.

Query Your Local Engine

Install Splitgraph Locally
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
 

Read the installation docs.

Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.

It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgrcan manage the image, container and volume for you.

There are a few ways to ingest data into the local engine.

For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.

For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr cloneand sgr checkout.

Cloning Data

Because pa-gov/agriculture-production-data-in-pennsylvania-1850-a5ik-vwk6:latest is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.

First, install Splitgraph if you haven't already.

Clone the metadata with sgr clone

This will be quick, and does not download the actual data.

sgr clone pa-gov/agriculture-production-data-in-pennsylvania-1850-a5ik-vwk6

Checkout the data

Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest tag:

sgr checkout pa-gov/agriculture-production-data-in-pennsylvania-1850-a5ik-vwk6:latest

This will download all the objects for the latest tag of pa-gov/agriculture-production-data-in-pennsylvania-1850-a5ik-vwk6 and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.

Alternatively, use "layered checkout" to avoid downloading all the data

The data in pa-gov/agriculture-production-data-in-pennsylvania-1850-a5ik-vwk6:latest is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:

sgr checkout --layered pa-gov/agriculture-production-data-in-pennsylvania-1850-a5ik-vwk6:latest

This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.

Read the layered querying documentation to learn about when and why you might want to use layered queries.

Query the data with your existing tools

Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, pa-gov/agriculture-production-data-in-pennsylvania-1850-a5ik-vwk6 is just another Postgres schema.

Related Documentation:

Loading...