Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the equityprioritycommunities2020acs2018
table in this repository, by referencing it like:
"bayareametro-gov/equityprioritycommunities2020acs2018-8ukr-26na:latest"."equityprioritycommunities2020acs2018"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"shape_area",
"pct_hus_re", -- Severely rent-burdened household as a share of total renter-occupied housing units. Values for percent of severely rent-burdened households are from ACS table calculation B25070_010E/B08201_001E.
"pct_spfam", -- Single-parent families as a share all families. Percent of single-parent families values are the result of ACS table calculation (B11004_010E + B11004_016E)/B11004_001E.
"zvhh_1_2", -- Zero vehicle households make up over regional mean plus .5 standard deviation of the total households in a given Census Tract. Allowed values are: 0 (False) and 1 (True).
"disab_1_2", -- Disabled people make up over regional mean plus .5 standard deviation of the total civilian noninstitutionalized population in a given Census Tract. Allowed values are: 0 (False) and 1 (True).
"below2_1_2", -- Low-income (incomes less than 200% of the federal poverty level) people make up over regional mean plus .5 standard deviation of the total population for whom poverty status is determined in a given Census Tract. Allowed values are: 0 (False) and 1 (True).
"lep_1_2", -- People with limited English proficiency make up over regional mean plus .5 standard deviation of the total population 5 years and over in a given Census Tract. Allowed values are: 0 (False) and 1 (True).
"spfam_1_2", -- Single-parent families make up over regional mean plus .5 standard deviation of the families in a given Census Tract. Allowed values are: 0 (False) and 1 (True).
"poc_1_2", -- Persons of Color make up over regional mean plus .5 standard deviation of the total population in a given Census Tract. Allowed values are: 0 (False) and 1 (True).
"over75_1_2", -- Seniors aged 75 and over make up over regional mean plus .5 standard deviation of the total population in a given Census Tract. Allowed values are: 0 (False) and 1 (True).
"pct_zvhhs", -- Zero-vehicle households as a share of total households. Values for percent of zero-vehicle households are from ACS table calculation B08201_002E/B08201_001E.
"pct_disab", -- Disabled population as a share of total civilian noninstitutionalized population. Disabled population percent values are from ACS table calculation (C18108_001E - (C18108_005E + C18108_009E + C18108_013E))/C18108_001E.
"pct_below2", -- Low-Income population as a share of total population for whom poverty status is determined. Values for the percent of population that is low-income are the result of ACS table calculation C17002_001E - C17002_008E/C17002_001E
"pct_lep", -- Limited English proficiency population as a share of total population 5 years and over. Limited English proficiency as percent of population values are the result of ACS table calculation (B16005_007E + B16005_008E + B16005_012E + B16005_013E + B16005_017E + B16005_018E + B16005_022E + B16005_023E + B16005_029E + B16005_030E + B16005_034E + B16005_035E + B16005_039E + B16005_040E + B16005_044E + B16005_045E)/B16005_001E.
"pct_over75", -- Seniors aged 75 years and over as a share of total population. Percent of persons aged 75 years and over values are the result of ACS table calculation (B01001_023E + B01001_024E + B01001_025E + B01001_047E + B01001_048E + B01001_049E)/B01001_001E.
"pop_zvhhs", -- Number of persons living in households that do not own a personal vehicle. Zero-vehicle household population values are from ACS table B08201_002E.
"pop_below2", -- Low-income population. Low-income designation is determined as number of persons living in a household with incomes less than 200% of the federal poverty level established by the Census Bureau. The low-income (< 200% of federal poverty level) population values are the result of ACS table calculation C17002_001E - C17002_008E.
"pop_lep", -- Number of limited English proficiency population as a share of total population aged 5 years and over. Limited English proficiency (LEP) refers to anyone above the age of 5 who reported speaking English less than “very well,” as classified by the United States Census Bureau. Though most LEP individuals are immigrants, there are a number who were born in the United States, most to immigrant parents. Limited English proficiency population values are the result of ACS table calculation B16005_007E + B16005_008E + B16005_012E + B16005_013E + B16005_017E + B16005_018E + B16005_022E + B16005_023E + B16005_029E + B16005_030E + B16005_034E + B16005_035E + B16005_039E + B16005_040E + B16005_044E + B16005_045E.
"pop_spfam", -- Number of persons living in families headed by a single individual. Single-parent family population values are the result of ACS table calculation B11004_010E + B11004_016E.
"pop_over75", -- Number of seniors aged 75 years and older. Population of persons older than 75 years of age values are the result of ACS table calculation B01001_023E + B01001_024E + B01001_025E + B01001_047E + B01001_048E + B01001_049E.
"tot_pop_ov", -- Universe: Total population 5 years and over. Values for number of persons aged five years and older are from ACS table B16005_001E.
"tot_fam", -- Universe: Families. Families are defined as households in which one or more persons in the household are related to the householder (formerly, the head of the household) by birth, marriage, or adoption. The census tabulates only one family per household. Values for number of families are from ACS table B11004_001E.
"tot_hh", -- Universe: Total households. A household is defined as an occupied housing unit. Household type is identified by the presence of relatives and the number of persons living in the household. Family households, with or without children, include married couples and other families—a male or female householder with no spouse present. Non-family households may be a group of unrelated persons or a single person living alone. Number of households values are from ACS table B08201_001E.
"tot_pop_ci", -- Universe: Total civilian noninstitutionalized population. Represents number of persons who do not live in an institutionalized setting. The civilian noninstitutional population refers to people 16 years of age and older residing in the 50 States and the District of Columbia who are not inmates of institutions (penal, mental facilities, homes for the aged), and who are not on active duty in the Armed Forces. Civilian noninstitutionalized population values are from ACS table C18108_001E.
"tot_pop", -- Universe: Total Population. Population values are from ACS table B03002_001E.
"tract", -- Census Tract identifier. Census tracts are small, relatively permanent statistical subdivisions of a county. Value comes from the Unites States Census' TIGERweb REST API Generalized_ACS2018/Tracts_Blocks 2018 web service. The service is located at https://tigerweb.geo.census.gov/tigerwebmain/TIGERweb_restmapservice.html
"state_fip", -- State Federal Information Processing Series (FIPS) code. The only value for this column is 06 (California).
"geoid", -- Geographic identifier assigned by the United States Census Bureau. Data users rely on GEOIDs to join demographic data from census and survey tables to the appropriate geographic object. The geographic objects for this feature set are Census Tracts.
"objectid", -- Internal feature number.
"shape_length",
"c2040_2050", -- Difference between Plan Bay Area 2040 Equity Priority Communities and Plan Bay Area 2050 Equity Priority Communities. Allowed values are: 1 (New Equity Priority Community tract), -1 (No longer a Equity Priority Community tract), and 0 (No difference in Equity Priority Community status).
"epc_2040", -- The Census Tract meets MTC's Plan Bay Area 2040 Equity Priority Community criteria. Allowed values are: 0 (False) and 1 (True).
"hus_re_1_2", -- Severely rent-burdened households make up over regional mean plus .5 standard deviation of total renter-occupied housing units in a given Census Tract. Allowed values are: 0 (False) and 1 (True).
"pct_poc", -- People of Color population as a share of total population. People of Color percent values are the result of ACS table calculation (B03002_001E - B03002_003E)/B03002_001E.
"pop_hus_re", -- Number of persons living in a severely rent-burdened household. Severe rent burden is defined as persons or families paying more than 50 percent of their income for rental housing. Rent burdened population values are from ACS table B25070_010E.
"pop_disabi", -- Number of persons with a disability. For the American Community Survey (ACS) data used in this feature set, disability is defined as the product of interactions among individuals’ bodies; their physical, emotional, and mental health; and the physical and social environment in which they live, work, or play. Disability exists where this interaction results in limitations of activities and restrictions to full participation at school, at work, at home, or in the community. The ACS identifies serious difficulty with four basic areas of functioning – hearing, vision, cognition, and ambulation. The disabled population values are the result of ACS table calculation C18108_001E - (C18108_005E + C18108_009E + C18108_013E).
"pop_poc", -- Number of persons who report as being a race or ethnicity other than non-Hispanic white. People of Color population values are the result of ACS table calculation B03002_001E - B03002_003E.
"tot_pop_po", -- Universe: Total population for whom poverty status is determined. Low-income population values are from ACS table C17002_001E.
"county_fip", -- County Federal Information Processing Standards (FIPS) code. Allowed values are: 001 (Alameda County), 013 (Contra Costa County), 041 (Marin County), 055 (Napa County), 075 (San Francisco, City and County of), 081 (San Mateo County), 085 (Santa Clara County), 095 (Solano County), and 097 (Sonoma County).
"epc_class", -- The Equity Priority Community ranking as determined by standard deviation (SD) from regional mean. Allowed values are: High (0.5 standard deviation from regional mean), Higher (1 standard deviation from regional mean), Highest (1.5 standard deviation from regional mean.), and NA (Not Applicable - Census tract is not an Equity Priority Community).
"epc_2050", -- The Census Tract meets MTC's Plan Bay Area 2050 Equity Priority Community criteria. Allowed values are: 0 (False) and 1 (True).
"epc_2035", -- The Census Tract meets MTC's Plan Bay Area 2035 Equity Priority Community criteria. Allowed values are: 0 (False) and 1 (True).
"the_geom" -- Feature geometry.
FROM
"bayareametro-gov/equityprioritycommunities2020acs2018-8ukr-26na:latest"."equityprioritycommunities2020acs2018"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query bayareametro-gov/equityprioritycommunities2020acs2018-8ukr-26na
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because bayareametro-gov/equityprioritycommunities2020acs2018-8ukr-26na:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone bayareametro-gov/equityprioritycommunities2020acs2018-8ukr-26na
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout bayareametro-gov/equityprioritycommunities2020acs2018-8ukr-26na:latest
This will download all the objects for the latest
tag of bayareametro-gov/equityprioritycommunities2020acs2018-8ukr-26na
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in bayareametro-gov/equityprioritycommunities2020acs2018-8ukr-26na:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered bayareametro-gov/equityprioritycommunities2020acs2018-8ukr-26na:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, bayareametro-gov/equityprioritycommunities2020acs2018-8ukr-26na
is just another Postgres schema.