Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the housing_database_project_level_files_inactive
table in this repository, by referencing it like:
"cityofnewyork-us/housing-database-project-level-files-inactive-br6q-ssj3:latest"."housing_database_project_level_files_inactive"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"classainit", -- Number of units that initially existed in the building at the time of the job application, as reported by the applicant. This field is edited by DCP to only count Class A units, which are units in houses or apartment buildings intended for long-term residential use (greater than 30 days), and typically do not require the use of shared kitchens and bathrooms. The definition of Class A and other unit types is available here: https://www1.nyc.gov/assets/buildings/pdf/MultipleDwellingLaw.pdf
"pl_firm07", -- A value of 1 means that some portion of the tax lot falls within the 1% annual chance floodplain as determined by FEMA’s 2007 Flood Insurance Rate Map. Note that buildings on the tax lot may or may not be in the portion of the tax lot that is within the 1% annual chance floodplain.
"pl_pfirm15", -- A value of 1 means that some portion of the tax lot falls within the 1% annual chance floodplain as determined by FEMA’s 2015 Preliminary Flood Insurance Rate Map. Note that buildings on the tax lot may or may not be in the portion of the tax lot that is within the 1% annual chance floodplain.
"policepcnt", -- NYC Police Precinct code.
"job_desc", -- The general description of the work being applied for. This field is free text, and is filled out by the applicant.
"residflag", -- This field is used to identify jobs in buildings containing residential uses. A value of "residential" indicates that the job affects residential units in some way through new construction, alteration, or demolition. Only those jobs with a value of "residential" are included in the housing database. Manual research was conducted at DCP to help ensure that all work on buildings with residences receive this flag, though some records may remain misclassified.
"schcommnty", -- NYC Community School District code.
"schelmntry", -- NYC Elementary School Zone code.
"schmiddle", -- NYC Middle School Zone code.
"schsubdist", -- NYC School Subdistrict code.
"zoningdst3", -- The tertiary zoning district of the tax lot per the applicant at time of application.
"zoningdst2", -- The secondary zoning district of the tax lot per the applicant at time of application.
"zoningdst1", -- The primary zoning district of the tax lot per the applicant at time of application. For more information see http://www1.nyc.gov/site/planning/zoning/about-zoning.page.
"specldst1", -- The primary special zoning district of the tax lot per the applicant at time of application. Other zoning designations may appear in this field, such as industrial business zones (IBZ), mandatory inclusionary housing (MIH) areas, or other zoning designations. This field is provided by the applicant, and is likely inconsistent.
"units_co", -- Number of dwelling units provided on the temporary or final certificates of occupancy. This unit count may include Class A units in addition to Class B units and hotel units.
"the_geom", -- Geometry field used for mapping.
"specldst2", -- The secondary special zoning district of the tax lot per the applicant at time of application.
"compltyear", -- Year the job was completed. For new buildings and alterations, this is defined as the year of the first certificate of occupancy issuance. For demolitions, this is the year that the demolition was permitted (reached status Q).
"councildst", -- NYC City Council District code.
"datecomplt", -- DCP's best estimate of completion date for all jobs. For new buildings and alterations, date complete is equal to the date of the earliest certificate of occupancy. For demolitions, date complete is equal to status Q (permit issued), since demolitions do not receive certificates of occupancy. Blank indicates no certificate of occupancy has been issued. Typically, a building can be considered complete at this stage. Large buildings with many units may have units receiving certificates of occupancy over a longer period of time.
"datefiled", -- Date of job status A (pre-filing application). This is the first step in the process for all job applications. The job application # is assigned at this status. This occurs when the applicant submits any part of the application (even a single form), in person or electronically.
"datelstupd", -- The date of the last update to the DOB record for the job filing.
"permityear", -- Year the job was permitted. For all job types, this is defined as the status Q year.
"datepermit", -- Date of job status Q (first partial permit issuance). This is when construction work may begin. This field should be used for identifying the number of permits approved in a given year.
"bldg_class", -- Building Class code. A code used by Department of Finance (DOF) and Department of Buildings (DOB) describing the major use of a structure.
"dcpedited", -- List of fields that were edited by DCP. This only lists fields where original DOB source data was overwritten by DCP, and doesn't include fields where DCP recodes data as part of the standard methodology.
"firebattln", -- NYC Fire Battalion code.
"bin", -- Building Identification Number (BIN), supplied by Geosupport, that identifies each unique building in the city.
":@computed_region_f5dn_yrer", -- This column was automatically created in order to record in what polygon from the dataset 'Community Districts' (f5dn-yrer) the point in column 'the_geom' is located. This enables the creation of region maps (choropleths) in the visualization canvas and data lens.
"version", -- Version of the Developments Database.
"cdtaname20", -- 2020 Community District Tabulation Area (NTA) descriptive name.
"firecmpany", -- NYC Fire Company code.
"floorsinit", -- The existing number of stories/floors in the building, as reported by the applicant.
"floorsprop", -- The number of stories/floors in the building after the work is done, as reported by the applicant.
"geomsource", -- Source of the geographic coordinates for the record.
"cenblock20", -- 2020 Census Block FIPS code.
"nonresflag", -- This field is used to identify jobs in buildings containing non-residential uses. A value of "non-residential" indicates that the job affects some use type beyond residential, including commercial, industrial, or community facility uses. Mixed-use buildings will have values in both the Resid_Flag and Nonresid_Flag, since they contain both residential and nonresidential uses, but having flags in both of those columns does not necessarily mean that it is a mixed-use development since the flags do not distinguish between the initial and proposed uses in the building.
"centract20", -- 2020 Census Tract FIPS code.
"hotelinit", -- Number of hotel units that initially existed in the building at the time of the job application, as determined through DCP research. Note that hotel units are a subtype of Class B units which are typically occupied as for-profit businesses for short-term (less than 30 days), full-service lodging.
"hotelprop", -- Number of hotel units proposed in the job application after the proposed work has been completed, as determined through DCP research.
"bctcb2020", -- The Borough - 2020 Census Tract - Census Block code.
"classaprop", -- Number of units proposed in the job application after the proposed work has been completed, as reported by the applicant and edited by DCP to count only Class A units.
"boro", -- The NYC borough code where the proposed work will take place.
"cdta2020", -- 2020 Community District Tabulation Area (NTA) code.
"classanet", -- Net change in Class A unit count between the number of units existing at the time of application and the number of units proposed.
"bct2020", -- The Borough - 2020 Census Tract code.
"bbl", -- Borough-Block-Lot tax ID number of the parcel where the proposed work will take place.
"job_status", -- DCP recode of DOB's status label. This describes the status of the job at the date of the data vintage. For example, a job marked as ""3. Permitted"" was at that status as of June 30, 2020 if using version 20Q2 of the DCP Housing Database. More details on each DOB status is available here: https://www1.nyc.gov/assets/buildings/pdf/bisjobstatus.pdf Jobs typically move through status A through X over time as they reach certain approval milestones: 1. Filed: job application is at status A - G at the time of publication. Application submitted, but review is not yet in progress. 2. Plan Examination: application is at status H - P. Plan examination is in progress, but not yet approved. 3. Permitted: application is at status Q and R and may begin construction. 4. Partial Complete: application at status U and X, and CO issued for NB or A1 job type, and the CO is a Temporary CO AND less than 80% of the units are completed for a building with 20 or more units. 5. Complete: For new buildings and alterations, application is at status U and X, or a CO has been issued. For demolitions, the application is at status X. DCP has decided to mark demolitions as complete when they reach status X, but list the completion date as equal to status Q because this is likely when the building must be vacated, and it appears that many buildings are physically demolished some time before receiving sign off (status X). 9. Withdrawn: application is at status 3. The application has been withdrawn by the applicant.
"commntydst", -- NYC Community District code.
"job_type", -- DOB's type category for the job application. More information is available here. The following types are included in this database: New Building (NB): an application to build a new structure. “NB” cannot be selected if any existing building elements are to remain—for example a part of an old foundation, a portion of a façade that will be incorporated into the construction, etc. Alteration Type I (A1): a major alteration that will change the use, egress, or occupancy of the building. Demolition (DM): an application to fully or partially demolish an existing building. Note that many demolition permits are only for partial demolitions and for garages (these are also captured).
"addressst", -- The street name for the building where the proposed work will take place.
"latitude", -- Latitude in WGS84 / SRID:4326
"addressnum", -- The house number for the building where the proposed work will take place.
":@computed_region_yeji_bk3q", -- This column was automatically created in order to record in what polygon from the dataset 'Borough Boundaries' (yeji-bk3q) the point in column 'the_geom' is located. This enables the creation of region maps (choropleths) in the visualization canvas and data lens.
"firedivsn", -- NYC Fire Division code.
"enlargemnt", -- This indicates if the work to be done under the application will result in a horizontal and/or vertical enlargement, as reported by the applicant. Values include: Horizontal, Vertical, Horizontal and Vertical, [blank].
"landmark", -- Indicates that the building has been designated as a landmark building by the Landmarks Preservation Commission.
":@computed_region_92fq_4b7q", -- This column was automatically created in order to record in what polygon from the dataset 'City Council Districts' (92fq-4b7q) the point in column 'the_geom' is located. This enables the creation of region maps (choropleths) in the visualization canvas and data lens.
"longitude", -- Longitude in WGS84 / SRID:4326
"nta2020", -- 2020 Neighborhood Tabulation Area (NTA) code.
"ntaname20", -- 2020 Neighborhood Tabulation Area (NTA) descriptive name.
"job_number", -- The DOB job application number assigned when the applicant begins the application. This is the unique identifier for the application submitted to the Department of Buildings (DOB). It may contain several work types, and more work types may be added as the application review and the work continues. It is a 9-digit number where the first digit indicates the borough where the building is located.
"occ_init", -- Description of the existing occupancy type at the time of the job application. This indicates what a site was used for before the proposed job. It is a more descriptive recode of the occupancy code that the applicant submitted to DOB as the initial occupancy type of the building. Note that the applicant may only choose one occupancy code, even in a building containing multiple uses, so this code is only able to describe one of many possible uses.
"occ_prop", -- Description of the proposed occupancy type at the time of the job application. This indicates what a site will be used for after the proposed job is complete. It is a more descriptive recode of the occupancy code that the applicant submitted to DOB as the proposed occupancy type of the building. Note that the applicant may only choose one occupancy code, even in a building containing multiple uses, so this code is only able to describe one of many possible uses.
"otherbinit", -- Number of Class B units (excluding hotel units) that initially existed in building at the time of the job application, as determined through DCP research. Other Class B units include all dwellings that are not Class A units or hotels, and may include single room occupancy units, dormitories, certain kinds of supportive housing and assisted living, homeless shelters, convents and monasteries, among many other forms of temporary lodging or lodging with communal kitchens or bathrooms. The definition of Class B and other unit types is available here: https://www1.nyc.gov/assets/buildings/pdf/MultipleDwellingLaw.pdf
":@computed_region_sbqj_enih", -- This column was automatically created in order to record in what polygon from the dataset 'Police Precincts' (sbqj-enih) the point in column 'the_geom' is located. This enables the creation of region maps (choropleths) in the visualization canvas and data lens.
"otherbprop", -- Number of Class B units (excluding hotel units) proposed in the job application after the proposed work has been completed, as determined through DCP research.
"ownership" -- This indicates whether the property is government owned, the ownership structure, and non-profit status, as reported by the applicant and recoded by DCP.
FROM
"cityofnewyork-us/housing-database-project-level-files-inactive-br6q-ssj3:latest"."housing_database_project_level_files_inactive"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query cityofnewyork-us/housing-database-project-level-files-inactive-br6q-ssj3
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because cityofnewyork-us/housing-database-project-level-files-inactive-br6q-ssj3:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone cityofnewyork-us/housing-database-project-level-files-inactive-br6q-ssj3
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout cityofnewyork-us/housing-database-project-level-files-inactive-br6q-ssj3:latest
This will download all the objects for the latest
tag of cityofnewyork-us/housing-database-project-level-files-inactive-br6q-ssj3
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in cityofnewyork-us/housing-database-project-level-files-inactive-br6q-ssj3:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered cityofnewyork-us/housing-database-project-level-files-inactive-br6q-ssj3:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, cityofnewyork-us/housing-database-project-level-files-inactive-br6q-ssj3
is just another Postgres schema.