Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the green_jobs_green_new_york_gjgny_residential_loan
table in this repository, by referencing it like:
"ny-gov/green-jobs-green-new-york-gjgny-residential-loan-9evn-drxk:latest"."green_jobs_green_new_york_gjgny_residential_loan"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"months_since_origination", -- The number of months since the loan was originated
"contract_date", -- The date the loan proceeds were paid to the contractor/installer upon completion of work, which triggers commencement of the loan
"setup_date", -- The date the loan was set up in the loan servicer's system
"type_of_bankruptcy", -- Identifies if the borrower has had a Chapter 7 or Chapter 13 bankruptcy since the loan has been issued. Blank field indicates no data available
"cancel_reason", -- The loan servicer’s loan cancellation reason on the loan. Blank field indicates no data available
"original_loan_amount", -- The original principal amount borrowed by the borrower(s)
"account_code", -- The loan servicer’s account status on the loan
"contractor_name", -- Name of company implementing the measures financed. Contractor Name is not available for small business and multifamily loans, and field will be blank
"property_county", -- The county of the address where the clean energy improvements were installed
"original_term", -- The original term of the loan in months
"total_late_charge_paid", -- The total late charge paid on the loan. Zero values indicate no late charges
"payment_amount", -- The monthly loan payment amount (principal and interest) due from the borrower each due date
"last_payment_amount", -- The last amount the borrower paid on the loan
"last_payment_date", -- The last date the borrower paid on the loan. Blank field indicates no data available
"credit_score_name_2", -- The FICO credit score of the co-borrower upon credit application (if any). Blank field indicates no data available
"credit_score_current_high", -- The higher of the credit scores of borrower and co-borrower (if any). FICO scores are not obtained for successor borrowers. Blank field indicates no data available
"interest_rate", -- The annual interest rate on the loan
"reporting_period", -- The time period covered by the dataset.
"loan_type", -- Either Smart Energy Loan for loans repaid through direct payment to Loan Servicer by statement billing or automated ACH payment; or On-Bill Recovery Loan for loans repaid through consumer's electric utility bill
"utility", -- The name of the electric/gas utility for any On-Bill Recovery Loan. This field applicable only for On-Bill Recovery Loans. Blank field indicates no data available
"original_maturity_date", -- The original maturity date of the loan
"current_maturity_date", -- The current maturity date of the loan. Maturity Date may change for On-bill Recovery loans only due to a change in the utility account holder
"credit_score", -- The FICO credit score of the co-borrower upon credit application (if any). Blank field indicates no data available
"debt_to_income", -- The ratio of the borrower/co-borrower's annual fixed debt obligations (as reported on the credit report) as a percentage of reported and verified income. DTI is not obtained for successor borrowers. Blank field indicates no data available
"first_payment_date", -- The date the first loan installment payment is due
"next_payment_due_date", -- The next date a loan installment payment is due
"current_balance", -- The current principal balance due from the borrower. Zero values indicate loan has been repaid in full
"unapplied_cash", -- Funds that have been received, but haven’t yet been used to reduce the principal balance of the loan (generally due to prepayments which do not equal the next full monthly installment due)
"delinquent_amount", -- The total principal and interest amount past due. Negative values indicate the amount the customer has pre-paid. Positive values indicate the amount past due
"total_interest_paid", -- The total interest paid on the loan. Negative values indicate the amount the customer has pre-paid. Positive values indicate the amount past due
"underwriting", -- Tier 1: Minimum FICO: 640 (680 if self-employed for 2 years+; 720 if self-employed < 2 years) Maximum DTI Ratio: up to 50% Bankruptcy: No bankruptcy, foreclosure or repossession within last 7 years Judgements: No combined outstanding collections, judgements, charge-offs or tax liens > $2,000 Tier 2: Minimum FICO: 540 Maximum DTI Ratio: up to 80% for FICO 680+, up to 75% for FICO 600-679; Up to 70% for FICO 540-599; Up to 100% for Assisted Home Performance with ENERGY STAR® incentives; up to 80% for Affordable Soar or Pellet Stove incentives Bankruptcy: No bankruptcy, foreclosure or repossession within last 2 years Judgements: No combined outstanding collections, judgements, charge-offs or tax liens > $2,000
"pledged", -- Indicates if the loan has been pledged for payments/security on bonds issued
"georeference", -- Open Data/Socrata-generated geocoding information.
"payments_remaining", -- The number of installment payments remaining on the loan to pay off the remaining principal balance
"days_delinquent", -- The number of days the borrower is delinquent from the next payment due date (if the number is negative, this represents payments made in advance of the next payment due date)
"installation_city", -- The city of the address where the clean energy improvements were installed
"installation_zip", -- The zip code of the address where the clean energy improvements were installed
"successor_number", -- S-0 values indicate an original On-Bill Recovery loan that was assumed by (transferred to) a successor borrower and utility account holder at the installation address. S-1, S-2, S-3, S-4, S-5, and S-6 values indicate an On-Bill Recovery loan has been assumed by a successor property owner from the original/previous borrower upon sale or transfer of the property and the number (1, 2, 3, 4, 5, or 6) indicates the number of times the loan was transferred to another borrower at the installation address. This field is applicable only for On-Bill Recovery Loans, if the loan is a Smart Energy Loan or an On-Bill Recovery Loan with no successor, field is blank
"account_code_date", -- The date the account code was effective on the loan
"purpose", -- Indicates the type of measures financed with the loan: Energy Efficiency (EE); Solar (PV); EE & PV = Energy Efficiency and Photovoltaic; EE & RHNY = Energy Efficiency and Renewable Heat NY; Renewable Heat NY (RHNY); Solar Thermal
"payments_made" -- The number of loan installment payments (principal and interest) paid on the loan to date, including any pre-payments
FROM
"ny-gov/green-jobs-green-new-york-gjgny-residential-loan-9evn-drxk:latest"."green_jobs_green_new_york_gjgny_residential_loan"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query ny-gov/green-jobs-green-new-york-gjgny-residential-loan-9evn-drxk
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because ny-gov/green-jobs-green-new-york-gjgny-residential-loan-9evn-drxk:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone ny-gov/green-jobs-green-new-york-gjgny-residential-loan-9evn-drxk
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout ny-gov/green-jobs-green-new-york-gjgny-residential-loan-9evn-drxk:latest
This will download all the objects for the latest
tag of ny-gov/green-jobs-green-new-york-gjgny-residential-loan-9evn-drxk
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in ny-gov/green-jobs-green-new-york-gjgny-residential-loan-9evn-drxk:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered ny-gov/green-jobs-green-new-york-gjgny-residential-loan-9evn-drxk:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, ny-gov/green-jobs-green-new-york-gjgny-residential-loan-9evn-drxk
is just another Postgres schema.