Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the last_minute_contributions_to_candidates_and
table in this repository, by referencing it like:
"wa-gov/last-minute-contributions-to-candidates-and-mppc-zjn9:latest"."last_minute_contributions_to_candidates_and"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"candidacy_id", -- The unique id assigned to a candidate or political committee. The filer id is consistent across election years with the exception that an individual running for a second office in the same election year will receive a second filer id. There is no correlation between the two filer ids. For a candidate and single-election-year committee such as a ballot committee, the combination of filer_id and election_year uniquely identifies a campaign.
"election_year", -- The election year for which the report was filed.
"committee_id", -- The unique identifier of a committee. For a continuing committee, this id will be the same for all the years that the committee is registered. Single year committees and candidate committees will have a unique id for each year even though the candidate or committee organization might be the same across years. Surplus accounts will have a single committee id across all years.
"amended_by", -- This field only applies to records which have been superseded by an amendment. The value is the report number of the newer version of the report.
"initiative_type",
"for_or_against", -- Ballot initiative committees are formed to either support or oppose an initiative. This field represents whether a committee “supports” (for) or “opposes” (against) a ballot initiative.
"jurisdiction", -- The political jurisdiction associated with the office of a candidate.
"party", -- The political party as declared by the candidate or committee on their form C1 registration. Contains only "Major parties" as recognized by Washington State law.
"recipient_zip", -- The US zip code of the candidate or political committee receiving the contribution. Contributions from outside the United States may contain foreign postal codes in this field.
"contributor_city", -- The city of the individual or organization making the contribution.
"filer_id", -- Unique identifier for the filing entity within this data set
"contributor_address", -- The street address of the individual or organization making the contribution.
"amount", -- Amount of contribution reported
"contribution_date", -- Date the contribution was received or made.
"aggregate", -- The total contributions received by the recipient during the special reporting period.
"amends", -- This field only applies to records which amend a prior report. The value is the report number of the previous version of the report that is superseded by this record.
"user_data", -- JSON representation of the report as it was filed to the PDC.
"recipient_city", -- The city of the candidate or political committee receiving the contribution.
"position", -- The position associated with an office. This field typically applies to judicial and local office that have multiple positions or seats. This field does not apply to political committees.
"description", -- The reported description of the transaction. This field does not apply to cash contributions.
"jurisdiction_type", -- The type of jurisdiction this office is: Statewide, Local, etc.
"cash_or_in_kind", -- What kind of contribution this is, if known.
"primary_general", -- Indicates the whether the contribution
"reported_by", -- Indicates whether the contribution was reported by the contributor or the recipient of the contribution.
"conduit", -- In the case of an earmarked contribution this is the name of the committee that is a conduit for the contribution. When this is present the recipient fields indicate the candidate that ultimately benefited from the contribution, and the conduit indicates the name of the committee that received the funds.
"contributor_name", -- The name of the individual or organization making the contribution as reported. The names appearing here have not been normalized and the same entity may be represented by different names in the dataset.
"filer_name", -- Name of filing entity submitting the report
"recipient_state", -- The state of the candidate or political committee receiving the contribution.
"office", -- The office sought by the candidate. Does not apply to political committees.
"recipient_address", -- The street address of the candidate or political committee receiving the contribution.
"contributor_state", -- The state of the individual or organization making the contribution.
"receipt_date", -- Date report was filed to the PDC.
"pdc_report_url", -- A link to a printable version of the original report as it was filed to the PDC.
"jurisdiction_county", -- The county associated with the jurisdiction of a candidate. Multi-county jurisdictions as reported as the primary county. This field will be empty for political committees and when a candidate jurisdiction is statewide.
"ballot_number", -- If the committee is a Statewide Ballot Initiative Committee a ballot number will appear once a ballot number is assigned by the Secretary of State. Local Ballot Initiatives will not have a ballot number. This field will contain a number only if the Secretary of State issues a number.
"filer_type", -- Indicates if this record was reported by a candidate, political committee, lobbyist client or lobbyist firm
"id", -- PDC internal identifier that corresponds to a single contribution or correction record. When combined with the origin value, this number uniquely identifies a single row.
"legislative_district_by", -- The Washington State legislative district. This field only applies to candidates where the office is "state senator" or "state representative."
"recipient_name", -- The name candidate or political committee receiving the contribution as reported. The names appearing here have not been normalized and the same entity may be represented by different names in the dataset.
"contributor_zip" -- The US zip code of the individual or organization making the contribution. Contributions from outside the United States may contain foreign postal codes in this field.
FROM
"wa-gov/last-minute-contributions-to-candidates-and-mppc-zjn9:latest"."last_minute_contributions_to_candidates_and"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query wa-gov/last-minute-contributions-to-candidates-and-mppc-zjn9
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because wa-gov/last-minute-contributions-to-candidates-and-mppc-zjn9:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone wa-gov/last-minute-contributions-to-candidates-and-mppc-zjn9
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout wa-gov/last-minute-contributions-to-candidates-and-mppc-zjn9:latest
This will download all the objects for the latest
tag of wa-gov/last-minute-contributions-to-candidates-and-mppc-zjn9
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in wa-gov/last-minute-contributions-to-candidates-and-mppc-zjn9:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered wa-gov/last-minute-contributions-to-candidates-and-mppc-zjn9:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, wa-gov/last-minute-contributions-to-candidates-and-mppc-zjn9
is just another Postgres schema.