Metabase
NOTE: This is an sgr
-specific article. To learn how to use Splitgraph with
Metabase, see the
corresponding Splitgraph page.
You can connect a Metabase instance to a sgr
engine, allowing you to visualize datasets that were cloned and checked out on
the engine. Metabase also works with
layered querying, letting it query huge
datasets by only downloading a required fraction of the data.
Make sure to use Metabase version v0.36.8 or above. Earlier versions have an
issue with the Metabase
Explore Data view: it doesn't support schemata with slashes in them that sgr
uses by default to check data out into.
Start Metabase
Here is a sample Docker Compose service definition for Metabase running against
a sgr
engine:
version: "3"
services:
engine:
image: splitgraph/engine:${DOCKER_TAG-stable-postgis}
ports:
- 5432:5432
environment:
- POSTGRES_USER=sgr
- POSTGRES_PASSWORD=supersecure
- POSTGRES_DB=splitgraph
- SG_LOGLEVEL=INFO
- SG_CONFIG_FILE=/.sgconfig
expose:
- 5432
volumes:
- ${HOME}/.splitgraph/.sgconfig:/.sgconfig
metabase:
image: metabase/metabase
ports:
- 3000:3000
You can also use a sgr
engine managed by
sgr engine
: you will have to connect
it to the Docker network set up by Metabase. For example:
$ docker network connect metabase_default splitgraph_engine_default
Configure connection
Go to localhost:3000
and set up your Metabase account.
Use engine
as the database hostname if you're using the Compose file or
splitgraph_engine_default
(or the Docker container name) if you're using
sgr engine
.
Browse Data issue
As discussed, the "Browse Data" functionality currently doesn't work on sgr
,
since Splitgraph schemas contain slashes in them.
Write SQL
Click on "Write SQL" and select the sgr
database that you set up.
Sample datasets and queries
This sections contains some sample datasets that are hosted on Splitgraph and are ready for visualization in Metabase.
You can use layered querying to query
these datasets, which lazily downloads required fragments on the fly. For
maximum performance, you can check the image out into a real PostgreSQL table
and create indexes on it (use sgr checkout
instead of
sgr checkout --layered
)
Tweets of Congress
Tweets of Congress image on Splitgraph
$ sgr clone splitgraph/congress_tweets
$ sgr checkout --layered splitgraph/congress_tweets:latest
SELECT DATE(time) AS date,
COUNT(1) AS tweets
FROM "splitgraph/congress_tweets".tweets
WHERE text ILIKE '%coronavirus%'
AND time > '2020-01-01'
GROUP BY date;
WITH hourly_counts AS (
SELECT DATE(time) AS date,
EXTRACT(HOUR FROM time) AS hour,
COUNT(1) AS tweets
FROM "splitgraph/congress_tweets".tweets
GROUP BY date, hour
) SELECT hour, AVG(tweets)
FROM hourly_counts
GROUP BY hour;
Domestic US Flights
Domestic US Flights image on Splitgraph
$ sgr clone splitgraph/domestic_us_flights
$ sgr checkout --layered splitgraph/domestic_us_flights:latest
SELECT
origin_airport,
SUM(passengers) AS total_passengers
FROM "splitgraph/domestic_us_flights".flights
WHERE fly_month BETWEEN '2008-01-01' AND '2009-01-01'
GROUP BY origin_airport
ORDER BY total_passengers DESC;
2016 US Election votes
2016 US Election image on Splitgraph
$ sgr clone splitgraph/2016_election
$ sgr checkout --layered splitgraph/2016_election:latest
SELECT
state_postal,
SUM (CASE WHEN candidate_normalized = 'trump' THEN votes ELSE 0 END) / SUM(votes) * 100 AS trump_vote_percentage
FROM "splitgraph/2016_election".precinct_results
GROUP BY state_postal;
GeoNames
$ sgr clone splitgraph/geonames
$ sgr checkout --layered splitgraph/geonames:latest
SELECT
name, latitude, longitude, elevation
FROM "splitgraph/geonames".all_countries
WHERE feature_code = 'PPL' AND elevation IS NOT NULL
ORDER BY elevation DESC
LIMIT 100;
SELECT
name, latitude, longitude, country_code
FROM "splitgraph/geonames".all_countries
WHERE feature_code = 'PPL' AND name = 'Cambridge'
London Wards
London Wards image on Splitgraph
This dataset is PostGIS-enabled. Whilst Metabase doesn't support PostGIS columns directly, you can convert them to latitude and longitude and show them using Metabase's Pin visualization.
$ sgr engine upgrade --image splitgraph/engine:stable-postgis
$ sgr sql "CREATE EXTENSION IF NOT EXISTS postgis"
$ sgr clone splitgraph/london_wards
$ sgr checkout --layered splitgraph/london_wards:latest
WITH ward_centres AS (
SELECT name, ST_Centroid(ST_Transform(ST_SetSRID(geom, 27700), 4326)) AS centroid
FROM "splitgraph/london_wards".city_merged_2018
)
SELECT name, ST_Y(centroid) AS lat, ST_X(centroid) AS lon
FROM ward_centres;