Splitgraph has been acquired by EDB! Read the blog post.

SQL

SQL {command}

Runs a (potentially arbitrary) SQL statement. There is some preprocessing and validation done on this SQL statement:

  • Only basic DDL (CREATE/DROP/ALTER TABLE) and DML (UPDATE/INSERT/DELETE) and function calls are allowed
  • The command is executed with the OUTPUT schema being the default search_path, so non-schema-qualified tables will reference the current image.
  • In schema-qualified tables, the schema is treated as a reference to a Splitgraph image, for example, "namespace/repository:latest".table_name. This gets temporarily mounted in the engine at execution time as a layered checkout with the effect that queries in SQL can perform joins or queries on other Splitgraph images. This is recorded in image provenance as normal.
  • There's a blacklist of table names that can't be used in Splitfiles. These names reference Postgres system tables and are always accessible by any SQL statement, making it possible for outside state to be leaked into the image.

The image hash produced by this command is a combination of the current OUTPUT hash and the hash of the SQL statement, with any image references rewritten to have image hashes, not tags.

The SQL statement gets reformatted before hashing, meaning that minor formatting changes to the SQL will produce the same image hash.

Note that non-deterministic SQL statements, for example SELECT random(), will produce the same hash given the same inputs.