splitgraph.hooks package
Subpackages
Submodules
splitgraph.hooks.external_objects module
Hooks for registering handlers to upload/download objects from external locations into Splitgraph’s cache.
- class splitgraph.hooks.external_objects.ExternalObjectHandler(params: Dict[Any, Any])
Bases:
object
Framework for allowing to dump objects from the Splitgraph cache to an external location. This allows the objects to be stored somewhere other than the actual remote engine.
External object handlers must extend this class and be registered in the Splitgraph config.
For an example of how this can be used, see splitgraph.hooks.s3: it’s a handler allowing objects to be uploaded to S3/S3-compatible host using the Minio API. It’s registered in the config as follows:
[external_handlers]S3=splitgraph.hooks.s3.S3ExternalObjectHandler
The protocol and the URLs returned by this handler are stored in splitgraph_meta.external_objects and used to download the objects back into the Splitgraph cache when they are needed.
- download_objects(objects: List[Tuple[str, str]], remote_engine: PsycopgEngine) Sequence[str]
Download objects from the external location into the Splitgraph cache.
- Parameters
objects – List of tuples (object_id, object_url) that this handler had previosly uploaded the objects to.
remote_engine – An instance of Engine class that the objects will be registered on
- Returns
A list of object IDs that have been successfully downloaded.
- upload_objects(objects: List[str], remote_engine: PsycopgEngine) Sequence[Tuple[str, str]]
Upload objects from the Splitgraph cache to an external location
- Parameters
objects – List of object IDs to upload
remote_engine – An instance of Engine class that the objects will be registered on
- Returns
A list of successfully uploaded object IDs and URLs they can be found at.
- splitgraph.hooks.external_objects.get_external_object_handler(name: str, handler_params: Dict[Any, Any]) splitgraph.hooks.external_objects.ExternalObjectHandler
Load an external protocol handler by its name, initializing it with optional parameters.
- splitgraph.hooks.external_objects.register_upload_download_handler(name: str, handler_class: Callable[[...], splitgraph.hooks.external_objects.ExternalObjectHandler]) None
Register an external protocol handler. See the docstring for get_upload_download_handler for the required signatures of the handler functions.
splitgraph.hooks.mount_handlers module
Extra wrapper code for mount handlers
- splitgraph.hooks.mount_handlers.mount(mountpoint: str, mount_handler: str, handler_kwargs: Dict[str, Any], overwrite: bool = True, tables: Optional[TableInfo] = None) None
Mounts a foreign database via an FDW (without creating new Splitgraph objects)
- Parameters
mountpoint – Mountpoint to import the new tables into.
mount_handler – The type of the mounted database.
handler_kwargs – Dictionary of options to pass to the mount handler.
overwrite – Delete the foreign server if it already exists. Used by mount_postgres for data pulls.
tables – List of tables to mount or their schemas
- splitgraph.hooks.mount_handlers.mount_postgres(mountpoint, **kwargs) None
Mount a Postgres database.
Mounts a schema on a remote Postgres database as a set of foreign tables locally.
- Parameters
mountpoint – Schema to mount the remote into.
server – Database hostname.
port – Port the Postgres server is running on.
username – A read-only user that the database will be accessed as.
password – Password for the read-only user.
dbname – Remote database name.
remote_schema – Remote schema name.
extra_server_args – Dictionary of extra arguments to pass to the foreign server
tables – Tables to mount (default all). If a list, then will use IMPORT FOREIGN SCHEMA.
If a dictionary, must have the format {“table_name”: {“col_1”: “type_1”, …}}.
splitgraph.hooks.s3 module
Plugin for uploading Splitgraph objects from the cache to an external S3-like object store
- class splitgraph.hooks.s3.S3ExternalObjectHandler(params: Dict[Any, Any])
Bases:
splitgraph.hooks.external_objects.ExternalObjectHandler
Uploads/downloads the objects to/from S3/S3-compatible host using the Minio client.
The handler is “attached” to a given registry which manages issuing pre-signed GET/PUT URLs.
The handler supports a parameter threads specifying the number of threads used to upload the objects.
- download_objects(objects: List[Tuple[str, str]], remote_engine: PsycopgEngine) List[str]
Download objects from Minio.
- Parameters
objects – List of (object ID, object URL (object ID it’s stored under))
- upload_objects(objects: List[str], remote_engine: PsycopgEngine) List[Tuple[str, str]]
Upload objects to Minio
- Parameters
remote_engine – Remote Engine class
objects – List of object IDs to upload
- Returns
List of tuples with successfully uploaded objects and their URLs.
- splitgraph.hooks.s3.get_object_download_urls(remote_engine, remote_object_ids)
- splitgraph.hooks.s3.get_object_upload_urls(remote_engine, objects)
splitgraph.hooks.s3_server module
S3 registry-side routines called from the Python stored procedure that are aware of the actual S3 access creds and generate pre-signed URLs to upload/download objects.
- splitgraph.hooks.s3_server.delete_objects(client: minio.api.Minio, object_ids: List[str]) None
Delete objects stored in Minio
- Parameters
client – Minio client
object_ids – List of Splitgraph object IDs to delete
- splitgraph.hooks.s3_server.get_object_download_urls(s3_host: str, object_ids: List[str]) List[List[str]]
Return a list of pre-signed URLs that each part of an object can be downloaded from.
- Parameters
s3_host – S3 host that the objects are stored on
object_ids – List of object IDs
- Returns
A list of lists [(object URL, object footer URL, object schema URL)]
- splitgraph.hooks.s3_server.get_object_upload_urls(s3_host: str, object_ids: List[str]) List[List[str]]
Return a list of pre-signed URLs that each part of an object can be downloaded from.
- Parameters
s3_host – S3 host that the objects are stored on
object_ids – List of object IDs
- Returns
A list of lists [(object URL, object footer URL, object schema URL)]
- splitgraph.hooks.s3_server.list_objects(client: minio.api.Minio) List[str]
List objects stored in Minio
- Parameters
client – Minio client
- Returns
List of Splitgraph object IDs
splitgraph.hooks.splitfile_commands module
A framework for custom Splitfile commands. The execution flow is as follows:
When the Splitfile executor finds an unknown command, it looks for an entry in the config file:
[commands]RUN=splitgraph.plugins.Run
The command class must extend this class, initialized at every invocation time.
The command’s calc_hash() method is run. The resultant command context hash is combined with the current image hash to produce the new image hash: if it already exists, then the image is simply checked out.
Otherwise (or if calc_hash is undefined or returns None), execute(), where the actual command should be implemented, is run. If it returns a hash, this hash is used for the new image. If this hash already exists, the existing image is checked out instead. If the command returns None, a random hash is generated for the new image.
- class splitgraph.hooks.splitfile_commands.PluginCommand
Bases:
object
Base class for custom Splitfile commands.
- calc_hash(repository, args)
Calculates the command context hash for this custom command. If either the command context hash or the previous image hash has changed, then the image hash produced by this command will change. Consequently, two commands with the same command context hashes are assumed to have the same effect on any Splitgraph images.
This is supposed to be a lightweight method intended for pre-flight image hash calculations (without performing the actual transformation). If it returns None, the actual transformation is run anyway.
For example, for a command that imports some data from an external URL, this could be the hash of the last modified timestamp provided by the external data vendor. If the timestamp is unchanged, the data is unchanged and so actual command won’t be re-executed.
- Parameters
repository – SG Repository object pointed to a schema with the checked out image the command is being run against.
args – Positional arguments to the command
- Returns
Command context hash (a string of 64 hexadecimal digits)
- execute(repository, args)
Execute the custom command against the target schema, optionally returning the new image hash. The contract for the command is as follows (though it is not currently enforced by the runtime):
Has to use get_engine().run_sql (or run_sql_batch) to interact with the engine.
Can only write to the schema with the checked-out repository (run_sql runs non-schema-qualified statements against the correct schema).
Can inspect splitgraph_meta (e.g. to find the current HEAD) for the repository.
Can’t alter the versioning of the repository.
- Parameters
repository – SG Repository object pointed to a schema with the checked out image the command is being run against.
args – Positional arguments to the command
- Returns
Command context hash (a string of 64 hexadecimal digits). If calc_hash() had previously returned a hash, this hash is ignored. If both this command and calc_hash() return None, the hash is randomly generated.
Module contents
Various hooks for extending Splitgraph, including:
External object handlers (
splitgraph.hooks.external_objects
) allowing to download/upload objects to locations other than the remote Splitgraph engine.Data sources (
splitgraph.hooks.data_sources
) that allow to add data to Splitgraph, e.g.using the Postgres engine’s FDW interface to mount other external databases on the engine.
Splitfile commands (
splitgraph.hooks.splitfile_commands
) to define custom data transformation steps compatible with the Splitfile framework.