splitgraph.hooks.data_source package
Submodules
splitgraph.hooks.data_source.base module
- class splitgraph.hooks.data_source.base.DataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)
Bases:
abc.ABC
- credentials_schema: Dict[str, Any] = {'type': 'object'}
- abstract classmethod get_description() str
- classmethod get_icon() Optional[bytes]
- abstract classmethod get_name() str
- get_raw_url(tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None, expiry: int = 3600) Dict[str, List[Tuple[str, str]]]
Get a list of public URLs for each table in this data source, e.g. to export the data as CSV. These may be temporary (e.g. pre-signed S3 URLs) but should be accessible without authentication. :param tables: A TableInfo object overriding the table params of the source :param expiry: The URL should be valid for at least this many seconds :return: Dict of table_name -> list of (mimetype, raw URL)
- abstract introspect() IntrospectionResult
- params_schema: Dict[str, Any] = {'type': 'object'}
- supports_load = False
- supports_mount = False
- supports_sync = False
- table_params_schema: Dict[str, Any] = {'type': 'object'}
- class splitgraph.hooks.data_source.base.LoadableDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)
Bases:
splitgraph.hooks.data_source.base.DataSource
,abc.ABC
- load(repository: splitgraph.core.repository.Repository, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None) str
- supports_load = True
- class splitgraph.hooks.data_source.base.MountableDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)
Bases:
splitgraph.hooks.data_source.base.DataSource
,abc.ABC
- abstract mount(schema: str, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None, overwrite: bool = True) Optional[List[splitgraph.core.types.MountError]]
Instantiate the data source as foreign tables in a schema
- supports_mount = True
- class splitgraph.hooks.data_source.base.SyncableDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)
Bases:
splitgraph.hooks.data_source.base.LoadableDataSource
,splitgraph.hooks.data_source.base.DataSource
,abc.ABC
- supports_load = True
- supports_sync = True
- sync(repository: splitgraph.core.repository.Repository, image_hash: Optional[str], tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None) str
- class splitgraph.hooks.data_source.base.TransformingDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None, image_mounter: Optional[splitgraph.core.image_mounting.ImageMounter] = None)
Bases:
splitgraph.hooks.data_source.base.DataSource
,abc.ABC
Data source that runs transformations between Splitgraph images. Takes in an extra parameter, an ImageMounter instance to manage temporary image checkouts.
- abstract get_required_images() List[Tuple[str, str, str]]
Get images required by this data source. :returns List of tuples (namespace, repository, hash_or_tag)
- mount_required_images() Generator[Dict[Tuple[str, str, str], str], None, None]
Mount all images required by this data source into temporary schemas. On exit from this context manager, unmounts them. :return: Map of (namespace, repository, hash_or_tag) -> schema where the image is mounted.
- splitgraph.hooks.data_source.base.get_ingestion_state(repository: splitgraph.core.repository.Repository, image_hash: Optional[str]) Optional[SyncState]
- splitgraph.hooks.data_source.base.getrandbits(k) x. Generates an int with k random bits.
- splitgraph.hooks.data_source.base.prepare_new_image(repository: splitgraph.core.repository.Repository, hash_or_tag: Optional[str], comment: str = 'Singer tap ingestion') Tuple[Optional[splitgraph.core.image.Image], str]
splitgraph.hooks.data_source.fdw module
- class splitgraph.hooks.data_source.fdw.ElasticSearchDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)
Bases:
splitgraph.hooks.data_source.fdw.ForeignDataWrapperDataSource
- commandline_help: str = 'Mount an ElasticSearch instance.\n\nMount a set of tables proxying to a remote ElasticSearch index.\n\nThis uses a fork of postgres-elasticsearch-fdw behind the scenes. You can add a column\n`query` to your table and set it as `query_column` to pass advanced ES queries and aggregations.\nFor example:\n\n\x08\n```\n$ sgr mount elasticsearch target_schema -c elasticsearch:9200 -o@- <<EOF\n {\n "tables": {\n "table_1": {\n "schema": {\n "id": "text",\n "@timestamp": "timestamp",\n "query": "text",\n "col_1": "text",\n "col_2": "boolean"\n },\n "options": {\n "index": "index-pattern*",\n "rowid_column": "id",\n "query_column": "query"\n }\n }\n }\n }\nEOF\n\x08\n```\n'
- credentials_schema: Dict[str, Any] = {'properties': {'password': {'type': 'string'}, 'username': {'type': 'string'}}, 'type': 'object'}
- classmethod get_description() str
- get_fdw_name()
- classmethod get_name() str
- get_server_options()
- params_schema: Dict[str, Any] = {'properties': {'host': {'type': 'string'}, 'port': {'type': 'integer'}}, 'required': ['host', 'port'], 'type': 'object'}
- table_params_schema: Dict[str, Any] = {'properties': {'index': {'description': 'ES index name or pattern to use, for example, "events-*"', 'type': 'string'}, 'query_column': {'description': 'Name of the column to use to pass queries in', 'type': 'string'}, 'score_column': {'description': 'Name of the column with the document score', 'type': 'string'}, 'scroll_duration': {'description': 'How long to hold the scroll context open for, default 10m', 'type': 'string'}, 'scroll_size': {'description': 'Fetch size, default 1000', 'type': 'integer'}, 'type': {'description': 'Pre-ES7 doc_type, not required in ES7 or later', 'type': 'string'}}, 'required': ['index'], 'type': 'object'}
- class splitgraph.hooks.data_source.fdw.ForeignDataWrapperDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)
Bases:
splitgraph.hooks.data_source.base.MountableDataSource
,splitgraph.hooks.data_source.base.LoadableDataSource
,abc.ABC
- commandline_help: str = ''
- commandline_kwargs_help: str = ''
- credentials_schema: Dict[str, Any] = {'type': 'object'}
- classmethod from_commandline(engine, commandline_kwargs) splitgraph.hooks.data_source.fdw.ForeignDataWrapperDataSource
Instantiate an FDW data source from commandline arguments.
- abstract get_fdw_name()
- get_remote_schema_name() str
Override this if the FDW supports IMPORT FOREIGN SCHEMA
- get_server_options() Mapping[str, str]
- get_table_options(table_name: str, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None) Dict[str, str]
- get_table_schema(table_name: str, table_schema: List[splitgraph.core.types.TableColumn]) List[splitgraph.core.types.TableColumn]
- get_user_options() Mapping[str, str]
- introspect() IntrospectionResult
- mount(schema: str, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None, overwrite: bool = True) Optional[List[splitgraph.core.types.MountError]]
Instantiate the data source as foreign tables in a schema
- params_schema: Dict[str, Any] = {'type': 'object'}
- preview(tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]]) PreviewResult
- supports_load = True
- supports_mount = True
- table_params_schema: Dict[str, Any] = {'type': 'object'}
- class splitgraph.hooks.data_source.fdw.MongoDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)
Bases:
splitgraph.hooks.data_source.fdw.ForeignDataWrapperDataSource
- commandline_help: str = 'Mount a Mongo database.\n\nMounts one or more collections on a remote Mongo database as a set of foreign tables locally.'
- commandline_kwargs_help: str = 'tables: A dictionary of form\n```\n{\n "table_name": {\n "schema": {"col1": "type1"...},\n "options": {"database": <dbname>, "collection": <collection>}\n }\n}\n```\n'
- credentials_schema: Dict[str, Any] = {'properties': {'password': {'type': 'string'}, 'username': {'type': 'string'}}, 'required': ['username', 'password'], 'type': 'object'}
- classmethod get_description() str
- get_fdw_name()
- classmethod get_name() str
- get_server_options()
- get_table_schema(table_name, table_schema)
- get_user_options()
- params_schema: Dict[str, Any] = {'properties': {'host': {'type': 'string'}, 'port': {'type': 'integer'}}, 'required': ['host', 'port'], 'type': 'object'}
- table_params_schema: Dict[str, Any] = {'properties': {'collection': {'type': 'string'}, 'database': {'type': 'string'}}, 'required': ['database', 'collection'], 'type': 'object'}
- class splitgraph.hooks.data_source.fdw.MySQLDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)
Bases:
splitgraph.hooks.data_source.fdw.ForeignDataWrapperDataSource
- commandline_help: str = 'Mount a MySQL database.\n\nMounts a schema on a remote MySQL database as a set of foreign tables locally.'
- commandline_kwargs_help: str = 'dbname: Remote MySQL database name (required)\ntables: Tables to mount (default all). If a list, then will use IMPORT FOREIGN SCHEMA.\nIf a dictionary, must have the format\n {"table_name": {"schema": {"col_1": "type_1", ...},\n "options": {[get passed to CREATE FOREIGN TABLE]}}}.\n '
- credentials_schema: Dict[str, Any] = {'properties': {'password': {'type': 'string'}, 'username': {'type': 'string'}}, 'required': ['username', 'password'], 'type': 'object'}
- classmethod get_description() str
- get_fdw_name()
- classmethod get_name() str
- get_remote_schema_name() str
Override this if the FDW supports IMPORT FOREIGN SCHEMA
- get_server_options()
- get_table_options(table_name: str, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)
- get_user_options()
- params_schema: Dict[str, Any] = {'properties': {'dbname': {'type': 'string'}, 'host': {'type': 'string'}, 'port': {'type': 'integer'}}, 'required': ['host', 'port', 'dbname'], 'type': 'object'}
- class splitgraph.hooks.data_source.fdw.PostgreSQLDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)
Bases:
splitgraph.hooks.data_source.fdw.ForeignDataWrapperDataSource
- commandline_help: str = 'Mount a Postgres database.\n\nMounts a schema on a remote Postgres database as a set of foreign tables locally.'
- commandline_kwargs_help: str = 'dbname: Database name (required)\nremote_schema: Remote schema name (required)\nextra_server_args: Dictionary of extra arguments to pass to the foreign server\ntables: Tables to mount (default all). If a list, then will use IMPORT FOREIGN SCHEMA.\nIf a dictionary, must have the format\n {"table_name": {"schema": {"col_1": "type_1", ...},\n "options": {[get passed to CREATE FOREIGN TABLE]}}}.\n '
- credentials_schema: Dict[str, Any] = {'properties': {'password': {'type': 'string'}, 'username': {'type': 'string'}}, 'required': ['username', 'password'], 'type': 'object'}
- classmethod get_description() str
- get_fdw_name()
- classmethod get_name() str
- get_remote_schema_name() str
Override this if the FDW supports IMPORT FOREIGN SCHEMA
- get_server_options()
- get_table_options(table_name: str, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)
- get_user_options()
- params_schema: Dict[str, Any] = {'properties': {'dbname': {'description': 'Database name', 'type': 'string'}, 'host': {'description': 'Remote hostname', 'type': 'string'}, 'port': {'description': 'Port', 'type': 'integer'}, 'remote_schema': {'description': 'Remote schema name', 'type': 'string'}}, 'required': ['host', 'port', 'dbname', 'remote_schema'], 'type': 'object'}
- table_params_schema: Dict[str, Any] = {'type': 'object'}
- splitgraph.hooks.data_source.fdw.create_foreign_table(schema: str, server: str, table_name: str, schema_spec: List[splitgraph.core.types.TableColumn], extra_options: Optional[Dict[str, str]] = None)
- splitgraph.hooks.data_source.fdw.import_foreign_schema(engine: PsycopgEngine, mountpoint: str, remote_schema: str, server_id: str, tables: List[str], options: Optional[Dict[str, str]] = None) List[splitgraph.core.types.MountError]
- splitgraph.hooks.data_source.fdw.init_fdw(engine: PsycopgEngine, server_id: str, wrapper: str, server_options: Optional[Mapping[str, Optional[str]]] = None, user_options: Optional[Mapping[str, str]] = None, role: Optional[str] = None, overwrite: bool = True) None
Sets up a foreign data server on the engine.
- Parameters
engine – PostgresEngine
server_id – Name to call the foreign server, must be unique. Will be deleted if exists.
wrapper – Name of the foreign data wrapper (must be installed as an extension on the engine)
server_options – Dictionary of FDW options
user_options – Dictionary of user options
role – The name of the role for which the user mapping is created; defaults to public.
overwrite – If the server already exists, delete and recreate it.
Module contents
- splitgraph.hooks.data_source.get_data_source(data_source: str) Type[splitgraph.hooks.data_source.base.DataSource]
Returns a class for a given data source
- splitgraph.hooks.data_source.get_data_sources() List[str]
Returns the names of all registered data sources.
- splitgraph.hooks.data_source.merge_jsonschema(left: Dict[str, Any], right: Dict[str, Any]) Dict[str, Any]
- splitgraph.hooks.data_source.register_data_source(name: str, data_source_class: Type[splitgraph.hooks.data_source.base.DataSource]) None
Returns a data source under a given name.