splitgraph.ingestion.csv package

Submodules 

splitgraph.ingestion.csv.common module 

class splitgraph.ingestion.csv.common.CSVOptions(autodetect_header, autodetect_dialect, autodetect_encoding, autodetect_sample_size, schema_inference_rows, delimiter, quotechar, header, encoding, ignore_decode_errors)

Bases: tuple

autodetect_dialect: bool: Alias for field number 1

autodetect_encoding: bool: Alias for field number 2

autodetect_header: bool: Alias for field number 0

autodetect_sample_size: int: Alias for field number 3

delimiter: str: Alias for field number 5

encoding: str: Alias for field number 8

classmethod from_fdw_options(fdw_options)

header: bool: Alias for field number 7

ignore_decode_errors: bool: Alias for field number 9

quotechar: str: Alias for field number 6

schema_inference_rows: int: Alias for field number 4

to_csv_kwargs()

to_table_options(): Turn this into a dict of table options that can be plugged back into CSVDataSource.

splitgraph.ingestion.csv.common.autodetect_csv(stream: io.RawIOBase, csv_options: splitgraph.ingestion.csv.common.CSVOptions) → splitgraph.ingestion.csv.common.CSVOptions: Autodetect the CSV dialect, encoding, header etc.

splitgraph.ingestion.csv.common.dump_options(options: Dict[str, Any]) → Dict[str, str]

splitgraph.ingestion.csv.common.get_s3_params(fdw_options: Dict[str, Any]) → Tuple[minio.api.Minio, str, str]

splitgraph.ingestion.csv.common.load_options(options: Dict[str, str]) → Dict[str, Any]

splitgraph.ingestion.csv.common.log_to_postgres(*args, **kwargs)

splitgraph.ingestion.csv.common.make_csv_reader(response: io.IOBase, csv_options: splitgraph.ingestion.csv.common.CSVOptions) → Tuple[splitgraph.ingestion.csv.common.CSVOptions, _csv._reader]

splitgraph.ingestion.csv.common.pad_csv_row(row: List[str], num_cols: int, row_number: int) → List[str]: Preprocess a CSV file row to make the parser more robust.

splitgraph.ingestion.csv.fdw module 

class splitgraph.ingestion.csv.fdw.CSVForeignDataWrapper(fdw_options, fdw_columns)

Bases: object

Foreign data wrapper for CSV files stored in S3 buckets or HTTP

can_sort(sortkeys)

execute(quals, columns, sortkeys=None): Main Multicorn entry point.

explain(quals, columns, sortkeys=None, verbose=False)

get_rel_size(quals, columns)

classmethod import_schema(schema, srv_options, options, restriction_type, restricts)

splitgraph.ingestion.csv.fdw.log_to_postgres(*args, **kwargs)

splitgraph.ingestion.csv.fdw.report_errors(table_name: str): Context manager that ignores exceptions and serializes them to JSON using PG’s notice mechanism instead. The data source is meant to load these to report on partial failures (e.g. failed to load one table, but not others).

Module contents 

class splitgraph.ingestion.csv.CSVDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)

Bases: splitgraph.hooks.data_source.fdw.ForeignDataWrapperDataSource

commandline_help: str = 'Mount CSV files in S3/HTTP.\n\nIf passed an URL, this will live query a CSV file on an HTTP server. If passed\nS3 access credentials, this will scan a bucket for CSV files, infer their schema\nand make them available to query over SQL. \n\nFor example: \n\n\x08\n```\nsgr mount csv target_schema -o@- <<EOF\n {\n "s3_endpoint": "cdn.mycompany.com:9000",\n "s3_access_key": "ABCDEF",\n "s3_secret_key": "GHIJKL",\n "s3_bucket": "data",\n "s3_object_prefix": "csv_files/current/",\n "autodetect_header": true,\n "autodetect_dialect": true,\n "autodetect_encoding": true\n }\nEOF\n```\n'

commandline_kwargs_help: str = "s3_access_key:\ns3_secret_key:\nconnection:\nautodetect_header: Detect whether the CSV file has a header automatically.\nautodetect_dialect: Detect the CSV file's dialect (separator, quoting characters etc) automatically.\nautodetect_encoding: Detect the CSV file's encoding automatically.\nautodetect_sample_size: Sample size, in bytes, for encoding/dialect/header detection.\nschema_inference_rows: Number of rows to use for schema inference.\nencoding: Encoding of the CSV file.\nignore_decode_errors: Ignore errors when decoding the file.\nheader: First line of the CSV file is its header.\ndelimiter: Character used to separate fields in the file.\nquotechar: Character used to quote fields."

credentials_schema: Dict[str, Any] = {'properties': {'s3_access_key': {'type': 'string'}, 's3_secret_key': {'type': 'string'}}, 'type': 'object'}

classmethod from_commandline(engine, commandline_kwargs) → splitgraph.ingestion.csv.CSVDataSource: Instantiate an FDW data source from commandline arguments.

classmethod get_description() → str

get_fdw_name()

classmethod get_name() → str

get_raw_url(tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None, expiry: int = 3600) → Dict[str, List[Tuple[str, str]]]: Get a list of public URLs for each table in this data source, e.g. to export the data as CSV. These may be temporary (e.g. pre-signed S3 URLs) but should be accessible without authentication. :param tables: A TableInfo object overriding the table params of the source :param expiry: The URL should be valid for at least this many seconds :return: Dict of table_name -> list of (mimetype, raw URL)

get_remote_schema_name() → str: Override this if the FDW supports IMPORT FOREIGN SCHEMA

get_server_options()

get_table_options(table_name: str, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None) → Dict[str, str]

classmethod migrate_params(params: Params) → Params

params_schema: Dict[str, Any] = {'properties': {'autodetect_dialect': {'default': True, 'description': "Detect the CSV file's dialect (separator, quoting characters etc) automatically", 'type': 'boolean'}, 'autodetect_encoding': {'default': True, 'description': "Detect the CSV file's encoding automatically", 'type': 'boolean'}, 'autodetect_header': {'default': True, 'description': 'Detect whether the CSV file has a header automatically', 'type': 'boolean'}, 'autodetect_sample_size': {'default': 65536, 'description': 'Sample size, in bytes, for encoding/dialect/header detection', 'type': 'integer'}, 'connection': {'oneOf': [{'type': 'object', 'required': ['connection_type', 'url'], 'properties': {'connection_type': {'type': 'string', 'const': 'http'}, 'url': {'type': 'string', 'description': 'HTTP URL to the CSV file'}}}, {'type': 'object', 'required': ['connection_type', 's3_endpoint', 's3_bucket'], 'properties': {'connection_type': {'type': 'string', 'const': 's3'}, 's3_endpoint': {'type': 'string', 'description': 'S3 endpoint (including port if required)'}, 's3_region': {'type': 'string', 'description': 'Region of the S3 bucket'}, 's3_secure': {'type': 'boolean', 'description': 'Whether to use HTTPS for S3 access'}, 's3_bucket': {'type': 'string', 'description': 'Bucket the object is in'}, 's3_object': {'type': 'string', 'description': 'Limit the import to a single object'}, 's3_object_prefix': {'type': 'string', 'description': 'Prefix for object in S3 bucket'}}}], 'type': 'object'}, 'delimiter': {'default': ',', 'description': 'Character used to separate fields in the file', 'type': 'string'}, 'encoding': {'default': 'utf-8', 'description': 'Encoding of the CSV file', 'type': 'string'}, 'header': {'default': True, 'description': 'First line of the CSV file is its header', 'type': 'boolean'}, 'ignore_decode_errors': {'default': False, 'description': 'Ignore errors when decoding the file', 'type': 'boolean'}, 'quotechar': {'default': '"', 'description': 'Character used to quote fields', 'type': 'string'}, 'schema_inference_rows': {'default': 100000, 'description': 'Number of rows to use for schema inference', 'type': 'integer'}}, 'type': 'object'}

supports_load = True

supports_mount = True

supports_sync = False

table_params_schema: Dict[str, Any] = {'properties': {'autodetect_dialect': {'default': True, 'description': "Detect the CSV file's dialect (separator, quoting characters etc) automatically", 'type': 'boolean'}, 'autodetect_encoding': {'default': True, 'description': "Detect the CSV file's encoding automatically", 'type': 'boolean'}, 'autodetect_header': {'default': True, 'description': 'Detect whether the CSV file has a header automatically', 'type': 'boolean'}, 'autodetect_sample_size': {'default': 65536, 'description': 'Sample size, in bytes, for encoding/dialect/header detection', 'type': 'integer'}, 'delimiter': {'default': ',', 'description': 'Character used to separate fields in the file', 'type': 'string'}, 'encoding': {'default': 'utf-8', 'description': 'Encoding of the CSV file', 'type': 'string'}, 'header': {'default': True, 'description': 'First line of the CSV file is its header', 'type': 'boolean'}, 'ignore_decode_errors': {'default': False, 'description': 'Ignore errors when decoding the file', 'type': 'boolean'}, 'quotechar': {'default': '"', 'description': 'Character used to quote fields', 'type': 'string'}, 's3_object': {'description': 'S3 object of the CSV file', 'type': 'string'}, 'schema_inference_rows': {'default': 100000, 'description': 'Number of rows to use for schema inference', 'type': 'integer'}, 'url': {'description': 'HTTP URL to the CSV file', 'type': 'string'}}, 'type': 'object'}

class splitgraph.ingestion.csv.CSVIngestionAdapter

Bases: splitgraph.ingestion.common.IngestionAdapter

static create_ingestion_table(data, engine, schema: str, table: str, **kwargs)

static data_to_new_table(data, engine: PostgresEngine, schema: str, table: str, no_header: bool = True, **kwargs)

static query_to_data(engine, query: str, schema: Optional[str] = None, **kwargs)

splitgraph.ingestion.csv.copy_csv_buffer(data, engine: PsycopgEngine, schema: str, table: str, no_header: bool = False, **kwargs): Copy CSV data from a buffer into a given schema/table

splitgraph.ingestion.csv.query_to_csv(engine: PsycopgEngine, query, buffer, schema: Optional[str] = None)

Splitgraph has been acquired by EDB! Read the blog post.

splitgraph.ingestion.csv package

Submodules 

splitgraph.ingestion.csv.common module 

splitgraph.ingestion.csv.fdw module 

Module contents 

Product

Support

Company

Splitgraph

Splitgraph has been acquired by EDB! Read the blog post.

splitgraph.ingestion.csv package

Submodules

splitgraph.ingestion.csv.common module

splitgraph.ingestion.csv.fdw module

Module contents

Product

Support

Company

Community

Splitgraph

Submodules 

splitgraph.ingestion.csv.common module 

splitgraph.ingestion.csv.fdw module 

Module contents 