splitgraph.ingestion package
Subpackages
Submodules
splitgraph.ingestion.common module
- class splitgraph.ingestion.common.IngestionAdapter
Bases:
object
- abstract create_ingestion_table(data, engine, schema: str, table: str, **kwargs)
- abstract data_to_new_table(data, engine, schema: str, table: str, no_header: bool = True, **kwargs)
- abstract query_to_data(engine, query: str, schema: Optional[str] = None, **kwargs)
- to_data(query: str, image: Optional[Union[splitgraph.core.image.Image, str]] = None, repository: Optional[splitgraph.core.repository.Repository] = None, use_lq: bool = False, **kwargs)
- to_table(data, repository: splitgraph.core.repository.Repository, table: str, if_exists: str = 'patch', schema_check: bool = True, no_header: bool = False, **kwargs)
- splitgraph.ingestion.common.add_timestamp_tags(repository: splitgraph.core.repository.Repository, image_hash: str)
- splitgraph.ingestion.common.build_commandline_help(json_schema)
- splitgraph.ingestion.common.dedupe_sg_schema(schema_spec: List[splitgraph.core.types.TableColumn], prefix_len: int = 59) List[splitgraph.core.types.TableColumn]
Some foreign schemas have columns that are longer than 63 characters where the first 63 characters are the same between several columns (e.g. odn.data.socrata.com). This routine renames columns in a schema to make sure this can’t happen (by giving duplicates a number suffix).
- splitgraph.ingestion.common.generate_column_names(schema_spec: List[splitgraph.core.types.TableColumn], prefix: str = 'col_') List[splitgraph.core.types.TableColumn]
Replace empty column names with autogenerated ones
- splitgraph.ingestion.common.merge_tables(engine: splitgraph.engine.postgres.engine.PsycopgEngine, source_schema: str, source_table: str, source_schema_spec: List[splitgraph.core.types.TableColumn], target_schema: str, target_table: str, target_schema_spec: List[splitgraph.core.types.TableColumn])
- splitgraph.ingestion.common.schema_compatible(source_schema: List[splitgraph.core.types.TableColumn], target_schema: List[splitgraph.core.types.TableColumn]) bool
Quick check to see if a dataframe with target_schema can be written into source_schema. There are some implicit type conversions that SQLAlchemy/Pandas can do so we don’t want to immediately fail if the column types aren’t exactly the same (eg bigint vs numeric etc). Most errors should be caught by PG itself.
Schema is a list of (ordinal, name, type, is_pk).
splitgraph.ingestion.inference module
- splitgraph.ingestion.inference.infer_sg_schema(sample: Sequence[List[str]], override_types: Optional[Dict[str, str]] = None, primary_keys: Optional[List[str]] = None)
- splitgraph.ingestion.inference.parse_bigint(integer: str)
- splitgraph.ingestion.inference.parse_boolean(boolean: str)
- splitgraph.ingestion.inference.parse_int(integer: str)
- splitgraph.ingestion.inference.parse_json(json_s: str)
splitgraph.ingestion.pandas module
Routines that ingest/export CSV files to/from Splitgraph images using Pandas
- class splitgraph.ingestion.pandas.PandasIngestionAdapter
Bases:
splitgraph.ingestion.common.IngestionAdapter
- static create_ingestion_table(data, engine, schema: str, table: str, **kwargs)
- static data_to_new_table(data, engine: PsycopgEngine, schema: str, table: str, no_header: bool = True, **kwargs)
- static query_to_data(engine, query: str, schema: Optional[str] = None, **kwargs)
- splitgraph.ingestion.pandas.df_to_table(df: Union[pandas.core.series.Series, pandas.core.frame.DataFrame], repository: splitgraph.core.repository.Repository, table: str, if_exists: str = 'patch', schema_check: bool = True) None
Writes a Pandas DataFrame to a checked-out Splitgraph table. Doesn’t create a new image.
- Parameters
df – Pandas DataFrame to insert.
repository – Splitgraph Repository object. Must be checked out.
table – Table name.
if_exists – Behaviour if the table already exists: ‘patch’ means that primary keys that already exist in the
table will be updated and ones that don’t will be inserted. ‘replace’ means that the table will be dropped and recreated. :param schema_check: If False, skips checking that the dataframe is compatible with the target schema.
- splitgraph.ingestion.pandas.df_to_table_fast(engine: PsycopgEngine, df: Union[pandas.core.series.Series, pandas.core.frame.DataFrame], target_schema: str, target_table: str)
- splitgraph.ingestion.pandas.sql_to_df(sql: str, image: Optional[Union[splitgraph.core.image.Image, str]] = None, repository: Optional[splitgraph.core.repository.Repository] = None, use_lq: bool = False, **kwargs) pandas.core.frame.DataFrame
Executes an SQL query against a Splitgraph image, returning the result.
Extra **kwargs are passed to Pandas’ read_sql_query.
- Parameters
sql – SQL query to execute.
image – Image object, image hash/tag (str) or None (use the currently checked out image).
repository – Repository the image belongs to. Must be set if image is a hash/tag or None.
use_lq – Whether to use layered querying or check out the image if it’s not checked out.
- Returns
A Pandas dataframe.