splitgraph.core.indexing package
Submodules
splitgraph.core.indexing.bloom module
Bloom filtering on fragments for equality queries.
- splitgraph.core.indexing.bloom.describe(index_tuple: Tuple[int, str]) str
Returns a pretty-printed summary of the bloom filter
- Parameters
index_tuple – Tuple of (k, base64-encoded fingerprint) returned by generate_bloom_index
- Returns
String
- splitgraph.core.indexing.bloom.filter_bloom_index(engine: PsycopgEngine, object_ids: List[str], quals: Any) List[str]
Runs a bloom filter on given qualifiers using the given objects’ previously-generated fingerprints.
- Parameters
engine – Object engine
object_ids – Object IDs
quals – List of qualifiers
- Returns
List of object IDs that might match the qualifiers in quals (including IDs that don’t have a bloom index).
- splitgraph.core.indexing.bloom.generate_bloom_index(engine: PsycopgEngine, object_id: str, changeset: Optional[Dict[Tuple[str, ...], Tuple[bool, Dict[str, Any], Dict[str, Any]]]], column: str, probability: Optional[float] = None, size: Optional[int] = None) Tuple[int, str]
Generates a bloom filter signature for a given column and a given fragment. Bloom filters can answer queries asking whether an item is definitely not in a given set or possibly can be.
The tradeoff is between the probability of a false positive (item said to be in the set when it actually isn’t) and the size of the filter.
Bloom filters also have an extra parameter, k, or the number of bits in the signature that a certain item flips. This parameter has an optimal value for a given number of distinct items or a probability and so isn’t explicitly passed by the user.
- Parameters
engine – Object engine the fragment is cached in.
object_id – Fragment ID
changeset – Optional, if specified, the old column values are included in the index.
column – Column name to generate the index on.
probability – Probability of a false positive. Either this or the size of the filter must be specified, but not both.
size – Size of the filter, in bytes.
- Returns
Dictionary to be inserted into the index.
splitgraph.core.indexing.range module
- splitgraph.core.indexing.range.extract_min_max_pks(engine: PsycopgEngine, fragments: List[str], table_pks: List[str], table_pk_types: List[str]) Any
Extract minimum/maximum PK values for given fragments.
- Parameters
engine – Engine the objects live on
fragments – IDs of objects
table_pks – List of columns forming the table primary key
table_pk_types – List of types for table PK columns
- Returns
List of min/max primary key for every object.
- splitgraph.core.indexing.range.filter_range_index(metadata_engine: PsycopgEngine, object_ids: List[str], quals: Any, column_types: Dict[str, str]) List[str]
- splitgraph.core.indexing.range.generate_range_index(object_engine: PsycopgEngine, object_id: str, table_schema: TableSchema, changeset: Optional[Dict[Tuple[str, ...], Tuple[bool, Dict[str, Any], Dict[str, Any]]]], columns: Optional[List[str]] = None) Dict[str, Tuple[splitgraph.core.indexing.range.T, splitgraph.core.indexing.range.T]]
Calculate the minimum/maximum values of every column in the object (including deleted values).
- Parameters
object_engine – Engine the object is located on
object_id – ID of the object.
table_schema – Schema of the table
changeset – Changeset (old values will be included in the index)
columns – Columns to run the index on (default all)
- Returns
Dictionary of {column: [min, max]}
- splitgraph.core.indexing.range.quals_to_sql(quals: Optional[Sequence[Sequence[Tuple[str, str, Any]]]], column_types: Dict[str, str]) Tuple[psycopg2.sql.Composable, Tuple]
Convert a list of qualifiers in CNF to a fragment of a Postgres query :param quals: Qualifiers in CNF :param column_types: Dictionary of column names and their types :return: SQL Composable object and a tuple of arguments to be mogrified into it.