Base¶
- group Base
Basic data structure and preprocess
-
struct AllDimsCache¶
- #include <all_dims_cache.h>
Represents a cache for frequency tables of all dimensions given df.
-
struct ContingencyTable¶
- #include <contingency_table.h>
Represents a contingency table for a subset of variables. Key design: radix_weights[i] == product of cardinalities for variables to the right of i (i.e. var_ids[i+1], var_ids[i+2], …) Thus the least-significant “digit” is the last variable in var_ids. make_key uses these precomputed weights directly (no extra mult loop).
Public Functions
-
ContingencyTable(const std::vector<size_t> &var_ids, const DataframeWrapper &df)¶
Construct a new ContingencyTable from a subset of variables.
- Parameters:
var_ids – The column indices of the variables to include (must be sorted ascending).
df – The DataframeWrapper containing the data.
-
ContingencyTable marginalize_to(const std::vector<size_t> &var_ids_tgt) const¶
Marginalize this contingency table S down to a subset T ⊆ S.
- Parameters:
var_ids_tgt – Sorted ascending subset of this->var_ids.
- Returns:
New ContingencyTable defined on var_ids_tgt with counts aggregated. Complexity: O(nnz(S) * |T|), where nnz(S) == counts.size().
-
ContingencyTable(const std::vector<size_t> &var_ids, const DataframeWrapper &df)¶
-
struct DataframeWrapper¶
- #include <dataframe_wrapper.h>
A wrapper struct for a pandas DataFrame.
This struct is used to store the data from a pandas DataFrame. It stores the data in two formats: column-major and row-major. It also stores the mapping between the column names and the column indices, and the mapping between the column values and the column value indices. using uint8_t to minimize data size and improve cache hit rate (uint8_t is still redundant)
Public Functions
-
DataframeWrapper(const py::object &dataframe)¶
Construct a new Dataframe Wrapper from a pandas DataFrame.
The constructor extracts the column names, value mappings, and stores the data in both column-major and row-major formats.
- Parameters:
dataframe – The pandas DataFrame to wrap.
-
DataframeWrapper(const py::object &dataframe)¶
-
struct AllDimsCache¶