Base¶

group Base

Basic data structure and preprocess

DataframeWrapper
ContingencyTable

struct AllDimsCache¶: #include <all_dims_cache.h>

Represents a cache for frequency tables of all dimensions given df.

struct ContingencyTable¶

#include <contingency_table.h>

Represents a contingency table for a subset of variables. Key design: radix_weights[i] == product of cardinalities for variables to the right of i (i.e. var_ids[i+1], var_ids[i+2], …) Thus the least-significant “digit” is the last variable in var_ids. make_key uses these precomputed weights directly (no extra mult loop).

Public Functions

ContingencyTable(const std::vector<size_t> &var_ids, const DataframeWrapper &df)¶

Construct a new ContingencyTable from a subset of variables.

Parameters:

var_ids – The column indices of the variables to include (must be sorted ascending).
df – The DataframeWrapper containing the data.

ContingencyTable marginalize_to(const std::vector<size_t> &var_ids_tgt) const¶

Marginalize this contingency table S down to a subset T ⊆ S.

Parameters:: var_ids_tgt – Sorted ascending subset of this->var_ids.
Returns:: New ContingencyTable defined on var_ids_tgt with counts aggregated. Complexity: O(nnz(S) * |T|), where nnz(S) == counts.size().

template<typename RowLike> inline size_t make_key(const RowLike &row) const noexcept¶: Create a linear key from a row-like object (supports operator[]). Uses precomputed radix_weights for clarity and speed.

struct DataframeWrapper¶

#include <dataframe_wrapper.h>

A wrapper struct for a pandas DataFrame.

This struct is used to store the data from a pandas DataFrame. It stores the data in two formats: column-major and row-major. It also stores the mapping between the column names and the column indices, and the mapping between the column values and the column value indices. using uint8_t to minimize data size and improve cache hit rate (uint8_t is still redundant)

Public Functions

DataframeWrapper(const py::object &dataframe)¶

Construct a new Dataframe Wrapper from a pandas DataFrame.

The constructor extracts the column names, value mappings, and stores the data in both column-major and row-major formats.

Parameters:: dataframe – The pandas DataFrame to wrap.

Base¶

Table of Contents

Related Topics