Base

group Base

Basic data structure and preprocess

struct AllDimsCache
#include <all_dims_cache.h>

Represents a cache for frequency tables of all dimensions given df.

struct ContingencyTable
#include <contingency_table.h>

Represents a contingency table for a subset of variables. Key design: radix_weights[i] == product of cardinalities for variables to the right of i (i.e. var_ids[i+1], var_ids[i+2], …) Thus the least-significant “digit” is the last variable in var_ids. make_key uses these precomputed weights directly (no extra mult loop).

Public Functions

ContingencyTable(const std::vector<size_t> &var_ids, const DataframeWrapper &df)

Construct a new ContingencyTable from a subset of variables.

Parameters:
  • var_ids – The column indices of the variables to include (must be sorted ascending).

  • df – The DataframeWrapper containing the data.

ContingencyTable marginalize_to(const std::vector<size_t> &var_ids_tgt) const

Marginalize this contingency table S down to a subset T ⊆ S.

Parameters:

var_ids_tgt – Sorted ascending subset of this->var_ids.

Returns:

New ContingencyTable defined on var_ids_tgt with counts aggregated. Complexity: O(nnz(S) * |T|), where nnz(S) == counts.size().

template<typename RowLike>
inline size_t make_key(const RowLike &row) const noexcept

Create a linear key from a row-like object (supports operator[]). Uses precomputed radix_weights for clarity and speed.

struct DataframeWrapper
#include <dataframe_wrapper.h>

A wrapper struct for a pandas DataFrame.

This struct is used to store the data from a pandas DataFrame. It stores the data in two formats: column-major and row-major. It also stores the mapping between the column names and the column indices, and the mapping between the column values and the column value indices. using uint8_t to minimize data size and improve cache hit rate (uint8_t is still redundant)

Public Functions

DataframeWrapper(const py::object &dataframe)

Construct a new Dataframe Wrapper from a pandas DataFrame.

The constructor extracts the column names, value mappings, and stores the data in both column-major and row-major formats.

Parameters:

dataframe – The pandas DataFrame to wrap.