facet.data.partition.ContinuousRangePartitioner#

class facet.data.partition.ContinuousRangePartitioner(max_partitions=None)[source]#

Partition numerical values in adjacent intervals of the same length.

The range of intervals and interval size is computed based on attributes max_partitions, lower_bound, and upper_bound.

Partition boundaries and interval sized are chosen with interpretability in mind and are always a power of 10, or a multiple of 2 or 5 of a power of 10, e.g. 0.1, 0.2, 0.5, 1.0, 2.0, 5.0, and so on.

The intervals also satisfy the following conditions:

  • lower_bound is within the first interval

  • upper_bound is within the last interval

For example, with max_partitions = 10, lower_bound = 3.3, and upper_bound = 4.7, the resulting partitioning would be: [3.2, 3.4), [3.4, 3.6), [3.6, 3.8), [4.0, 4.2), [4.4, 4.6), [4.6, 4.8]

Bases

RangePartitioner [float64, float]

Metaclasses

ABCMeta

Parameters

max_partitions (Optional[int]) – the maximum number of partitions to generate; must be at least 2 (default: 20)

Method summary

fit

Calculate the partitioning for the given observed values.

Attribute summary

DEFAULT_MAX_PARTITIONS

frequencies_

The count of values allocated to each partition.

is_categorical

False

is_fitted

True if this object is fitted, False otherwise.

max_partitions

The maximum number of partitions to be generated by this partitioner.

partition_bounds_

Return the endpoints of the intervals that delineate each partition.

partition_width_

The width of each partition.

partitions_

The values representing the partitions.

Definitions

fit(values, *, lower_bound=None, upper_bound=None, **fit_params)#

Calculate the partitioning for the given observed values.

The lower and upper bounds of the range to be partitioned can be provided as optional arguments. If no bounds are provided, the partitioner automatically chooses the lower and upper outlier thresholds based on the Tukey test, i.e., \([- 1.5 \cdot \mathit{iqr}, 1.5 \cdot \mathit{iqr}]\) where \(\mathit{iqr}\) is the inter-quartile range.

Parameters
  • values (ndarray[Any, dtype[float64]]) – a sequence of observed values as the empirical basis for calculating the partitions

  • lower_bound (Union[float64, float, int, None]) – the inclusive lower bound of the elements to partition

  • upper_bound (Union[float64, float, int, None]) – the inclusive upper bound of the elements to partition

  • fit_params (Any) – optional fitting parameters

Return type

ContinuousRangePartitioner

Returns

self

property frequencies_: numpy.ndarray[Any, numpy.dtype[numpy.int64]]#

The count of values allocated to each partition.

Return type

ndarray[Any, dtype[int64]]

property is_categorical: bool#

False

Return type

bool

property is_fitted: bool#

True if this object is fitted, False otherwise.

Return type

bool

property max_partitions: int#

The maximum number of partitions to be generated by this partitioner.

Return type

int

property partition_bounds_: Sequence[Tuple[T_Values_Scalar, T_Values_Scalar]]#

Return the endpoints of the intervals that delineate each partition.

Return type

Sequence[Tuple[TypeVar(T_Values_Scalar, int, float), TypeVar(T_Values_Scalar, int, float)]]

Returns

sequence of tuples (x, y) for every partition, where x is the inclusive lower bound of a partition range, and y is the exclusive upper bound of a partition range

property partition_width_: T_Values_Scalar#

The width of each partition.

Return type

TypeVar(T_Values_Scalar, int, float)

property partitions_: Sequence[T_Values]#

The values representing the partitions.

Return type

Sequence[TypeVar(T_Values, bound= generic)]