facet.data.Sample#

class facet.data.Sample(observations, *, target_name, feature_names=None, weight_name=None)[source]#

A collection of observations, comprising features, one or more target variables and optional sample weights.

A Sample object serves to keep features, targets and weights aligned, ensuring a more readable and robust ML workflow. It provides basic methods for accessing features, targets and weights, and for selecting subsets of features and observations.

The underlying data structure is a DataFrame.

Supports len(), returning the number of observations in this sample.

Parameters
  • observations (DataFrame) – a table of observational data; each row represents one observation, names of all used columns must be strings

  • target_name (Union[str, Iterable[str]]) – the name of the column representing the target variable; or an iterable of names representing multiple targets

  • feature_names (Optional[Iterable[str]]) – optional iterable of strings naming the columns that represent features; if omitted, all non-target and non-weight columns are considered features

  • weight_name (Optional[str]) – optional name of a column representing the weight of each observation

Method summary

drop

Return a copy of this sample, dropping the features with the given names.

keep

Return a new sample which only includes the features with the given names.

subsample

Return a new sample with a subset of this sample's observations.

Attribute summary

IDX_FEATURE

Default name for the feature index (= column index) used when returning a features table.

IDX_OBSERVATION

Default name for the observations index (= row index) of the underlying data frame.

IDX_TARGET

Default name for the target series or target index (= column index) used when returning the targets.

feature_names

The column names of all features in this sample.

features

The features for all observations.

index

Row index of all observations in this sample.

target

The target variable(s) for all observations.

target_name

The column name of the target in this sample, or a list of column names if this sample has multiple targets.

weight

A series indicating the weight for each observation; None if no weights are defined.

weight_name

The column name of weights in this sample; None if no weights are defined.

Definitions

drop(*, feature_names)[source]#

Return a copy of this sample, dropping the features with the given names.

Parameters

feature_names (Union[str, Collection[str]]) – name(s) of the features to be dropped

Return type

Sample

Returns

copy of this sample, excluding the features with the given names

keep(*, feature_names)[source]#

Return a new sample which only includes the features with the given names.

Parameters

feature_names (Union[str, Iterable[str]]) – name(s) of the features to be selected

Return type

Sample

Returns

copy of this sample, containing only the features with the given names

subsample(*, loc=None, iloc=None)[source]#

Return a new sample with a subset of this sample’s observations.

Select observations either by indices (loc), or integer indices (iloc). Exactly one of both arguments must be provided when calling this method, not both or none.

Parameters
Return type

Sample

Returns

copy of this sample, comprising only the observations in the given rows

IDX_FEATURE = 'feature'#

Default name for the feature index (= column index) used when returning a features table.

IDX_OBSERVATION = 'observation'#

Default name for the observations index (= row index) of the underlying data frame.

IDX_TARGET = 'target'#

Default name for the target series or target index (= column index) used when returning the targets.

property feature_names: List[str]#

The column names of all features in this sample.

Return type

List[str]

property features: pandas.DataFrame#

The features for all observations.

Return type

DataFrame

property index: pandas.Index#

Row index of all observations in this sample.

Return type

Index

property target: Union[pandas.Series, pandas.DataFrame]#

The target variable(s) for all observations.

Represented as a series if there is only a single target, or as a data frame if there are multiple targets.

Return type

Union[Series, DataFrame]

property target_name: Union[str, List[str]]#

The column name of the target in this sample, or a list of column names if this sample has multiple targets.

Return type

Union[str, List[str]]

property weight: Optional[pandas.Series]#

A series indicating the weight for each observation; None if no weights are defined.

Return type

Optional[Series]

property weight_name: Optional[str]#

The column name of weights in this sample; None if no weights are defined.

Return type

Optional[str]