facet.data.Sample#

class facet.data.Sample(observations, *, target_name, feature_names=None, weight_name=None)[source]#

A collection of observations, comprising features, one or more target variables and optional sample weights.

A Sample object serves to keep features, targets and weights aligned, ensuring a more readable and robust ML workflow. It provides basic methods for accessing features, targets and weights, and for selecting subsets of features and observations.

The underlying data structure is a DataFrame.

Supports len(), returning the number of observations in this sample.

Parameters

observations (DataFrame) – a table of observational data; each row represents one observation, names of all used columns must be strings
target_name (Union[str, Iterable[str]]) – the name of the column representing the target variable; or an iterable of names representing multiple targets
feature_names (Optional[Iterable[str]]) – optional iterable of strings naming the columns that represent features; if omitted, all non-target and non-weight columns are considered features
weight_name (Optional[str]) – optional name of a column representing the weight of each observation

Method summary

`drop`	Return a copy of this sample, dropping the features with the given names.
`keep`	Return a new sample which only includes the features with the given names.
`subsample`	Return a new sample with a subset of this sample's observations.

Attribute summary

`IDX_FEATURE`	Default name for the feature index (= column index) used when returning a features table.
`IDX_OBSERVATION`	Default name for the observations index (= row index) of the underlying data frame.
`IDX_TARGET`	Default name for the target series or target index (= column index) used when returning the targets.
`feature_names`	The column names of all features in this sample.
`features`	The features for all observations.
`index`	Row index of all observations in this sample.
`target`	The target variable(s) for all observations.
`target_name`	The column name of the target in this sample, or a list of column names if this sample has multiple targets.
`weight`	A series indicating the weight for each observation; `None` if no weights are defined.
`weight_name`	The column name of weights in this sample; `None` if no weights are defined.

Definitions

drop(*, feature_names)[source]#

Return a copy of this sample, dropping the features with the given names.

Parameters: feature_names (Union[str, Collection[str]]) – name(s) of the features to be dropped
Return type: Sample
Returns: copy of this sample, excluding the features with the given names

keep(*, feature_names)[source]#

Return a new sample which only includes the features with the given names.

Parameters: feature_names (Union[str, Iterable[str]]) – name(s) of the features to be selected
Return type: Sample
Returns: copy of this sample, containing only the features with the given names

subsample(*, loc=None, iloc=None)[source]#

Return a new sample with a subset of this sample’s observations.

Select observations either by indices (loc), or integer indices (iloc). Exactly one of both arguments must be provided when calling this method, not both or none.

Parameters

loc (Union[slice, Sequence[Any], None]) – indices of observations to select
iloc (Union[slice, Sequence[int], None]) – integer indices of observations to select

Return type

Sample

Returns

copy of this sample, comprising only the observations in the given rows

IDX_FEATURE = 'feature'#: Default name for the feature index (= column index) used when returning a features table.

IDX_OBSERVATION = 'observation'#: Default name for the observations index (= row index) of the underlying data frame.

IDX_TARGET = 'target'#: Default name for the target series or target index (= column index) used when returning the targets.

property feature_names: List[str]#

The column names of all features in this sample.

Return type: List[str]

property features: pandas.DataFrame#

The features for all observations.

Return type: DataFrame

property index: pandas.Index#

Row index of all observations in this sample.

Return type: Index

property target: Union[pandas.Series, pandas.DataFrame]#

The target variable(s) for all observations.

Represented as a series if there is only a single target, or as a data frame if there are multiple targets.

Return type: Union[Series, DataFrame]

property target_name: Union[str, List[str]]#

The column name of the target in this sample, or a list of column names if this sample has multiple targets.

Return type: Union[str, List[str]]

property weight: Optional[pandas.Series]#

A series indicating the weight for each observation; None if no weights are defined.

Return type: Optional[Series]

property weight_name: Optional[str]#

The column name of weights in this sample; None if no weights are defined.

Return type: Optional[str]

facet.data

facet.data.partition