facet.simulation.UnivariateTargetSimulator#

class facet.simulation.UnivariateTargetSimulator(model, sample, *, confidence_level=0.95, n_jobs=None, shared_memory=None, pre_dispatch=None, verbose=None)[source]#

Univariate simulation of the absolute output of a regression model.

The simulation is carried out for one specific feature x[i] of a model, and for a range of values v[1], …, v[n] for f, determined by a Partitioner object.

For each value v[j] of the partitioning, a Sample of historical observations is modified by assigning value v[j] for feature x[i] for all observations, i.e., assuming that feature x[i] has the constant value v[j].

Then the regressor is used to predict the output for all observations, and the mean output across all observations is calculated for each regressor and value v[j], along with the standard error of the mean as a basis of obtaining confidence intervals.

Note that sample weights are not taken into account for simulations; each observation has the same weight in the simulation even if different weights have been specified for the sample.

Bases

UnivariateRegressionSimulator

Metaclasses

ABCMeta

Parameters
  • model (BaseUnivariateSimulator) – a fitted learner to use for calculating simulated outputs

  • sample (Sample) – the sample to be used for baseline calculations and simulations

  • confidence_level (float) – the width \(\alpha\) of the confidence interval to be estimated for simulation results

  • n_jobs (Optional[int]) – number of jobs to use in parallel; if None, use joblib default (default: None)

  • shared_memory (Optional[bool]) – if True, use threads in the parallel runs; if False or None, use multiprocessing (default: None)

  • pre_dispatch (Union[int, str, None]) – number of batches to pre-dispatch; if None, use joblib default (default: None)

  • verbose (Optional[int]) – verbosity level used in the parallel computation; if None, use joblib default (default: None)

Method summary

baseline

Calculate the expectation value of the simulation result, based on historically observed actuals.

expected_output

Calculate the mean of actually observed values for the target.

simulate_feature

Simulate the average target uplift when fixing the value of the given feature across all observations.

Attribute summary

output_unit

Unit of the output values calculated by the simulation.

n_jobs

Number of jobs to use in parallel; if None, use joblib default.

shared_memory

If True, use threads in the parallel runs; if False or None, use multiprocessing.

pre_dispatch

Number of batches to pre-dispatch; if None, use joblib default.

verbose

Verbosity level used in the parallel computation; if None, use joblib default.

model

The learner pipeline used to conduct simulations

sample

The sample to be used in baseline calculations and simulations

confidence_level

The width of the confidence interval used to calculate the lower/upper bound of the simulation

Definitions

baseline()#

Calculate the expectation value of the simulation result, based on historically observed actuals.

Return type

float

Returns

the expectation value of the simulation results

expected_output()#

Calculate the mean of actually observed values for the target.

Return type

float

Returns

mean observed value of the target

simulate_feature(feature_name, *, partitioner, **partitioner_params)#

Simulate the average target uplift when fixing the value of the given feature across all observations.

Simulations are run for a set of values determined by the given partitioner, which is fitted to the observed values for the feature being simulated.

Parameters
  • feature_name (str) – the feature to run the simulation for

  • partitioner (Partitioner[TypeVar(T_Value, bound= generic)]) – the partitioner of feature values to run simulations for

  • partitioner_params (Any) – additional parameters to pass to the partitioner

Return type

UnivariateSimulationResult[TypeVar(T_Value, bound= generic)]

Returns

a mapping of output names to simulation results

property output_unit: str#

Unit of the output values calculated by the simulation.

Return type

str