facet.simulation.UnivariateProbabilitySimulator#
- class facet.simulation.UnivariateProbabilitySimulator(model, sample, *, confidence_level=0.95, n_jobs=None, shared_memory=None, pre_dispatch=None, verbose=None)[source]#
Univariate simulation of positive class probabilities based on a binary classifier.
The simulation is carried out for one specific feature x[i] of a model, and for a range of values v[1], …, v[n] for f, determined by a
Partitioner
object.For each value v[j] of the partitioning, a
Sample
of historical observations is modified by assigning value v[j] for feature x[i] for all observations, i.e., assuming that feature x[i] has the constant value v[j].Then the classifier is used to predict the positive class probabilities for all observations, and the mean probability across all observations is calculated for each classifier and value v[j], along with the standard error of the mean as a basis of obtaining confidence intervals.
Note that sample weights are not taken into account for simulations; each observation has the same weight in the simulation even if different weights have been specified for the sample.
Also, care should be taken to re-calibrate classifiers trained on weighted samples as the weighted samples will impact predicted class probabilities.
- Bases
BaseUnivariateSimulator
[ClassifierDF
]- Metaclasses
- Parameters
model (
BaseUnivariateSimulator
) – a fitted learner to use for calculating simulated outputssample (
Sample
) – the sample to be used for baseline calculations and simulationsconfidence_level (
float
) – the width \(\alpha\) of the confidence interval to be estimated for simulation resultsn_jobs (
Optional
[int
]) – number of jobs to use in parallel; ifNone
, use joblib default (default:None
)shared_memory (
Optional
[bool
]) – ifTrue
, use threads in the parallel runs; ifFalse
orNone
, use multiprocessing (default:None
)pre_dispatch (
Union
[int
,str
,None
]) – number of batches to pre-dispatch; ifNone
, use joblib default (default:None
)verbose (
Optional
[int
]) – verbosity level used in the parallel computation; ifNone
, use joblib default (default:None
)
Method summary
Calculate the expectation value of the simulation result, based on historically observed actuals.
Calculate the actual observed frequency of the positive class as the baseline of the simulation.
Simulate the average target uplift when fixing the value of the given feature across all observations.
Attribute summary
Unit of the output values calculated by the simulation.
n_jobs
Number of jobs to use in parallel; if
None
, use joblib default.shared_memory
If
True
, use threads in the parallel runs; ifFalse
orNone
, use multiprocessing.pre_dispatch
Number of batches to pre-dispatch; if
None
, use joblib default.verbose
Verbosity level used in the parallel computation; if
None
, use joblib default.model
The learner pipeline used to conduct simulations
sample
The sample to be used in baseline calculations and simulations
confidence_level
The width of the confidence interval used to calculate the lower/upper bound of the simulation
Definitions
- baseline()#
Calculate the expectation value of the simulation result, based on historically observed actuals.
- Return type
- Returns
the expectation value of the simulation results
- expected_output()[source]#
Calculate the actual observed frequency of the positive class as the baseline of the simulation.
- Return type
- Returns
observed frequency of the positive class
- simulate_feature(feature_name, *, partitioner, **partitioner_params)#
Simulate the average target uplift when fixing the value of the given feature across all observations.
Simulations are run for a set of values determined by the given partitioner, which is fitted to the observed values for the feature being simulated.
- Parameters
feature_name (
str
) – the feature to run the simulation forpartitioner (
Partitioner
[TypeVar
(T_Value
, bound=generic
)]) – the partitioner of feature values to run simulations forpartitioner_params (
Any
) – additional parameters to pass to the partitioner
- Return type
UnivariateSimulationResult
[TypeVar
(T_Value
, bound=generic
)]- Returns
a mapping of output names to simulation results