Concepts#
The aim of this package is to provide different metrics that measure bias in continuous scoring systems.
Fairness Concepts#
Given a score or prediction \(S\in\mathbb{R}\), a binary target variable \(Y\in\{0,1\}\) and a (binary) group variable \(A\in\{a,b\}\), we can formulate the following concepts for group fairness:
- Independence Bias
A score fulfills independence fairness, if the score distribution is independent of the group, i.e. \(S\perp \!\!\! \perp A\). Note that a (hypothetical) perfect predictor (where prediction are equal to the outcome) will fail independence fairness if the basic risk rates differ between groups. For this reason, we do not recommend to use it, if a reliable target variable is available.
To quantify the bias, one can measure the difference between the two distributions \(S|A=a\) and \(S|A=b\). In our package, this is done via
WassersteinMetric(withfairness_type='IND'), which computes the Wasserstein distance between these two distributions.- Separation Bias
A score fulfills separation fairness, if the score distribution is independent of the group given the outcome, i.e. \(S\perp \!\!\! \perp A \,|\, Y\).
We implement two metrics to measure the magnitude of the separation bias:
Equal Opportunity compares the score distributions of samples with favorable outcome, i.e. \(S|A=a,Y=0\) and \(S|A=b,Y=0\).
Predictive Equality compares the score distributions of samples with unfavorable outcome, i.e. \(S|A=a,Y=1\) and \(S|A=b,Y=1\).
Both metrics are implemented in
WassersteinMetric(either withfairness_type='EO'or withfairness_type='PE'). These classes compute the Wasserstein distance between above mentioned distributions.- Sufficiency / Calibration Bias
A score fulfills calibration fairness, if the outcome is independent of the group given the score, i.e. \(Y\perp \!\!\! \perp A \,|\, S\). For scores, we use calibration and sufficiency bias synonymously. Note that for fairness of (binary) decisions, the term sufficiency is used. In this sence, sufficiency is the broader concept.
In this package, the magnitude of the calibration bias can be measured via
CalibrationMetric, which measures the differences between calibration curves.
Further Metrics#
Beside the metrics, there are other ways to measure differences between groups. The downside of these metrics is the lack of a clear connection to one of the fairness concepts. Nevertheless, this packages also provides some of these metrics:
- ROC / ABROCA Metrics
These metrics measure the Absolute Between-ROC Area (ABROCA). ROC-Base measures are available through the
ROCBiasMetric.We distinguish the following to metrics that compare different roc-curves:
Setting
bias_type='roc'computes the area between the groupwise roc-curves.Setting
bias_type='xroc'(cross-roc) builds roc-curves with \(Y=0\) samples from one group and \(Y=1\) samples from the other group.
API Concepts#
Metrics#
In this package, each bias metric is implemented as an instance of the BaseBiasMetric class.
The main method of this class is bias(), which takes three arrays and
some metadata to compute the bias. These three arrays are:
The score value of each sample
The target variable / the actual outcome
The attribute or group each sample belongs to
As a convenience function, BaseBiasMetric is also callable. Calling a metric will
return the bias as a single float value.
Bias Results#
The method bias() returns a BiasResult
object. In its basic form, this class only contains a single bias value. The idea is to extend this class to return
further data specific to some bias metrics.
Most notable is the TwoGroupBiasResult that is currently supported by each bias metric.
Beside the pure bias value, it also contains a split into positive and negative bias.
Plots#
The package contains a number of plots that visualize the bias. These allow for a better understanding of the bias. Each plot takes an axes, which allows to combine multiple plots into one bias figure.
See the fairscoring.plots module for a list of supported plots.
Examples for their usage can be found in the examples section.