Adult - Bias#
This notebook computes the gender bias of scores developed the on the adult dataset. It using different bias metrics.
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder, MinMaxScaler, LabelEncoder
from sklearn.model_selection import train_test_split, cross_val_score, KFold
from sklearn.metrics import roc_auc_score
from sklearn.linear_model import LogisticRegression
import xgboost as xgb
from fairscoring.metrics import bias_metric_pe, bias_metric_eo, bias_metric_cal, \
WassersteinMetric, CalibrationMetric
from fairscoring.metrics.roc import bias_metric_roc, bias_metric_xroc
from tqdm.notebook import tqdm
from warnings import simplefilter
# ignore all future warnings
simplefilter(action='ignore', category=FutureWarning)
Load and pre-process data#
Load Adult data#
feature_names=['age', 'workclass', 'fnlwgt', 'education', 'education_num', 'marital_status',
'occupation', 'relationship', 'race', 'sex', 'capital_gain', 'capital_loss',
'hours_per_week', 'native_country', 'income']
dataURL = 'https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data'
df = pd.read_csv(dataURL, delimiter=', ',header=None, names=feature_names, engine="python")
Feature Engineering#
df['native_country_bin']=df['native_country']=='United-States'
num_features=['age', 'capital_gain', 'capital_loss','hours_per_week', 'education_num']
cat_features=['workclass', 'education', 'marital_status', 'occupation', 'race', 'sex',
'native_country_bin'] #'relationship'
# Reduce categories
df = df.replace({'workclass': {
'?': 'Other/Unknown',
'Federal-gov': 'Government',
'Local-gov': 'Government',
'Never-worked': 'Other/Unknown',
'Private': 'Private',
'Self-emp-inc': 'Self-emp',
'Self-emp-not-inc': 'Self-emp',
'State-gov': 'Government',
'Without-pay': 'Other/Unknown'},
'education': {
'10th': '1-12th',
'11th': '1-12th',
'12th': '1-12th',
'1st-4th': '1-12th',
'5th-6th': '1-12th',
'7th-8th': '1-12th',
'9th': '1-12th',
'Assoc-acdm': 'Assoc',
'Assoc-voc': 'Assoc',
'Bachelors': 'University/College',
'Doctorate': 'University/College',
'HS-grad': 'HS-grad',
'Masters': 'University/College',
'Preschool': '1-12th',
'Prof-school': 'University/College',
'Some-college': 'University/College'},
'marital_status': {
'Married-AF-spouse': 'Married',
'Married-civ-spouse': 'Married',
'Married-spouse-absent': 'Married',
'Divorced': 'Div/Sep/Wid',
'Separated': 'Div/Sep/Wid',
'Widowed': 'Div/Sep/Wid'},
'relationship': {
'Husband': 'Spouse/Partner',
'Wife': 'Spouse/Partner',
'Unmarried': 'Unmarried'},
'occupation': {
'Adm-clerical': 'White-Collar',
'Craft-repair': 'Blue-Collar',
'Exec-managerial': 'White-Collar',
'Farming-fishing': 'Blue-Collar',
'Handlers-cleaners': 'Blue-Collar',
'Machine-op-inspct': 'Blue-Collar',
'Other-service': 'Service',
'Priv-house-serv': 'Service',
'Prof-specialty': 'Professional',
'Protective-serv': 'Service',
'Tech-support': 'Service',
'Transport-moving': 'Blue-Collar',
'?': 'Other/Unknown',
'Armed-Forces': 'Other/Unknown'}
})
Encoding#
# Store gener column
gender_column = df["sex"].copy()
ordinal_enc = OrdinalEncoder().fit(df[cat_features])
df[cat_features]=ordinal_enc.transform(df[cat_features])
df[cat_features]=df[cat_features].astype(int)
# Undo Encoding gender
df["sex"] = gender_column
categorical=pd.get_dummies(df[cat_features].astype(str))
numerical=MinMaxScaler().fit_transform(df[num_features])
encoder = LabelEncoder()
target=encoder.fit_transform(df['income'])
Training#
Train-Test Split#
log_reg_data = pd.concat([pd.DataFrame(categorical), pd.DataFrame(numerical)], axis=1)
log_reg_data = log_reg_data.rename(columns={0: 'age', 1: 'capital_gain', 2: 'capital_loss',
3: 'hours_per_week', 4: 'education_num'})
X_train, X_test, y_train, y_test = train_test_split(
log_reg_data, target, test_size=0.3, random_state=43)
Train LogReg Model#
Cross-Validation to check for stability#
shuffle = KFold(n_splits=5, shuffle=True, random_state=2579)
logreg = LogisticRegression(max_iter=1000)
ROC_Values=cross_val_score(logreg, X_train , y_train, cv=shuffle, scoring="roc_auc")
print('\nROC AUC values for 5-fold Cross Validation:\n',ROC_Values)
print('\nStandard Deviation of ROC AUC of the models:', round(ROC_Values.std(),3))
print('\nFinal Average ROC AUC of the model:', round(ROC_Values.mean(),3))
ROC AUC values for 5-fold Cross Validation:
[0.90249847 0.89177676 0.8820583 0.89022666 0.8969462 ]
Standard Deviation of ROC AUC of the models: 0.007
Final Average ROC AUC of the model: 0.893
Final Model#
logreg.fit(X_train, y_train)
y_pred = logreg.predict_proba(X_test)[:,1]
y_pred_train = logreg.predict_proba(X_train)[:,1]
roc_score_logreg = roc_auc_score(y_test, y_pred)
roc_score_logreg_train = roc_auc_score(y_train, y_pred_train)
print('The ROC-AUC of the Logistic Regression is', roc_score_logreg)
print('The train-ROC-AUC of the Logistic Regression is', roc_score_logreg_train)
The ROC-AUC of the Logistic Regression is 0.8975588173788007
The train-ROC-AUC of the Logistic Regression is 0.8942243079704495
Train debiased LogReg Model#
Remove Gender Information#
X_train.columns[[22,23]]
Index(['sex_Female', 'sex_Male'], dtype='object')
X_train_wosex = X_train.drop(X_train.columns[[22,23]], axis=1)
X_test_wosex = X_test.drop(X_train.columns[[22,23]], axis=1)
Cross-Validation to check for stability#
shuffle = KFold(n_splits=5, shuffle=True, random_state=2579)
logreg_wosex = LogisticRegression(max_iter=1000)
ROC_Values=cross_val_score(logreg_wosex, X_train_wosex, y_train, cv=shuffle, scoring="roc_auc")
print('\nROC AUC values for 5-fold Cross Validation:\n',ROC_Values)
print('\nStandard Deviation of ROC AUC of the models:', round(ROC_Values.std(),3))
print('\nFinal Average ROC AUC of the model:', round(ROC_Values.mean(),3))
ROC AUC values for 5-fold Cross Validation:
[0.90207961 0.89145549 0.88137445 0.88927664 0.89602997]
Standard Deviation of ROC AUC of the models: 0.007
Final Average ROC AUC of the model: 0.892
Final Model#
logreg_wosex = LogisticRegression(max_iter=1000)
logreg_wosex.fit(X_train_wosex, y_train)
y_pred_wosex = logreg_wosex.predict_proba(X_test_wosex)[:,1]
y_pred_train_wosex = logreg_wosex.predict_proba(X_train_wosex)[:,1]
roc_score_logreg_wosex = roc_auc_score(y_test, y_pred_wosex)
roc_score_logreg_wosex_train = roc_auc_score(y_train, y_pred_train_wosex)
print('The ROC-AUC of the Logistic Regression is', roc_score_logreg_wosex)
print('The train-ROC-AUC of the Logistic Regression is', roc_score_logreg_wosex_train)
The ROC-AUC of the Logistic Regression is 0.8968531931820867
The train-ROC-AUC of the Logistic Regression is 0.8935059284878036
Train XGBoost Model#
Cross-Validation to check for stability#
shuffle = KFold(n_splits=5, shuffle=True, random_state=2579)
xgb_model = xgb.XGBClassifier()
ROC_Values=cross_val_score(xgb_model, X_train , y_train, cv=shuffle, scoring="roc_auc")
print('\nROC AUC values for 5-fold Cross Validation:\n',ROC_Values)
print('\nStandard Deviation of ROC AUC of the models:', round(ROC_Values.std(),3))
print('\nFinal Average ROC AUC of the model:', round(ROC_Values.mean(),3))
ROC AUC values for 5-fold Cross Validation:
[0.92175832 0.9203497 0.91333443 0.91947067 0.92419827]
Standard Deviation of ROC AUC of the models: 0.004
Final Average ROC AUC of the model: 0.92
Final Model#
xgb_model.fit(X_train, y_train)
y_pred_xgb = xgb_model.predict_proba(X_test)[:,1]
y_pred_train_xgb = xgb_model.predict_proba(X_train)[:,1]
roc_score_xgb = roc_auc_score(y_test, y_pred_xgb)
roc_score_xgb_train = roc_auc_score(y_train, y_pred_train_xgb)
print('The ROC-AUC of the Logistic Regression is', roc_score_xgb)
print('The train-ROC-AUC of the Logistic Regression is', roc_score_xgb_train)
The ROC-AUC of the Logistic Regression is 0.9221733121562541
The train-ROC-AUC of the Logistic Regression is 0.9495695617402895
Bias Measures#
Prepare Dataset#
attribute = df.loc[X_test.index,"sex"]
groups = ['Female', 'Male']
favorable_target = encoder.transform([">50K"])[0]
models = [
("LogReg", y_pred),
("LogReg (debiased)", y_pred_wosex),
("XGBoost", y_pred_xgb)
]
List of bias metrics#
metrics = [
bias_metric_eo, # Standardized Equal Opportunity
bias_metric_pe, # Standardized Predictive Equality
bias_metric_cal, # Standardized Calibration Equality
bias_metric_roc, # ROC-Bias
bias_metric_xroc, # xROC-Bias
WassersteinMetric(fairness_type="EO",name="Equal Opportunity (U)", score_transform="rescale"),
WassersteinMetric(fairness_type="PE",name="Predictive Equality (U)", score_transform="rescale"),
CalibrationMetric(weighting="scores",name="Calibration (U)", score_transform="rescale"),
]
Compute Bias Metrics#
Compute all bias metrics for the dataset
results = []
for metric in tqdm(metrics):
for model, scores in models:
# Compute bias
bias = metric.bias(
scores, y_test, attribute,
groups=groups,
favorable_target=favorable_target,
min_score=0, max_score=1,
n_permute=1000, seed=2579)
# Store result
results.append((metric, model, bias))
Result Table#
This corresponds to table 3 in the publication.
# Models vertically arranged
results = [[
metric.name,
model,
f"{bias.bias:.3f}",
f"{bias.pos_component:.0%}",
f"{bias.neg_component:.0%}",
f"{bias.p_value:.2f}" ] for metric, model, bias in results
]
df_v = pd.DataFrame(results, columns=["metric", "model", "total", "pos", "neg", "p-value"])
df_v.set_index(["metric", "model"], inplace=True)
# Models horizontally arranged
model_names = [name for name, _ in models]
blocks = [df_v[df_v.index.get_level_values(1) == name] for name in model_names]
for i in range(len(blocks)):
blocks[i].set_index(blocks[i].index.droplevel("model"))
blocks[i] = blocks[i].reset_index()
blocks[i].drop("model", axis=1, inplace=True)
if i == 0:
metric_col = blocks[i]["metric"]
blocks[i].drop("metric", axis=1, inplace=True)
df_h = pd.concat([metric_col] + blocks, axis=1, keys=[""]+model_names)
df_h.set_index(df_h.columns[0],inplace=True)
df_h.index.names = ["Metric"]
df_h
| LogReg | LogReg (debiased) | XGBoost | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| total | pos | neg | p-value | total | pos | neg | p-value | total | pos | neg | p-value | |
| Metric | ||||||||||||
| Equal Opportunity | 0.107 | 0% | 100% | 0.00 | 0.069 | 0% | 100% | 0.00 | 0.057 | 1% | 99% | 0.00 |
| Predictive Equality | 0.164 | 0% | 100% | 0.00 | 0.121 | 0% | 100% | 0.00 | 0.143 | 0% | 100% | 0.00 |
| Calibration | 0.052 | 22% | 78% | 0.00 | 0.045 | 55% | 45% | 0.01 | 0.050 | 52% | 48% | 0.00 |
| ROC bias | 0.050 | 98% | 2% | 0.00 | 0.051 | 98% | 2% | 0.00 | 0.033 | 98% | 2% | 0.00 |
| xROC bias | 0.205 | 0% | 100% | 0.00 | 0.151 | 0% | 100% | 0.00 | 0.129 | 0% | 100% | 0.00 |
| Equal Opportunity (U) | 0.161 | 0% | 100% | 0.00 | 0.104 | 0% | 100% | 0.00 | 0.087 | 0% | 100% | 0.00 |
| Predictive Equality (U) | 0.118 | 0% | 100% | 0.00 | 0.098 | 0% | 100% | 0.00 | 0.101 | 0% | 100% | 0.00 |
| Calibration (U) | 0.105 | 20% | 80% | 0.00 | 0.102 | 50% | 50% | 0.00 | 0.138 | 62% | 38% | 0.00 |