How can we ensure Machine Learning models are fair?

Philip Botros
Philip is an Applied Scientist at Onfido, specializing in the detection of fraudulent documents with Machine Learning models.

There has been a lot of focus on fairness in machine learning recently—it’s become clear that there is a problem that needs solving. It is less clear, however, how to solve this problem or, in the first place, how to translate the notion of fairness to something measurable. 

This article examines how we can create a mathematical definition of fairness that (1) can be used to measure fairness of existing systems and (2) can be incorporated in the machine learning pipeline to actively mitigate this bias. We’ll also expand on what a fair representation of data constitutes and how this can be learned with modern machine learning models.

How does bias creep into current models?

Our reliance on algorithmic decision making is increasing every day. In most instances, these scenarios rely on machine learning techniques that recognize patterns from historical data. While this is often a successful strategy, it can pose a significant threat when these patterns are based on biases found in the data. 

These biases are broadly based on two circumstances. 

  1. A standard machine learning model, which builds on historical data, can incorporate the biases found in the data during training. This can lead to subsequent predictions being made based on these biases. 

  2. And perhaps a less obvious case, when the data is not necessarily biased, but there is simply less data available from a minority group. When there is less data to work with, especially with modern machine learning techniques, this is more likely to lead to modeling inaccuracies. 

So what’s the problem with these modeling inaccuracies?

Well, unjust biases can have real-world impacts. There are growing concerns that credit assignments, mortgage approvals or the provision of healthcare aren’t as fair as they should be, because of biases based on historical data. This has motivated work from large tech organizations to release supporting software for various parts of the machine learning lifecycle. 

Google released multiple fairness diagnosing tools, enabling current model diagnostics as well as diagnosing long term impacts. They also released a software library that enabled the training of fair models early this year, the Tensorflow Constrained Optimization Library. Microsoft has recently (May 2020) released Fairlearn, a tool for assessing and improving fairness. And IBM released the AI Fairness 360 Open Source Toolkit last year to help with mitigating bias throughout the whole lifecycle.

For businesses that operate all over the world, like Onfido, unjust biases can be a real problem. Every individual or group should have the same set of opportunities, regardless of gender, age and ethnicity. For those of us that work with machine learning models, it is imperative that we try to minimize cases of unjust bias and understand how bias arises in our models.

How do we mathematically denote a notion of fairness?

We have to define a mathematical representation of fairness before we can incorporate it in our model. Broadly speaking, fairness in machine learning can be divided into two categories: 

  1. Group fairness 

  2. Individual fairness

This article will focus on group fairness, which aims to create independence between the output of a model and a sensitive (sub) group of a population of interest. 

This could mean equality with respect to a multitude of metrics. A widely chosen metric is  demographic parity, which ensures equal opportunity over multiple groups. Another example is equality of opportunity, where the false rejection rate per group is equalized, defined as the percentage of rejected legitimate applicants.

Let’s zoom in on the simplest metric, demographic parity, to illustrate how we can enforce this in practice. Demographic parity states that, for any group of interest in our dataset, the probability of being assigned to the predicted positive class should be equal. A practical example of this is a university admitting students from a disadvantaged background at the same rate as students from a privileged background, regardless of differences in the quality of their underlying applications. If we denote membership to a specific group with either S=0 or S=1, and P(Y|X) as the classification probability Y given a datapoint X, a straightforward optimization metric for a classifier with respect to two groups is:

min |E[P(Y|X, S=0)] - E[P(Y|X, S=1)]|,

(note that we take the expectation over the whole group since we want the groups to be equal on average, with the outer brackets denoting the absolute difference).

This metric simply measures the expected difference in classification predictions between two different groups. Minimizing this as an additional objective during training would force the model to, on average, output the same probability, regardless of class membership.

Managing a trade-off between fairness and classification accuracy

In an ideal world, we would like to learn representations that are completely fair while maintaining the highest classification accuracy our model is able to achieve. In practice, adding these fairness constraints will have a negative performance on the classification accuracy. 

For example, the objective functions commonly used are optimizing for accuracy directly. Any additional constraint we add diverts from this original objective. However, most fair models provide a parameter, which we can set to control this trade off between fairness and accuracy. This is illustrated in Figure 1.

FIGURE 1: Illustration of decision boundaries under different fairness constraints1 

As the fairness measure gets more significant in the figure (denoted by p%-rule), the accuracy decreases. This is because less optimal decision boundaries (from a classification perspective) are learned by the model to satisfy the increasingly stronger fairness constraint.

It’s also illustrated graphically, where the decision boundary is moving further away from the close to optimal solid blue line, as the measure of fairness increases. In practice, this tradeoff is based on the problem at hand and the objective of the policy makers using the algorithm.

Optimizing for fairness at different stages of the pipeline

Regularization during training is one way of adding fairness to machine learning models. It works under the assumption that the model and relevant data are both available for a particular vendor or practitioner. We can optimize for the chosen fairness constraints by adding them to the objective function. This isn’t necessarily a valid assumption for all scenarios. So different strategies require different use cases.

Another option is to preprocess the training datasets2. Here, the features and the sensitive information are decorrelated before training, while having a minimum impact on the data or decision rules. This is interesting in cases where there is no access to both the data and the training pipeline. As an example, think about a case where a vendor sells, or a party releases data used for downstream tasks, where they have no influence on the training or decision process thereafter.

Lastly, post-processing to obtain fairness is also possible. This is done by adjusting the classifier after training, when the training pipeline is either unavailable or retraining is costly3. By recalibrating the classifier after training, the threshold is set so that it maximizes a certain fairness criterion.

What constitutes a fair representation and how do we learn them?

Most work relating to fairness focuses on optimization during training. But before we can actually incorporate fairness constraints in our training pipeline we have to think about how we want our learned representations to look. These representations are what the model uses to predict the outcome instead of the original (biased) data.

We can define the task as learning fair representations in the following way:

  1. They should be fair, i.e. approximately independent from the sensitive group S
  2. They should include information about the original data point X
  3. They should still incorporate the relevant label information Y to correctly classify that data point

A natural solution to achieve fair representations is to use an encoder-decoder architecture to project the original data to a learned fair space. This space should remove as much of the sensitive information as possible while keeping the relevant information to perform our classification task. Interestingly, most of the methods share the same underlying decomposition of the objective function:

L = * accuracy + * reconstruction ability + * fairness measure

The accuracy denotes the average classification performance of the algorithm. The reconstruction ability shows how well the model can reconstruct the original data points from the encoded representation. Lastly, the fairness measure can denote any measure we define with respect to our fairness objective, for instance, demographic parity. These three components are weighted by , and respectively.

This decomposition is learned in multiple ways. Recent methods usually employ high capacity (deep) neural networks together with adversarial methods or latent variable models.

We can learn a model with a set of latent variables Z that keep a high probability of reconstructing the original data P(X|Z). At the same time, these latent variables should keep the relevant label information P(Y|Z) and minimize the availability of information about the sensitive group P(S|Z). This problem formulation can be represented as a probabilistic graphical model and these additional constraints can, in principle, be built into a Variational Auto-Encoder4.

The other direction uses an adversarial network5,6. The core idea is to have two networks competing with each other. The base network wants to learn representations that are reconstructed and classified well. The adversarial network wants to predict the sensitive variable given the produced representation. In a competing game, minimizing the capacities of the adversarial network will remove the sensitive information from the representations produced by the base network. On the other hand, maximizing the capabilities of the base network will produce representations that retain information about the original data point and the label. Together, this formulation of the optimization problem will create representations that fulfill the characteristics of fair representations.

FIGURE 2: Representations clustered based on the sensitive group before and after applying the fair model4.

Illustrating how we go from having original representations (left), which can be well clustered based on their sensitive attribute, to representations which are almost indistinguishable after applying the fair model (right).

Fair models mean fair decisions in the real world

If we want to deploy machine learning models in the real world, it is of utmost importance to ensure that these models do not exhibit biases that could hinder the opportunity of certain groups. By formalizing a mathematical notion of fairness, we create several options in the pipeline to minimize biases in models.

At the data stage, biases can be removed from the dataset before training the model, guaranteeing fairness when releasing or selling the data to other parties. During the model training stage, fairness constraints can actively be incorporated in the objective function to guide a model to arrive at a more fair solution. Lastly, if retraining the model is costly or the training pipeline is unavailable, a classifier can be adjusted after training by setting a threshold which satisfies the chosen fairness constraints.

Integrating and monitoring these fairness constraints in one of the stages of a real-world system can lead to algorithms providing the same level of opportunity for every group. At Onfido, since we process checks from all over the world, fairness is a key issue. We follow the recent developments in this space to keep ourselves up to date so that we can provide the same opportunities for as many groups as possible.

Bias mitigation at Onfido 

We’ve taken several steps to ensure our models are as fair as possible. You can read more about how we achieved this in a paper we recently published in the WACV 2020. It covers in-depth how we reduced geographic performance differentials for face recognition. The paper is linked here.

We also recently won the CogX award "Best Innovation in Algorithmic Bias Mitigation". We’re extremely proud that our FaceMatch algorithm has achieved market-leading accuracy while being the fairest it’s ever been across ethnicities. This recent upgrade to our FaceMatch algorithm was developed in consultation with the Information Commissioner’s Office (ICO) as part of their new Privacy Sandbox.

As a company, we continue to not only look for ways that we can mitigate bias in our own models, but also advocate for fairness across technologies. By mitigating bias in technologies, we can achieve real-world improvements.


You can read more about our approach to artificial intelligence in our Center of Applied AI pages.



  1. Fairness Constraints: Mechanisms for Fair Classification

  2. Discrimination-and privacy-aware pattern

  3. Equality of Opportunity in Supervised Learning 

  4. Variational Fair Autoencoder

  5. Learning Adversarially Fair and Transferable Representations

  6. Censoring Representations with an Adversary

Previous Article
Onfido receives £5M grant from UK Banking Competition Remedies to improve access to financial services for SMEs
Onfido receives £5M grant from UK Banking Competition Remedies to improve access to financial services for SMEs

Award to focus on advanced digital identity solutions to simplify access, promote account switching, and re...

Next Article
The ethical use of personal data to build AI technologies
The ethical use of personal data to build AI technologies

Our Director of Privacy, Neal Cohen, discussing the ethical creation of AI technologies.