
According to a recent survey of 200 business executives, financial institutions are increasingly concerned with fraud detection challenges at customer onboarding. More than 1 in 3 execs rank synthetic fraud (37%) and deepfakes (35%) as top concerns. It’s no wonder executives are concerned — the FTC estimates that $8.8 billion was lost to fraud in 2022.
Financial services as an industry are at the front line of the fight against fraud. But as that same survey reveals, fraud is not the only challenge financial services are facing. More than 4 in 5 execs (84%) say they are less than completely confident in their company's ability to keep up with changes in data privacy regulation. And nearly half of executives (46%) say there are too many steps for customers in their onboarding process.
Financial institutions need to achieve a perfect balance between detecting fraud and:
- Onboarding more users (in other words, minimizing friction)
- Respecting regulations (onboarding customers in a compliant way)
- Controlling spend (ensuring that none of the above is costing the business too much money)
One business priority cannot overwhelmingly impact the other. So striking a good balance across these different priorities presents a major challenge for the industry. This is where financial fraud detection using machine learning comes in. Artificial intelligence (AI) using machine learning can help to make tasks like know your customer (KYC) and fraud detection scalable. While training fraud detection machine learning algorithms in itself poses challenges, leveraging technology like machine learning does offer significant advantages.
What is fraud detection using machine learning?
But first, let’s start with the basics — defining fraud detection machine learning.
Machine learning is a type of artificial intelligence (AI) that is able to learn from existing data without requiring explicit instructions. While AI has been around since at least the 1960s, machine learning is more recent. It really took off with the advent of deep learning ten years ago.
The power of machine learning comes from the fact that statistical models can infer patterns from the data with little guidance from humans. Often they end up performing better than humans on many tasks such as image recognition (machines interpret and categorize images or videos), translation (machines translate text or speech from one language to another, eg. Google Translate), and gameplay (machines designed to play chess and other games eg. AlphaGo).
Fraud detection is the process of identifying and preventing fraud. So fraud detection using machine learning is the process of using a trained computer system to catch fraud.
How to use AI and machine learning in fraud detection
Fraud detection often happens as part of the identity verification step during customer onboarding. Why? Because businesses are required to meet compliance requirements such as KYC and AML, and take necessary steps to prevent financial crime. Preventing fraud also helps protect their customers, their business reputation and bottom line.
Fraud detection using machine learning in banking and other financial institutions offers several benefits. It’s faster, cheaper and often more accurate than relying on humans to identify fraud. But training machine learning solutions to get the best results out of them requires a thoughtful approach. Fraud detection is challenging — for machines as well as humans — and those challenges should be addressed when training algorithms.
The challenges of training machine learning algorithms
- Fraud prevention is at odds with low friction. If stopping fraud was our only concern, businesses could do so tomorrow by blocking 100% of customers. But any business that does will go bust pretty quickly. Businesses need strong fraud prevention capabilities to protect their customers and bottom line. But they also need to offer low-friction, seamless and intuitive sign-up processes for genuine customers. Making the sign-up process as easy and frictionless as possible is key for businesses' long-term growth.
- Fraud is dynamic and ever-changing. Fraudsters are innovative and invent new methods of attack all the time. They are constantly looking for ways to get past business defenses. Remove one route, and they’ll try and try again until they find another. Fraud detection solutions must be innovative and adaptable to get ahead of the latest attack vectors.
- There are thousands of identity documents: Each country has several different ID types. And each of these IDs will have several versions in circulation at one time. Extrapolate this across the globe and that’s thousands of different documents, each with a different format, standard and varying character sets. Training models to detect fraud across all these documents and their different variations is a challenge to say the least.
- ‘Noisy’ data can impact the algorithms. There is no guaranteed ground truth in fraud detection. Comparison with genuine specimen documents is one of the best ways to detect forgeries, and a lot of fraud is obvious and easy to catch. But for highly sophisticated forgeries, even the best experts might struggle to tell the difference. This is what’s called a subtle signal — where sometimes the difference between a fraud and a genuine is smaller than the difference between two genuine samples. The performance of fraud detection machine learning models is reliant on the quality of data used to train the algorithms.
- Not enough data. Following the previous point, it can be hard to get enough good-quality data to train algorithms. Sometimes we might only see a few documents with a certain type of fraud. But to accurately train machine learning models, you need large data sets.
What machine learning models are used for fraud detection?
Every fraud detection machine learning algorithm is slightly different. But generally, training machine learning algorithms for fraud detection involves the following steps:
- Data sourcing: The first step is to source the data that will train the model. Often, this raw data is noisy and unlabelled.
- QC / labeling: Human experts label and curate the data to weed out low-quality data.
- Model training: Train the model using large-scale computing platforms.
- Model evaluation: Evaluate the performance of the model against an unknown dataset (the holdout set) to see if it performs as expected.
- Model deployment: If the model evaluation is conclusive, the final step is to deploy the model to production.
- Monitoring: Monitoring the model's performance in production and adjust and improve it as needed.

What to look for in fraud detection machine learning solutions
There are a lot of technical processes involved in building and training fraud detection machine learning models. It takes time, resources and a lot of expertise. So many financial institutions will use specialist, third-party solutions to help their fraud detection efforts.
So what should businesses look for in a fraud detection solution that leverages AI and machine learning? What makes one fraud detection machine learning model better than another?
Data is key
As mentioned earlier, the best results come from training machine learning models on large volumes of high-quality data. This gives businesses a higher degree of certainty that
a) They’re catching more fraud, and
b) They’re catching fraud more accurately.
At Onfido, we train our fraud detection machine learning models on a wide range of genuine and fraudulent datasets. By exposing the model to genuine and fraudulent samples it gets better over time at detecting the difference between the two.
We even develop our own fraud samples in-house to ensure we have enough high-quality data to train the models. We also have an industrial-grade data cycle and machine learning operations, which helps us maintain state-of-the-art models in production for the benefit of our customers.
Look for expertise
Machine learning models aren’t built overnight. The best fraud detection solutions draw on years of research, training and expertise.
Onfido’s unique applied scientist and analytics team combines decades of experience in industrial research. They’ve built our fraud detection technology over the past 10 years, leveraging existing off-the-shelf models plus building specialized, in-house models. We have developed unique expertise leveraging unsupervised, supervised and self-supervised machine learning. Combining classical computer vision with deep learning, we've designed several patented algorithms specifically for fraud detection.
Solid infrastructure
Identity verification is not a nice-to-have, but a critical piece of infrastructure of the internet. Our customers and end users expect outstanding robustness and reliability from Onfido. This is why we've built a robust cloud-based computing platform that can handle and monitor our traffic in real-time.
In addition, the data we handle is very sensitive since it contains private information of end users. We take strong measures to protect this data. We enforce the data deletion policy agreed upon with our customers using programmatic data deletion across our entire platform. Finally, a dedicated in-house security team ensures that we hold state-of-the-art security standards within the company.
Relationships with regulatory bodies
AI governance, how we handle personal data, and the future of fraud detection are all key considerations when building a successful long-term product. At Onfido, we have developed strong ties with regulatory bodies such as the ICO in the UK.
Anti-bias considerations
Fraud detection solutions need to work equally well, for everybody. It’s crucial to build fair models and prevent bias from creeping in, especially when biometrics are involved.
Learn more about what we’ve been doing at Onfido to define, measure, and mitigate biometric bias in our whitepaper Building AI without Bias.
The benefits of fraud detection using machine learning
Efficient and scalable
Businesses that opt for fraud detection using machine learning will be able to process information much more quickly and in much larger volumes. This reduces the number of verification checks that go for manual review. In turn, internal fraud teams will spend less time manually reviewing documents.
Manual reviews also aren’t scalable. For one thing, they are limited to business hours. For another, as businesses grow or see a sudden increase in applications, manual processes can cause bottlenecks. This pushes up costs as businesses have to hire more reviewers to cope with demand. Ultimately, these slowdowns are going to turn genuine end-users away.
Repeatable and deterministic
It’s much easier (and quicker, and cheaper) to improve a machine-learning model than to train a human analyst. For example, border guards have years of training to help them identify fraudulent documents. Even then, humans are more likely to make errors, for example, if they’re tired. Comparatively, we can deploy a new machine-learning model across the globe in minutes.
More accurate
We can train machine learning algorithms to detect minute changes in documents that could point to potential signs of fraud. For highly sophisticated forgeries, even an expertly trained human eye might struggle to detect fraud. Fraud detection machine learning algorithms can pinpoint small changes across document layout, fonts, data consistency and much, much more.
Cost effective
As a business grows, they will naturally onboard more customers, and have to deal with more fraud. Relying on manual reviews means hiring more analysts to keep up with demand. Comparatively, you only need one machine-learning system to go through all the data you throw at it, regardless of the volume. This is much more scalable for businesses that see seasonal ebbs and flows in sign-ups. Fraud detection machine learning systems can help combat fraud as onboarding volumes, without dramatically increasing risk management costs.
For businesses whose priority is to stay ahead of fast-moving fraud, machine learning is a critical asset. It enables scalable, robust and cost-effective fraud detection, helping the industry manage a difficult trade-off between fast customer onboarding and strong fraud protection. While machine learning requires advanced skills and infrastructure, done right it will bring unique benefits to the field moving forward.
Learn how the Onfido Real Identity Platform can help you strike the perfect balance between fraud prevention and customer acquisition.