Application fraud prediction using Machine learning model adds 10% revenue

Advanced Machine Learning models to identify fraud customers at the point of application


Fraud is a billion-dollar business and it is increasing every year. Traditional methods of data analysis have long been used to detect fraud. They require complex and time-consuming investigations that deal with different domains of knowledge like financial, economics, business practices and law. Fraud often consists of many instances or incidents involving repeated transgressions using the same method. Fraud instances can be similar in content and appearance but usually are not identical.

The first industries to use data analysis techniques to prevent fraud were the telephone companies, the insurance companies and the banks (Decker 1998). One early example of successful implementation of data analysis techniques in the banking industry is the FICO Falcon fraud assessment system, which is based on a neural network shell.

In general, the primary reason to use data analytics techniques is to tackle fraud since many internal control systems have serious weaknesses. In order to effectively test, detect, validate, correct error and monitor control systems against fraudulent activities, businesses entities and organizations rely on specialized data analytics techniques such as data mining, data matching, sounds like function, Regression analysis, Clustering analysis and Gap. Techniques used for fraud detection fall into two primary classes: statistical techniques and artificial intelligence.


Application Fraud is defined as when a person falsifies an application to acquire a credit from a banking system. These application frauds can be classified into multiple categories – Assumed identity, where an individual illegally obtains personal information of another individual and opens accounts in his or her name, using partially legitimate information; Financial fraud, where an individual provides false information about his or her financial status to acquire credit and Account Takeover, where an individual illegally obtains a valid customers’ personal information. The biggest challenge that occurs is in identification of the different categories in the training data while designing the problem for machine learning.

About the Client

Equifax Inc. is a global information solutions company that uses trusted unique data, innovative analytics, technology and industry expertise to power organizations and individuals around the world by transforming knowledge into insights that help make more informed business and personal decisions. Equifax operates primarily in the business-to-business sector, selling consumer credit and insurance reports and related analytics to businesses in a range of industries. Business customers include retailers, insurance firms, healthcare providers, utilities, government agencies, as well as banks, credit unions, personal and specialty finance companies and other financial institutions.

In 2010, Equifax established a presence in India market and was licensed by RBI to operate as a Credit Information Company. Equifax India,  registered as Equifax Credit Information Services Private Limited (ECIS). It is a joint venture between Equifax Inc., USA and seven leading Indian financial institutions.

Business Impact

The bureau data driven fraud score helps in creating a score that has low correlations with credit risk scores and thereby effective in being used in dual score strategies. It can also be used to a very limited extent on new to credit applicants based on address information provided by the applicant, and on enquiry only records. The continuous learning approach helps in keeping the model abreast of emerging fraud hotspots.

But most importantly it allows the exploration of “links” between fraud cases, methods based on application and document data are far less effective in capturing such links.

  • Less than 20% correlation between the current credit risk score in production.

  • Provided accuracy of more than 65% overall with segments like two-wheeler having accuracy of ~78%.

  • Estimated increase in revenue for Equifax of around ~10%.


The Client – Equifax India approached Voxco Intelligence to understand how fraud risk be estimated by leveraging alternate data and machine learning in the context of generic credit bureau solutions. The biggest challenge for the client has been high correlation of the bureau risk score with the application fraud score.

Solution Delivered

Voxco Intelligence developed a machine learning solution to predict fraud risk leveraging the unstructured header information present in the credit bureau file. A self-learning loop was also created to update the model based on monthly data updates. The feature library created was generic enough to be applied to the entire bureau file.

A random extract of the bureau data was used for building the feature library. The extract consisted of the current and past header data (address, phone number, etc.) and the trades and inquiry data. The standard features from the inquiry and trades data were created using the current feature library that is already being used by Equifax India. The features from the trades and inquiry data were used to create interactions with the header data and also possible segmentations. Some additional features to identify fraud (from first payment default and straight roll) were also created to understand geo clusters or linkages that were further used to create features around the geo fraud risk index.

Once, the feature library was created, confirmed application fraud and first payment default with straight roll to write-off were used as the bad definition. A deep learning methodology was used to identify the features that were going into the model. Voxco Intelligence leveraged its proprietary parameter tuning methodologies to fine tune the model in the development and cross validation samples.  Further validation was also performed on the out-of-time validation sample.

The final models were also validated across different segments, like two-wheeler loans, credit cards, gold loans, and other unsecured/semi-secured products. Additionally, Voxco Intelligence also helped to validate the models on two different sets of customers of Equifax.

Voxco Intelligence also provided knowledge sharing and training sessions to Equifax internal team for their capability development as well as to ensure smooth transition and maintenance of the models developed.

Voxco Intelligence played a key role in helping Equifax leverage the power of unstructured data and significantly reduce fraud related risks across our financial services customers.
Nimilita Chatterjee
Senior VP – Data & Analytics , EQUIFAX