Fraud detection in healthcare reimbursement transactions

by | Nov 12, 2020 | 0 comments

Application Track:

Ready Made



Proposed by:


Entity Logo:

Summary of the entity:

Almerys is an expert in collecting, storing and processing sensitive data and provides products and services in digital trust, sovereignty and privacy-by-design, ranging from tier 4+ level data centres, to identities management, electronic signature, dynamic consent management, transactional payments, legal and probative archival of sensitive data, as well as personalized services brokering. The majority of data processed by almerys concerns health, in particular the reimbursement of medical expenses.

Summary of the challenge:

The objective of the challenge is to develop solutions that allow anti-fraud agencies and insurance companies to detect fraud in reimbursement operations committed by healthcare practitioners and patients. The solution should exploit datasets historical data of transactions of healthcare reimbursement requests, both for fraud detection of past operations but also to propose mechanisms to prevent fraud.

Description of the global challenge:

In many European countries, the health system makes it possible to reimburse part of the medical care with public and / or private health insurance systems. The healthcare reimbursement system is based on trust. A patient’s care needs must be real and approved by a healthcare professional. Then, the health costs declared by the doctor must correspond to the medical acts that were actually carried out. Unfortunately, there are abuses and many frauds are noted. Since the beginning of the 2000s and the generalization of the dematerialization of reimbursements for medical care, the anti-fraud services of health insurance can use big data and artificial intelligence to detect fraud in real time. Fraud comes from patients (fake prescription), healthcare professionals (fake care) or the combination of both. In a country like France (70 million inhabitants), for the year 2018, there were 261 million euros of fraud detected.  Each year, the progression of artificial intelligence algorithms makes it possible to detect even more new cases of fraud.

The goal of this complex challenge is to work on a real dataset of medical reimbursement data to detect health insurance fraud. It is therefore composed of two sub-challenges: The first focused on the detection of fraud from historical data, and the second focused on prevention, by using prediction models.

Sub-challenges composing this experiment:

This challenge is composed of 2 sub-challenges:

  • Potential Fraud Behaviour Profile (REACH-2020-READYMADE-ALMERYS_2.1)
  • Predictive model for fraud prevention (REACH-2020-READYMADE-ALMERYS_2.2)

Expected global results:

  • Solution that allows stakeholders from the healthcare, insurance and anti-fraud domains to identify new cases of fraud with machine learning and clustering algorithms.
  • Develop solutions based on models that describe the behaviour of fraud perpetrators, highlighting profiles with “red flags” for insurance companies.
  • Develop solutions based on predictive models to detect a potential fraud perpetrator, that provides a fraud likelihood score for an optician prone to committing fraud.
  • Based on the behaviour of opticians who are part of a fraud cluster, create new anti-fraud rules making it possible to block upstream reimbursement requests.




Summary of the sub-challenge:

The objective of the challenge is to develop solutions based on a model that describes the behaviour of opticians fraud perpetrators, based on both analysis of historical data, blacklisted opticians and domain knowledge.

Description of the challenge:

The model should provide a set of traits of individual subjects, in this case opticians, based on historical data by Almerys. This profile could contain information about unusual age ranges of patients asking for reimbursements, unusual price variations for products with the same reference, and other characteristics.

Clustering algorithms must make it possible to group together opticians according to their behaviour (several axes of analysis) and to look in which cluster are grouped together opticians who are already labelled as fraudsters. Within these clusters, it is necessary to determine whether opticians not known to our anti-fraud unit have suspicious behaviour or movements.

The model should describe a profile with “red flags”. Indeed, experience has shown that practitioners use the same techniques to commit fraud while trying to prevent suspicion. Antifraud agencies and insurance companies can watch for these red flags when doing manual inspection to detect fraud.

In a more general setting, beyond the stated healthcare sector of opticians, the solution should allow insurance companies and anti-fraud agencies to create fraud detection rules from domain expertise. This means using not only statistical analysis from historical data but using techniques to retrieve and formalize domain knowledge such as Descriptive Logic and ontologies.  

Expected outcomes:

Develop solutions based on models that describe the behaviour of fraud perpetrators, indicating a profile with “red flags” for insurance companies. ID list of suspect / fraud optician (based on their behaviour) to be communicated to the staff of our anti-fraud unit.




Summary of the sub-challenge:

The objective of the challenge is to develop solutions that use predictive models for early detection of fraud, with the aim to use them as fraud prevention tools.

Description of the challenge:

Develop a model that assigns a score of fraud likelihood to a reimbursement request, based on a weighted calculation of potential fraud conditions, formalized as rules. Such a model could be embedded on a solution deployed at point-of-sale level that potentially helps to prevent fraud, together with an additional step manual verification. Solution developers must put attention to not specify overprotective fraud rules, which could lead to the “false positive trap”, which would lead to bad consequences of customer satisfaction.

The solution could involve scoring with a combination of rules (as opposed to a large number of individual rules) with a weight attached to each rule, and generates a single coefficient of fraud likelihood that reflects how well a transaction performed against multiple fraud indicators.

The weight attached to each rule will correspond to the severity of the rule. For example, a rule written to capture transactions done by a blacklisted optician location will be assigned a bigger weight than a rule that captures transaction quantities larger than the 99th percentile. The solution could also include a score assignment to each individual healthcare provider, in this case an optician, based on historical data of their reimbursement requests, but the assignment must be explainable with proof of reasoning.

Expected outcomes:

Develop solutions based on predictive model to detect a potential fraud perpetrator, that provides a fraud likelihood score for an optician prone to committing fraud. Provide a list of rules which, combined with the previous optician’s fraud probability score, allows the reimbursement request to be blocked upstream with a false positive rate of less than 5%.

How do we apply?

Read the Guidelines for Applicants

Doubts or questions? Read more about REACH on the About Us page,

have a look at our FAQ section or drop us an email at