Data science/data manipulation in order to gain insights from the market

by | Nov 6, 2022

Application Track:

Ready Made



Proposed by:


Entity Logo:

Summary of the entity:

Almerys is an expert in collecting, storing and processing sensitive data and provides products and services in digital trust, sovereignty and privacy-by-design, ranging from tier 4+ level data centres, to identities management, electronic signature, dynamic consent management, transactional payments, legal and probative archival of sensitive data, as well as personalized services brokering. The majority of data processed by almerys concerns health, in particular the reimbursement of medical expenses.

Summary of the challenge:

The objective is to derive market trends and insights from historical data related to healthcare reimbursement transactions. Hence, smart data grounded feedback can be supplied to insurance companies to aid them to better orientate their policies and offers to deliver higher value to healthcare practitioners and patients.


In many European countries, the health system makes it possible to reimburse part of the medical care with public and / or private health insurance systems. The healthcare reimbursement system is based on trust. Patients’ care needs must be real and approved by healthcare professionals. Insurance companies may then proceed to reimburse patients. 

Insurance companies need to understand the market and its evolution, in order to adapt their insurance packages, thus, providing the best value for patients and still ensuring a profitable business. It is necessary to analyse the market from a statistical point of view, to gain insights from the datasets of historical data transaction of reimbursement requests from healthcare practitioners. As an example of this type of datasets, the supplied dataset corresponds to reimbursement requests from opticians. 

The challenge entails the application of a set of different statistical analysis techniques in order to gain insights from the market, such as:

  • Factor Analysis: shed light on what combination of aspects, characteristics or priorities are most important to a certain type of customers (group).
  • Cluster Analysis: identify its various customer segments, and then conduct cluster analysis to see if any such segments share similar characteristics (e.g. objectives, pain points, perceptions, demographics, preferences, etc.) that are distinctly different from other segments.
  • Multiple Regression: Predict the value of a variable based on changes to two or more variables.
  • Discriminant Analysis: Predicting membership in a group (or population or cluster) based on measured characteristics of other variables.

Subsequently, it is requested to be based on the previous results (market analysis, etc.) to:

  • Extract outliers from the dataset (according to several axes of analysis).
  • Carry out an in-depth analysis of these data to identify fraud movements.


Dataset: Historical data of reimbursement requests from opticians to insurance companies (824,198 records)

It is important to note that all sensitive data (name, address, etc.) has already been anonymized via the tool provided by Gnubila (Anonymizer).

To provide the anonymity of patients their Social Security Numbers (SSN) have been substituted by an generate id. Regarding this, it’s necessary to provide traceability features to log the usage of the patient data. This can be achieved using the ProRegister Tool provided in the REACH ToolBox.

Expected outcomes:

To map market trends from raw data of historical opticians’ reimbursement request transactions (price of glasses, frames, type of vision correction, patient ages, geolocation, etc.).

How do we apply?

Read the Guidelines for Applicants

Doubts or questions? Read more about REACH on the About Us page,

have a look at our FAQ section or drop us an email at