Meet Algomo, the startup working on a machine learning model capable of handling the summarization and the necessary infrastructure for inference
Hi Algomo! Can you present your team and company to us?
Hi! We are Algomo and we are a London-based startup that helps companies offer their customer service in 100+ languages.
By leveraging advancements in multilingual AI, we have developed a unique technology and platform where a single AI is fluent in 109 languages. All the other commercial players (including Google, Microsoft, Amazon, IBM etc) combined support just 45. With the additional 64 languages, we are the first company to offer automated customer support to 1 billion people globally, and our product is being designed to be affordable and easy to use even by SMEs
Algomo was founded in February 2020 and we are currently a core team of 6 people (Charis Sfyrakis, Gustavo Cilleruelo, Filip Krawczyk, Daniel Duma, Waleed Khoury).
How did you learn about REACH Incubator and what made you apply?
We receive a monthly newsletter listing all European open calls, and we try to identify those that would help us bring our product to the next level, and REACH had multiple challenges that could be relevant. Out of all available challenges at REACH, it was obvious that the auto-summarisation task was a perfect candidate. Firstly because it involves NLP, which is the core part of our business and we are experts in and we felt comfortable delivering it, and secondly but more importantly, the know-how we’d get would enable us to release a very important feature, which is the automatic generation of training data for our chatbot.
You are tackling VRT’s challenge Dutch Abstractive Auto Summarization. How do you think experimenting with and solving their data value chain challenge will set the foundations for scaling your solution?
As a conversational AI company, we need a lot of training data to help our bots easily scale across multiple languages. In absence of enough data and content specialists, especially for low-resource languages, we need an easy way to generate such data, so that a human would just need to approve or at worst correct some automatically generated training data. The methodology and technologies we currently use for VRT to create summaries in Dutch are the same as the ones we are using to generate paraphrased data which serve as training data for our bots. Moreover, the summarisation product we will build for VRT will also be used internally, and more usage of our product will further increase the performance of the summarization algorithm we will build.
Lastly, VRT provides the necessary testbed as well as a launchpad that will help us scale the solution to other news agencies globally.
Please present your solution and elaborate on how it differentiates from the competition.
Our solution consists of a trained machine learning model capable of handling the summarization and the necessary infrastructure for inference.
For this task, VRT has experience with sequence-to-sequence models, which we will also finetune as a baseline. We will also experiment with autoregressive models, which have proved to achieve state-of-the-art results.
Besides modeling, we will also build a robust training pipeline which will allow us to experiment and train with different models in parallel. We are putting considerable efforts to also detect hallucinations in the generated summaries (i.e when the summaries are grammatically and syntactically correct but not factually)
Another major difference between our approach and the other commercial solutions is that our summarization will be multilingual, which means that for a given large text, we could generate summaries across multiple languages, in a single pass.
Do you foresee any obstacles in successfully developing and commercializing your solution?
Training the ML models required for language generation is never an easy task, as they are big both in terms of parameters as well as the amount of data required for a successful finetuning. This will require planning and engineering care.
For inference, what models end up being used in production and how they are deployed depends vastly on VRT’s needs. For us, a scalable and automated finetuning pipeline for language models will become the backbone of our product, allowing us to quickly adapt our chatbot engine to new domains and clients.