Speech/no speech activity detection
Summary of the entity:
The Centre for Research and Technology Hellas (CERTH) is one of the largest research centres in Greece; top 1 in north Greece. Its mission is to promote the triplet Research – Development – Innovation by conducting high quality research and developing innovative products and services while building strong partnerships with industry and strategic collaborations with academia and other research and technology organisations in Greece and abroad.
More than 800 people work at CERTH with the majority being scientists. CERTH has received numerous awards and distinctions, while it is listed among the Top-20 of the EU’s Research Centres with the highest participation in H2020 competitive research grants.
It is active in a large number of application sectors (energy, buildings and construction, health, manufacturing, robotics, (cyber)security, transport, smart cities, space, agri-food, marine and blue growth, water, etc.) and technology areas such as data and visual analytics, data mining, machine and deep learning, virtual and augmented reality, image processing, computer and cognitive vision, human computer interaction, IoT and communication technologies, navigation technologies, cloud and computing technologies, distributed ledger technologies (blockchain), (semantic) interoperability, system integration, mobile and web applications, hardware design and development, smart grid technologies and solutions and social media analysis.
Summary of the challenge:
The goal of this challenge is to recognize speech and no speech activity inside building environment.
Stakeholder: Company internal stakeholders (employees) and in particular software engineers – DATA ANALYSIS
A few challenges regarding the problem of audio-based event detection are related to the distance of the acoustic sensor from the target appliance related to an activity, the selection of features and classifiers for an indoor environment and the parameters of a Deep Neural Network and their impact on recognition accuracy. Regarding speech/non-speech activity detection, the fundamental problem is the design of a general system that would perform equally well in a very noisy and a quiet environment.
In recent years, there has been an increasing interest in Deep Learning architectures. The main reason is that these architectures can achieve high classification results from the raw data, without the need for human engineered ones.
However, when dealing with sensitive data, such as audio, one must consider the deployment of the computationally expensive algorithms to single-board computers. This would require optimization, keeping a fair trade-off between the computational cost and classification accuracy, as well as, storing the data on-device. In the case of a cloud-based system, one must assure that the Blockchain-as-a-Service is used.
REACH Data Providers:
- Audio data (wav files)
To create a data value chain that allows:
- To develop an accurate algorithm for recognizing the human activities;
- To recognize more than 95% of the speech/ no speech activities inside the building.