INSIGHTS FROM THE DATA EXPERTS’ PERSPECTIVE
Context
The most prominent Data Experts from companies and institutions within the REACH Ecosystem were asked concrete questions about Data Value Chain. A total of 10 Experts have been interviewed in order to better understand the challenges and opportunities within the European Data Sharing Economy. Below are our findings.
Insights and Trends
Political Agenda and Regulations
Data driven innovations are a very hot topic on the Political agenda and this explains why the EC is pushing towards fostering data spaces and its sustainable economic development. Bunch of Regulations that are coming from the EC (Digital services act, Data Act, AI regulations and etc) that are changing the landscape and how we deal with data governance and AI. These efforts are bringing the stakeholders and the industry towards specific directions. Both the industry and politicians are very curious of these efforts and its yet to be seen what effect it will have on the industry as they will have number of effects / some of which unintended. Hopefully they will create opportunities, for new types of data intermediaries, and the legal basis for tackling data monopolies with new energy.
Standards
There is a certain consensus on compliance on the need of standards and to be compliant of some common rules and specs. In 1-2 year’s time, most data spaces will be interoperable. Whoever uses the data will be able to use in “a normal way”.
Custom Trends
The market is seeing more and more customers demanding data driven analytics and real time data a services. Exploration of opportunities with customers is much more hands on and dynamic.
Data value chains are seen as ways for traditional industries to enhance their business models. For example, Big Hotel are creating departments run by CDO (chief Data officer) are establishing data value chains with other types of stakeholders to create multi stakeholder exploitation models based on their data flow.
Stakeholder interactions and data spaces
New associations and initiatives are being orchestrated to enable the exchanges on the data pan European and global level. Data Spaces Business Alliance for example, where main players try to push it to happen. Conversation happening and realization of data spaces is inherently an opportunity for data spaces to build trust. The data spaces are supposed to speed up the processes between data providers and “data exploiters”
Big challenge is to demonstrate how these data spaces will generate benefit, increase productivity and competitiveness of Europe. In the beginning they were related to the traditional sectors, and step by step, new data spaces are emerging (like tourism), with efforts to showcase their benefits to the stakeholders. We are starting to see the possibility how these data spaces work together, merging infrastructure and sharing data, and it’s time to demonstrate how these data spaces will operate with each other and cross-sectorally.
Holistically, we are also seeing emergence of greater trust on the exchange of the data, from the technical standpoint, both the suppliers and the adopters’ side as maturity levels rise.
Big companies such as Google, Amazon, FB are open sourcing their key software they have been using to develop M.L. and deep learning. LinkedIn open sourced what they have been using as a feature store. McKinsey open sourced a framework to produce reproducible, maintainable, and modular data science code – good software practices. Big Companies are providing these and the community is leveraging and building on it.
Tech and Data
We are seeing drastic development on the tech and data side. The focus is now given to value- based engineering, where we are being much more intentional on how we apply tech internally in business and public sphere. Conversation are constantly being held around AI and ML, and now we have tools to study ethics behind their implementation, and how we can design these systems to match our (human) values.
New ways of leveraging non labelled data are emerging. IN the ML space – sub supervised learning (SLL) methodologies to create models (compression of a non-labelled data) have appeared along side with sufficient computation power allowing businesses to take advantage of certain forms of data that by itself its not useful. In ML space you need labelled data, so with this SSL, one can create a model, and use it in a transferred learning setting. For example, data sets of images exist online which to label you need powerful software, however, now you use the images to train the model to recognize the image – and therefore leverage this externly powerful model to apply it to your own specific labelled data to make it even more capable and powerful, and make it even more predictive. Also, with this in mind, we have found a way to create powerful models without having to label huge amounts of data.
There are quite a few companies (open AI & Hugging Face) that are providing AI and ML as a service. As not every company can afford these training models, however, what we are now seeing is that many companies are offering these models to other companies as out of the box solution.
Reinforcement Learning changes the approach of training AI entities (e.i. robotics and gaming) allows predictive models and AI systems to reach results without precedents, much more efficient and than previous ML models reaching better results. We may see novel technologies acting themselves, which require novel standardization, ways of thinking and regulations which in turn will establish next-gen paradigm shift.
Furthermore, we are seeing that Tech became cheaper and more accessible. Platforms, open- source components, data management tools, libraries, programming languages are exponentially emerging. Aling side with that, companies as well as public officials are more consciousness related to the data management and use when it comes to creation of data driven solutions.
With Federated AI proposing that the AI moves closer to the data supplier, clear advantages arise as security, safety, new features, and intelligence will be performed locally.
Edge-to-Cloud will enable processing of the data in order to become more efficient in the data management process, and in collaboration with other tech (5G, 6G) it will enable Real time analysis, immersive technologies. With the pace of the tech development, we will see VR becoming a real reality.
We can strongly conclude that data science is very trendy and will develop and expand drastically.
Signals of disruption
Data value chain experts were asked if they had noticed any signals of disruption and what those were. In a nutshell, there are several opportunities under the topic of Big Data, especially related with traceability, trustworthiness and monetisation, and from the combination of multiple domains.
It is mentioned that disruptions could start even from the first part of a data value chain, which is data creation. Data is produced, but still is not defined if we exploit all possible data sources or if we create and collect really valuable data. Furthermore, other disruption may caused by security, privacy and governance issues related to data.
Speaking of data creation and data sharing, there are two cases: companies that don’t have data because they cannot produce it but needs data and it’s difficult to access to third party data. And other hand companies that have the data but not create correct data spaces with governance and not exploit the data value. For example, the change in this can happen if there are clients who are not afraid to upload data to the, e.g., AWS cloud, and move from company owned infrastructure to cloud solution that support the whole chain.
Moreover, it is important to recognise data sharing as a new dimension in the value chain, and that definitively adds a lot of implications to the process, e.g., the data holder/data owner does not have to be the one processing or getting value out of the data. Furthermore, some big tech companies have a lot of usage datasets because they have big audiences to their (free) digital services. With this data sets they can train their Artificial Intelligence services, machine learning information systems.
Nonetheless, some experts notice that the main problems within the data value chain space are the uncontrolled growth of stored data (data lake) and the concerns about GDPR and Data Privacy. From a technological point of view, the never-ending growth of data has pushed the performance boundaries of legacy data storage and management technologies. This has driven the emergence of new technological solutions that sometimes go against well-established principles (e.g., NoSQL or data denormalisation).
From an industry point of view, the disruption could come from the emergence of a plethora of data-driven companies and business models. Some data experts see a huge chance that nobody is leveraging, which is company including data analysis in their own business as a mandatory and regular activity, to support decisions and efforts. However, businesses working in the data space find the governance and creation of processes to check the quality of data as the main issue for further disruptions.
One of the good examples, according to surveyed experts, is the increase of communities fostering the data economy at different domains, and the recent alliance between the main European initiatives for the promotion of data spaces (BDVA, IDSA, FIWARE, GAIA-X) may be the final push for the data economy to take off in Europe.
Barriers
Data
New data sources bring new challenges that need to be resolved with new technologies. For example, IoT data is brought at crazy speed bringing novel challenges, and potentially resolved with these technologies even faster. On the other hand, occasionally, some companies are experiencing unstructured, noisy, unlabelled, uncorrelated amounts of data that comes. To further enable the efficient sharing of the data, proper documentation and ensuring consistent flow and data availability is established as a barrier as well.
Determining the value of the data is a trade-off – both a challenge and an opportunity.
Mindset
AI & ML changed the landscape drastically as it has emphasised the importance of data. On one hand, businesses need to enhance their digital transformation efforts to start collecting the data systematically and use it for Machine learning and improve how they work/products and give value to the customers.
Companies are realizing that the innovation is fostered around the data, however it’s not the common practice. Especially when it comes to the traditional companies.
Determining the value of the data is a trade-off – both a challenge and an opportunity.
Regulations and the Ecosystem
Amount of regulations tends to ensure lack of understanding it and difficult to follow. Data Act will put constraints on how to use data which effects companies and how they use it.
Data monopolies exist because big internet companies rely on data collection. The moment when those business models become less important, and less on viral content and clicks, we will see more sharing. Furthermore, there is a barrier from the perspective of unaligned multi stakeholder ecosystem (multi stakeholder exploitation agreements / data standardisation).
Opportunities for growth
99% of the companies need to digitize to be competitive. High rise in the workload leads to huge increase in data driven projects leading to job opportunities, cross sector involvement and collaboration and data access. We have more chance to access data sources then ever before which is exciting!
If we concentration on the non tech part as much as the tech part, we could make a huge change. We are seeing that the data providers (industry, banks, telecoms – managing data for others) are holding all the data, all the while seeing data value chain ecosystems and SMEs within the traditional ecosystems also partaking in the progress and generating the data, but they do not know how to deal with this data – they lack profiles and expertise – hence needing the external professional to do it. Room for growth here!
The Big data ecosystem is huge and its difficult to bet on some parts of it. Generative neural networks in deep learning provide us opportunities (new audio, video, text, – directly made up automatically predictive model of AI). We are letting machines invent, because of this, we can have many positive aspects, while on the other hand we have negative aspects such as fake news, videos made to manipulate the viewer. Due this technological and cultural change, a spill over can be seen in the politics, culture and music that may lead towards new informal/formal world order.
Stream analytics and creating adaptive Machine Learning models – enabling to maintain predictive power adhering to the changes in the data. The need will be even greater and hence the greater discussion and education needs to happen around this.
5G and future 6G will allow huge number of sleeping devices to be fully connected, generating massive amount of data uploaded to the cloud. Nevertheless, this massive data needs to be managed and turned from sleeping data into insightful, Realtime value adding analytics.
A good list of real-world use cases that are easily reproducible and tutorials on how to complete each phase of the chain, it would help enormously. if each stage can be communicated effectively!
Standardization, as sometimes it’s understood from the technological perspective (blockchain and DLT) needs to include non-technological people, environments and processes, in order to establish a universal model that is clearly defined and governed.
Finally, we are seeing many people being motivated by impact and the effect of the data and AI on our struggle to deal with climate change. Solving this biggest problem our society is facing is an opportunity for growth as well.