Thursday, July 27, 2023 - 06:00
  • Share this article:

With the rise of data as one of the most valuable assets for companies today, businesses are feeling the need to store as much data as they are able to. However, given the ever-increasing amount of generated data thanks to IoT, digitization, wearables and many other new data sources, the amount of available data for companies is usually beyond what they can manage. And, if you cannot manage your data, there is no way you can monetize it.

Studies have shown that up to 95% of organizations suffer from the data decision gap [1], meaning they cannot rely on data to make accurate decisions. Similarly, reports state that between 60% and 73% of data is never used for analytics [2] and that only 32% of companies can gain tangible and measurable value from their data [3]. The impact is unmeasurable, as companies are losing the opportunity to get a better understanding of their customers, make better pricing decisions, or even avoid fraud. 

Poor-quality data alone is estimated to be costing companies around 10-30% of their revenues [4]. This is especially dramatic if we consider that small and medium-sized enterprises (SMEs) represent 99% of the EU's productive fabric and are more affected by the usual barriers than large enterprises or public administrations. The nature of the data decision gap and its consequences differ for each type of organization. 

The barriers to data monetization are numerous, and we show some of them and their consequences in the figure below.  

challenges2 (1)

One usually neglected is the lack of hardware resources due to the high availability of computing resources in the cloud. However, this is still a problem for non-specialized SMEs that cannot afford them or do not have the personnel to leverage them. There is a lack of open solutions regarding data governance and quality. As seen above, this is critical. Lacking tools that facilitate how data can be found or used by non-technical personnel results in very low data usage due to poor findability, accessibility or interoperability. Not having tools that allow organizations to assess the quality of their data leads to bad and unreliable data, which cannot be used to produce reliable models, projections or anything that can be consumed by decision-makers to increase their revenues. Poor data quality can also lead to biases in AI models (e.g., race or gender), or to higher energy consumption, as non-reliable AI models need to re-train more often, repeating an energy-intensive process. 

There is also a problem of skills, both from the technical and the business point of view. Because data has not been a main actor for that long, there is a shortage of specialized IT professionals that can effectively prepare and manage it.

From the business side, there is still a lack of indicators, business models, and understanding of how data can be exploited internally by organizations. Data also has a great potential to be leveraged externally through sharing, exchanging or trading. However, there is a lack of both a data sharing culture and data sharing tools that motivate companies to step in that direction and allow them to do it in a reliable and trustable way, knowing that they can define terms of use that will be enforced.

With these barriers in mind, we proposed DATAMITE. The DATAMITE project understands the precise nature of existing barriers. It develops a simple but impactful technical framework — whose high-level architecture is depicted in the figure below — that enables European enterprises and public administrations to overcome existing challenges and facilitate the monetization of their data. 

arch_stacked_base (1)

The core target consists of helping users to better monetize, govern and enhance the trust of their data by developing a set of key modules: Data Governance, Quality, Security, Sharing & Supporting Tools. Interoperability with current leading storage technologies is achieved by building them on top of existing open source components. The community uptake is driven by providing intuitive graphical interfaces and by detailed documentation and tailored training materials addressing both technical and business aspects. 

DATAMITE aims to be an enabler, unleashing the monetization potential at internal and external levels. At an internal level, users have tools to improve the governance and quality management of their data, the adherence to FAIR (Findable, Accessible, Interoperable, Reusable) principles, and can upskill on technical and business aspects. Therefore, data can become trustable, reducing the data decision gap and ensuring the reliability needed for other paradigms like AI. Similarly, the project will foster an open source community around it to facilitate the dissemination of the technical and business materials produced in the project, let other users produce their material, and assist in the upskilling of European professionals.

At an external level, DATAMITE keeps users in control of their data and provides new sources of revenue and interaction with other stakeholders. To do so, there is a focus on the importance of data sovereignty, integrating tools to let data providers define their conditions — the way they want their data to be used and by whom — and to see the best ways to enforce them. The framework also follows a plugin-based approach to facilitate how organizations can share their data in different ecosystems like the International Data Spaces (IDS), data markets, the European AI-on-Demand platform (AIoD) or the European Open Science Cloud (EOSC). In addition, the architecture envisioned for DATAMITE enables Digital Innovation Hubs (DIHs) sandboxing, becoming a potential instructor on their onboarding of SMEs and low-tech SMEs into the data economy. Together, DATAMITE's solutions are a catalyst to boost data monetization in the European productive fabric.

Three Use Cases Across Six Pilots Will Validate DATAMITE

The framework will be validated in six different pilots in 5 different domains in the following three use cases:

1) Data exchange within large corporations.

Pilot 1 focuses on data accessibility from multiple domains and its exchange within a conglomerate of companies: Grupo Gimeno. ITI will adopt the role of DIH sandbox, where Giditek will try the stack before adopting it. Finally, data sharing will be tested by publishing data to EOSC. 

Pilot 2 focuses on data exchange within a large Telco company, OTE, where data is stored over different data lakes. The main goal is to facilitate how the company exchanges and consumes data. As in Pilot 1, data sharing will be tested by publishing data to EOSC. 

2) Data exchange through (energy) Data Spaces 

In Pilot 3, HEDNO will aim to enhance its internal data management while testing the use of data spaces to improve data exchange with data providers. Among others, they will validate data governance and quality tools.

Pilot 4 has a two-fold purpose. On the one hand, E-Redes will aim to improve the technology stack they are exploiting. On the other hand, they will test the use of data spaces as a potential new environment to offer their open data. 

3) Interaction with other European initiatives 

Pilot 5, led by PSNC, will focus on the governance quality and exploitation of agrifood data from the eDWIN platform. The main goal is to improve the way data is managed, as well as the mechanisms needed to improve its quality. Complementarily, plugins will be validated to publish data into data markets for its trading. 

Pilot 6 will validate the integration of DATAMITE with the AIoD platform. To do so, CINECA, also aiming to improve how it manages the data of its HPC infrastructure internally, will validate the plugins to share datasets into the European AIoD platform. 

In conclusion, DATAMITE will deliver a modular, open source and multi-domain Framework to improve DATA Monetizing, Interoperability, Trading and Exchange in the form of software modules, training and business materials for European companies, empowering them to become new relevant players in the data economy.

This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101092989.   

Project Consortium

Project Consortium DATAMITE consortium, led by Instituto Tecnológico de Informática, comprises a balanced team of complementary organisations, including research & development partners, academic partners, large industrial partners, SMEs, and a standardisation body. Each organisation provides its unique expertise. We are characterized by multidisciplinarity, complementarity of knowledge and objectives, and an excellent track record in research and development projects.   
 

References:

[1] "Data in Context: Closing the Data Decision Gap," by Quantexa (https://bit.ly/35JgGI8).

[2] "Underuse of Analytics could be costing organisations millions," by Marcus Dervin, from CEO magazine (https://bit.ly/3LAcQA4).

[3] "Closing the Data-Value gap," by Accenture (https://accntu.re/3xaWHx0).

[4] "What is data quality? Why is it important?" by Ataccama (https://bit.ly/3NLdKvI)

 

European Union Flag. Yellow Stars Over Blue Background. Eu Symbol. Vector Eps 10 datamite

About the Author

Jordi Arjona Aroca

Jordi Arjona Aroca

Jordi Arjona Aroca is the coordinator of the Distributed Systems group at the Instituto Tecnológico de Informática.