a

Understanding Data Mesh, principles and implementation

Share on facebook
Share on linkedin
Share on twitter
Share on email

Data mesh is a decentralized data architecture where business domains own and manage their own data as products. Organizations adopt it to eliminate bottlenecks caused by centralized data teams, scale analytics across complex organizations, and align data ownership with domain expertise. Built on four principles: domain ownership, data as a product, self-serve infrastructure, and federated governance.

TL;DR

Data Mesh = decentralized data ownership. Each business domain (finance, HR, etc.) owns and manages its own data. Not a tool, it’s a mindset/framework for scaling data in large orgs. Built on 4 principles:

  • Domain ownership
  • Data as a product
  • Self-serve data platform
  • Federated governance

Solves central data team bottlenecks and speeds up decision-making. Requires cultural + architectural shift, not just tech changes. Best for complex, scaling organizations with multiple data domains.

Table of Contents

What is Data Mesh?

At its core, data mesh is not a technology, tool, or plug-and-play solution; it is a strategic framework designed to address the complexities of managing data at scale. The traditional centralized models of data management, characterized by siloed departments and a monolithic data team overseeing all integrations, are being reimagined with Data Mesh.

Data mesh proposes a decentralized approach where ownership and responsibility for data are distributed across various domains or departments within an organization, such as accounting, HR, operations, and finance.

This paradigm shift enables each domain or department to manage its data pipelines, create and maintain its data models, and perform analytics, all while contributing to a cohesive, interconnected data ecosystem. The central data team transitions from being the sole custodian of data logic and integrations to facilitating infrastructure and tools that empower domain teams to independently manage their data assets.

The term data mesh was coined by Zhamak Dehghani in 2019 and is based on four fundamental principles that guide its implementation:

  • Domain-Oriented Decentralized Data Ownership and Architecture: Each business domain becomes the custodian of its data, responsible for its collection, processing, and availability. This principle ensures that data management is closely aligned with the domain’s specific needs and expertise, fostering faster decision-making and increased agility.
  • Data as a Product: Data mesh advocates for treating data not as a byproduct of business processes but as a valuable product with defined customers (i.e., other domains or analytics teams). This approach necessitates a focus on data quality, usability, and accessibility, encouraging domains to provide data products that meet the needs of their consumers.
  • Self-Serve Data Infrastructure as a Platform: To support decentralized data ownership, a self-serve data infrastructure platform is essential. This platform acts as the foundation for domain-owned data products. It empowers domain teams to build, manage, and deploy their data pipelines and APIs independently. The platform provides domain-agnostic functionalities like data processing tools and standardized workflows, enabling efficient data management. Crucially, this infrastructure enforces pre-defined rules and ensures data quality, security, and privacy compliance across the entire mesh. This approach fosters a balance between domain autonomy and centralized governance, allowing for increased agility in data product development while adhering to essential data management principles.
  • Federated Computational Governance: While data mesh promotes domain autonomy, it also recognizes the need for overarching governance to ensure interoperability, consistency, and compliance across the organization. Federated governance models enable the balancing of local domain autonomy with global standards and best practices. Acting as a central body, the governance group sets the guidelines for all data products within the mesh. Standardization is key, ensuring that data from different domains can be seamlessly integrated and analyzed. The governance group fosters knowledge sharing and best practices, promoting a unified approach to data management across the organization. Crucially, this body also ensures adherence to both internal data policies and relevant industry regulations. This centralized oversight fosters a trustworthy and compliant data ecosystem, where diverse data products can flow freely and empower data-driven decision-making throughout the organization.

When Does Data Mesh Make Sense?

Data Mesh is not required for every organization. However, it becomes increasingly valuable as data environments grow in complexity, scale, and organizational fragmentation.

The Problem with Centralized Data Teams

As organizations scale, traditional data warehouses and centralized data teams often become bottlenecks. They struggle to handle the growing volume and variety of data while keeping up with increasing analytical demands from business stakeholders.

This creates a critical gap: decision-makers depend on timely insights, but the data team cannot deliver fast enough. As a result, data-driven decision-making slows down, directly impacting competitiveness.

Why Data Teams Get Overwhelmed

  • Constantly fixing broken data pipelines due to upstream changes
  • Limited time to explore and understand domain-specific data
  • Need to build deep domain expertise for every request
  • Balancing operational maintenance with analytical work

Even highly capable data teams end up spending more time maintaining infrastructure than delivering insights.

The Disconnect with Domain Teams

At the same time, many organizations operate with domain-driven structures. Product and business teams are already organized around specific domains and often run on decentralized microservices architectures.

These teams:

  • Deeply understand their domain’s data and operational needs
  • Own and manage their applications and APIs
  • Move quickly and independently in product development

However, despite their expertise, they remain dependent on centralized data teams for analytics and insights. This dependency creates friction and slows down decision-making.

The Breaking Point at Scale

As the organization grows, both sides feel the pressure:

  • Data teams become overloaded and reactive
  • Domain teams become blocked waiting for insights
  • Business decisions slow down

The Shift: From Centralized to Domain-Owned Data

This is where Data Mesh comes in.

Instead of relying on a central data team, Data Mesh shifts responsibility for data ownership, management, and analysis to domain teams.

What Data Mesh Enables

  • Domain teams own and serve their data as products
  • Faster access to insights without central bottlenecks
  • Scalable analytics aligned with organizational structure
  • Cross-domain data access similar to APIs in microservices

By decentralizing data ownership, organizations reduce dependency on a single team and enable faster, more autonomous decision-making across the business.

Its Core Components

1. Data Products: Data products are essential components within a data mesh architecture. They serve as logical units designed to process and store domain-specific data for analytical purposes. These products connect to various data sources, perform necessary transformations, and serve datasets through designated output ports. Examples of output ports include datasets in BigQuery and messages in Kafka topics. Each data product is owned and operated by a domain team responsible for its entire lifecycle, including monitoring data quality, ensuring availability, and managing costs.

2. Data Contracts: In the context of data mesh architecture, data contracts play a crucial role in facilitating data exchange between providers and consumers. A data contract specifies the structure, format, quality, and terms of use for exchanging data. It includes essential details such as the data product provider, usage terms, schema, quality attributes, service-level objectives, and billing information. By defining these parameters, data contracts ensure a common understanding of data semantics, quality expectations, and compliance requirements among all stakeholders involved in data exchange.

3. Federated Governance: Federated governance serves as the governing body responsible for establishing and enforcing global policies within a data mesh environment. These policies define rules and standards governing various aspects of data mesh operations, including data product development, interoperability, documentation, access control, privacy, and compliance. By establishing consistent policies across all domain teams participating in the data mesh, federated governance ensures alignment, coherence, and compliance with organizational objectives and regulatory requirements.

4. Transformations: Transformations represent the process through which data undergoes various stages of preprocessing, integration, and aggregation within a data mesh architecture. Raw operational data is cleaned, structured, and transformed into meaningful events and entities. External data from other teams is integrated, and aggregations are performed to derive actionable insights. These transformations are essential for maintaining data consistency, quality, and relevance throughout the data lifecycle, ultimately enabling informed decision-making and analytical insights.

5. Ingesting: Ingesting operational data into the data platform is a critical aspect of data mesh architecture. Domain teams employ various ingestion methods, including streaming ingestion, change data capture, or batch processing, depending on their specific requirements and use cases. Domain events and entity states are ingested to capture relevant business facts and maintain data integrity. Ingestion processes ensure real-time availability of data for analytics, reporting, and decision-making purposes, thereby driving operational efficiency and agility.

6. Clean Data: Clean data forms the foundation for effective data analytics and decision-making within a data mesh environment. Domain teams are responsible for cleaning and preprocessing ingested data to ensure accuracy, consistency, and reliability. Preprocessing steps include structuring unstructured data, mitigating structural changes, deduplicating entries, ensuring completeness, and fixing outliers. By ensuring data cleanliness, domain teams enhance the quality and reliability of analytical insights derived from the data, thereby enabling data-driven decision-making and organizational effectiveness.

7. Analytics: Analytics play a central role in extracting insights and value from data within a data mesh architecture. Domain teams leverage various analytical techniques, including SQL queries, data visualization tools, and machine learning methods, to gain actionable insights from analytical data. SQL queries facilitate data exploration, join operations, and aggregations, while data visualization tools enable users to visualize trends, anomalies, and key performance indicators. Machine learning methods support advanced analytics, correlation analyses, and prediction models, enabling organizations to derive valuable insights and drive informed decision-making.

8. Data Platform: The data platform serves as the backbone of a data mesh architecture, providing essential capabilities for data ingestion, storage, querying, visualization, and machine learning. Analytical capabilities enable domain teams to build analytical data models and perform analytics for data-driven decision-making. Data product capabilities empower domain teams to create, monitor, discover, and access data products in a self-service manner. The data platform also supports policy automation, cross-domain data access, and compliance management, ensuring efficient and effective data management and governance within the organization.

9. Enabling Team: The enabling team plays a crucial role in promoting and facilitating data mesh adoption within the organization. Comprising specialists with extensive knowledge in data analytics, engineering, and platform usage, the enabling team provides guidance, support, and learning materials to domain teams on their journey to become full members of the data mesh. They act as advocates for data mesh adoption, helping domain teams understand the principles, practices, and benefits of data mesh architecture. By fostering collaboration, upskilling, and knowledge sharing among domain teams, the enabling team accelerates the adoption and implementation of data mesh within the organization.

Implementing Data Mesh

Adopting a data mesh architecture requires a significant paradigm shift within an organization. This shift involves moving from centralized to decentralized data management, prioritizing domain expertise in data handling, and embracing a product-oriented view of data.

Here are the key steps and considerations for implementing a Data Mesh:

  1. Adopt Domain-driven Design
  • Identify Domains: Break down the organization into distinct business domains based on the products or services it offers. Each domain should have a clear boundary and encompass a specific business capability.
  • Domain Teams: Form autonomous teams around these domains, with the responsibility for their data, from production to consumption.
  1. Decentralize Data Ownership
  • Data as a Product: Treat data managed by each domain as a standalone product, with a focus on quality, usability, and user needs.
  • Domain Data Teams: Ensure each domain has a data team or data product owner responsible for the lifecycle of the data products, including their creation, maintenance, and retirement.
  1. Establish a Self-serve Data Infrastructure
  • Platform Approach: Develop or adopt a self-serve data platform that enables domain teams to easily access, publish, and manage data products without deep technical expertise in data infrastructure.
  • Tools and Technologies: Provide tools for data ingestion, processing, storage, and access that support a wide range of data product types and needs.
  1. Implement Federated Computational Governance
  • Governance Framework: Create a federated governance model that balances autonomy with coherence. This involves setting cross-domain standards and policies for data security, privacy, quality, and interoperability.
  • Compliance and Standardization: Ensure that while domains operate independently, they adhere to organizational standards and legal regulations.
  1. Embrace Data Product Thinking
  • User-centric Design: Design data products with the end-user in mind, focusing on ease of discovery, access, and integration.
  • Quality and Documentation: Prioritize data quality and provide comprehensive documentation to ensure data products are trustworthy and easily understandable.
  • Cultivate a Culture of Collaboration
  • Cross-domain Collaboration: Encourage and facilitate collaboration between domain teams to share best practices, learnings, and data products.
  • Continuous Learning: Foster a culture of continuous improvement and learning, where feedback from data consumers drives the evolution of data products.
  • Technological Foundations
  • Interoperability: Adopt technologies and standards that promote interoperability among data products across domains.
  • Scalable Architecture: Ensure the data architecture is scalable and flexible to accommodate future growth and changes in business requirements.
  1. Continuous Monitoring and Feedback Loops
  • Monitoring: Implement monitoring tools to track the usage, performance, and quality of data products.
  • Feedback Loops: Establish mechanisms for collecting feedback from data consumers to continuously improve data products.
  • Training and Education
  • Upskilling: Offer training and resources to domain teams to build their capabilities in data management, governance, and product development.
  • Best Practices: Share best practices and lessons learned across the organization to elevate the overall data literacy and maturity.

Implementing a Data Mesh is a significant undertaking that requires commitment from all levels of the organization. It’s not just a technical implementation but a new way of thinking about and working with data.

Data Mesh vs Traditional Model

FactorData MeshData WarehouseData LakeData Fabric
OwnershipDistributed across domainsCentral data teamCentral data teamFederated with central metadata
ArchitectureDecentralized, domain-drivenCentralized, monolithicCentralized, schema-on-readUnified metadata layer across sources
ScalabilityScales with domain autonomyLimited by warehouse capacityScales storage, not analyticsScales via automation
Data qualityDomain-level SLAsCentral team enforces qualityOften inconsistentAutomated governance policies
Best forLarge orgs with many domainsStructured reporting at scaleRaw storage, ML pipelinesHybrid/multi-cloud environments
Knowi fitQuery across domain data products without ETLNative SQL connectorsNative S3, Redshift supportCross-source joins without data movement

Transitioning from traditional pre-data mesh governance to a data mesh governance model marks a significant shift in how organizations approach data management. In the pre-data mesh era, a centralized team was tasked with overseeing data quality, security, and compliance with regulations, maintaining centralized custodianship of data, and striving for a well-defined, static structure of data governed through manual processes aimed at preventing errors. This team also worked independently from domains, using centralized technology and measuring success based on the volume of governed data.

Contrastingly, the data mesh governance model introduces a federated team composed of domain representatives, responsible for defining the criteria for data quality, security aspects, and regulatory requirements, which are then built into and monitored by a self-serve platform. 

This model champions federated custodianship of data, with a focus on modeling polysemes (data elements that span multiple domains),thus enabling a dynamic, continuously evolving topology of the mesh. Success in this model is measured by the network effect, illustrating the connections and consumption of data across the mesh, with an emphasis on detecting errors and enabling recovery through the platform’s automated processes. This shift not only democratizes data management but also promotes a more agile, responsive approach to data governance that aligns with the fluid nature of modern data landscapes.

Conclusion

As the data landscape continues to evolve, data mesh offers a forward-thinking framework that aligns with the needs of modern, data-driven organizations. By decentralizing data ownership, treating data as a product, and fostering a culture of collaboration and innovation, companies can unlock new levels of efficiency, agility, and strategic insight. While the journey to a fully realized data mesh architecture may be complex and challenging, the potential rewards for those who navigate it successfully are substantial. As we continue to witness the adoption and adaptation.

Frequently Asked Questions

What is a data mesh?

A data mesh is a decentralized data architecture where individual business domains own, manage, and serve their data as products. It shifts responsibility from a central data team to domain teams, enabling faster, more scalable analytics.

What are the core principles of data mesh?

Data mesh is built on four principles: domain-oriented ownership, data as a product, self-serve data infrastructure, and federated governance to ensure consistency across domains.

Why do companies adopt data mesh?

Companies adopt data mesh to eliminate bottlenecks from centralized data teams, improve scalability, and enable domain experts to directly manage and analyze their own data.

What is the difference between data mesh and traditional data architecture?

Traditional architectures rely on centralized data teams and warehouses, while data mesh distributes ownership across domains, making data management more scalable and aligned with business needs.

Is data mesh a tool or a technology?

No. Data mesh is not a tool – it’s a strategic and organizational approach to managing data at scale, supported by the right platforms and infrastructure.

Sherry Quach

Sherry Quach

Sherry is a Data Analyst at Knowi having previously worked at the California Emerging Infections Program analyzing public health infectious disease data. Sherry is skilled in data visualizations, SQL, data analysis, and business intelligence. Sherry holds a BS, Molecular and Cellular Biology from University of California, Berkeley and has contributed to research papers including Characteristics and Maternal and Birth Outcomes of Hospitalized Pregnant Women with Laboratory-Confirmed COVID-19 — COVID-NET, 13 States and COVID-19–Associated Hospitalizations Among Health Care Personnel — COVID-NET, 13 States.

Want to See Knowi in Action?

Connect your databases, run cross-source joins, and ask questions in plain English. No warehouse required.

See Knowi in action
Connect your databases, query across sources, and run AI on-premises. No warehouse required.
Book a Demo