Data Cloud Architecture: A Comprehensive Overview and Its Benefits
Written on
The concept of Data Architecture is tailored to solve various challenges dictated by business goals, market strategies, and the evolution of IT maturity. For decades, aligning IT with business needs has posed significant challenges, particularly as organizations expand globally and diversify their product offerings across multiple platforms. This growth leads to an increase in data volume, putting pressure on IT systems to meet rising demands. As organizations adapted, data silos emerged, prompting leaders to seek a "single source of truth" to enhance operational efficiency.
Throughout the years, various data architecture models have surfaced to tackle these issues, facilitating the collection of data from operational systems for analysis by different stakeholders, including lines of business (LOBs), partners, suppliers, and even customers. Approaches such as Data Warehousing, Data Lakes, Data Meshes, and Data Fabrics aim to democratize data access across enterprises. Each model has its strengths and weaknesses; however, they often struggle to extend collaboration beyond the confines of the organization, limiting potential innovations with partners and clients.
The Data Cloud Architecture (DCA) is a top-down, business-oriented framework designed to enhance collaboration between internal and external entities while sharing resources. DCA emphasizes four fundamental principles, aiming to build on existing architectural patterns rather than disrupt them. Business leaders must acknowledge the significant investments made in IT over the years, while IT must recognize the rapid evolution and immense opportunities presented by cloud computing.
The DCA encompasses the following key elements:
- Manage Code and Data as Assets: AI/ML models, large language models, and various forms of code and data can be relocated or referenced, allowing work to be brought to the data or vice versa.
- Business Entities Collaborate: Internal lines of business, partners, suppliers, and customers can own and collaboratively manage assets in nearly real-time.
- Governing Constellations: Trust relationships form the foundation for access control and asset discoverability.
- Agnostic and Interoperable: The framework is cloud-agnostic and does not depend on any specific underlying data architecture.
As the amount of data and the systems managing it continue to grow, various data architecture models have emerged. The diagram above contextualizes these architectures with two main considerations: the types of problems they address (technical or business) and how data management is organized (centralized or distributed across multiple locations). Organizations often combine different architectures to fulfill various functions. The DCA supports implementations specific to business entities while fostering collaboration within a constellation.
Manage Code and Data as Assets
In the Data Cloud Architecture, an asset is defined as any digital representation—whether logical or physical—that holds sufficient value for a business entity to warrant management and collaboration. Teams frequently find themselves bogged down by low-priority tasks, hindering their ability to concentrate on core business operations. By adopting the DCA framework, the focus shifts to developing and managing high-value assets, which hold significance both within and outside the organization, while alleviating the burden of managing lower-value tasks such as infrastructure and storage. The DCA enables organizations to optimize their data asset management, driving new revenue opportunities and reducing costs through efficient task management. Ownership of assets remains with the business entity, even when they are shared for updates, maintenance, or governance. Assets are categorized into logical and physical types.
Physical Assets: These assets are digital representations, often existing as files within a file system or object store. They can consist of one or multiple basic representations (files). For instance, metadata, configuration files, and associated code assets could collectively be regarded as a composite asset, though they are simply referred to as a DCA Asset for clarity. Physical assets are created and maintained by the owning business entity within its infrastructure and can be shared with consuming entities either through copying or direct access.
Logical Assets: These assets consist of links to physical assets maintained by the owning business entity. A DCA Asset can incorporate both logical and physical elements and is treated as a single asset for simplicity. Even if the physical version of an asset resides outside the consuming business entity's infrastructure, it should be accessible and operable as if it were local.
In the illustration above, Business Entity 2 is the owner, sharing a logical asset with Business Entity 1 (indicated by a dotted line). Similarly, Business Entity 5 operates as a marketplace, physically sharing an asset with Business Entity 3.
Business Entities Collaborate
Before delving into how the Data Cloud Architecture fosters collaboration, it's essential to define "Business Entity." The term varies among organizations, reflecting their unique structures. Within the DCA framework, a Business Entity should be defined at a level that allows for ownership and collaboration on specific data assets while remaining flexible to various collaborative scenarios. Business Entities create and manage their assets while also consuming those shared by others. Consequently, both owning and consuming entities must possess control over the infrastructure (storage and compute) they manage, facilitating the execution and transfer of assets based on the DCA's implementation.
Organizations make technological choices based on perceived requirements for success. While this promotes operational efficiency, it can complicate collaboration across different Business Entities regarding data assets. Whether defined as distinct lines of business within a company, independent subsidiaries, partner ecosystems, or revenue-generating customers, the ability to collaborate on assets is crucial for maximizing overall efficiency. The DCA provides a flexible collaboration framework for all types of Business Entities, enabling diverse interactions—not just unilateral data sharing. While initial collaborations may occur internally, the DCA lays the groundwork for engaging new partners and customers, enhancing decision-making and expediting the launch of revenue-generating products. By implementing the DCA, organizations can integrate new assets across their operations and optimize their value through improved collaboration.
Governing Constellations
Business Entities can be grouped to form a Constellation, which is a collection of entities that agree on governance, operating standards, and share the goal of collaborating on assets. Entities can be part of multiple Constellations, ensuring that established trust agreements are upheld. A minimum of two Business Entities is required to constitute a Constellation.
The example above illustrates two Constellations. Business Entity 2, which produces assets, is part of both Constellation 1 and Constellation 2, sharing assets physically with Business Entity 1 and logically with Business Entity 4. Since Business Entities 1 and 4 do not belong to the same Constellation, they cannot share assets without establishing their own trust relationship.
Trust Relationship
Collaboration hinges on trust. Many organizations struggle to grant access to their own systems and data. Regulatory concerns often exacerbate this issue, as restrictions are rooted in processes rather than technology. The DCA does not assume the level of trust between Business Entities but suggests frameworks for establishing restrictions and employing common security protocols. A Trust Relationship is a formal agreement that defines how assets can be accessed and shared within a single Constellation or across multiple Constellations. This scope may include open standard transfer protocols, data security policies, and overall asset discoverability.
Business Entities: As owners of assets, Business Entities hold the responsibility for granting and managing access to those resources unless they explicitly transfer this responsibility to another entity. Access to DCA Assets can be broadly available, allowing any designated asset to be instantly accessible to other entities within the Constellation. However, it is generally advisable to implement minimal restrictions to monitor usage and assess the business value derived from these assets.
Assets: Trust encompasses various aspects concerning assets, including quality and timeliness, as well as confidence in their sources. Security measures such as encryption, role-based access control, and data masking should be evaluated based on well-defined security standards established by the Business Entities.
Discoverability: The adage "you don’t know what you don’t know" applies here; as the number of assets increases, it becomes progressively difficult for consuming entities to locate and request access to the desired resources. Effective metadata management is essential for good governance within the DCA. While the DCA imposes no requirements regarding metadata, it is highly recommended that metadata be made discoverable as the Constellation expands.
Agnostic and Interoperable
Agnosticism refers to an approach that emphasizes architecture frameworks and technology-independent principles, allowing teams the flexibility to use preferred patterns and platforms for asset management and collaboration. Organizations often conflate open standards with "open source"; however, in this context, Business Entities are not restricted by specific data architectures. The objective of a Constellation leveraging the DCA is to maintain an independent set of platform and data architecture decisions at the Business Entity level while ensuring interoperability.
The DCA promotes a business-first, top-down methodology for asset collaboration, incorporating principles such as:
Data Architecture
The Data Cloud Architecture permits Business Entities to utilize any necessary data architectural framework or pattern for asset development and deployment. This flexibility fosters collaboration across different entities. Organizations must consider the rationale behind their architectural choices to ensure alignment with overarching business goals.
Asset Formats: The responsibility for defining a usable asset format rests with the owning Business Entity, which should specify the metadata required for collaboration. The DCA facilitates either the transfer or direct access to assets based on implementation (logical vs. physical). Importantly, metadata is treated as part of the same asset.
Consumption Considerations: Owning Business Entities can also act as consuming entities, creating a network of interconnected architectures. This "network of networks" enables collaboration among lines of business, partners, and customers. The DCA does not impose restrictions on how consuming entities utilize the assets, although methods for asset discovery should be established.
Public Cloud/Internet (Networking, Storage, and Compute)
The Data Cloud Architecture is made possible by the emergence of public cloud infrastructure, which offers limitless scalability for storage and processing, alongside advancements in internet connectivity for global Business Entity collaboration. Beyond network connectivity, each entity is responsible for its storage and computing infrastructure necessary for asset management.
Networking: To facilitate interoperability, Business Entities must be network-accessible. The DCA imposes no restrictions on inter-entity interactions. While all entities may connect via the internet, security and IT teams may impose internal (company-wide) or external (partner and customer) access restrictions.
Storage: Assets will be stored by the owning Business Entity using networked storage solutions (such as public cloud object storage) accessible to consuming entities. Depending on the implementation (logical vs. physical), assets may also be replicated closer to the consuming entity. Under the DCA, storage ownership can reside with either Business Entity, while the owning entity retains control over the asset. This relationship may be reciprocal, allowing owning entities to consume assets from those with whom they share resources.
Compute: Business Entities exchange assets to derive business value. To utilize presented assets, consuming entities leverage their processing capabilities, making public cloud infrastructure appealing. By establishing interoperability standards in advance, the DCA ensures that assets function as intended. Hyperscalers offer varied capabilities at different costs, and consuming entities are likely to exploit these offerings for asset utilization.
In prior examples, Business Entities 1 and 2 were depicted within a Constellation, as were Business Entities 2 and 4. For clarity, these connections remain intact (Constellations 1 and 2). In this diagram, Business Entities 3, 4, and 5 form Constellation 3, with Business Entity 5 functioning as a marketplace, either creating its assets or housing assets for which ownership has been transferred.
Summary
At the core of the DCA is the adaptability of work processes and the establishment of trust among entities. As organizations increasingly adopt multi-cloud strategies to leverage diverse technological and economic advantages, the Data Cloud Architecture serves to unify these elements, allowing data to be moved to where the work is or vice versa. With advancements in technology (such as GenAI and GPU), the placement and movement of data, along with the operation of custom code, must be carefully architected rather than treated as an afterthought. While siloed teams can achieve effectiveness, savvy leaders understand that collaboration yields greater results; thus, fostering collaborative mechanisms is one of the most effective paths to innovation. A well-designed architecture empowers businesses to execute initiatives swiftly and allows IT departments to partner with the business, delivering new capabilities in days rather than months or years. A robust architecture is one that can adapt, evolve, and incrementally incorporate new capabilities. The Data Cloud Architecture empowers organizations to define their own boundaries (internal, partners, customers) and evolve alongside changing business needs. In a fast-paced world, IT requires a modern data architecture built on public infrastructure to democratize assets, unlock new revenue streams, and enhance operational efficiency while preserving trust.