I’ve been having discussions lately about data strategies. And, living in London, these conversations often end up in a pub over beer and evolve into philosophical debates on trends in data management, data warehousing, business intelligence, etc. So I thought I’d start a series on the topic and see where it leads.
One area that has changed the most in the last 5 years or so is the role of the data producer. A data producer is “a user interface, automation, service or device that collects data that is relevant to a business. In many cases, multiple systems and processes may produce data for the same data entity. For example, a customer record may be updated by marketing, sales, point of sale systems, customer service and self-service tools used directly by the customer.” In banking, data producers would be ATMs, websites, systems that process transactions, external vendors, customer relationship management systems, etc.
For most of my career in data, the systems that produce data have been somewhat detached from the downstream uses of the data in analytics and reporting. The producers captured the information but it was largely up to the data teams to make sense of it, make it usable, and try (usually with limited success) to get the producers to clean it up when they found issues.
This has never been particularly practical. But as data has scaled, it has become impossible. Streaming data, big data, unstructured data, Internet of Things, and the rapid evolution of data science mean that data and data needs are growing much too quickly for a data integration team to be responsible for the quality of the data – we simply can’t keep up with demand. As a result, the industry has had a substantial change in attitude and started holding data producers accountable and responsible for both making their data available to downstream teams and ensuring the quality of their data in the process.
From this, the concept of a ‘producer contract’ has evolved. The producer contract is an agreement between the data teams and the producers of data that outlines the roles & responsibilities of each team. A producer contract may include agreements on:
- Recency/Timeliness – how long from when the data is produced until the producers make it available in the data system (likely a data lake)? This could be seconds, minutes, hours, days, etc. depending on capabilities and business needs.
- Growth – data systems must be able to accommodate the size of the data and to plan accordingly. There must be a method to plan for and communicate expectations for future storage capacity.
- Change Management – agreements regarding how issues with the data will be communicated
- Data Privacy – rules for treatment of personally identifiable information and accommodations for data privacy regulations
- Data Cataloging – ensuring data producers provide information to users on the data they are producing so that it can be properly understood by power users
- Schemas – agreement on shared schemas to ensure the data system can be centrally managed to scale
The increased involvement of the data producers is one of my favorite trends in data management – it involves data producers more closely in the usage of their data and it ensures that they are fully engaged in how their data is used. And it means data teams can spend less time on routine data quality and acquisition and more time thinking about key metrics (often derived from data from multiple producers) that drive the larger business. A win-win for everyone but especially me!