Introducing Data Contracts: collect events correctly the first time
Never collect bad data again. Data contracts help you collect the behavioral customer data you need.
Nate Wardwell
November 14, 2023
|5 minutes
“Garbage in, garbage out.” While it may be cliche, this maxim remains true: if you have a bad data foundation, your data-driven analytics and operations will be fundamentally flawed.
To solve this, we built data contracts inside Hightouch Events. Data contracts ensure that the event tracking taking place on your website and app is not, in fact, “garbage” but instead consistently meets the standards your business needs. Unlike a traditional CDP, we don’t charge extra to proactively manage data quality as we collect events: this is a foundation that any event collection service should include. Read on to learn why data contracts have become so popular and how we’ve specifically developed them at Hightouch to provide best-in-class event collection for the data warehouse.
What are Data Contracts, and Why Do They Matter?
Generally, data contracts are agreements that define the structure of data and help establish validation rules for detecting problems before they reach other systems. Data contracts allow multiple parties to establish and enforce standards for their data. Without data contracts, teams must either learn to live with bad data or clean their data after the fact in their data warehouse.
Generally, data contracts offer several key benefits:
- Create collaboration between data producers and consumers. Data contracts allow data consumers, typically data or analytics teams, to clearly define the format data should follow. Data producers, typically engineering teams, interact with data contracts as their code on websites and apps creates new data. Data contracts create a feedback loop that ultimately ensures engineering work matches the outputs that data teams need.
- Manage changes proactively to maintain data quality. Engineering teams can add data contracts to their version control and testing processes to proactively catch data violations before deploying changes to production.
- Ensure data quality reactively as bad data occurs. When bad data is created in production, data contracts allow teams to quarantine those errors and resolve them. This also allows engineering teams to debug their production code and correct it going forward.
Data contracts are in equal parts a code-driven feature (as a system that enforces data governance) as they are a cultural change (as a system that creates collaboration between data producers and consumers). They have become a mainstream tactic in many modern data tools–as part of the core product strategy at industry giants like Monte Carlo and dbt and in dedicated startups like Gable.Ai.
“The goal of data contracts is to improve collaboration between data producers and data consumers, bridging a visibility gap that has caused decades' worth of data quality issues at all levels of the stack. Contracts bring version control and change management to the data space, resulting in visibility, awareness, and the prevention of breaking changes through human-in-the-loop review and conversation. Those conversations bring awareness to how upstream data is being used downstream, its importance, and its impact on the business.”
Chad Sanderson
CEO
, Gable.AI
Data Contracts in Hightouch Events
Hightouch Events allows companies to easily collect and load events into the warehouse to support analytics and the Composable CDP. We built a user-friendly data contracts feature directly into Hightouch Events so that data and engineering teams can enforce data quality for behavioral events collected on websites and apps.
Hightouch Events’ data contracts have several key features that make it easy to enforce data quality:
- Highly-accessible interface. We built a simple visual interface that makes it easy for anyone to define new data contracts and understand existing ones created by other team members.
- Violation quarantine and write back. When data violates contracts, we write it to a dedicated separate warehouse table. This allows teams to easily find and debug errors and then replay corrected data into their main behavioral event tables.
- Programmatic management. Technical teams can manage data contracts at scale and programmatically via both Git and APIs.
- Debug mode. Debug mode allows engineers to proactively test code changes to see if they introduce event data violations.
- Violation alerting. It’s easy to set up alerts for different types of data violations to Slack and whatever other tools you use to manage internal communications.
Here’s a demo of Hightouch Events’ data contracts in action:
Getting Started
Hightouch Events allows companies to easily collect behavioral data on their website or app and store it in the warehouse. We’ve built features like data contracts to ensure that this data is high quality and ready for your production use cases. Traditional CDPs charge extra for their data quality features, meaning you either have to settle for messy data or pay a premium to clean it up later. Hightouch Events is built for the modern data team and includes Data Contracts out of the box. To get a demo of Hightouch Events and see if it’s right for your organization, reach out to us today.