When startups launch, they often have a small number of data sources and people managing their data assets, so it’s easy to keep their data taxonomy clean.
But as those companies scale, so does the complexity of their data and the number of inputs involved. If they struggle to manage this information, they may face a number of costs.
“Reliable and relevant analytics are important from the get-go, and they are not achievable without some data governance,” says Avo CEO and Co-founder Stefanía Ólafsdóttir. “Scaling product and go-to-market efforts make data reliability and product delivery speed more important, while a growing team makes data reliability more difficult to manage. In other words, the more you need data governance, the more difficult it becomes.”
According to Gartner research, “organizations believe poor data quality to be responsible for an average of $15 million per year in losses.” And with growing regulations around data privacy, such as the GDPR and the CCPA, companies have to be careful of what data they track, and be able to access and remove specific data as required.
Instituting a data governance framework increases the integrity of your data, because everyone is working from the same taxonomy and standards. With this framework, you minimize data misunderstandings by creating clear and effective data practices.
What Is a Data Governance Framework?
A data governance framework is a system used to maintain the integrity of your data and make it more useful while also helping you avoid time-consuming data cleanup. It involves building processes around educating your team about your data standards, outlining the data standards, and providing resources for questions about the standards.
There are three pillars to the framework:
- Education: How and when you share your data standards with your team
- Instrumentation: What your data standards are and how to use them
- Maintenance: Who is in charge of updating the standards, and who to go to with questions
Following these pillars is key for organizations that want to use product analytics. Tracking user behavioral data isn’t meaningful unless you’re assessing high-quality data.
Why You Need a Data Governance Framework
It is easy for data to depreciate in value if there is no framework in place to dictate how data should be added.
Consider a company where the data team for Product A labels their events with underscores between words, and the team for Product B uses hyphens for their labels. The team for Product C doesn’t follow any conventions at all, and there are even some labeled “test.” If an executive wants to see a report on what led to purchases across products, but the report only pulls events with hyphens, the data is incomplete.
Incomplete data can, at best, lead to embarrassing misinterpretations of the data. A graph on a report shows that the majority of customer acquisition happens through email, but the marketing team can easily show that’s the case for only a subset of products.
At worst, incomplete or misleading data can drive decisions that hurt your bottom line, such as shuttering valuable features or inadvertently failing to comply with government regulations.
3 Pillars of a Data Governance Framework
Build a culture around keeping data assets organized by establishing a plan for educating your team, instrumenting a taxonomy, and identifying roles for maintenance.
Pillar 1: Education
Create an education plan to outline how your teams learn about your data governance standards and how they can access the standards.
Your education plan should address the following:
- Why data and data-readability matter to your company
- How a member of your team can read the data
- How the data is governed, including how it’s managed and maintained
- What is the instrumentation workflow
At Amplitude, we recommend including information about your data governance during new-hire onboarding. Highlighting these rules shows team members that data integrity is a priority from the beginning.
Because data evolves as your systems and processes change, it’s good to maintain a data taxonomy within a single source of truth. This resource helps your team understand and read the data easily. Include a data dictionary in your education plan, as well as a guide for adding to the dictionary.
House your education plan and data dictionary in a shared Google Drive or an internal document repository so both trainers and trainees have access. Or use a tracking plan tool to host all your data governance documentation and create a single source of truth for all data stakeholders.
Pillar 2: Instrumentation
A comprehensive instrumentation workflow and taxonomy style guide sets the foundation for clean, easy-to-understand data assets.
When building your taxonomy guide, be sure to address the following:
- What the taxonomy for your data, such as the syntax and nomenclature, is
- Who designs the taxonomy
- How the data is constructed and applied
If your company has been around for more than a few years, accounting for every possible event and property in your taxonomy could be a monumental task. Start with the most important metrics that you track with every new product and feature.
If you aren’t sure what those metrics are, work with your engineering and design teams to identify the metrics that answer questions such as, “What does success look like?” and “What does failure look like?” These metrics should be spelled out in your taxonomy guide using the templated syntax in the guide to address additional data needs.
The next step in the instrumentation pillar is to identify who will design the taxonomy—a lead product manager, an engineer, a designer? At Amplitude, we’ve found the product manager is most likely to design the taxonomy because they are the person most likely to be querying the data.
With your key metrics established and the person designing the taxonomy identified, it’s time to build your taxonomy style guide. The primary goal of any taxonomy is that it is functional. Make sure the nomenclature is consistent, human-readable, and descriptive. Include elements such as casing (all lowercase, for example), syntax (verb plus noun, with spaces), and verb tense (present versus past tense). These elements should be consistent across events and properties. For more details on the different elements, check out our help documents.
Also understand that all taxonomies are imperfect. Even if you get your taxonomy exactly the way it should be, your company’s goals, focus, and products will change. Revisit your taxonomy regularly to account for changes in the company and key metrics.
Once you have a taxonomy guide drafted, share it with the engineering team, and expect to make some compromises based on technical constraints. When your taxonomy is ready, make the documentation available in a shared Google Doc, Airtable base, GitHub repo, or tracking plan so anyone can access it during development.
Pillar 3: Maintenance
The final piece of the data governance framework is to clearly outline who to contact for questions and updates to the framework, as well as how the documents in the framework will be distributed.
When documenting the maintenance pillar of your framework, be sure to address the following:
- Who the data governors are
- How your team reports data issues
- How your team resolves data issues
Your data governors are the people most able to address questions related to specific aspects of your data governance framework, such as general data issues or suggested updates to the taxonomy design. This can be a single point of contact (e.g. a PM who helped craft the original foundational taxonomy) or it can be a data governance team that consists of cross-functional members from within product development. In terms of reporting data issues, determine whether team members should contact the appropriate data governor directly or whether there is a specific channel team members should use .
The final piece of the data governance framework is to document how the pieces of the framework will be distributed. Are they all located in the same Google Drive or GitHub Repo? Will different teams include them in their own documentation repositories? Or is everything in a data governance tool? Understanding where this information will live will make it easier to update them when there are changes.
High-Quality Data Requires a Solid Framework
There is no magic wand for clean, useful data, but establishing a clear data governance framework will help. With these guidelines, your team will have all of the information they need to maintain high-quality, standardized data assets.
Looking for help with your taxonomy? Reach out to a product analytics expert to see how Amplitude helps customers streamline their taxonomies during the onboarding process.