At Amplitude, we spend a lot of time talking to our customers and helping them succeed with product analytics. One thing we have observed is that companies that do a good job with data management—in other words, companies that keep clean data, fix mistakes, approve events to ensure they are correct, and plan events to make sure data is consistent—tend to get more value from their product analytics system. We have dug into this over the past few years, and really tried to understand why this is the case. Through this work, we have come up with some principles that we call The Four Truths of Data Management. They are as follows:
1. The Goal of Taxonomy Is Functionality
The first truth is the concept of functionality. Functionality is the ability of a taxonomy to answer questions. Although a well-organized and complete taxonomy is rewarding in its own right (at least to data nerds like us), unless there is direct value generated from that taxonomy, most people aren’t going to put in the effort to maintain that taxonomy.
The ultimate goal of a taxonomy is to facilitate answering questions that help you improve your product (and make it more profitable). And functionality isn’t just about measuring the most complex question that a data scientist could theoretically answer. It also measures how usable your data is for someone who is brand new to analytics and just wants to calculate some simple metrics. It measures not only depth of analysis but also breadth, or the number of people who can use it to answer questions.
2. All Taxonomies Are Imperfect
The second truth is the concept that all taxonomies are imperfect. No taxonomy can answer every question for every user. There is going to be some data that you haven’t captured, or even if the data is there, it may not be structured in a way conducive to answering a given question. Also, there is typically going to be a tradeoff between breadth and depth, so you will need to think about how to structure the data such that it is useful to the largest number of people possible.
Another thing to remember is that needs evolve over time. Even if you design a taxonomy that can answer every conceivable question today, and it is also extremely usable by everyone in your company, it is likely that in six months or a year your requirements will change. If your company is becoming more product-led and learning faster, you will likely have new questions, and your current taxonomy will likely be unable to answer some of those questions.
3. Data Investment Won’t Happen Until the Gap Between Current State and Desired State Becomes Clear
This leads us to our third truth, otherwise known as “Oh no! We don’t have the data!” Organizations often have limited resources to build their taxonomy and to instrument event data. Until they hit the point where they have a critical question that their analytics system can’t answer, they often won’t invest any more. This makes sense, because everyone wants to spend money on the things that yield the most value, and an investment in analytics often takes time to show significant value. The problem with this is that it often becomes obvious that there is a gap between the current state of the taxonomy and the desired state. And when this happens, the only way to bridge that gap is to invest more in something that hasn’t yet shown much ROI.
4. Effective Data Management Allows You to Fix Existing Data and to Proactively Prevent Future Problems
The first three truths help us understand the current state of the data management world and the desired outcome. And this leads us to the fourth truth, or the way to end all of the suffering. This is the truth of data management.
Data management involves two things:
- retroactively cleaning up your existing data by fixing mistakes and
- removing unneeded events and properties
and proactively planning and vetting incoming data.
Retroactively fixing your existing data will enable you to improve functionality without a huge investment. This will then give you the leverage you need to set up a structure for proactively planning and approving events, which will lead to even larger and more sustainable gains in functionality. Thus, by using data management correctly, you can both rectify problems quickly and set up a structure that will prevent new problems from arising.
Effective data management is not a magic bullet; solving data problems takes continuous effort. But by investing in sophisticated data management systems, you ensure that your team and your colleagues—regardless of their specific role or SQL knowledge—are able to garner immediate insights from data.
Of course, you also need the tools to make sure that data is not just accessible, but clean, secure, and manageable. That’s where capabilities like Govern from Amplitude come in. With Govern, you can manipulate data post-ingestion, addressing some of the issues that traditionally limit depth and breadth of usage. Additionally, Govern provides an easy setup process for planning and verifying incoming data, which leads to fewer data problems down the road.
Instead of a magic bullet, think of data management like a magic trampoline, designed to catapult you to the next level of maturity by reducing pain and improving results. You’ll still need to do some work on your end, but you’ll have the tools to get a boost. It’s that boost that justifies the time and money investment. With an effective and reliable system like Govern, you have proactive tools to manage your taxonomy.
This post about the Four Truths of Data Management is just the first part in a three-part series on data management. The second post will explain how to unlock functionality for your company. The final post will lay out a maturity model for data management, which will help you determine exactly where your organization stands today and see what it would mean to move to the next level of maturity.