With the advent of big data analytics, powered by the ease of moving data to the cloud, the pressure on companies to get data right to make million-dollar decisions in a few seconds has become paramount. How can organizations set themselves up for success when it comes to data? What are some of the foundational elements to have in place? In this episode, Jason Hare shares the principles of establishing a data governance plan for forward-thinking organizations.
You might also enjoy: Data-Driven with Data Mesh
Kimberly Boyd: Welcome to Pragmatism in Practice, a podcast from Thoughtworks, where we share stories of practical approaches to becoming a modern digital business. I'm Kimberly Boyd, global head of customer marketing and insights at Thoughtworks. It's a pleasure to join the program as a new host. I'm excited to speak today with our guest who is helping Thoughtworks lead the frontier of data governance on behalf of our clients, first, because I'm admittedly a neophyte when it comes to understanding data governance, but also because becoming a data-driven company is increasingly a priority for companies across the globe.
How can organizations set themselves up for success when it comes to data? What are some of the foundational elements they need to have in place? Those are a few of the topics we'll dive into today. I'd like to welcome Jason Hare who has recently taken on a new role at Thoughtworks as data governance subject matter expert in principle. Jason, welcome to Pragmatism in Practice.
Jason Hare: Thank you very much.
Kimberly: To begin, let's talk about why data governance matters. Why do today's business leaders need to care about it?
[00:01:00] Jason: Sure. Well, data governance has been around for a very long time. It was called a lot of different things. In the past, mostly, it was due to regulatory downward pressure, think of HIPAA or Sarbanes-Oxley, things like that. With the advent of big data analytics, powered by the ease of moving data to the cloud, suddenly, the pressure on companies to get data right to make million-dollar decisions in a few seconds has become pretty paramount. Having the policies and having that part of your data management lifecycle where the authority to operate the standards, guidelines, and whatnot have become absolutely essential and existential for companies to survive against their competition.
[00:01:48] Kimberly: Well, that makes a lot of sense. I think like you said, there's million, billion-dollar decisions being made in an instant in today's increasingly digital world. It really sounds like when we get to the essence of data governance, it's all the fundamentals to allow organizations to really be successful and turn data into impact into action and ultimately into value. Where should organizations begin when it comes to data governance?
[00:02:18] Jason: That's a great question. I think it also speaks to where there is some confusion about what data governance really is. Data governance is not data management. If we think of, for example, all the parts of the implementation of data management like quality control, security, all that stuff, those are things that are based on the policies and standards and guidelines that are inside of data governance. Data governance is more about the policies and the auditability, the authority, and the accountability of what data management does.
If you think about using, let's say, analogy of finance, you have auditors and you have accountants, and you never want the accountants to be the auditors. There's really good reasons for that, especially in today's growing privacy thing, which I'm sure we'll talk about in a minute. The data governance part of the overall data management part of a company is responsible for deciding on who is responsible for what, what role is responsible, who exercises the authority about how data is going to be managed.
That is very important to keep those separate for, amongst other things, privacy regulation. That's especially what it does. The last thing that data governance does is it orchestrates between stakeholders, processes, and technology. It's really important that data governance aligns with the cultural principles of an organization so that it can manage its state as a strategic asset.
[00:04:06] Kimberly: It sounds like when it comes to data governance, a lot of it is setting up the structure and the decision-making body of who gets to see what and why and protecting that all-important privacy that you just mentioned. Who typically has ownership for data governance in our organization? Is it always housed in the same place or is it like you said, based on the cultural makeup of an organization?
[00:04:31] Jason: I want to throw out a little factoid real quick. Harvard Business Review did a study in 2017 that underlies why data governance is so important. It determined in its 2017 study that 70% of all companies surveyed had employees with inappropriate access to data. Let's just think about that for just a sec. How big of a problem is that? It's a huge problem. Where does data governance typically sit? Where it should sit ideally, is with the business side of things, not IT.
When we think of data governance we think of people, process, and policy, those are the three-piece, and enabled by technology. Notice that the first thing I did not say was technology. Technology is certainly the enabler, it builds capacity and capability within data management informed by data governance, but data governance is really a business function. It's really a people process and culture function. That's one of the misunderstandings of what data governance really is, it's not an IT practice per se, data management is an IT practice. That's the implementation of the policies of data governance but data governance itself is definitely business-oriented.
[00:05:59] Kimberly: What can be done to help realign the misconception that data governance is a tech thing and not a business thing and really get the business side of organizations to take more ownership when it comes to data governance?
[00:06:13] Jason: That's another great question. I'll give you a specific example. One of the things that's influenced by data governance is the data management practice of data quality. If you think about quality, traditional data quality is described as having six dimensions. I'll just throw out a couple of them. One is accuracy, completeness, timeliness, and relevancy.
When it comes to accuracy and completeness, let's say, the threshold for what is fit for purpose, that is what data could be used in business, the threshold for that quality dimension is set by the business and the data quality team on the IT side is informed of that, and then writes the business rules and does the data quality management on the technical side to meet the business requirements that was, again, set by the business.
[00:07:09] Kimberly: When it also comes to getting more widespread adoption and I think buy-in for data governance, what have you commonly seen in numerous years of experience in the data governance space that can help accelerate organizations on that journey?
[00:07:27] Jason: I'd say in the past 10 years it has always started in IT. In every data governance project I've been involved in Contrary to what I just said where it should be a business-driven endeavor. It's always started out as an IT endeavor, which is always a mistake. The first thing that I try to do is find champions within the business community, the business side of the organization, and say, "Hey, folks in the sales department," people that have to generate revenue for the organization, "we need you to take ownership of these existential things."
Let's just say again data quality, these dimensions of data quality, for example, completeness. What's the threshold of completeness for this particular data set and do we have enterprise data sets? Enterprise data sets would be data sets that are used by not just one line of business but the entire organization, and those would be the existential data sets. The things that keep the business continuity going.
The first thing I will do is, I'm usually going to start out in the IT department because that's where people put data governance usually, then I will champion my way out. I would find other high-powered high-interest influential stakeholders. -Usually not the CIO or the CTO because again they're IT but somebody on the business side, the CEO or CFO. Somebody like that to charter my data governance organization, then I get my business champions, then the whole process starts to shift from IT to on the business side.
[00:09:22] Kimberly: Is there anything else that is a common pitfall or perhaps challenge beyond just getting data governance to reside in the right space in an organization?
[00:09:35] Jason: There is. Usually, it's minimized. I would say there's a couple of problems that usually arise. One is that any one of the dozens and dozens and dozens of data governance and data management products out there, I won't name any names but they all equally have their ups and downs to them, there's often this idea that we could buy a solution or build a solution, a technology solution to a people and process and policy problem. That never has worked. In the multiple decades I've been doing this, that has never worked, not once.
[00:10:18] Kimberly: No. I think in the history of technology, it's probably never worked.
[00:10:21] Jason: Yes. What I'm talking about in data governance is probably a business problem, in general. I didn't start out in data governance, I actually found my way here by accident. When I was a, let's say a web developer, I would find that we would just buy more software and more server space and more whatever to try to solve a process problem, and it never worked. That doesn't work in data governance either. You really have to have the roles and responsibilities of who is going to take ownership of the aspects of governing the implementation of your data management and policies.
All those policies, all those things that say, for example, these are the agreed-upon business terms, this is what a customer is, this is how we define customer. Here's the policy that that links to, and we have a enterprise business glossary generated by the business that goes through a workstream to get that term approved. It's not just one individual that says, "Oh, I'm going to define customer," and just put it over here and everybody's going to agree to that. That's not how that works. It's a council, a committee, a group of people from all parts of the organization that agree upon what these base definitions are and they put them into an enterprise business glossary, for example.
That's one artifact that is commonly used. I'd say that when I've talked to customers and I'll say, for example, "What do you think the value is that you're delivering to your constituents, the people that pay money to your company?" I will get different answers. The problem with that is there's not a common definition of what is the customer, what is the thing we're delivering, and what is a reasonable threshold of quality that we're going to deliver in terms of data or customer experience. Without that governance policy, without that data management framework being driven by a policy, you're going to have bad data and you're going to make bad decisions. That's going to cost you customers.
[00:12:50] Kimberly: Yes. It's interesting. It's something that probably seems fundamental but if no one stopped to define that and give that a common understanding, that can probably have compounding troublesome impacts down the road.
[00:13:04] Jason: It does. I'll give you an example of somebody that did it right. One of my favorite ways to spend money is on music. I am a big Spotify fan. I love Spotify. I have my playlist of songs and it seems to know what I want to listen to before I even listen to it. How does it do that? It does that with a couple of basic principles. One, it has a very sound data governance policy that is separate from data management. The reason it does that is what its discovery product does. If anybody out here is a Spotify fan, I'm sure there's lots of you that are because Spotify, through data governance, is now the largest music streaming company on earth.
[00:13:53] Kimberly: And podcast, so it's very relevant today.
[00:13:57] Jason: Yes. This might even be on a Spotify podcast somewhere. What they did was they started using data to track what few people listen to, and they built a recommendation engine to give you your discovery list. Every Monday, you get discovery stuff; you get your list of your favorite tunes that you listened to last summer, for example. These turned out to be extremely popular. That's good data, sound data management, sound data labeling, sound metadata, sound schemas all that stuff. What was the other thing they did that made that possible? That is looking at the ethical use of data. That's also where data governance is separate from data management.
Data management will do whatever data governance tells it to do, ideally. If we say, "Well, we could just track and reuse whatever people listen to in whatever way we want to," that's not an ethical use of data. In fact, that violates a lot of privacy laws both locally at home here, statewide like CCPA, for example, California Consumer Protection Act, but also GDPR. That could be a big problem. Spotify started out with a couple of Swedish gentlemen who decided that there's a monetization opportunity in defeating music piracy, which 10 years ago, 12 years ago almost killed the music industry. Came fairly close. It was a big problem.
They did this by, one, offering to pay per listen to the artist to the record companies, whoever owned the rights to the music. Some small fee for every time somebody played one of these songs. That had to be tracked. It had to be tracked in an ethical way that respected the user's privacy. GDPR gave them a look over and they were fine, so this company was able to grow. Did it by a couple of things. One, it did what we said of having a data governance set of policies that exercise authority and control and consensual communicated decision-making over the management of its data assets. It did that. Also, it placed a big premium on privacy.
Sometimes I get asked, "How do we monetize the ethical use of data?" There you go, the Spotify example is the pretty example of why you should be ethical with how you use data. It doesn't track users and then reuse that data to re-target users with other stuff. It doesn't do that. Spotify just has a policy of no go on that. That's one of the reasons why it's such a trusted brand. It creates this delightful user experience for its consumers, consumers don't have to worry about their data being sold all over creation, and Spotify makes money, the artists make money, and I think the consumer, in my humble opinion, it's a great value.
[00:17:17] Kimberly: It's a really great example of you're doing the right things with data but then by doing the right things and being ethical and having these processes in place it in turn also enables a lot of value to be delivered both to the organization and ultimately to the customer or the listener, all of us on the other end of that Spotify app. I think it's really interesting that you mentioned the role of ethics in data governance because with the predominant rise of ESG, and I think ethics on the forefront now more than ever for organizations, and rightly so, it sounds like data governance is, or at least should be a central component of how organizations think about their ethical policies.
Have you found that most organizations are making those linkages or is there still work to be done there?
[00:18:11] Jason: I think there's still work to be done. I don't think anybody starts out to think, "Oh, how can I unethically use data?" That has never been said by anybody that I've ever met. It's just that, one, there seems to be a disconnect between the use of the word data ethics and the use of the word revenue generation. There's this wet blanket mentality whenever I bring this up, like, "Hey, what's the ethical framework in which we are going to operate?" Because there's just, fun fact, most privacy legislation globally are based on the OECD's eight principles of privacy.
OECD is the European Economic Development Authority. Many years ago, they came up with these eight principles of user privacy. That's what GDPR is based on, is what CCPA is based on. It's really the basis of all privacy law that's currently in the world right now, and that's becoming an exploding issue. Each state right now in the United States is getting ready to pass some kind of data privacy legislation that is like CCPA. For example, Virginia passed one, I think it was in April. That's going to go into effect next year. If Virginia is doing it, everybody's going to do it.
We're going to have 50 states with 50 different privacy rules, how are we going to manage all that? Well, one, it starts with a good set of solid data governance policies that looks at how to use data in an ethical way. It's going to not just be to create these delightful experiences and create authentic brands that consumers trust, but it's also going to have, again, that downward regulatory pressure and that data sovereignty that's going to be popping up more and more and more.
I think if we get ahead of it, if we start to orchestrate how do we deliver data in what form and how do we know that our governance risk and compliance rules can be semi-automated so that they're not so much overhead, how can we do all that? Also, the companies that are really going to thrive in the next five years.
[00:20:28] Kimberly: If I'm a digital leader at an organization and I'm forward-thinking and I want to get ahead like you said of this impending potential data legislation, what are the two or three things I need to start doing now so I can be in a good, prepared place when that time comes and legislation occurs?
[00:20:48] Jason: One, data governance doesn't act alone on this. There's data assurance as well. Data assurance is more of a information security kind of function. Compliance is not the same as security. Most InfoSec people would agree with that. There is overlap between data governance and data security compliance and assurance. In fact, a lot of data governance is concerned with the InfoSec triad, confidentiality, integrity, and availability. Confidentiality, the right person should look at the right data and not have that 70% Harvard Business Review statistic going on. That's just bad business.
Integrity means is the data that I am looking at, is that actually the real data? Am I looking at data that has been adulterated, corrupted, whatever? Then there's availability. Can I get access to it? Do I have the right people getting access to that data in a way that makes it useful? Really, there's just a couple of metrics here. One is having a defensible posture, that is, you have the information security defensive depth apparatus all set up. Then you have a sound set of data governance policies and a managed proactive data governance practice.
You have those things. Ultimately, the two metrics that really power any kind of data effort is the trust. Do you trust the data that you are using? Is that data being reused? You may have heard back in the days of big data that-- I even hate to say this word because I'm afraid it's going to just get picked up and that's the quote that comes out of this. Data is not the new oil. It is not that because data's primary metric is reuse, right? The more a data set, a data source, a data whatever, data object is reused, the more value it has. Data is a resource that could be reused over and over and over again. The more it's reused, the more value it has. It could also be enriched with other data.
If we're talking about data management, the entire data management life cycle, you have initiation, creation, enrichment, and then eventually you have some end state for that data. Whether it be to continue to exist, to be re-enriched, or eventually to its end of data life, it's deleted when it's no longer in use. We could have a whole nother podcast on e-discovery and the legal implications of bad data management but we'll leave that for another time.
[00:23:54] Kimberly: You were just sharing a little bit of what organizations can do to be forward-thinking and get ahead of the data governance legislation. For organizations that are already a little more forward-thinking about data as well and are perhaps thinking about data as a product, or thinking about concepts like data mesh, how does data governance adapt to a model like data mesh, for example?
[00:24:18] Jason: That's a great question. That's something that I've been thinking about since I started at Thoughtworks, which has been a very exciting four months. I just got here, but I was aware of data mesh and other decentralized ways of storing data quite a while ago. How data governance responds to that is, one, the core functions of governance itself, not data management, again, we're separating those two things, the thing about governance itself, it sets policies; it sets KPIs; it sets all those standards and things that power everything else.
That really doesn't change, but the operating model, that is the people that have roles and responsibilities around that data and how was it's used in its decentralized, decomposed form, that model would have to probably be not centralized. You might have a central governance policy repository of some kind, some place where there are ultimate standards and ultimate authority to operate, but then the operating model where the, let's say data product owners, or in old-timey data governance speak, data stewards if you will, how those people operate would be in a much more federated way.
That is they would be close to the knowledge workers and not so much in the centralized pyramid of the data governance organization.
[00:25:58] Kimberly: I want to circle back to something we talked about a little earlier. A lot of the emphasis has been around people and process and very much not on technology when it comes to data governance. Given the challenge of finding talent in today's market, specifically finding data talent in today's market, I think is additionally challenging. -So, we're lucky to have you here with us at Thoughtworks, Jason.
[00:26:27] Jason: I'm honored to be here.
[00:26:30] Kimberly: What do you need to look for in terms of talent to staff and build up your data governance function? What type of skills, background, capability are best suited for people to lead a data governance capability?
[00:26:51] Jason: One, it's a small community. We all know each other. I'm trying to think of data governance people. The men and women that make up the data governance community all came from different backgrounds and they all as said every last one of them came into data governance accidentally.
What I would look for, what I find to be the best data practitioners are people that look at business problems holistically, and what I mean is they look at the cultural principles of what drives the company, and from there, the thing that the company feels it delivers in terms of value, whether that's a bicycle or whether that's a bunch of streaming music or whether it's financial services, whatever that is, the organization has a mission and that mission drives everything they do and how they think about themselves.
When a data governance person, I would say a really good one, comes in, that's the first thing they analyze, is what is important to this organization business-wise? How do they think they generate revenue and how do they think they deliver value to whoever it is buys their stuff? Once we know that, then we can say, "Okay, well, here's how we can prioritize which data needs to be governed now and which data needs to be governance some other points."
You want to think of it as hard as the Gartner Magic Quadrant of data. On the upper right-hand corner, you have the high-impact stuff that really makes a difference between whether the company has a bottom line or not, that's the important prioritize stuff. Then the lower left-hand corner, you have the kibbles and bits, the stuff that is probably redundant, obsolete, and trivial data, or what we call them, the biz, ROT. That stuff, there's usually a lot more of that than you think there is. There's a lot more ROT in any organization than it even knows about.
We try to, again, look at what is important to the company or organization, whether it's a government, whether it's a pizza manufacturer, whether they make diodes, whatever it is they do, maybe they make cars, whatever it is they do, they ascribe a value to that. There's some underlying principles in which they operate and decisions are made. If you understand that, then you can start to figure out, "How do I build roles and responsibilities should take decisioning about data and have that stick?" Those kinds of people that can think like that, those are good data governance folks.
[00:30:01] Kimberly: It sounds like you could secretly recruit since you mentioned that most people in the data governance community have come to it kind of happenstance, that find those individuals and organizations who are culture champions, who have a clear, fundamental understanding of the value drivers of the organization and convert them and also make them data champions. Sounds like it could be a winning formula.
[00:30:27] Jason: Yes. You have to have a thick skin too because nobody-- Sometimes when you say what has to be done; it doesn't sound all that much fun. Change is hard. Changing management, cultural change within an organization. Changing the way you look at data, changing the way you use levers to derive value from data is hard work. Finding those stewards, those product managers, and those champions that will really move your business forward in the direction you want to go, that's much harder than it sounds. It even sounds hard when I say it, but in real life, it's even more difficult than that.
You have to have the stick-to-it-iveness, and as the data governance person, you have to be the person cheering them on you, "You've got this, you can do it. Here's how we've delivered value in the last quarter," for example. You can now tell that to your shareholders and they'll be happy with you, and let's go on to the next step.
[00:31:39] Kimberly: You have to be a champion but also not take things personally, is what I hear. You mentioned that it's hard. Maybe to close out our conversation here, you could talk a little bit about, what's at stake if organizations-- What's at stake on the positive side for those who get this right? What's at stake for those who either ignore it or don't move on it quickly enough?
[00:32:08] Jason: What's at stake is, can you make data-driven, intelligence-driven database decisions? If you're making a decision, are you doing it anecdotally, based on your gut, what your gut tells you is the right thing to do, or are you making decisions based on what the market, your customers, and your back office is telling you what's happening? If it's the latter, you're doing great because definitely, your competitors are using real-time streaming analytics to make decisions in microseconds being that now we're all consumers in real-time, 24 by 7.
Think about your smartphone and think about how much business you transact on that phone every day. The companies that you do business with, one, you trust them, or I hope you trust them. [chuckles] They should be exhibiting those markers of trust that they're not abusing your data, but second, they're delivering a customer experience to you that is at the very least not annoying. For example, I have an insurance app for my motorcycle on my phone. It's very intuitive. Whoever designed that did a great job. They didn't do it based on instinct; they did it based on data. They did A/B Testing. They figured out, this is what people want to interact with.
Now, some other insurance app that I used to have was terrible. It bogged down, it broke, I could barely read it. It was awful. They probably did not do a lot of testing, or if they did, they had bad data and they made bad decisions. Really, the consequence of getting it right is a delightful consumer experience or customer retention. It's hard to get new customers. It's much easier to keep the ones you've got and then grow from there.
The second, if you get it wrong, well, let's say there's the regulatory problems you're going to face. There is the dissatisfied customers you're going to face, then there's your competition that is going to steamroller you because they are using streaming real-time analytics. They do trust their data and their data is being used to make decisions.
[00:34:38] Kimberly: Lots of compelling reasons across the board on why data governance matters. Jason, I know I've learned a lot today. Admittedly, I came in not knowing much but definitely converted to being a champion over the course of our conversation for the importance of data governance and just the far-reaching impacts it has for the organization. Thank you so much for your time today.
[00:35:05] Jason: Thank you as well. Those were great questions. One thing about data governance people, if you have two data governance people in the same room, you’re going to get five different definitions of what data governance is, so I welcome anybody's feedback to challenge me and to have a discussion with me on what they think I just said and how much they either agree or disagree with it.
[00:35:30] Kimberly: All right. Well, challenge issued. Hopefully, you receive some comments or questions or challenging remarks on your social media channels. Thanks so much for joining us for this episode of Pragmatism in Practice. If you'd like to listen to similar podcasts, please visit us at thoughtworks.com/podcasts. Or if you enjoyed the show, help spread the word by rating us on your preferred podcast platform.