Where to Start with Data Quality
Where to Start with Data Quality
Guest writer Sara Hanks builds our foundational knowledge with her blog post on data: “Where to Start with Data Quality.”
Fall is my favorite season for two reasons – beautiful scenery and the National Football League. This year, I joined a fantasy football league for the first time. The league uses the ESPN fantasy football app, which is great because it contains analytics that make it easy for a novice like me. Each week, my team is matched up with another and there is a predicted outcome for each player as well as the team overall. I can use the predictions to decide who to bench or trade. This is possible because there is a ton of high-quality data that feeds the algorithm.
According to Joseph Juran, quality means “fitness for use.” According to Philip Crosby, it means “conformance to requirements.” Data quality encompasses both definitions – it needs to be able to provide insights to make decisions real-time. Here are 10 elements to consider with data quality:
- Accuracy – the data needs to be correct
- Complete – the data does not have missing values
- Consistent – the data needs to be defined the same across all IT systems
- Valid – the format of the data needs to match the data structure, such as a date field
- Singular – the data should not be duplicated
- Seamless – the data needs to move from one system to another without compromise
- Repeatable – if two people are recording data, they both record the same thing.
- Preserved – the data needs to be retained according to the retention policies
- Compliant – the data needs to adhere to privacy laws, and internal policies
- Accessible – the data needs to be democratized in a way that people can consume it, according to their skillset.
To achieve a high level of data quality, the data needs to have a clear owner. The owner is most likely responsible for executing the process. For example, the customer information is owned by the sales team, and the supplier data is owned by the purchasing team. The IT team must support the data owners because they can ensure that there are proper controls in place to detect issues with moving and storing the data.
Getting started with a data quality plan can be overwhelming, so it is best for businesses to prioritize the data first. I recommend starting with the fundamental, foundational data for your business. I like to consider this data the cost of doing business and understand that data quality is just as necessary as closing the books at the end of the month. The next area I recommend tackling is all the data used to generate operational KPIs. Finally, focus on the data necessary for transformation efforts.
Once the scope of the data quality plan is set, it is good to create a baseline of the data quality. An audit of the data can help data owners understand the baseline. The audit is a deep dive into a sample of the data to get a representation of the overall data. During the data audit, the data owner will need to get hands on with the data, as well as interview people to understand the accuracy of the data. At the end of the audit, create a metric around how much of the data is considered defective. The audit findings facilitate recommendations and plans to improve the data quality.
At a minimum, the data quality plan must include a process around ensuring new data is created with high quality. The process needs to define who has authority to create the data, and it needs to define the process to update data.
Improving data quality takes time and resources, so start small and drive incremental improvements over time.