6 data quality dimensions: a comprehensive overview
Data quality is an essential section within a company. Poor quality data can cost the company much time and resources to correct. According to Gartner, poor data quality can lead to the average loss of $12.9 million per year for companies.
The high costs for having imperfect data quality has prompted companies to take action and implement routine data quality checks.
What is data quality dimension?
Data is the foundation – the majority of business decisions are based on it, making it crucial to ensure that employees are working with high quality datasets.
There are six dimensions that help measure data quality:
- accuracy,
- completeness,
- consistency,
- uniqueness,
- validity,
- timeliness.
The critical role of data quality dimensions
With data quality dimensions, companies are able to assess their data quality, exposing the flaws found within when compared to the company’s data quality standards.
Becoming aware of the flaws within the system enables leaders to take action and implement solutions that will improve the quality.
As industries become more reliant on big data and the importance of data science in IT increases, data quality dimensions can be implemented and monitored by data scientists to help businesses achieve their goals.
IT Insights: Benefits of data with Carrie Goetz
Six data quality dimensions with examples
Accuracy, completeness, consistency, uniqueness, validity, and timeliness are the six data quality dimensions that many businesses use to rate the quality of their data.
Each of these dimensions plays a critical role in data preprocessing, as it helps prevent the company from experiencing negative impacts od bad data quality. These components are intertwined with each other, and poor quality data can negatively affect various dimensions at the same time.
For example, there is a big role for data analytics in the finance industry. Data on customer activity is analyzed, placing each customer in different categories. If there was bad quality data, the analysis would place the customer in a category that does not align with their interests, possibly causing dissatisfaction.
If the data meets the standards set by the company’s data quality dimensions, the customer is placed in the correct category that is aligned with their interests.
See related articles on the different steps and tasks involved in data workflows:
- Data transformation: the complete guide for effective data management
- Data automation for business growth: everything you need to know
- Data cleaning: benefits, process and best practices
- Data modelling: a guide to techniques and best practices
Accuracy
The first data quality dimension is accuracy. Data accuracy represents how precise and correct the data is to its real-world representation. High data accuracy does not contain errors and can be verified by using a trusted source.
For example, customer records that have incorrect data not verified personally by clients, such as incorrect birthdays or former home addresses, would be inaccurate. However, recording the correct phone number and home address that is personally verified by the customer would make the data accurate.
Completeness
The dimension of completeness focuses on whether or not the dataset contains the necessary components to serve its function.
An example that is commonly experienced by customers is filling out shipping information about packages. Information such as the name of the recipient, street address, city, state/province, country, and zip code are required to be filled. When all these components are filled, the data is considered complete, and the shipping process can be conducted in a smooth manner.
However, if the customer does not fill these out and the form is missing information, it may lead to difficulties in delivering the package, such as delays or the inability to contact the recipient.
Consistency
Consistency measures data that is used multiple times and stored throughout multiple datasets.
Testing data consistency measures the percentage of values matching throughout various datasets. Consistent data holds a high percentage of matching data values.
Inconsistencies can vary in severity. For example, formatting issues like different date formats are easily fixed, because it is essentially the same information presented in different ways. On the other hand, having multiple addresses listed within customer data becomes difficult to fix without risking deleting the correct data entry.
Customer data comes from multiple sources, such as multiple forms filled out by a patient in a hospital setting. There may be instances where the patient accidentally commits a human error, resulting in having two different addresses filled out for the same category.
IT Insights: Benefits of data with Carrie Goetz
Uniqueness
Uniqueness is considered to be the most important data quality dimension. Data uniqueness is tested by examining for duplicate data within data fields and throughout datasets.
High quality data in terms of uniqueness means the data has minimal instances of duplication or overlapping. Additional work to improve the uniqueness score of the organization’s data includes removing duplicates. To ensure that data remains unique, companies can use codes to organize the data better.
For example, Austin Chia from the Pragmatic Institute suggests to create unique customer IDs and compare them to the total number of customers the company serves. If there are differences, that is a sign of duplicate data and the company should take action to fix it.
Validity
Validity focuses on how accurate the data is in terms of representation and if it abides to any rules set for the type of data. Rules can vary from standard norms to business rules set by the company, such as formatting, ability to be used within different systems, etc. Business rules also establish the standards for measuring data validity.
Customers often encounter examples of data validity when filling out online forms. Valid data occurs when the customer writes down the correct information in the expected format, such as a 10-digit phone number with the country calling code. However, it becomes invalid if the customer adds 9 digits and forgets to include the country calling code, as such a number does not exist.
Timeliness
Data timeliness focuses on how quick and easy it is to access data when it is needed. Measuring the timeliness comes from the time it takes to retrieve data from multiple sets or sources. An example of timeliness can be seen throughout the finance industry, especially in the stock market, where acquiring up-to-date data quickly is the difference from making or losing money.
How do you ensure quality and integrity of data?
If a company has not updated their data infrastructure, they are at risk of falling behind the rest of their competitors that moved on to use new technology and, for example, migrated to the cloud. Companies also run the risk of losing data integrity as they lack the security measures to keep data safe. With how valuable data is today, it is very important to keep up with the latest security technology.
There are many data services available to assist companies in handling their data. Data modernization services are available to help companies transition their data to a new infrastructure. These services also include help in using and maintaining data quality dimensions. At Future Processing, we are more than happy to assist you and ensure that your data is of high quality and fully supports your decision-making – contact us to make the most out of your data.