Data Quality Archives - Datactics https://www.datactics.com/category/glossary/data-quality/ Unlock your data's true potential Mon, 05 Aug 2024 17:27:22 +0000 en-GB hourly 1 https://wordpress.org/?v=6.7.2 https://www.datactics.com/wp-content/uploads/2023/01/DatacticsFavIconBluePink-150x150.png Data Quality Archives - Datactics https://www.datactics.com/category/glossary/data-quality/ 32 32 What is Data Quality and why does it matter? https://www.datactics.com/glossary/what-is-data-quality/ Mon, 05 Aug 2024 17:27:17 +0000 https://www.datactics.com/?p=15641 Data quality refers to how fit your data is for serving its intended purpose. Good quality data should be reliable, accurate and accessible.

The post What is Data Quality and why does it matter? appeared first on Datactics.

]]>

What is Data Quality and why does it matter?

 

Data Quality refers to how fit your data is for serving its intended purpose. Good quality data should be reliable, accurate and accessible

What is Data Quality

Good quality data allows organisations to make informed decisions and ensure regulatory compliance. Bad data should be viewed at least as costly as any other type of debt. For highly regulated industries such as government and financial services, achieving and maintaining good data quality is key to avoiding data breaches and regulatory fines.

As data is arguably the most valuable asset to any organisation, there are ways to improve data quality through a combination of people, processes and technology. Data quality issues can include data duplication, incomplete fields or manual input (human) error. Identifying these errors relies on human eyes and can take a significant amount of time. Utilising technologies can benefit an organisation to automate data quality monitoring, improving operational efficiencies and reducing risk.

These dimensions apply regardless of the location of the data (where it physically resides) and whether it is conducted on a batch or real time basis (also known as scheduling or streaming). These dimensions help provide a consistent view of data quality across data lineage platforms and into data governance tools.

How to measure Data Quality:

According to Gartner, data quality is typically measured against six main data quality dimensions, including – Accuracy, Completeness, Uniqueness, Timeliness, Validity (also known as Integrity) and Consistency.  

Accuracy

Data accuracy is the extent to which data succinctly represents the real-world scenario and confirms with a source that is independently verified. For example, an email address incorrectly recorded in an email list can lead to a customer not receiving information. An inaccurate birth detail can deprive an employee of certain benefits. The accuracy of data is linked to how the data is preserved through its journey. Data accuracy can be supported through successful data governance and is essential for highly regulated industries such as finance and banking.

Completeness

For products or services completeness is required. Completeness measures if the data can sufficiently guide and inform future business decisions. It measures the number of required values that are reported – this dimension not only affects mandatory fields but also optional values in some circumstances.

Uniqueness

Uniqueness links to showcasing that a given entity exists just once. Duplication is a huge issue and is frequently common when integrating various data sets. The way to combat this is to ensure that the correct rules are applied to unifying the candidate records. A high uniqueness score infers minimal duplicates will be present which subsequently builds trust in data and analysis. Data uniqueness has the power to improve data governance and subsequently speed up compliance.

Timeliness

Data is updated with timely frequency to meet business requirements. It is important to understand how often data changes and how subsequently how often it will need updated. Timeliness should be understood in terms of volatility.

Validity

Any invalid data will affect the completeness of the data. It is key to define rules that ignore or resolve the invalid data for ensuring completeness. Overall validity refers to data type, range, format, or precision. It is also referred to as data integrity.

Consistency

Inconsistent data is one of the biggest challenges facing organisations, because inconsistent data is difficult to assess and requires planned testing across numerous data sets. Data consistency is often linked with another dimension, data accuracy. Any data set scoring high in both will be a high-quality data set.

How does Datactics help with measuring Data Quality?

Datactics is a core component of any data quality strategy. The Self-Service Data Quality platform is fully interoperable with off-the-shelf business intelligence tools such as PowerBI, MicroStrategy, Qlik and Tableau. This means that data stewards, Heads of Data and Chief Data Officers can rapidly integrate the platform to provide fine-detail dashboards on the health of data, measured to consistent data standards.

The platform enables data leaders to conduct a data quality assessment, understanding the health of data against business rules and highlighting areas of poor data quality against consistent data quality metrics.

These business rules can relate to how the data is to be viewed and used as it flows through an organisation, or at a policy level. For example, a customer’s credit rating or a company’s legal entity identifier (LEI).

Once a baseline has been established the Datactics platform can perform data cleansing, with results over time displayed in data quality dashboards. These help data and business leaders to build the business case and secure buy-in for their overarching data management strategy.

What part does Machine Learning play?

Datactics uses Machine Learning (ML) techniques to propose fixes to broken data, and uncover patterns and rules within the data itself. The approach Datactics employs is of “fully-explainable” AI, ensuring humans in the loop can always understand why or how an AI or ML model has reached a specific decision.

Measuring data quality in an ML context therefore also refers to how well an ML model is monitored. This means that in practice, data quality measurement strays into an emerging trend of Data Observability: the knowledge at any point in time or location that the data – and its associated algorithms – is fit for purpose.

Data Observability, as a theme, has been explored further by Gartner and others. This article from Forbes provides deeper insights into the overlap between these two subjects.

What Self-Service Data Quality from Datactics provides

The Datactics Self-Service Data Quality tool measures the six dimensions of of data quality and more, some of which include: Completeness, Referential Integrity, Correctness, Consistency, Currency and Timeliness.

Completeness – The DQ tool profiles data on ingestion and gives the user a report on percentage populated along with a data and character profiles of each column to quickly spot any missing attributes. Profiling operations to identify non-conforming code fields can be easily configured by the user in the GUI. 

Referential Integrity – The DQ tool can identify links/relationships across sources with sophisticated exact/fuzzy/phonetic/numeric matching against any number of criteria and check the integrity of fields as required. 

Correctness – The DQ tool has a full suite of pre-built validation rules to measure against reference libraries or defined format/checksum combinations. New validations rules can easily be built and re-used. 

Consistency – The DQ tool can measure data inconsistencies via many different built-in operations such as validation, matching, filtering/searching. The rule outcome metadata can be analysed inside the tool to display the consistency of the data measured over time. 

Currency – Measuring the difference in dates and finding inconsistencies is fully supported in the DQ tool. Dates is any format can be matched against each other or converted to posix time and compared against historical dates. 

Timeliness – The DQ tool can measure timeliness by utilizing the highly customisable reference library to insert SLA reference points and comparing any action recorded against these SLAs with the powerful matching options available. 

Our Self-Service Data Quality solution empowers business users to self-serve for high-quality data, saving time, reducing costs, and increasing profitability. Our Data Quality solution can help ensure accurate, consistent, compliant and complete data which will help businesses to make better informed decisions. 

And for more from Datactics, find us on LinkedinTwitter or Facebook.

The post What is Data Quality and why does it matter? appeared first on Datactics.

]]>
What is a Data Quality Firewall?  https://www.datactics.com/blog/what-is-a-data-quality-firewall/ Thu, 01 Feb 2024 15:33:56 +0000 https://www.datactics.com/?p=24514 What is a Data Quality firewall? A data quality firewall is a key component of data management. It is a form of data quality monitoring, using software to prevent the ingestion of messy or bad data. It’s a set of measures or processes to ensure the integrity, accuracy, and reliability of data within an organisation, […]

The post What is a Data Quality Firewall?  appeared first on Datactics.

]]>
What is a Data Quality firewall?

A data quality firewall is a key component of data management. It is a form of data quality monitoring, using software to prevent the ingestion of messy or bad data.

an image depicting what a data quality firewall might look like. data is streaming from a central point, with a bright light depicting the orderly transmission of data through the firewall.

It’s a set of measures or processes to ensure the integrity, accuracy, and reliability of data within an organisation, and helps support data governance strategies. This could involve controls and checks to prevent the entry of inaccurate or incomplete data from data sources into data stores, as well as mechanisms to identify and rectify any data quality issues that arise. 

In its simplest form, a data quality firewall could be data stewards manually checking the data. However, this isn’t recommended, as it’s considerably inefficient and could cause inaccuracies.  Instead, a more effective approach is the use of automation.

An automated approach

Data quality metrics (e.g. completeness, duplication, validity etc.) can be generated automatically and are useful for identifying a data quality issue. At Datactics, with our expertise in AI-augmented data quality, we understand that the most value is derived from data quality rules that are highly specific to an organisation’s context. This includes rules focusing on Accuracy, Consistency, Duplication, and Validity. The ability to execute all the above rules should be a part of any data quality firewall. 

The above is perfectly suited to an API giving an on-demand view of the data’s health before ingestion into the warehouse. This real-time assessment ensures that only clean, high-quality data is stored, significantly reducing downstream errors and inefficiencies.

What Features are Required for a Data Quality Firewall? 

 

The ability to define Data Quality Requirements 

The ability to specify what data quality means for your organisation is key. For example, you may want to consider whether data should be processed in situ or passed through an API, depending on data volumes and other factors. Here are a couple of other questions worth considering when defining data quality requirements- 

  • Which rules should be applied to the data?  It goes without saying that not all data is the same. Rules which are highly applicable to the specific business context will be more useful than a generic completeness rule, for example. This may involve checking data types, ranges, and formats, or validation against sources of truth. Reject data that doesn’t meet the specified criteria.
  • What should be done with broken data? Strategies for dealing with broken data should be flexible. Options might include quarantining the entire dataset, isolating only the problematic records, passing all data with flagged issues, or immediately correcting issues, like removing duplicates or standardising formats. All the above should be options for the user of the API.  The point is, not every use case is the same and a one-size-fits-all solution won’t be sufficient. 

Key DQ Firewall Features:

Data Enrichment 

Data enrichment may involve adding identifiers and codes to the data entering the warehouse. This can help with usability and traceability. 

Logging and Auditing 

Robust logging and auditing mechanisms should be provided. Log all incoming and outgoing data, errors, and any data quality-related issues. This information can be valuable for troubleshooting and monitoring data quality over time. 

Error Handling 

A comprehensive error-handling strategy should be provided, with clearly defined error codes and messages to communicate issues with consumers of the API. Guidance on how to resolve or address data quality errors is provided. 

Reporting 

Regular reporting on data quality metrics and issues, including trend analysis, helps in keeping track of the data quality over time.

Documentation 

The API documentation should include information about data quality expectations, supported endpoints, request and response formats, and any specific data quality-related considerations. 

 

How Datactics can help 

 

You might have noticed that the concept of a Data Quality Firewall is not just limited to data entering an organisation. It’s equally valuable at any point in the data migration process, ensuring quality as data travels within an organisation. Wouldn’t it be nice to know the quality of your data is assured as it flows through your organisation?

Datactics can help with this. Our Augmented Data Quality (ADQ) solution uses AI and machine learning to streamline the process, providing advanced data profiling, outlier detection, and automated rule suggestions. Find out more about our ADQ platform here.

The post What is a Data Quality Firewall?  appeared first on Datactics.

]]>
What Is Augmented Data Quality And How Do You Use It? https://www.datactics.com/blog/augmented-data-quality-what-it-is-and-how-to-use-it/ Mon, 31 Jul 2023 09:00:06 +0000 https://www.datactics.com/?p=23635   Year after year, the volume of data being generated is increasing at an unparalleled pace. For businesses, data is critical to inform business strategy, facilitate decision-making, and create opportunities for competitive advantage. However, leveraging this data is only as good as its quality, and traditional methods for measuring and improving data quality are struggling […]

The post What Is Augmented Data Quality And How Do You Use It? appeared first on Datactics.

]]>
An image depicting the transmission of data through thousands of screens heading to a central point.

 

Year after year, the volume of data being generated is increasing at an unparalleled pace. For businesses, data is critical to inform business strategy, facilitate decision-making, and create opportunities for competitive advantage.

However, leveraging this data is only as good as its quality, and traditional methods for measuring and improving data quality are struggling to scale.

This is where Augmented Data Quality comes in. The term describes an approach that leverages automation to enable systems to learn from data and continually improve processes. Augmented data quality has led to the recent emergence of automated tools for monitoring and improving data quality. In this post, we’ll explain what exactly is augmented data quality, where it can be applied, and its positive impact on data management. 

 

Why Are Traditional Approaches Struggling? 

First, let’s set the scene. With an ever-growing reliance on data-driven decision-making, businesses are looking for ways to gain accurate insights, deep business intelligence, and maintain data integrity in an increasingly complex business environment.

However, measuring data quality is challenging for enterprises, due to the high volume, variety, and velocity of data. Enterprises grapple with ensuring the reliability of data that has originated from multiple sources in different formats, which can often lead to inconsistencies and duplication within the data.

The complexity of data quality management procedures, which involve data cleansing, integration, validation, and remediation, further increases the challenge. Traditionally, these have been manual tasks carried out by data stewards, and/or using a deterministic-based approach, both of which are not scalable as the volume and veracity of data grows.  Now, enterprises are turning to highly automated solutions to effectively handle vast amounts of data and accelerate their data management journey and overall data management strategy.

 

What Is Augmented Data Quality? 

Augmented Data Quality is an approach that implements advanced algorithms, machine learning (ML), and artificial intelligence (AI) to automate data quality management. The goal is to correct data, learn from this, and automatically adapt and improve its quality over time, making data assets more valuable. 

Augmented data quality promotes self-service data quality management, empowering business users to execute tasks without requiring deep technical expertise. Moreover, it offers many benefits, from improved data accuracy to increased efficiency, and reduced costs, making it a valuable asset for enterprises dealing with large volumes of data. 

Although AI/ML solutions can speed up routine DQ tasks, they cannot fully automate the whole process. In other words, augmented data quality does not eliminate the need for human oversight, decision-making, and intervention; instead, it complements it by leveraging human-in-the-loop technology, where human expertise is combined with advanced algorithms to ensure the highest levels of data accuracy and quality.

“Modern data quality solutions offer augmented data quality capabilities to disrupt how we solve data quality issues. This disruption – fueled by metadata, artificial intelligence/machine learning (Al/ML) and knowledge graphs – is progressing and bringing new practices through automation to simplify data quality processes.”

-Gartner®, ‘The State of Data Quality Solutions: Augment, Automate and Simplify; By Melody Chien, Ankush Jain, 15 March 2022.

GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.

 

How Can Augmented Data Quality Help A Data Quality Process?

Routine data quality tasks, such as profiling and rule building, can be time-consuming and error-prone. Fortunately, the emergence of augmented data quality has revolutionized the way routine data quality tasks as performed, reducing manual effort and saving time for users. Below are some examples of where automation can add value as part of a data quality process:

Data profiling and monitoring

ML algorithms excel at recognizing patterns. For example, ML can enhance a system’s capability to manage data quality proactively, by identifying and learning patterns in data errors and corrections. Using these learnings, ML can be applied to automate routine tasks like data cleaning, validation, and deduplication.

Data Deduplication

ML can be used to identify and remove duplicate entities. Rather than simply looking for exact matches, ML algorithms such as  natural language processing can identify duplicates even with minor variations, such as spelling mistakes or different formats.

Automated Validation

ML can be used to automate the data validation process. For a feature such as  automated rule suggestion, the system applies ML to understand underlying data and match relevant rules to this data. The process can be further enhanced by automatically deploying suggested rules using a human-in-the-loop approach, making the process faster and more efficient. 

 

Why Enterprises Are Embracing Augmented Data Quality

Augmented data quality is useful for any organization wanting to streamline its data quality management. Whether it’s for digital transformation or risk management, augmented data quality holds immense value. Here are a few examples of where our clients are seeing the value of augmented data quality:

 

Regulation and Compliance: Industries like healthcare and financial services are confronted with increasing regulatory changes. Yet, organizations often struggle to meet the demands of these regulations and must adapt quickly. By leveraging AI/ML methods to help identify data errors and ensure compliance with regulatory requirements, enterprises can efficiently minimize the potential risks associated with poor data quality. 
Use Cases: Single Customer View, Sanctions matching.

Business analytics: With complete, and consistent data, organizations can leverage analytics to generate accurate insights and gain a competitive edge in the market. Through AI/ML, data quality processes can be automated to quickly produce analytics and predict future trends within the data.
 Use Cases: Data preparation & Enrichment, Data & Analytics Governance.

Modern Data Strategy: Data quality is a foundational component of any modern data strategy, as data sources and business use cases expand. By leveraging augmented data quality within a modern data strategy, organizations can experience greater automation of manual processes, such as rule building and data profiling. 
Use Cases: Data Quality Monitoring & Remediation, Data Observability

Digital Transformation: Enterprise-wide digital transformation is taking place across all industries to generate more value from data assets. Automation plays a crucial role in enabling scalability, reducing costs, and optimizing efficiencies. 
Use Cases: Data Harmonization, Data Quality Firewall

Adopting augmented data quality within an organization represents a transformative step towards establishing a data-driven culture, where data becomes a trusted asset that drives innovation, growth, and success. The automation of process workflows reduces dependence on manual intervention, saving time and resources while enhancing efficiency and productivity. Moreover, augmented data quality increases accuracy, reliability, and compliance, enhancing customer experiences and improving an organization’s competitive advantage.

In conclusion, the seamless integration of augmented data quality within essential business areas offers significant benefits to organizations seeking to maximize the value of their data.

 

Find out more about Datactics Augmented Data Quality platform in the latest news from A-Team Data Management Insight.

The post What Is Augmented Data Quality And How Do You Use It? appeared first on Datactics.

]]>