Self-Service Data Quality Archives - Datactics https://www.datactics.com/tag/regmetrics/ Unlock your data's true potential Sun, 28 Jul 2024 22:33:54 +0000 en-GB hourly 1 https://wordpress.org/?v=6.7.2 https://www.datactics.com/wp-content/uploads/2023/01/DatacticsFavIconBluePink-150x150.png Self-Service Data Quality Archives - Datactics https://www.datactics.com/tag/regmetrics/ 32 32 The Importance of Data Quality in Machine Learning https://www.datactics.com/blog/the-importance-of-data-quality-in-machine-learning/ Mon, 18 Dec 2023 12:40:03 +0000 https://www.datactics.com/?p=18042 We are currently in an exciting area and time, where Machine Learning (ML) is applied across sectors from self driving cars to personalised medicine. Although ML models have been around for a while – for example, the use of algorithmic trading models from the 80’s, Bayes since 1700s – we are still in the nascent […]

The post The Importance of Data Quality in Machine Learning appeared first on Datactics.

]]>
the importance of data quality in machine learning

We are currently in an exciting area and time, where Machine Learning (ML) is applied across sectors from self driving cars to personalised medicine. Although ML models have been around for a while – for example, the use of algorithmic trading models from the 80’s, Bayes since 1700s – we are still in the nascent stages of productionising ML.

From a technical viewpoint, this is ‘Machine Learning Ops’ or MLOPs. MLOPs involve figuring out how to build, deploy via continuous integration and deployment, tracking and monitoring models and data in production. 

From a human, risk, and regulatory viewpoint we are grappling with big questions about ethical AI (Artificial Intelligence) systems and where and how they should be used. Areas including risk, privacy and security of data, accountability, fairness, adversarial AI, and what this means, all come into play in this topic. Additionally, the debate over supervised machine learning, semi-supervised learning, and unsupervised machine learning, brings further complexity to the mix.

Much of the focus is on the models themselves, such as OpenAI GPT-4.  Everyone can get their hands on pre-trained models or licensed APIs; What differentiates a good deployment is the data quality.

However, the one common theme that underpins all this work, is the rigour required in developing production-level systems and especially the data necessary to ensure they are reliable, accurate, and trustworthy. This is especially important for ML systems; the role that data and processes play; and the impact of poor-quality data on ML algorithms and learning models in the real world.

Data as a common theme 

If we shift our gaze from the model side to the data side, including:

  • Data management – what processes do I have to manage data end to end, especially generating accurate training data?
  • Data integrity – how am I ensuring I have high-quality data throughout?
  • Data cleansing and improvement – what am I doing to prevent bad data from reaching data scientists?
  • Dataset labeling – how am I avoiding the risk of unlabeled data?
  • Data preparation – what steps am I taking to ensure my data is data science-ready?

A far greater understanding of performance and model impact (consequences) could be achieved. However, this is often viewed as less glamorous or exciting work and, as such, is often unvalued. For example, what is the impetus for companies or individuals to invest at this level (such as regulatory – e.g. BCBS, financial, reputational, law)?

Yet, as well defined in research by Google,

“Data largely determines performance, fairness, robustness, safety, and scalability of AI systems…[yet] In practice, most organizations fail to create or meet any data quality standards, from under-valuing data work vis-a-vis model development.” 

This has a direct impact on people’s lives and society, where “…data quality carries an elevated significance in high-stakes AI due to its heightened downstream impact, impacting predictions like cancer detection, wildlife poaching, and loan allocations”.

What this looks like in practice

We have seen this in the past, with the exam predictions in the UK during Covid. In this case, teachers predicted the grades of their students, then an algorithm was applied to these predictions to downgrade any potential grade inflation by the Office of Qualifications and Examinations Regulation, using an algorithm. This algorithm was quite complex and non-transparent in the first instance. When the results were released, 39% of grades were downgraded. The algorithm captured the distribution of grades from previous years, the predicted distribution of grades for past students, and then the current year.

In practice, this meant that if you were a candidate who had performed well at GCSE, but attended a historically poor performing school, then it was challenging to achieve a top grade. Teachers had to rank their students in the class, resulting in a relative ranking system that could not equate to absolute performance. It meant that even if you were predicted a B, were ranked at fifteenth out of 30 in your class, and the pupil ranked at fifteenth the last three years received a C, you would likely get a C.

The application of this algorithm caused an uproar. Not least because schools with small class sizes – usually private, or fee-paying schools – were exempt from the algorithm resulting in the use of the teaching predicted grades. Additionally, it baked in past socioeconomic biases, benefitting underperforming students in affluent (and previously high-scoring) areas while suppressing the capabilities of high-performing students in lower-income regions.

A major lesson to learn from this, therefore, was transparency in the process and the data that was used.

An example from healthcare

Within the world of healthcare, it had an impact on ML cancer prediction with IBM’s ‘Watson for Oncology’, partnering with The University of Texas MD Anderson Cancer Center in 2013 to “uncover valuable insights from the cancer center’s rich patient and research databases”. The system was trained on a small number of hypothetical cancer patients, rather than real patient data. This resulted in erroneous and dangerous cancer treatment advice.

Significant questions that must be asked include:

  • Where did it go wrong here – certainly the data but in general a wider AI system?
  • Where was the risk assessment?
  • What testing was performed?
  • Where did responsibility and accountability reside?

Machine Learning practitioners know well the statistic that 80% of ML work is data preparation. Why then don’t we focus on this 80% effort and deploy a more systematic approach to ensure data quality is embedded in our systems, and considered important work to be performed by an ML team?

This is a view recently articulated by Andrew Ng who urges the ML community to be more data-centric and less model-centric. In fact, Andrew was able to demonstrate this using a steel sheets defect detection prediction use case whereby a deep learning computer vision model achieved a baseline performance of 76.2% accuracy. By addressing inconsistencies in the training dataset and correcting noisy or conflicting dataset labels, the classification performance reached 93.1%. Interestingly and compellingly from the perspective of this blog post, minimal performance gains were achieved addressing the model side alone.

Our view is, if data quality is a key limiting factor in ML performance –then let’s focus our efforts here on improving data quality, and can ML be deployed to address this? This is the central theme of the work the ML team at Datactics undertakes. Our focus is automating the manual, repetitive (often referred to as boring!) business processes of DQ and matching tasks, while embedding subject matter expertise into the process. To do this, most of our solutions employ a human-in-the-loop approach where we capture human decisions and expertise and use this to inform and re-train our models. Having this human expertise is essential in guiding the process and providing context improving the data and the data quality process. We are keen to free up clients from manual mundane tasks and instead use their expertise on tricky cases with simpler agree/disagree options.

To learn more about an AI-driven approach to Data Quality, read our press release about our Augmented Data Quality platform here. 

The post The Importance of Data Quality in Machine Learning appeared first on Datactics.

]]>
Data Quality fundamentals driving valuable Data Insights in Insurance https://www.datactics.com/blog/self-service-data-quality/data-quality-fundamentals-driving-valuable-data-insights-in-insurance/ Wed, 19 Jan 2022 14:09:55 +0000 https://www.datactics.com/?p=17754 Data in a Changing World The Insurance industry traditionally uses data to inform decision-making and manage growth and profitability across marketing, underwriting, pricing and policy servicing processes. However, like most established financial institutions, insurance companies have many data repositories and different teams managing analytics functions. Traditionally, they also struggle to share this information or communicate […]

The post Data Quality fundamentals driving valuable Data Insights in Insurance appeared first on Datactics.

]]>

Data in a Changing World

The Insurance industry traditionally uses data to inform decision-making and manage growth and profitability across marketing, underwriting, pricing and policy servicing processes. However, like most established financial institutions, insurance companies have many data repositories and different teams managing analytics functions. Traditionally, they also struggle to share this information or communicate with one another, with many organisations having their own processes for capturing data. These factors combine to cause poor quality and inconsistent data, creating barriers toward seamless integration.

The Insurance industry recognises the importance of maintaining a competitive edge, with many companies looking to adopt a ‘single platform’ approach using Cloud Services from AWS, Azure or Google in the short to medium term. Such a platform needs to be flexible to support different skill sets, react to changing market conditions and able to integrate alternative sources of data. Fundamental to this is the quality of data across different data sources, ensuring it is trusted, of a high degree of integrity, and complete for business decisioning purposes.

Challenges

Customer insights are isolated to silos and scattered across lines of business, functional areas and even channels. As a result, much of the work that surrounds the handling of data becomes manual and time consuming, with no common keys or even set definitions of key terms, i.e., ‘customer’. It is estimated that as much as 70% of a highly qualified analyst’s time is spent locating and fixing the data.

The challenge for Insurance companies is being able to recognize the same customer across product lines and/or at different stages of the policy lifecycle. Direct and agency channels may compete for the same customer or attract a high-risk prospect that was turned down previously by underwriting. Since the claims department data is not available to pricing and marketing to inform their decisions, the result is often extra expenditures and a larger than necessary marketing budget that could easily be streamlined should these inefficiencies be addressed. It also causes poor customer experiences, which harm the brand.

There is, however, a significant demand for customer-centric solutions which allow insurance companies to link different pieces of data about a customer. These solutions use Data Quality tools to match, merge and link records, creating a holistic view across product lines and throughout the policy lifecycle.

Customer-centric solutions help insurance companies realise important business goals, including more accurate targeting, longer retention, and better profitability.

Opportunity

Generating valuable insights from expanding data sets is becoming significantly harder. On top of this, leveraging the right technology, people and process to analyse data remains a key challenge for Executives. Prepping the data is often where the real heavy lifting is done and using Data Quality automation and a Self-Service approach can really benefit a company in terms of significantly reducing costs and accelerating decision making.

While the Insurance industry faces a plethora of challenges with data and analytics, it’s imperative that executives recognize that the quality of the data is fundamental to capitalising on market opportunities. By overcoming these barriers, the industry will be better prepared to embark on the next frontier of Data and Analytics (D&A).

About Datactics

Datactics helps Insurance companies drive valuable Data Insights, supports Operational Data needs and process, including Data Governance and Compliance & Regulation by removing roadblocks common in data management. We specialise in class-leading, self-service data quality and fuzzy matching software solutions, designed to empower business users who know the data to visualise and fix the data.

To have further conversations about the drivers and benefits of a Self-Service Data Quality platform in Insurance, book a quick call with Kieran Seaward.    

And for more from Datactics, find us on LinkedinTwitter, or Facebook.

The post Data Quality fundamentals driving valuable Data Insights in Insurance appeared first on Datactics.

]]>
Key Features a Self-Service DQ Platform Should Have https://www.datactics.com/blog/self-service-data-quality/key-features-a-self-service-dq-platform-should-have/ Fri, 14 Jan 2022 12:34:42 +0000 https://www.datactics.com/?p=17633 of this evolution is the establishment of ‘self-service’ data quality – whereby data owners and SMEs have ready access to robust tools and processes, to measure and maintain data quality themselves, in accordance with data governance
policies.

The post Key Features a Self-Service DQ Platform Should Have appeared first on Datactics.

]]>

The drivers and benefits of a holistic, self-service data quality platform | Part 2

To enable the evolution towards actionable insight from data, D&A platforms and processes must evolve too. At the core of this evolution is the establishment of ‘self-service’ data quality – whereby data owners and SMEs have ready access to robust tools and processes, to measure and maintain data quality themselves, in accordance with data governance
policies. From a business perspective such a self-service data quality platform must be:

❖ Powerful enough to enable business users and SMEs to perform complex data operations
without highly skilled technical assistance from IT
❖ Transparent, accountable and consistent enough to comply with firm wide data governance
policies
❖ Agile enough to quickly onboard new data sets and changing data quality demands of end
consumers such as AI and Machine learning algorithms
❖ Flexible and open so it integrates easily with existing data infrastructure investment without
requiring changes to architecture or strategy
❖ Advanced to make pragmatic use of AI and machine learning to minimize manual
intervention

This goes way beyond the scope of most stand-alone data prep tools and ‘home grown’ solutions that are often used as a tactical one-off measure for a particular data problem. Furthermore, for the self-service data quality platform to truly enable actionable data across the enterprise, it will need to provide some key technical functionality built-in:


• Transparent & Continuous Data Quality Measurement
Not only should it be easy for business users and SMEs to implement large numbers of data domain specific data quality rules, but also those rules should be simple to audit, and easily explainable, so that ‘DQ breaks’ can be easily explored and the root cause of the break established.

In addition to data around the actual breaks, a DQ platform should be able to produce DQ dashboards enabling drill-down from high level statistics down to actual failing data points and publish high level statistics into data governance systems.

• Powerful Data Matching – Entity Resolution for Single View and Data Enrichment
Finding hidden value in data or complying with regulation very often involves joining together several disparate data sets. For example, enhancing a Legal Entity Master Database with an LEI, screening customer accounts against sanctions and PEP lists for KYC, creating a single view of client from multiple data silos for GDPR or FSCS compliance. This goes further than simple deduplication of records or SQL joins – most data sets are messy and don’t have unique identifiers and so fuzzy matching of numerous string fields must be implemented to join one data set with another. Furthermore, efficient clustering algorithms are required to sniff out similar records from other disparate data sets in order to provide a single consolidated view across all silos.

• Integrated Data Remediation Incorporating Machine Learning 
It’s not enough just to flag up broken data, you also need a process and technology for fixing the breaks. Data quality platforms should have this built in so that after data quality measurement, broken data can be quarantined, data owners alerted and breaks automatically assigned to the relevant SMEs for remediation Interestingly, the manual remediation process lends itself very well to machine learning. The process of manually remediating data captures domain specific knowledge about the data – information that can be readily used by machine learning algorithms to streamline the resolution of similar breaks in the future and thus greatly reduce the overall time and effort spent on manual remediation. 

“The process of manually remediating data captures domain specific knowledge about the data – information that can be readily used by machine learning algorithms to streamline the resolution of similar breaks in the future”   

• Data Access Controls Across Teams and Datasets 
Almost any medium to large sized organization will have various forms of sensitive data, and policies for sharing that data within the organization e.g. ‘Chinese walls’ between one department and another. In order to enable integration across teams and disparate silos of data, granular access controls are required – especially inside the data remediation technology where sensitive data may be displayed to users. Data access permissions should be set automatically where possible (e.g. inheriting Active Directory attributes) and enforced when displaying data, for example by row- and field-level access control, and using data masking or obfuscation where appropriate. 

  • Audit Trails, Assigning and Tracking Performance 
    Providing business users with tools to fix data could cause additional headaches when it  comes to being able to understand who did what, when, why and whether or not it was the right thing to do. It stands to reason, therefore, that any remediation tool should have builtin capability to do just that with the associated performance of data break remediation 
    measured, tracked and managed. 
  • AI Ready 
    There’s no doubt that one of the biggest drivers of data quality is AI. AI data scientists can spend up to 80% of their time just preparing input data for machine learning algorithms, which is a huge waste of their expertise. A self-service data quality platform can address many of the data quality issues by providing ready access to tools and processes that can ensure a base level of quality and identify anomalies in data that may skew machine learning models. Furthermore the same self-service data quality tools can assist data scientists to generate metadata that can be used to inform machine learning models – such ‘Feature Engineering’ can be of real value when the data set is largely textual as it can generate numerical indicators which are more readily consumed by ML algorithms. 

“AI data scientists can spend up to 80% of their time just preparing input data for machine learning algorithms, which is a huge waste of their expertise”

To have further conversations about the drivers and benefits of a Self-Service Data Quality platform, please book a quick call with Kieran Seaward.    

And for more from Datactics, find us on LinkedinTwitter, or Facebook.

The post Key Features a Self-Service DQ Platform Should Have appeared first on Datactics.

]]>
The Changing Landscape of Data Quality https://www.datactics.com/blog/self-service-data-quality/the-changing-landscape-of-data-quality/ Thu, 13 Jan 2022 12:37:29 +0000 https://www.datactics.com/?p=17605 The drivers and benefits of a holistic, self-service data quality platform | Part 1 Change There has been increasing demand for higher and higher data quality in recent years – highly regulated sectors, such as banking have had a tsunami of financial regulations such as BCBS239, MiFID, FATCA, and many more stipulating or implying exacting […]

The post The Changing Landscape of Data Quality appeared first on Datactics.

]]>

The drivers and benefits of a holistic, self-service data quality platform | Part 1

Change

There has been increasing demand for higher and higher data quality in recent years – highly regulated sectors, such as banking have had a tsunami of financial regulations such as BCBS239, MiFID, FATCA, and many more stipulating or implying exacting standards for data and data processes. Meanwhile, there is a growing trend for more and more firms to become more Data and Analytics (D&A) driven, taking inspiration from Google & Facebook, to monetize their data assets.

This increased focus on D&A has been accelerated by easier and lower-cost access to artificial intelligence (AI), machine learning (ML), and business intelligence (BI) visualization technologies. However, in the now-waning hype of these technologies comes the pragmatic realization that unless there is a foundation of good quality reliable data, insights derived from AI and analytics may not be actionable. With AI and ML becoming more of a commodity, and a level playing field, the differentiator is in the data and the quality of the data.

“Unless there is a foundation of good quality reliable data, insights derived from AI and analytics may not be actionable”

Problems 

As the urgency for regulatory compliance or competitive advantage escalates, so too does the urgency for high data quality. A significant obstacle to quickly achieve high data quality is the variety of disciplines required to measure data quality, enrich data and fix data. By its nature, digital data, especially big data can require significant technical skills to manipulate and for this reason, was once the sole responsibility of IT functions within an organization. However, maintaining data also requires significant domain knowledge about the content of the data, and this domain knowledge resides with the subject matter experts (SMEs) who use the data, rather than a central IT function. Furthermore, each data set will have its own SMEs with special domain knowledge required to maintain the data, and a rapidly growing and changing number of data sets. If a central IT department is to maintain the quality of data correctly it must therefore liaise with an increasingly large number of data owners and SMEs in order to correctly implement DQ controls and remediation required. These demands create a huge drain on IT resources and a slow-moving backlog of data quality change requests within IT that simply can’t keep up. 

Given the explosion in data volumes, this model clearly won’t scale and so there is now a growing trend to move data quality operations away from central IT and back into the hands of data owners. While this move can greatly accelerate data quality and data onboarding processes, it can be difficult and expensive for data owners and SMEs to meet the technical challenges of maintaining and onboarding data. Furthermore, unless there is common governance around data quality across all data domains there stands the risk of a ‘wild west’ scenario, where every department manages data quality differently with different processes and technology. 

Opportunity

The application of data governance policies and the creation of an accountable Chief Data Officer (CDO) goes a long way to mitigate against the ‘wild west’ scenario. Data quality standards such as the Enterprise Data Management Council’s (EDMC) Data Capability Assessment Model (DCAM)1 provide opportunities to establish consistency in data quality measurement across the board.

The drive to capitalize on data assets for competitive advantage has had the result that the CDO function is quickly moving from an operational cost centre towards a product-centric profit centre. A recent publication by Gartner (30th July 2019) 2 describes three generations of CDO: “CDO 1.0” focused on data management; “CDO 2.0” embraced analytics; “CDO 3.0” assisted digital transformation, and Gartner now predicts a fourth, “CDO 4.0” focused on monetizing data-oriented products. Gartner’s research suggests that to enable this evolution, companies should strive to develop data and analytics platforms that scale across the entire company and this implies data quality platforms that scale too. 

To have further conversations about the drivers and benefits of a Self-Service Data Quality platform, book a quick call with Kieran Seaward.    

And for more from Datactics, find us on LinkedinTwitter, or Facebook.

The post The Changing Landscape of Data Quality appeared first on Datactics.

]]>
Using Self-Service Data Quality to Gain an Edge https://www.datactics.com/blog/ceo-vision/using-self-service-data-quality-to-gain-an-edge/ Mon, 29 Nov 2021 14:37:35 +0000 https://www.datactics.com/?p=17151 Demand for better quality data has never been higher. Stuart Harvey, CEO of Datactics, writes that self-service data quality could be the key that unlocks significant competitive advantage for financial firms.

The post Using Self-Service Data Quality to Gain an Edge appeared first on Datactics.

]]>

Amidst ever-changing regulatory requirements and hype around the potential of data-driven technologies, demand for better quality data in the financial industry has never been higher. Stuart Harvey, CEO of Datactics, writes that a self-service approach could be the key that unlocks significant competitive advantage for financial firms.

Demand for higher data quality in the financial industry has exploded in recent years. A tsunami of regulations such as BCB23, MiFID and FATCA stipulate exacting standards for data and data processes, causing headaches for compliance teams.

At the same time, financial firms are trying to grasp the fruitful benefits of becoming more data and analytics driven. They are embracing technologies such as artificial intelligence (AI) and machine learning (ML) to get ahead of their competitors.

Through their attempts at meeting regulatory requirements and gaining meaningful insights from these technologies, they are coming to the realisation that high quality, reliable data is absolutely critical – and extremely difficult to achieve.

But there is an evolution underway. At the core of this evolution is the establishment of ‘self-service’ data quality whereby data owners have ready access to robust tools and processes to measure and maintain data quality themselves, in accordance with data governance policies. This not only simplifies the measuring and maintenance of data quality; it can help turn it into a competitive advantage.

High quality data in demand

As the urgency for regulatory compliance and competitive advantage escalates, so too does the urgency for high data quality. But it’s not plain sailing and there is a variety of disciplines required to measure, enrich, and fix data.

Legacy data quality tools were traditionally owned by IT teams as by its very nature, digital data can require significant technical skills to manipulate. However, this created a bottleneck as maintaining data also requires significant knowledge about its content – what good data and bad data looks like, and what its context is – and this resides with those who use the data, rather than a central IT function.

Each data set will have its own users within a business who have the special domain knowledge required to maintain the data. If a central IT department is to maintain quality of data correctly, it must liaise with many of these business users to correctly implement the controls and remediation required. This creates a huge drain on IT resources and a slow-moving backlog of data quality change requirements within IT that simply can’t keep up.

Due to the lack of scalability of this method, many have come to the realisation that this isn’t the answer and so have started moving data quality operations away from central IT back into the hands of business users. 

This move can accelerate information measurement, improvement and onboarding processes, but it isn’t without flaws. It can be difficult and expensive for business users to meet the technical challenges of this task and unless there is common governance around data quality there is the risk of a ‘wild west’ scenario where every department manages data quality differently across the business.

Utilising self-service data quality platforms

Financial firms are maturing their data governance processes and shifting responsibility away from IT to centralised data management functions. These Chief Data Officer functions are seeking to centralise data quality controls while empowering data stewards who know and understand the business context of the data best. 

As part of this shift, they require tooling that matches the skills and capabilities of different profiles of user at each stage of the data quality process. And this is where self-service data quality platforms come into a league of their own. However, not all are made equal and there a few key attributes to look for.

For analysts and engineers, a self-service data quality platform needs to be able to provide a profiling and rules studio that enables the rapid profiling of data and configuring/editing of rules in a GUI. It must also offer a connectivity and automation GUI to enable DataOps to automate the process.

For business users, it needs to offer easy-to-understand visualisation or dashboard so they can view the quality of data within their domain, and an interface so they can remediate records which have failed a rule or check.

Agility is key to quickly onboard new datasets and the changing data quality demands of end consumers such as AI and ML algorithms. 

It should be flexible and open, so it integrates easily with existing data infrastructure investment without requiring changes to architecture or strategy and advanced enough to make pragmatic use of AI and machine learning to minimise manual intervention.

This goes way beyond the scope of most stand-alone data prep tools and ‘home grown’ solutions that are often used as a tactical one-off measure for a particular data problem.

Gaining an edge

The move towards a self-service oriented model for data quality is a logical way to keep up with the expanding volumes and varieties of information being discovered, accessed, stored or made available. However, data platforms need to be architected carefully to support the self-service model in accordance with data governance to avoid the ‘wild west’ scenario. 

Organisations that successfully embrace and implement such a platform are more likely to benefit from actionable data, resulting in deeper insight and in more efficient compliance, in turn unlocking significant competitive advantage.

And for more from Datactics, find us on LinkedinTwitter or Facebook.

The post Using Self-Service Data Quality to Gain an Edge appeared first on Datactics.

]]>
Introducing SSDQ: Centralise and standardise data quality processes without programming or coding https://www.datactics.com/blog/marketing-insights/introducing-ssdq/ Fri, 23 Oct 2020 09:35:42 +0000 https://www.datactics.com/?p=12757 What does the self-service data quality (SSDQ) platform do?   It empowers data owners and SMEs to measure and maintain data quality themselves in line with governance policies.  SSDQ is already adding value at multiple investment and retail banks, wealth managers, and data vendors helping them to:  Achieve end-to-end holistic data quality management   Measure data against industry-standard dimensions and regulations   Empower data stewards / […]

The post Introducing SSDQ: Centralise and standardise data quality processes without programming or coding appeared first on Datactics.

]]>
SSDQ

What does the self-service data quality (SSDQ) platform do?  

It empowers data owners and SMEs to measure and maintain data quality themselves in line with governance policies.  SSDQ is already adding value at multiple investment and retail banks, wealth managers, and data vendors helping them to: 

  • Achieve end-to-end holistic data quality management  
  • Measure data against industry-standard dimensions and regulations  
  • Empower data stewards / subject matter experts to remediate data  

How can SSDQ help?  

SSDQ connects to source systems and data repositories, measures the quality of the information, and automatically reports on the health of underlying data to data owners via interactive data quality dashboards. The platform’s Data Quality Clinic function offers business users the opportunity to explore the root cause of data quality breaks, make decisions, or fix records themselves. The system is powerful enough to enable business users and SMEs to perform complex data operations without highly skilled technical assistance from IT. It is also flexible and open, designed to integrate easily with existing data infrastructure, and agile enough to quickly onboard new data sets and meet changing data quality demands.  

Is the platform hard to learn how to use? 

SSDQ has been built with the non-technical business user in mind. We provide full training alongside our consultancy services so that, within a matter of weeks, we are confident that we can help you to become fully self-sufficient in rule building, creating workflows and automations and fixing your broken data.  

What are the features?  

  • Out of the box rules & configuration logic  
  • Built-in connectivity to multiple data sources 
  • Interactive data quality dashboards built in off-the shelf tools such as PowerBI, Tableau, and Qlik 
  • Integrated data remediation 
  • Machine learning augmented recommendations on data matches  
  • Data access controls so that the right people see the data they are supposed to see  
  • Audit trails, assigning and tracking performance of rule break remediation over time 

What are the benefits of the SSDQ platform?  

SSDQ provides users with large numbers of data domain-specific rules that have been proven at many firms.  

  • Plug & Play options (e.g. REST API, ODBC/JDBC, File system, Cloud…) means you can rapidly add new data sources  
  • Easily drill-down from high-level statistics to actual failing data points in off-the-shelf visualisation tools, and publish statistics to governance systems  
  • Empower business users to fix identified data quality breaks themselves – so that those who know the data can fix the data 
  • Reduce and streamline the process of manually remediating data 
  • Audit trail of changes to data provides the ability to understand who did what, when, and why.  
  • Data owners no longer have to wait for IT to prioritise their data quality rule request alongside countless other IT tickets.  
  • Chief Data Officers, Heads of Data and other senior data leaders can gain rapid insights into the underlying health of their data, as well as a track record over time, improving the quality of information they report and can use for business purposes 

Who is the platform for?  

The self-service data quality platform is particularly helpful for Chief Data Officers, Head of Data roles, Data Governance leads, and Data Stewards. The platform is designed to empower those who know the data to manage and fix the data.  

To have a conversation about how the self-service data quality platform can help you to manage your data, contact our Head of Sales, Kieran Seaward today.   

Connect with Kieran on LinkedIn 

The post Introducing SSDQ: Centralise and standardise data quality processes without programming or coding appeared first on Datactics.

]]>
Part 2: Self-service data improvement is the route to better data quality https://www.datactics.com/blog/marketing-insights/new-self-service-data-improvement-is-the-route-to-better-data-quality/ Thu, 08 Oct 2020 12:00:37 +0000 https://www.datactics.com/new-self-service-data-improvement-is-the-route-to-better-data-quality/ The route to better data quality – It’s easy to say that planning a journey has been made far simpler since the introduction of live traffic information to navigation apps. You can now either get there faster, or at the very least phone ahead to explain how long you’ll be delayed. It’s just as easy […]

The post Part 2: Self-service data improvement is the route to better data quality appeared first on Datactics.

]]>
The route to better data quality – It’s easy to say that planning a journey has been made far simpler since the introduction of live traffic information to navigation apps. You can now either get there faster, or at the very least phone ahead to explain how long you’ll be delayed.

This image has an empty alt attribute; its file name is Retail-Banking-Part-2-1-1-1024x1024.png

It’s just as easy to say that we wouldn’t think of ignoring this kind of data. Last week’s blog looked at the reasons for why measuring data is important for retail banks, but unless there is a strategy taken to react to the results it’s arguably pretty much meaningless.

Internal product owners, risk and compliance teams all need to use specific and robust data measurements for analytics and innovation; to identify and serve customers; and to comply with the reams of rules and regulations handed down by regulatory bodies. Having identified a way of scoring the data, it would be equally as bizarre to ignore the results.

However, navigating a smooth path in data management is hampered by the landscape being vast, unchartered and increasingly archaic. Many executives of incumbent banks are rightly worried about the stability of their ageing systems and are finding themselves ill-equipped for a digital marketplace that is evolving with ever-increasing speed.

Key business aims of using data to achieve necessary cost-savings, and grow revenues through intelligent analytics, snarl up against the sheer volume of human and financial resources needing to be ploughed into these systems, in an effort to meet stringent regulatory requirements and to reduce the customer impact, regulatory pressure and painful bad press caused by an IT outage.

Meanwhile, for those who have them, data metrics are revealing quality problems, and fixing these issues tends to find its way into a once-off project that relies heavily on manual rules and even more manual re-keying into core systems. Very often, such projects have no capacity to continue that analysis and remediation or augmentation into the future, and overtime data that has been fixed at huge cost starts to decay again and the same cycle emerges.

But if your subject matter experts (SMEs) –  your regulatory compliance specialists, product owners, marketing analytics professionals – could have cost-effective access to their data, it could put perfecting data in the hands of those who know what the data should look like and how it can be fixed.

If you install a targeted solution that can access external reference data sources, internal standards such as your data dictionary, and user and department-level information to identify the data owner, you can self-serve to fix the problems as they arise.

This can be done via a combination of SME review and through machine learning technology that evolves to apply remedial activities automatically because the rules created through correcting broken records can contain the information required to fix other records that fail the same rules.

It might sound like futuristic hype – because AI is so hot right now – but this is a very practical example of how new technology can address a real and immediate problem, and in doing so complement the bank’s overarching data governance framework.

It means that the constant push towards optimised customer journeys and propositions, increased regulatory compliance, and IT transformation can rely on regularly-perfected data at a granular, departmental level, rather than lifting and dropping compromised or out-of-date datasets.

Then the current frustration at delays in simply getting to use data can be avoided, and cost-effective, meaningful results for the business can be delivered in days or weeks rather than months or years.

Head over the next part: ‘Build vs Buy – Off-the-shelf or do-it-yourself? ‘ or click here for part 1 of this blog, covering the need for data quality metrics in retail banking.

This image has an empty alt attribute; its file name is part-1-1024x594.png

Matt Flenley is currently plying his trade as chief analogy provider at Datactics. If your data quality is keeping you awake at night, check out Self-Service Data Quality™ our award-winning interactive data quality analysis and reporting tool that is built to be used by business teams who aren’t necessarily programmers.

The post Part 2: Self-service data improvement is the route to better data quality appeared first on Datactics.

]]>
Datactics launches new website: Democratising Data Quality https://www.datactics.com/press-releases/datactics-launches-new-website/ Mon, 28 Sep 2020 12:18:58 +0000 https://www.datactics.com/?p=11730 Belfast, London, New York, 28th September 2020 We are delighted to announce the launch of our brand new website! Our Marketing team has spent the summer working on a major revamp of the site, designed to make it easy for people to find information on our self-service data quality solutions, and the award-winning technology they’re built […]

The post Datactics launches new website: Democratising Data Quality appeared first on Datactics.

]]>
Belfast, London, New York, 28th September 2020

We are delighted to announce the launch of our brand new website!

Our Marketing team has spent the summer working on a major revamp of the site, designed to make it easy for people to find information on our self-service data quality solutions, and the award-winning technology they’re built on.

From the homepage you can now easily access information on our work in improving data quality in financial services, and in government and policing. You can also find lots of the answers to questions we are most frequently asked when it comes to specifics about our technology.

Deeper into the site, in the Datablog, you can find cutting-edge thought leadership material. Featuring blogs from our technologists and practitioners, introductions to the team, and thoughts on the market, there’s plenty here to get stuck into! It’s worth bookmarking as we regularly update this content with articles, company announcements and client successes, and more about the fast-growing team here at Datactics.

CEO Stuart Harvey said,

“We are very pleased to present our new website with its theme – “Democratising Data Quality”. At Datactics we are passionate about this topic. It resonates with many recent client projects where we see that the best person to monitor and fix broken data is the owner of that data. Many of our clients are front-office business people who would love to do this, but don’t have the necessary internal IT resource to make it happen. We aim to share user stories of how to empower data owners who are not programmers to practically address the issue of improving data quality and matching in their businesses.”

For any questions, suggestions, feedback or comments, please contact us.

The post Datactics launches new website: Democratising Data Quality appeared first on Datactics.

]]>
Tackling Practical Challenges of a Data Management Programme https://www.datactics.com/blog/good-data-culture/good-data-culture-facing-down-practical-challenges/ Mon, 03 Aug 2020 13:58:40 +0000 https://www.datactics.com/?p=5916 “Nobody said it was easy” sang Chris Martin, in Coldplay’s love song from a scientist to the girl he was neglecting. The same could be said of data scientists embarking on a data management programme! In his previous blog on Good Data Culture, our Head of Client Services, Luca Rovesti, discussed taking first steps on […]

The post Tackling Practical Challenges of a Data Management Programme appeared first on Datactics.

]]>
Nobody said it was easy” sang Chris Martin, in Coldplay’s love song from a scientist to the girl he was neglecting. The same could be said of data scientists embarking on a data management programme!

data culture

In his previous blog on Good Data Culture, our Head of Client Services, Luca Rovesti, discussed taking first steps on the road to data maturity and how to build a data culture. This time he’s taking a look at some of the biggest challenges of Data Management that arise once those first steps have been made – and how to overcome them. Want to see more on this topic? Head here.

One benefit of being part of a fast-growing company is the sheer volume and type of projects that we get to be involved in, and the wide range of experiences – successful and less so – that we can witness in a short amount of time.

Without a doubt, the most important challenge that rears its head on the data management journey is around complexity. There are so many systems, business processes and requirements of enterprise data that it can be hard to make sense of it all.

Those who get out of the woods fastest are the ones who recognise that there is no magical way of solving things that must be done.

A good example would be the creation of data quality rule dictionaries to play a part in your data governance journey.

data management programme

Firstly, there is no way that you will know what you need to do as part of your data driven culture efforts unless you go through what you have got.

Although technology can give us a helpful hand in the heavy lifting of raw data, from discovery to categorisation of data sets (data catalogues), the definition of domain-specific rules always requires a degree of human expertise and understanding of the exception management framework.

Subsequently, getting data owners and technical people to contribute to a shared plan that takes the uses of the data and how the technology will fit in is a crucial step in detailing the tasks, problems and activities that will deliver the programme.

Clients we have been talking to are experts in their subject areas. However, they don’t know what “best of breed” software and data management systems can deliver. Sometimes, clients find it hard to express what they want to achieve beyond a light-touch digitalisation of a human or semi-automated machine learning process.

data management

The most important thing that we’ve learned along the way is that the best chance of success in delivering a data management programme involves using a technology framework that is both proven in its resilience and flexible in how it can fit into a complex deployment.

From the early days of ‘RegMetrics’ – a version of our data quality software that was configured for regulatory rules and pushing breaks into a regulatory reporting platform – we could see how a repeatable, modularised framework provided huge advantages in speed of deployment and positive outcomes in terms of making business decisions.

Using our clients’ experiences and demands of technology, we’ve developed a deployment framework that enables rapid delivery of data quality measurement and remediation processes, providing results to senior management that can answer the most significant question in data quality management: what is the return on investing in my big data?

This framework has enabled us to be perfectly equipped to provide expertise on the technology that marries our clients’ business knowledge:

  • Business user-focused low-code tooling connecting data subject matter experts with powerful tooling to build rules and deploy projects
  • Customisable automation that integrates with any type of data source, internal or external
  • Remediation clinic so that those who know the data can fix the data efficiently
  • “Chief Data Officer” dashboards provided by integration into off-the-shelf visualisation tools such as Qlik, Tableau, and PowerBI.

Being so close to our clients also means that they have a great deal of exposure and involvement in our development journey.

We have them ‘at the table’ when it comes to feature enhancements, partnering with them rather than sell and move on, and involving them in our regular Guest Summit events to foster a sense of the wider Datactics community.

It’s a good point to leave this blog, actually, as next time I’ll go into some of those developments and integrations of our “self-service data quality” platform with our data discovery and matching capabilities.

Click here for the latest news from Datactics, or find us on LinkedinTwitter or Facebook

The post Tackling Practical Challenges of a Data Management Programme appeared first on Datactics.

]]>
Data Governance or Data Quality: not always a ‘chicken & egg’ problem https://www.datactics.com/blog/sales-insider/market-insights-data-governance-or-data-quality-not-always-a-chicken-egg-problem/ Thu, 18 Jun 2020 13:00:33 +0000 https://www.datactics.com/market-insights-data-quality-vs-data-governance-not-always-a-chicken-egg-problem/ In this  blog with Datactics’ Head of Sales, Kieran Seaward, we dive into market insights and the sometimes-thorny issue of where to start. Data Governance or Data Quality is a problem data managers and users will fully understand, and Kieran’s approach to this is influenced by thousands of hours of conversation with people at all […]

The post Data Governance or Data Quality: not always a ‘chicken & egg’ problem appeared first on Datactics.

]]>
In this  blog with Datactics’ Head of Sales, Kieran Seaward, we dive into market insights and the sometimes-thorny issue of where to start.

kieran seaward market insights

Data Governance or Data Quality is a problem data managers and users will fully understand, and Kieran’s approach to this is influenced by thousands of hours of conversation with people at all stages of the process, all unified in the desire to get the data right and build a data culture around quality and efficiency. 

Following hot on the heels of banks, we are seeing a lot of buy-side and insurance firms on the road to data maturity and taking a more strategic approach to data quality and data governance, which is great.  Undertaking a data maturity assessment internally can throw up some much-needed areas of improvement regarding an organization’s data, from establishing a data governance framework, to updating existing data quality initiatives and improving data integrity. 

From what I hear, the “data quality or governance first?” conundrum is commonly debated by most firms, regardless of what stage they are at in a data programme rollout.

Business decisions are typically influenced by the need to either prioritise ‘top-down’ data governance activities such as creating a data dictionary and business glossary, or ‘bottom-up’ data quality activities such as measurement and remediation of company data assets as they exist today from data sources.  However, achieving a data driven culture relies on both these initiatives existing concurrently. 

In my opinion, these data strategies are not in conflict but complementary and can be tackled in any order, so long as the ultimate goal is a fully unified approach.  

I could be biased and say those market insights derived from data quality activities can help form the basis of definitions and terms typically stored in governance systems: 

data quality or data governance

Figure 1 – Data Quality first

However, the same can be said inversely, data quality systems can benefit from having critical data elements defined and metadata definitions to help shape measurement rules that need to be applied: 

data quality or data governance

Figure 2 – Data Governance first

The ideal complementary state is that of Data Governance + Data Quality working in perfect unison, i.e. :

  • Data Governance system that contains all identified critical data elements as well as definitions to help determine which Data Quality validation rules are applied to ensure they meet the definitions;
  • Data Quality platform that validates data elements and connects to the governance catalogue to understand who the responsible data scientist or data steward is, in order to push data to them for review and/or remediation of data quality issues.
    The quality platform can then push data quality metrics back into the governance front-end that acts as the central hub/visualization layer displaying data visuals. This either renders data itself or through connectivity to third parties such as Microsoft PowerBI, Tableau, or Qlik. 

data quality or data governance

Figure 3 – The ideal, balanced state

In the real world, this decision can’t be made in isolation of what the business is doing right now with the information they rely on:

  • Regulatory reporting teams have to build, update and reconfigure reports in increasingly tighter timeframes.
  • Data analytics teams are relying on smarter models for prediction and intelligence in order to perform accurate data analysis.
  • Risk committees are seeking access to data for the client, investor, and board reporting.  

If the quality of this information can’t be guaranteed, or breaks can’t be easily identified and fixed, all of these teams will keep coming back to IT asking for custom rules, sucking up much-needed programming resources.

Then when an under-pressure IT can’t deliver in time, or the requests are conflicting with one another, the teams will resort to building in SQL or trying to do it via everyone’s favourite DIY tool, Excel. 

Wherever firms are on their data maturity model or data governance programme, data quality is of paramount importance and can easily run first, last or in parallel. This is something we are used to helping clients and prospects with at various points along that journey, whether it’s using our self-service data quality & matching platform to drive better data into a regulatory reporting requirement, or facilitating a broad vision to equip an internal “data quality as-a-service” function.

My colleague Luca Rovesti, who heads up our Client Services team, goes more into this in Good Data Culture

I’ll be back soon to talk about probably the number one question thrown in at the end of every demo of our software:

What are you doing about AI?

Click here for the latest news from Datactics, or find us on LinkedinTwitter or Facebook

The post Data Governance or Data Quality: not always a ‘chicken & egg’ problem appeared first on Datactics.

]]>
No-code & Lo-code: A Lighter Way To Enjoy Tech? https://www.datactics.com/blog/cto-vision/no-code-lo-code/ Fri, 12 Jun 2020 09:20:54 +0000 https://www.datactics.com/cto-vision-nocode-locode/ In this article with Datactics CTO Alex Brown, Matt Flenley asks about the nature of no-code and lo-code platforms like Datactics’ Self-Service Data Quality, and whether they really are a lighter way to enjoy technology?  The lo-code no-code paradigm can be a bit like Marmite. Some people say that it’s great, it gets the job done, and these are […]

The post No-code & Lo-code: A Lighter Way To Enjoy Tech? appeared first on Datactics.

]]>

In this article with Datactics CTO Alex Brown, Matt Flenley asks about the nature of no-code and lo-code platforms like Datactics’ Self-Service Data Quality, and whether they really are a lighter way to enjoy technology? 

The lo-code no-code paradigm can be a bit like Marmite. Some people say that it’s great, it gets the job done, and these are usually the business subject matter experts who are used to Excel, especially in banks and large government organisations where that’s the standard data handling tool in use. Technical people, such as software developers who are  fluent in programming languages and disciplines, look on aghast at these blocks of functionality that are being chained together in macro-enabled workbooks because they quickly evolve to become monstersThese monsters become very expensive if not impossible to maintain when, inevitably, changes are required to support a change in the development environment and data formats.  

The perfect combination for these technical people is something that fits in with the IT rigour around release schedules, documentation, and testing – and just good practices in how you build stuff, making them robust and reusable.

Creating application that are well-tested and can be reused in other projects are quicker and easier to use for new projects, with a product at the end that is more stable. The whole modular approach is where the Datactics self-service platform has been built: reusable components that can be recycled and customised for rapid, lowrisk development and deployment within a user-friendly lo-code interface.  

From a business point of view, the driving force behind the lo-code, the no-code approach is about a tactical way to address specific problems, where the existing infrastructure isn’t delivering what the business needs but the business users aren’t technical coders. For example, a bank or financial firm might need to capture an additional piece of information to meet a regulatory requirement. They might design and provide a webform or something similar that captures and relays the data into a datastore, and then into the firm’s regulatory reporting framework. This all plays a part in developing efficient business process management. 

This is where no/lo-code comes in as it allows you to do this kind of thing very quickly – those kinds of ad-hoc changes you might need to do to meet a specific deadline or requirement. 

The demand for this will only increase in a post-COVID-19 environmentFor instance, one of our clients mentioned that at the start of the UK lockdown phase they needed to rapidly understand what the state of their email addresses was for all their customers to whom they’d usually write by post. Their data team of professional developers had rules built in under two hours and a fully operational interactive dashboard a day later that their Risk committee could review and track data quality issues and how quickly they were being fixed.

Our Self-Service Data Quality platform, for example, is easily used to address the tactical need for data quality or matching without writing any code, or waiting for central IT to run queries. You’ve all the drag & drop capability to build rules, data pipelines, matching algorithms and so on without the need for writing any code, allowing you to do a specific job really quite quickly. Platforms like this are extremely good at these tactical use cases where you don’t want to rip out and rewrite your existing infrastructure, you just need to do this little add-on job to make it complete to meet a regulatory reporting requirement or specific business requirement. 

Because our platform doesn’t force you to use a particular persistence layer or anything like that, it’s all API-driven and sits on whatever Master Data Management platform that you have, it makes it a really flexible tool that is well-suited to these tactical use cases.

This means that the total cost of ownership for firms is far lower because lo-code platforms offer a wide range of extensibility to multiple downstream use cases. Things like regulatory compliance, emerging risks, custom data matching or even migration projects are the perfect situations where one self-service platform can be leveraged for all these things without causing huge delays in IT ticketing processes, or multiple conflicting requests hitting the central IT team all at once. 

Ultimately, lo or no-code solutions are likely to thrive as business teams discover that they can get to use the firm’s data assets themselves for faster resultswithout tying their IT teams up in knots.

The post No-code & Lo-code: A Lighter Way To Enjoy Tech? appeared first on Datactics.

]]>
Explainable AI with Dr. Fiona Browne https://www.datactics.com/blog/ai-ml/blog-ai-explainability/ Tue, 26 May 2020 18:19:57 +0000 https://www.datactics.com/blog-ai-explainable/ The AI team at Datactics is building explainability from the ground up and demonstrating the “why and how” behind predictive models for client projects. Matt Flenley prepared to open his brains to a rapid education session from Dr Fiona Browne and Kaixi Yang. One of the most hotly debated tech topics of 2020 concerns model […]

The post Explainable AI with Dr. Fiona Browne appeared first on Datactics.

]]>
Dr Fiona Browne, Datactics, discusses Explainable AI

The AI team at Datactics is building explainability from the ground up and demonstrating the “why and how” behind predictive models for client projects.

Matt Flenley prepared to open his brains to a rapid education session from Dr Fiona Browne and Kaixi Yang.

One of the most hotly debated tech topics of 2020 concerns model interpretability, that is to say, the rationale of how an ML algorithm has made a decision or prediction. Nobody doubts that AI can deliver astonishing advances in capability and corresponding efficiencies in an effort, but as HSBC’s Chief Data Officer Lorraine Waters shared at a recent A-Team event, “is it creepy to do this?” Numerous agendas at conferences are filled with differing rationales for interpretability and explainability of models, whether business-driven, consumer-driven, or regulatory frameworks to enforce good behaviour, but these are typically ethical conversations first rather than technological ones. It’s clear we need to ensure technology is “in the room” on all of these drivers.

We need to be informed and guided by technology to see what tools are already available to help with understanding AI decision-making, how tech can help shed light on ‘black boxes’ just as much as we’re dreaming up possibilities for the use of those black boxes.

As Head of Datactics’ AI team, Dr Fiona Browne has a strong desire for what she calls ‘baked-in explainability’. Her colleague Kaixi Yang explains more about explainable models, 

Some algorithms, such as neural networks (deep learning), are complex. Functions are calculated through approximation, from the network’s structure it is unclear how this approximation is determined. We need to understand the rationale behind the model’s prediction so that we can decide when or even whether to trust the model’s prediction, turning black boxes into glass boxes within data science.

The team puts their ‘explain first‘ approach to a specific client project to build explainable Artificial Intelligence (XAI) from the ground up, using explainability metrics including LIME – a local, interpretable, model-agnostic way of explaining individual predictions.

“Model-agnostic explanations are important because they can be applied to a wide range of ML classifiers, such as neural networks, random forests, or support vector machines” continued Ms Yang, who has recently joined Datactics after completing an MSc in Data Analytics with Queen’s University in Belfast. “They help to explain the predictions of any machine learning classifier and evaluate its usefulness in various tasks related to trust”.

For the work the team has been conducting, these range of explainability measures provides them with the ability to choose the most appropriate Machine Learning model and AI systems, not just the one that makes the most accurate predictions based on evaluation scores. This has had a significant impact on their work on Entity Resolution for Know Your Customer (KYC) processes, a classic problem of large, messy datasets that are hard to match, with painful penalties if it goes wrong for human users. The project, which is detailed in a recent webinar hosted with the Enterprise Data Management Council, matched entities from the Refinitiv PermID and Global LEI Foundation’s datasets and relied on human validation of rule-based matches to train a machine learning algorithm.

Dr Browne again: “We applied different explainability metrics to three different classifiers that could predict whether a legal entity would match or not. We trained, validated and tested the models using an entity resolution dataset. For this analysis we selected  two ‘black-box”’classifiers, and one interpretable classifier to illustrate how the explainability metrics were entirely agnostic and applicable regardless of the classifier that was chosen.”

The results are shown here:

explainability metrics in AI and ML

“In a regular ML conversation, these results indicate two reliably accurate models that could be deployed in production,” continued Dr Browne, “but in an XAI world we want to shed light on how appropriate those models are.”

By applying, for example, LIME to a random instance in the dataset, the team can uncover the rationale behind the predictions made. Datactics’ FlowDesigner rules studio automatically labelled this record as “not a match” through its configurable fuzzy matching engines.

Dr Browne continued, “explainability methods build an interpretable classifier based on similar instances to the selected instance from the different classifiers and summarises the features which are driving this prediction. It selects those instances that are quite close to the predicted instance, depending on the model that’s been built, and uses those predictions from the black-box model to build a glass-box model, where you can then describe what’s happening.

prediction probabilities in AI

In this case, for the Random Forest model (fig.), the label has been correctly predicted as 0 (not a match) and LIME exposes the features driving this decision. The prediction is supported by two key features but not a feature based on entity name which we know is important

Using LIME on the multilayer perceptron model (fig.), which had the same accuracy as Random Forest, it correctly predicted the “0” label of “not a match” but with a lower support score. It has been supported by slightly different features compared to the random forest model.

prediction probabilities in AI

The Naïve Bayesian model was different altogether. “It fully predicted the correct label of zero with a prediction confidence of one, the highest confidence possible,” said Dr Browne, “however it’s made this prediction supported by only one feature, a match on the entity country, disregarding all other features. This would lead you to doubt whether it’s reliable as a prediction model.”

This has significant implications in something as riddled with differences in data fields as KYC data. People and businesses move, directors and beneficial owners resign, and new ones are appointed, and that’s without considering ‘bad actors’ who are trying to hoodwink Anti-Money Laundering (AML) systems. 

explainability and interpretability in AI

The process of ‘phoenixing’, where a new entity rises from the ashes of a failed one, intentionally dodging the liabilities of the previous incarnation, frequently relies on truncations or mis-spellings of director’s names to avoid linking the new entity with the previous one. 

Any ML model being used on such a dataset would need to have this explainability baked-in to understand the reliability of predictions that the data is informing.

Using one explainability metric only is not good practice. Dr Browne explains Datactics’ approach: “Just as in classifiers, there’s no real best evaluation approach or explainer to pick; the best way is to choose a number of different models and metrics to try to describe what’s happening .There are always pros and cons, ranging from the scope of the explainer to stability of the code to complexity of the model and how and when it’s configured.”

These technological disciplines, to test, evaluate and try to understand a problem are a crucial part of the entire conversation that businesses are having at an ethical or “risk appetite” level.

Click here for more from Datactics, or find us on LinkedinTwitter or Facebook for the latest news.

The post Explainable AI with Dr. Fiona Browne appeared first on Datactics.

]]>
AI Bias – The Future is Accidentally Biased? https://www.datactics.com/blog/marketing-insights/ai-bias-the-future-is-accidentally-biased/ Fri, 15 May 2020 10:21:51 +0000 https://www.datactics.com/ai-bias-the-future-is-accidentally-biased/ AI Bias Every now and then a run-of-the-mill activity makes you sit up and take notice of something bigger than the task you’re working on, a sort of out-of-body experience where you see the macro instead of the micro. Yesterday was one such day. I’d had a pretty normal one of keeping across all the […]

The post AI Bias – The Future is Accidentally Biased? appeared first on Datactics.

]]>
AI Bias

Every now and then a run-of-the-mill activity makes you sit up and take notice of something bigger than the task you’re working on, a sort of out-of-body experience where you see the macro instead of the micro.

AI Bias - The future is accidentally biased?

Yesterday was one such day. I’d had a pretty normal one of keeping across all the usual priorities and Teams calls, figuring out our editorial calendar and the upcoming webinars, all the while refreshing some buyer and user personas for our Self-Service Data Quality platform.

Buyer personas themselves are hardly a new thing, and they’re typically represented by an icon or avatar of the buyer or user themselves. This time, rather than pile all our hopes, dreams and expectations into a bunch of cartoons, I figured I’d experiment a little. Back in January I’d been to an AI conference run by AIBE, where I’d heard about Generative Adversarial Networks (GANs) and the ability to use AI to create images of pretty much anything.

Being someone who likes to use tech first and ask questions later, I headed over to the always entertaining thispersondoesnotexist.com where GANs do a pretty stellar job of creating highly plausible-looking people who don’t exist (with some amusing if mildly perturbing issues at the limitations of its capability!). I clicked away, refreshing the page and copying people into my persona template, assigning our typical roles of Chief Data Officer, Data Steward, Chief Risk Officer and so on; it wasn’t until I found myself pasting them in that I realised how hard it was to generate images of people who were not white. Or indeed how it was impossible to generate anyone with a disability or a degenerative condition.

Buyer personas are supposed to reflect all aspects of likely users of the technology, yet this example of AI would unintentionally bias our product and market research activities to overlook people who did not conform to the AI’s model. My colleague Raghad Al-Shabandar wrote about this recently (published today, incidentally), and I think probably the most impactful part of this, for me, anyway, was the following quote:

The question, then, is developing models for the society we wish to inhabit, not merely replicating the society we have.

In the website’s case, it’s even worse: it obliterates the society we currently have, by creating images that don’t reflect the diversity of reality, instead layering on an expected or predicted society that is over 50% white and 0% otherwise-abled.

I should make it clear that I’m a big fan of this tech, not least for the bafflement my kids have at the non-existence of a person who looks very much like a person! But at the same time, I think it perhaps exposes the risk all AI projects have – did we really think of every angle about what society looks like today, and did we consider how society ought to look?

These are subjective points that vary wildly from culture to culture and country to country, but we must ensure that every minority and element of diversity is in the room when we’re making such decisions or we risk baking-in bias before we’ve even begun.

Click here for the latest news from Datactics, or find us on LinkedinTwitter or Facebook

The post AI Bias – The Future is Accidentally Biased? appeared first on Datactics.

]]>
Self-Service Data Quality for DataOps https://www.datactics.com/blog/ceo-vision/ceo-vision-self-service-data-quality-for-dataops/ Tue, 05 May 2020 11:12:48 +0000 https://www.datactics.com/ceo-vision-self-service-data-quality-for-dataops/ At the recent A-Team Data Management Summit Virtual, Datactics CEO Stuart Harvey delivered a keynote on “Self-Service Data Quality for DataOps – Why it’s the next big thing in financial services.” The keynote (available here) can be read below, with slides from the keynote included for reference. Should you wish to discuss the subject with […]

The post Self-Service Data Quality for DataOps appeared first on Datactics.

]]>
At the recent A-Team Data Management Summit Virtual, Datactics CEO Stuart Harvey delivered a keynote onSelf-Service Data Quality for DataOps – Why it’s the next big thing in financial services.” The keynote (available here) can be read below, with slides from the keynote included for reference. Should you wish to discuss the subject with us, please don’t hesitate to contact Stuart, or Kieran Seaward, Head of Sales.  

I started work in banking in the 90’s as a programmer, developing real-time software systems written in C++. In these good old days, I’d be given a specification, I’d write some code, test and document it. After a few weeks it would be deployed on the trading floor. If my software broke or the requirements changed it would come back to me and I’d start this process all over again. This ‘waterfall’ approach was slow and, if I’m honest, apart from the professional pride of not wanting to create buggy code, I didn’t feel a lot of ownership for what I’d created. 

In the last five years a new methodology in software engineering has changed all that – it’s called DevOps, and brings a very strategic and agile approach to building new software.

More recently DevOps had a baby sister called DataOps, and it’s this subject that I’d like to talk about today.

Many Chief Data Officers (CDO) and analysts have been impressed by the increased productivity and agility their Chief Technology Officer (CTO) colleagues are seeing through the use of DevOps. Now they’d like to get in on the act. In the last few months at Datactics we’ve been talking a lot to CDO clients about their desire to have a more agile approach to data governance and how DataOps fits into this picture.  

In these conversations we’ve talked a great deal about the ownership of data. A key question is how to associate the measurement and fixing of a piece of broken data with the person most closely responsible for it. In our experience the owner of a piece of data usually makes the best data steward. These are the people who can positively affect business outcomes through accurate measuring and monitoring of data and is typically a CDO’s role. 

We have seen a strong desire to push data science processes, including data governance and the measurement of actual data quality (at a record level) into the processes and automation that exist in a bank. 

I’d like to share with you through some simple examples of what we are doing with our investment bank and wealth management clients. I hope that this shows that a self-service approach to data quality (with appropriate tooling) can empower highly agile data quality measurement for any company wishing to implement the standard DataOps processes of validation, sorting, aggregation, reporting and reconciliation. 

Roles in DataOps and Data Quality 

We work closely with the people who use the Datactics platform, the people that are responsible for the governance of data and reporting on its quality. They have titles like Chief Data Officer, Data Quality Manager, Chief Digital Officer and Head of Regulation. These data consumers are responsible for large volumes of often messy data relating to entities, counterparties, financial reference data and transactions. This data does not reside in just one place; it transitions through multiple bank processes. It is sometimes “at rest” in a data store and sometimes “in motion” as it passes via Extract, Transform, Load (ETL) processes to other systems that live upstream of the point at which it was sourced.  

For example, a bank might download counterparty information from Companies House to populate its Legal Entity Master. This data is then published out to multiple consuming applications for Know Your Customer (KYC), Anti-Money Laundering (AML) and Life Cycle Management. In these systems the counterparty records are augmented with information such as a Legal Entity Identifier (LEI), a Bank Identifier Code (BIC) or a ticker symbol.  

This ability to empower subject matter experts and business users who are not programmers to measure data at rest and in motion has led to the following trends: 

  • Ownership: Data quality management moves from being the sole responsibility of a potentially remote data steward to all of those who are producing and changing data, encouraging a data driven culture. 
  • Federation: Data quality becomes everyone’s job. Let’s think about end of day pricing at a bank. The team that owns the securities master will want to test accuracy and completeness of data arriving from a vendorThe analyst working upstream who takes an end of day price from the securities master to calculate a volume-weighted average price (VWAP) will have different checks relating to the timeliness of information. Finally, the data scientist upstream of this who uses the VWAP to create predictive analytics. They want to build their own rules to validate data quality. 
  • Governance: A final trend that we are seeing is the tighter integration with standard governance tools. To be effective, self-service data quality and DataOps require tight integration with the existing systems that hold data dictionaries, metadata, and lineage information.

Here’s an illustration of how of how we see Datactics Self Service Data Quality (SSDQ) Platform integrating with DataOps in a highimpact way that you might want to consider in your own data strategy. 

1. Data Governance Team 

First off, we offer a set of pre-built dashboards for PowerBI, Tableau and Qlik that allow your data stewards to have rapid access to data quality measurements which relate just to them. A user in the London office might be enabled to see data for Europe or, perhaps, just data in their department. Within just a few clicks a data steward for the Legal Entity Master system could identify all records that are in breach of an accuracy check where an LEI is incorrector a timeliness check where the LEI has not been revalidated in the Global LEI Foundation’s (GLEIF) database inside 12 months. 


2. Data Quality Clinic: Data Remediation 

Data Quality Clinic extends the management dashboard by allowing a bank to return broken data to its owner for fixing. It effectively quarantines broken records and passes them to the data engineer in a queue, improving data pipelines and overall data governance & data quality. Clinic runs is a web browser and is tightly integrated with information relating to data dictionaries, lineage and thirdparty sources for validation. Extending our LEI example just now, I might be the owner of a bunch of entities which have failed an LEI check. Clinic would show me the records in question and highlight the fields in error. It would connect to GLEIF as the source of truth for LEIs and provide me with hints on what to correct. As you’d expect, this process can be enhanced by Machine Learning to automate this entity resolution process under human supervision.  


3. FlowDesigner Studio: Rule creation, documentation, sharing 

FlowDesigner is the rules studio in which the data governance team of super users build, manage, document and source-control rules for the profiling, cleansing and matching of enterprise data. We like to share these rules across our clients so FlowDesigner comes pre-loaded with rules for everything from name and address checking to CUSIP or ISIN validation. 


4. Data Quality Manager: Connecting to data sources; scheduling, automating solutions 

This part of the Datactics platform allows your technology team to connect to data flowing from multiple sources, schedule how rules are applied to data at rest and inmotion. It allows for the sharing and re-use of rules across all parts of your business. We have many clients solving big data problems involving hundreds of millions of records using Data Quality Manager across multiple different environments and data sources, on-premise or in public (or more typically private) cloud. 


Summary: Self-Service Data Quality for DataOps 

Thanks for joining me today as I’ve outlined how self-service data quality is a key part of successful DataOps. CDOs need real-time data quality insights to keep up with business needs while technical architects require a platform that doesn’t need a huge programming team to support it. If you have any questions about this topic, or how we’ve approached it, then we’d be glad to talk with you. Please get in touch below. 

Click here for the latest news from Datactics, or find us on LinkedinTwitter or Facebook 

The post Self-Service Data Quality for DataOps appeared first on Datactics.

]]>
Unlocking Data Quality and Value to Power Innovation & Insight https://www.datactics.com/blog/sales-insider/data-management-summit-virtual-unlocking-data-value-with-kieran-seaward/ Wed, 29 Apr 2020 15:38:27 +0000 https://www.datactics.com/data-management-summit-virtual-unlocking-data-value-with-kieran-seaward/ At last week’s Virtual Data Management Summit, the A-Team’s CEO Angela Wilbraham sat down for a Q&A on all things data quality and unlocking data value with our Head of Sales Kieran Seaward. Topics covered in this interview are: – The biggest issues in data management right now (0:31) – Why data quality is such […]

The post Unlocking Data Quality and Value to Power Innovation & Insight appeared first on Datactics.

]]>
At last week’s Virtual Data Management Summit, the A-Team’s CEO Angela Wilbraham sat down for a Q&A on all things data quality and unlocking data value with our Head of Sales Kieran Seaward.

Topics covered in this interview are:

– The biggest issues in data management right now (0:31)

– Why data quality is such a significant issue in the current climate (3:00)

– The impact of AI on the landscape of data quality, especially in AML & KYC (5:13)

– Being optimistic on the outlook for 2020 (8:25)

The Data Management Summit explores how financial institutions are shifting from defensive to offensive data management strategies, to improve operational efficiency and revenue-enhancing opportunities. 

Putting the business lens on data and deep-diving into the data management capabilities needed to deliver on business outcomes.

Topics include: 

  • Shifting from defensive to offensive – aligning data with business strategy for revenue optimisation and operational efficiency
  • Establishing trust – Embedding data ethics into your data strategy
  • How to turn data lineage from a regulatory response into a business advantage
  • Reviewing the regulatory landscape  and future of regulatory reporting in Europe
  • Migrating to the cloud to create new capabilities for the business
  • The promise and potential of AI
  • Unlocking Data Value
  • Client onboarding – how developments in tech and automation can help optimise entity data management and improve KYC
  • DataOps methodology – why its the next big thing and significant for financial services”

A natural communicator, Kieran Seaward has over 15 years’ experience in technical sales, including at First Derivatives, across a wide range of ERP, data and automation solutions. Kieran is rightly known for forming strong and lasting customer relationships. He began his career by undertaking a BSc (Hons) degree in Technology and Design from Ulster University. 

Having won numerous awards in his sales role in jewellery, it didn’t take long for Kieran to realise he had a skill for sales – as he grew in his expertise and his patter, he ended up in Datactics. One of the most exciting things about his role is that there is a huge amount of enlightenment brought to clients. He enjoys working towards solving a real issue and offering the client a solution that will transform the way they view data. 

Click here for the latest news from Datactics, or find us on LinkedinTwitter or Facebook

Kieran Seaward, Sales

Click here for more from Datactics, or find us on LinkedinTwitter or Facebook for the latest news.

The post Unlocking Data Quality and Value to Power Innovation & Insight appeared first on Datactics.

]]>
Beyond data prep – Whitepaper SSDQ https://www.datactics.com/blog/cto-vision/cto-vision-beyond-data-prep-whitepaper-ssdq/ Thu, 23 Apr 2020 12:25:13 +0000 https://www.datactics.com/cto-vision-beyond-data-prep-whitepaper-ssdq/ As featured in the recent A-Team webinar, we’ve been strong advocates of a self-service approach to data quality (SSDQ), especially when it comes to regulated data types and wide-ranging demands on a firm’s data assets. This whitepaper SSDQ, authored by our CTO Alex Brown, goes deeper into the reasons why this approach is so much […]

The post Beyond data prep – Whitepaper SSDQ appeared first on Datactics.

]]>
Whitepaper SSDQ

As featured in the recent A-Team webinar, we’ve been strong advocates of a self-service approach to data quality (SSDQ), especially when it comes to regulated data types and wide-ranging demands on a firm’s data assets.

This whitepaper SSDQ, authored by our CTO Alex Brown, goes deeper into the reasons why this approach is so much in demand and explores the functionalities that a fully self-service environment needs to equip business users with rapid access to high-quality data.

In this Self-Service Data Quality whitepaper, we describe trends and technologies bringing data quality functions closer to the data. Self Service Data Quality democratizes data, moving responsibility and control from central IT functions to data teams and SMEs. As a result, greater operational efficiency and higher value data assets can be achieved. 

Download our Whitepaper SSDQ here. For more information on our user friendly  Self Service Data Quality platform, take a look at our page here.

 

The Changing Landscape of Data Quality-There has been increasing demand for higher  quality data quality and less data quality issues  in recent years – highly regulated sectors dealing with personal data, such as banking, have had a tsunami of financial regulations such as BCBS239, MiFID, FATCA and many more stipulating or implying exacting standards for data and data processes.

Meanwhile, there is a growing trend for more and more firms to become more Data and Analytics (D&A) driven, taking inspiration from Google & Facebook, to monetize their data assets. This increased focus on D&A has been accelerated by easier and lower-cost access to artificial intelligence (AI), machine learning (ML) and business intelligence (BI) visualization technologies.

However, in the now-waning hype of lots of tools and technologies comes the pragmatic realization that unless there is a foundation of good quality reliable data and efficient data preparation, insights derived from AI and analytics may not be actionable. This is where having a modern data management framework is crucial, where organisations can take a look at how they are approaching data governance and data quality.

With AI and ML becoming more of a commodity, and a level playing field, the differentiator is in the data and the quality of the data… To read more see the whitepaper above.

Click here for more thought leadership pieces from our industry experts at Datactics, or find us on LinkedinTwitter or Facebook for the latest news.

The post Beyond data prep – Whitepaper SSDQ appeared first on Datactics.

]]>
Best Practices For Creating a Data Quality Framework https://www.datactics.com/blog/cto-vision/webinar-best-practices-for-creating-data-quality-framework/ Wed, 08 Apr 2020 13:50:01 +0000 https://www.datactics.com/webinar-best-practices-for-creating-data-quality-framework/ Chief Technology Officer, Alex Brown featured as a panellist in Data Management Insight’s webinar discussing the best practices for creating a data quality framework within your organisation.  What is the problem? A-Team Insight outlines that ‘bad data affects time, cost, customer service, cripples decision making and reduces firms’ ability to manage data and comply with […]

The post Best Practices For Creating a Data Quality Framework appeared first on Datactics.

]]>
Chief Technology Officer, Alex Brown featured as a panellist in Data Management Insight’s webinar discussing the best practices for creating a data quality framework within your organisation. 

Best practices from creating a data quality framework

What is the problem?

A-Team Insight outlines that ‘bad data affects time, cost, customer service, cripples decision making and reduces firms’ ability to manage data and comply with regulations.

With so much at stake, how can financial services organisations improve the accuracy, completeness and timeliness of their data in order to improve business processes?

What approaches and technologies are available to ensure data quality meets regulatory requirements as well as their own data quality objectives?

This webinar discusses how to establish a data framework and how to develop metrics to measure data quality. It also explores experiences of rolling out data quality enterprise-wide and resolving data quality issues. It will examine fixing data quality problems in real-time and how dashboards and data quality remediation tools can help. Lastly, it will explore new approaches to improving data quality using AI, Machine Learning, NLP and text analytics tools and techniques.

The topics focused on:

  • Limitations associated with an ad-hoc approach
  • Where to start, the lessons learned and how to roll out a comprehensive data quality solution
  • How to establish a business focus on data quality and developing effective data quality metrics (aligning with data quality dimensions) 
  • Using new and emerging technologies to improve data quality and automate data quality processes
  • Best practices for creating a Data Quality Framework 

We caught up with Alex to ask him a few questions on how he thought the webinar had gone, whether it had changed or backed up his views, and where we can hear from him next…

Firstly I thought the webinar was extremely well-run, with an audience well over 300 tunings in on the day.

The biggest takeaway for me was that it confirmed a lot of the narrative we’re hearing about the middle way between two models of data quality management – a centralised, highly-controlled but slow model of IT owning and running all data processes, and the “Wild West” where everyone does their own thing in an agile but disconnected way. Both sides have benefits and pitfalls, and the webinar really brought out a lot of those themes in a set of useful practical examples. It was well worth a listen as the session took a deep dive into establishing a data quality framework, looking at things like data profiling, data cleansing and data quality rules. 

Next up from me will be a whitepaper on this subject which we’ll be releasing really soon; there’ll be more blogs from me over at Datactics.com; and finally, I’m also looking forward to the Virtual Data Management Summit, as CEO Stuart Harvey’s got some interesting insight into DataOps to share

Missed the webinar? Not to worry, you can listen to the full recording here

Click here for more from Datactics, or find us on LinkedinTwitter or Facebook for the latest news.

The post Best Practices For Creating a Data Quality Framework appeared first on Datactics.

]]>
Why you should read the European Banking Authority report on AI and Big Data https://www.datactics.com/blog/cto-vision/why-you-should-read-the-european-banking-authority-report-on-ai-and-big-data/ Thu, 13 Feb 2020 15:05:43 +0000 https://www.datactics.com/why-you-should-read-the-european-banking-authority-report-on-ai-and-big-data/ You might have missed this highly informative report from the European Banking Authority (EBA)  – because the title didn’t contain the popular buzzwords of Artificial Intelligence – AI or Machine Learning – ML (nor does the front cover have a picture of a robot!). But for anyone who is trying to understand the challenges ahead for AI and […]

The post Why you should read the European Banking Authority report on AI and Big Data appeared first on Datactics.

]]>
European banking authority

You might have missed this highly informative report from the European Banking Authority (EBA)  – because the title didn’t contain the popular buzzwords of Artificial Intelligence – AI or Machine Learning – ML (nor does the front cover have a picture of a robot!).

But for anyone who is trying to understand the challenges ahead for AI and broader data management in banking I think this report provides a rare unbiased, concise and highly educational deep dive into pretty much all of the key topics involved. I won’t give a synopsis here, just some reasons why I think you should read it:

It’s really all about AI in Banking!

‘Advanced Analytics’ is the term the authors use for AI, ML tech.

BS Free

Provides most of the background you need to see through the smoke, mirrors and hype surrounding AI or Advanced Analytics.

It’s a great introduction

But not dumbed down – Great for business people who need a better understanding of the challenges their data scientists and AI professionals face, and great for data scientists who need to understand the broader applications and implications of this rapidly emerging technology in Banking. If you don’t know what kind of algorithm might be used for a particular business case this is for you. If you are trying to understand what a data scientist means by accuracy and a confusion matrix this is for you too.

Technologically Neutral

The report maintains technological neutrality and with so much information these days coming from vendors of proprietary tech, in a world where there are few common open standards, it’s hard to find information that doesn’t in some way implies vendor lock-in.

Holistic

This report covers pretty much everything including Data Quality, different types of ML, explainability and interpretability, ethics… So many reports are very narrow focusing on one use case or tech, but this takes the whole horizon into account.

Pragmatic

It describes practical use cases for AI and the technology involved – I was particularly impressed with the technical content: accurate concise and easy to understand. More importantly, it also describes all the potential problems – things like how automated credit scoring could be ‘gamed’ by an institution’s sales staff and could coach uncreditworthy customers on how to be granted a loan!

Forward-thinking

The European Banking Authority covers the topics of ethics in AI and even security in AI. Ethics has obviously been talked about a lot in recent months (sometimes with slightly fanciful references to Asimov’s laws of Robotics!) but this report lays out some really good practical steps that need to be implemented to ensure ML solutions are fair. It’s also refreshing to see serious consideration to security (data poisoning, adversarial attacks, model stealing) something I blogged about a couple of years ago. It’s a bit like in the old days of software development when people didn’t really take things like SQL injection or cross-site scripting seriously, resulting in security breaches in many applications and web sites. If AI solutions aren’t built with security from the ground up, the next few years could see echoes of these past security breaches played out in the AI domain.

You can get the report here

Click here for more from Datactics, or find us on LinkedinTwitter or Facebook for the latest news.

The post Why you should read the European Banking Authority report on AI and Big Data appeared first on Datactics.

]]>