Alex Brown https://www.datactics.com/author/alexbrown/ Unlock your data's true potential Sun, 28 Jul 2024 17:59:50 +0000 en-GB hourly 1 https://wordpress.org/?v=6.7.2 https://www.datactics.com/wp-content/uploads/2023/01/DatacticsFavIconBluePink-150x150.png Alex Brown https://www.datactics.com/author/alexbrown/ 32 32 What Connects James Cameron, Data Catalogs and Data Science 101? CTO Insights with Alex Brown https://www.datactics.com/blog/cto-vision/cto-insights-2023-data-science-101/ Thu, 16 Feb 2023 10:00:00 +0000 https://www.datactics.com/?p=21418 Matt Flenley recently took some time with Alex Brown, CTO at Datactics, to find out what he’s looking forward to in the year ahead.  You mentioned the other day in our team meeting the market trends towards Master Data Management. Is there anything that you had noted here that you wanted to expand on when […]

The post What Connects James Cameron, Data Catalogs and Data Science 101? CTO Insights with Alex Brown appeared first on Datactics.

]]>
CTO Insights 2023

Matt Flenley recently took some time with Alex Brown, CTO at Datactics, to find out what he’s looking forward to in the year ahead. 

You mentioned the other day in our team meeting the market trends towards Master Data Management. Is there anything that you had noted here that you wanted to expand on when it comes to the market’s direction, what our competitors are up to, and the implications for a firm like Datactics? 

I think the key things I’ve gleaned are that when it comes to Master Data Management and the big data management tools, like cataloguing and governance, it is really going cloud first. I think that’s why the likes of Microsoft see their Azure Purview as being an absolute essential. Even though people are critiquing its shortcomings, and saying it’s not too functional, there is still faith in Microsoft to pull it off. A lot of early adopters are working with them to improve scope and functionality.

Despite the doubts that are out there, a lot of people seem to be backing it, and that’s really quite interesting. I’ve been mulling over and exploring why this might be, and I think it has to be that if anyone’s going to do it, a cloud provider probably has the best opportunity ahead of them. They can get into all of those data sources and provide the comprehensive coverage that could take a third party a lot longer (with more hurdles to overcome).

In this case, they’re starting with infrastructure and then doing something with it, rather than being solution first and trying to optimise it for an infrastructure type?

Exactly. On the commercial side, everyone knows it shouldn’t be a massively expensive addition for them if they’re already Microsoft houses, as many businesses are. So, when it does come to maturity, it’s going to be an extremely competitive product commercially. If you’re an Azure house, it’s going to be especially persuasive.

I guess a bit like PowerBI on the data visualisation side to a certain extent? I suppose a lot of people are backing it like they would a film by James Cameron. They’re commercial successes, they’ll be worth seeing, it’ll be an epic regardless of whether you like it.  

Yeah, they’ll love it or hate it!

Yes-you’ll love it or it won’t be your cup of tea, but it’ll make a big splash! Solution providers are going to have to leverage Azure and infrastructure like it, but when firms can access Purview, there’s a chance that they’ll think they won’t necessarily need to extend their vendor licences for data governance. It makes for an interesting tech landscape. 

That’s true. On the licence thing, it’s worth saying that best-of-breed firms in governance and cataloguing have many years of development behind them and they are the domain specialists. There’s a lot of catching up to do when you think of firms in that space, like Collibra, Alation or Talend. A space to watch, for sure.

On Microsoft, it’s worth noting how they’re currently working. I think the former view about Microsoft being a very closed book, like back in the early 2000s when people saw them as a ‘root of all evil’ closed shop, that isn’t the case anymore. For example, I learned recently that for SQL Server for on-prem and cloud, in terms of what kind of storage they support, they now work with S3 storage and provide really good support. I would have anticipated them prioritising Azure storage, but this shows a lot of openness to accepting the best tech, even if it’s from an infrastructure competitor like Amazon. 

That’s really interesting. The idea of best of breed is something I picked up on as a trend over the last year, that interoperability across multiple tools. We’ve partnered with other firms in data management too. It’s fascinating to see Microsoft working with what’s best, rather than just building it themselves. In our tech teams, we use our own platform but then leverage Python for machine learning, for exmaple. An interesting dynamic of leveraging do-it-yourself and off-the-shelf. 

That’s a brilliant phrase! I like that. You’re right, the partnership angle is a big one for best-of-breed tooling, whether data quality, lineage, governance or cataloguing.

Another thing I’d like your view on is the declining focus on regulation as a driver for innovation. RegTech was seen as the standard bearer to plug a lot of holes, but in the last few years it’s shifted to getting more out of data – outflank, outrank the competition. Are you seeing the same thing on the tech side? Is it more regulation or business outcome driven? 

I’d agree with you. It’s less around compliance. One of the big things is about doing more with less; not having the tech operational staff, but using fewer human resources and more automation. Whether this is to satisfy a regulatory requirement or a key business proposition is sort of moot, I think people having to cope with having fewer staff to do things is key. 

I still see that the real advanced analytics thing doesn’t really seem to have got any easier in the last few years! And in a way its kind of surprising; it’s still the domain of specialist data scientists, which I think is quite interesting. All the same problems are there: sourcing the data, data quality, and I don’t think they’re going away – especially if there are fewer bodies around to wrangle it all. 

If they’re trying to get the most bang for their buck, they face a lot of challenges. They might be tempted to cut corners on data quality, and extract what they can from what they can manage. 

I agree. You’re not going to be able to compete with one hand tied behind your back, and even worse if you don’t realise you’ve one hand tied behind your back! 

This will sound salesy, but this is where the interplay of data cataloguing and data quality and governance come into play. If you can get the grassroots data quality published into your catalogue and governance, it means you can answer those questions about doing this kind of analysis, is the quality good enough? E.g. I can make an informed decision, but it’s not regulatory reporting ready. But in order to be able to make those decisions, you need the tools, the rules, the weights, the DQ fundamentals. 

Thanks for your time and your insights Alex!

CTO Insights 2023

Alex Brown is the Chief Technology Officer at Datactics. For more insights from Datactics, find us on LinkedinTwitter or Facebook.

The post What Connects James Cameron, Data Catalogs and Data Science 101? CTO Insights with Alex Brown appeared first on Datactics.

]]>
Key Features a Self-Service DQ Platform Should Have https://www.datactics.com/blog/self-service-data-quality/key-features-a-self-service-dq-platform-should-have/ Fri, 14 Jan 2022 12:34:42 +0000 https://www.datactics.com/?p=17633 of this evolution is the establishment of ‘self-service’ data quality – whereby data owners and SMEs have ready access to robust tools and processes, to measure and maintain data quality themselves, in accordance with data governance
policies.

The post Key Features a Self-Service DQ Platform Should Have appeared first on Datactics.

]]>

The drivers and benefits of a holistic, self-service data quality platform | Part 2

To enable the evolution towards actionable insight from data, D&A platforms and processes must evolve too. At the core of this evolution is the establishment of ‘self-service’ data quality – whereby data owners and SMEs have ready access to robust tools and processes, to measure and maintain data quality themselves, in accordance with data governance
policies. From a business perspective such a self-service data quality platform must be:

❖ Powerful enough to enable business users and SMEs to perform complex data operations
without highly skilled technical assistance from IT
❖ Transparent, accountable and consistent enough to comply with firm wide data governance
policies
❖ Agile enough to quickly onboard new data sets and changing data quality demands of end
consumers such as AI and Machine learning algorithms
❖ Flexible and open so it integrates easily with existing data infrastructure investment without
requiring changes to architecture or strategy
❖ Advanced to make pragmatic use of AI and machine learning to minimize manual
intervention

This goes way beyond the scope of most stand-alone data prep tools and ‘home grown’ solutions that are often used as a tactical one-off measure for a particular data problem. Furthermore, for the self-service data quality platform to truly enable actionable data across the enterprise, it will need to provide some key technical functionality built-in:


• Transparent & Continuous Data Quality Measurement
Not only should it be easy for business users and SMEs to implement large numbers of data domain specific data quality rules, but also those rules should be simple to audit, and easily explainable, so that ‘DQ breaks’ can be easily explored and the root cause of the break established.

In addition to data around the actual breaks, a DQ platform should be able to produce DQ dashboards enabling drill-down from high level statistics down to actual failing data points and publish high level statistics into data governance systems.

• Powerful Data Matching – Entity Resolution for Single View and Data Enrichment
Finding hidden value in data or complying with regulation very often involves joining together several disparate data sets. For example, enhancing a Legal Entity Master Database with an LEI, screening customer accounts against sanctions and PEP lists for KYC, creating a single view of client from multiple data silos for GDPR or FSCS compliance. This goes further than simple deduplication of records or SQL joins – most data sets are messy and don’t have unique identifiers and so fuzzy matching of numerous string fields must be implemented to join one data set with another. Furthermore, efficient clustering algorithms are required to sniff out similar records from other disparate data sets in order to provide a single consolidated view across all silos.

• Integrated Data Remediation Incorporating Machine Learning 
It’s not enough just to flag up broken data, you also need a process and technology for fixing the breaks. Data quality platforms should have this built in so that after data quality measurement, broken data can be quarantined, data owners alerted and breaks automatically assigned to the relevant SMEs for remediation Interestingly, the manual remediation process lends itself very well to machine learning. The process of manually remediating data captures domain specific knowledge about the data – information that can be readily used by machine learning algorithms to streamline the resolution of similar breaks in the future and thus greatly reduce the overall time and effort spent on manual remediation. 

“The process of manually remediating data captures domain specific knowledge about the data – information that can be readily used by machine learning algorithms to streamline the resolution of similar breaks in the future”   

• Data Access Controls Across Teams and Datasets 
Almost any medium to large sized organization will have various forms of sensitive data, and policies for sharing that data within the organization e.g. ‘Chinese walls’ between one department and another. In order to enable integration across teams and disparate silos of data, granular access controls are required – especially inside the data remediation technology where sensitive data may be displayed to users. Data access permissions should be set automatically where possible (e.g. inheriting Active Directory attributes) and enforced when displaying data, for example by row- and field-level access control, and using data masking or obfuscation where appropriate. 

  • Audit Trails, Assigning and Tracking Performance 
    Providing business users with tools to fix data could cause additional headaches when it  comes to being able to understand who did what, when, why and whether or not it was the right thing to do. It stands to reason, therefore, that any remediation tool should have builtin capability to do just that with the associated performance of data break remediation 
    measured, tracked and managed. 
  • AI Ready 
    There’s no doubt that one of the biggest drivers of data quality is AI. AI data scientists can spend up to 80% of their time just preparing input data for machine learning algorithms, which is a huge waste of their expertise. A self-service data quality platform can address many of the data quality issues by providing ready access to tools and processes that can ensure a base level of quality and identify anomalies in data that may skew machine learning models. Furthermore the same self-service data quality tools can assist data scientists to generate metadata that can be used to inform machine learning models – such ‘Feature Engineering’ can be of real value when the data set is largely textual as it can generate numerical indicators which are more readily consumed by ML algorithms. 

“AI data scientists can spend up to 80% of their time just preparing input data for machine learning algorithms, which is a huge waste of their expertise”

To have further conversations about the drivers and benefits of a Self-Service Data Quality platform, please book a quick call with Kieran Seaward.    

And for more from Datactics, find us on LinkedinTwitter, or Facebook.

The post Key Features a Self-Service DQ Platform Should Have appeared first on Datactics.

]]>
The Changing Landscape of Data Quality https://www.datactics.com/blog/self-service-data-quality/the-changing-landscape-of-data-quality/ Thu, 13 Jan 2022 12:37:29 +0000 https://www.datactics.com/?p=17605 The drivers and benefits of a holistic, self-service data quality platform | Part 1 Change There has been increasing demand for higher and higher data quality in recent years – highly regulated sectors, such as banking have had a tsunami of financial regulations such as BCBS239, MiFID, FATCA, and many more stipulating or implying exacting […]

The post The Changing Landscape of Data Quality appeared first on Datactics.

]]>

The drivers and benefits of a holistic, self-service data quality platform | Part 1

Change

There has been increasing demand for higher and higher data quality in recent years – highly regulated sectors, such as banking have had a tsunami of financial regulations such as BCBS239, MiFID, FATCA, and many more stipulating or implying exacting standards for data and data processes. Meanwhile, there is a growing trend for more and more firms to become more Data and Analytics (D&A) driven, taking inspiration from Google & Facebook, to monetize their data assets.

This increased focus on D&A has been accelerated by easier and lower-cost access to artificial intelligence (AI), machine learning (ML), and business intelligence (BI) visualization technologies. However, in the now-waning hype of these technologies comes the pragmatic realization that unless there is a foundation of good quality reliable data, insights derived from AI and analytics may not be actionable. With AI and ML becoming more of a commodity, and a level playing field, the differentiator is in the data and the quality of the data.

“Unless there is a foundation of good quality reliable data, insights derived from AI and analytics may not be actionable”

Problems 

As the urgency for regulatory compliance or competitive advantage escalates, so too does the urgency for high data quality. A significant obstacle to quickly achieve high data quality is the variety of disciplines required to measure data quality, enrich data and fix data. By its nature, digital data, especially big data can require significant technical skills to manipulate and for this reason, was once the sole responsibility of IT functions within an organization. However, maintaining data also requires significant domain knowledge about the content of the data, and this domain knowledge resides with the subject matter experts (SMEs) who use the data, rather than a central IT function. Furthermore, each data set will have its own SMEs with special domain knowledge required to maintain the data, and a rapidly growing and changing number of data sets. If a central IT department is to maintain the quality of data correctly it must therefore liaise with an increasingly large number of data owners and SMEs in order to correctly implement DQ controls and remediation required. These demands create a huge drain on IT resources and a slow-moving backlog of data quality change requests within IT that simply can’t keep up. 

Given the explosion in data volumes, this model clearly won’t scale and so there is now a growing trend to move data quality operations away from central IT and back into the hands of data owners. While this move can greatly accelerate data quality and data onboarding processes, it can be difficult and expensive for data owners and SMEs to meet the technical challenges of maintaining and onboarding data. Furthermore, unless there is common governance around data quality across all data domains there stands the risk of a ‘wild west’ scenario, where every department manages data quality differently with different processes and technology. 

Opportunity

The application of data governance policies and the creation of an accountable Chief Data Officer (CDO) goes a long way to mitigate against the ‘wild west’ scenario. Data quality standards such as the Enterprise Data Management Council’s (EDMC) Data Capability Assessment Model (DCAM)1 provide opportunities to establish consistency in data quality measurement across the board.

The drive to capitalize on data assets for competitive advantage has had the result that the CDO function is quickly moving from an operational cost centre towards a product-centric profit centre. A recent publication by Gartner (30th July 2019) 2 describes three generations of CDO: “CDO 1.0” focused on data management; “CDO 2.0” embraced analytics; “CDO 3.0” assisted digital transformation, and Gartner now predicts a fourth, “CDO 4.0” focused on monetizing data-oriented products. Gartner’s research suggests that to enable this evolution, companies should strive to develop data and analytics platforms that scale across the entire company and this implies data quality platforms that scale too. 

To have further conversations about the drivers and benefits of a Self-Service Data Quality platform, book a quick call with Kieran Seaward.    

And for more from Datactics, find us on LinkedinTwitter, or Facebook.

The post The Changing Landscape of Data Quality appeared first on Datactics.

]]>
The Three Pillars of AI https://www.datactics.com/blog/cto-vision/cto-vision-the-3-pillars-of-successful-production-ai/ Fri, 04 Sep 2020 11:05:25 +0000 https://www.datactics.com/?p=7020 Recent incidents involving AI algorithms have hit the headlines, leading many to question their worth. In this article, CTO Alex Brown outlines the three pillars of AI and looks at how they each play a part in implementing AI in production. As many who work within computer science will know, many Artificial Intelligence (AI) projects fail to make the crucial […]

The post The Three Pillars of AI appeared first on Datactics.

]]>
Recent incidents involving AI algorithms have hit the headlines, leading many to question their worth.

In this article, CTO Alex Brown outlines the three pillars of AI and looks at how they each play a part in implementing AI in production.

three pillars of AI

As many who work within computer science will know, many Artificial Intelligence (AI) projects fail to make the crucial transition from experiment to production, for a wide range of reasons. In many cases, the triple investment of money, training, and time is deemed too big of a risk to take; additionally, it could be feared that initial AI and machine learning models might not scale or might be viewed as too experimental to be utilised by internal or external customers. 

Pillars

In many cases, it can also be due to a lack of data, the suitability of data, and the quality of data. But even if your data’s of the right quality and your experimental model is good, your digital transformation journey is far from over – you still have a long way to go before you can use that AI in production! 

From all the work we at Datactics have been undertaking in AI development, it’s clear to us that there are 3 critical features your AI system will need:  

Explainability

Pillars of AI

Two or three years ago, when more AI technologies and intelligent systems were emerging, no one talked about explainability – the ability to explain why an algorithm or model made a decision or set of decisions.

Today it’s a hot topic in data science and discussions around deep learning. The use of opaque ‘black box’ solutions has been widely criticised, both for a lack of transparency and also for the possible biases inherited by the algorithms that are subject to human prejudices in the training data. 

Many recent cases have shown how this can lead to fragmented and unfair decisions being made.  

Explainable AI or “XAI” is fast becoming a prerequisite for many AI projects especially in government, policing, and regulated industries such as healthcare and banking with huge amounts of data.

In these business areas, the demand for explainability is understandably high. Explainability is vital for decision–making, predictions, risk management, and policymaking.

Predictions are a delicate topic of discussion as any mistakes made can often lead to major implications. 

AI models in healthcare

As an example in healthcare, if an AI algorithm isn’t trained adequately with the correct data, we can’t be sure that it will effectively be able to properly diagnose a patient.

Therefore, training the data set and ensuring that the data entering the data set is bias-free has never been more important.  

Furthermore, XAI is not just for data scientists, but also for non-technical business specialists.

It stands to reason that it should also be easy for a business user to obtain and understand information on why a predictive model made a particular prediction from a business perspective and for a data scientist to clearly understand the behaviour of the model in as much detail as possible.  

Monitoring  

Closely related to XAI is the need to closely monitor AI model performance. Just like children may be periodically tested at school to ensure their learning is progressing, so too do AI models need to be monitored to detect “model drift”, defined as when predictions keep becoming incorrect over time in unforeseen ways. Various concept drift and data drift detection and handling schemes may be helpful for each situation.

Often, if longer-term patterns are understood as being systemic, they can be identified and managed.

AI models


Concept drift is often prominent on supervised learning problems where predictions are developed and collated over some time.  Like many things, drift isn’t something to be feared but instead measured and monitored, to ensure firstly that we have confidence in the model and the predictions it is making, and secondly that we can report to senior executives on the level of risk associated with using the model. 

Retraining  

 Many AI solutions come with ‘out of the box’ pre-trained models which can which theoretically make it quicker to deploy into production. 

measuring AI

However, it is important to understand that there isn’t a “one-size fits all” when it comes to AI, and that some customisation is going to be necessary to ensure that predictions being made fit your business purposes. 

For most cases, these models may not necessarily be well suited to your data. The vendor will have trained the models on data sets that may look quite different to your particular data and so may behave differently.

Again, this highlights the importance of monitoring and explainability, but furthermore the importance of being able to adapt a pre-trained model to your specific data in order to achieve strong AI.

To this end, vendors supplying pre-trained models should provide facilities for the customer to collect new training data and retain an off–the–shelf model.

An important consequence of this is that such AI frameworks need to have the ability to rollback to previous versions of a model in case of problems, and version control both models and training data in order to prevent weak AI.

To conclude our three pillars of AI, the route to getting AI into production is built on being able to explain it, including: 

  • The decisions baked-into the model, including why certain data was selected or omitted
  • How much the model is deviating from expectations, and why
  • How often, how and why the model has been retrained, and whether or not it should be rolled back to a previous version

For more on this subject, read up on my colleague Fiona Browne’s work, including a recent piece on Explainable AI, which can be found here 

The post The Three Pillars of AI appeared first on Datactics.

]]>
No-code & Lo-code: A Lighter Way To Enjoy Tech? https://www.datactics.com/blog/cto-vision/no-code-lo-code/ Fri, 12 Jun 2020 09:20:54 +0000 https://www.datactics.com/cto-vision-nocode-locode/ In this article with Datactics CTO Alex Brown, Matt Flenley asks about the nature of no-code and lo-code platforms like Datactics’ Self-Service Data Quality, and whether they really are a lighter way to enjoy technology?  The lo-code no-code paradigm can be a bit like Marmite. Some people say that it’s great, it gets the job done, and these are […]

The post No-code & Lo-code: A Lighter Way To Enjoy Tech? appeared first on Datactics.

]]>

In this article with Datactics CTO Alex Brown, Matt Flenley asks about the nature of no-code and lo-code platforms like Datactics’ Self-Service Data Quality, and whether they really are a lighter way to enjoy technology? 

The lo-code no-code paradigm can be a bit like Marmite. Some people say that it’s great, it gets the job done, and these are usually the business subject matter experts who are used to Excel, especially in banks and large government organisations where that’s the standard data handling tool in use. Technical people, such as software developers who are  fluent in programming languages and disciplines, look on aghast at these blocks of functionality that are being chained together in macro-enabled workbooks because they quickly evolve to become monstersThese monsters become very expensive if not impossible to maintain when, inevitably, changes are required to support a change in the development environment and data formats.  

The perfect combination for these technical people is something that fits in with the IT rigour around release schedules, documentation, and testing – and just good practices in how you build stuff, making them robust and reusable.

Creating application that are well-tested and can be reused in other projects are quicker and easier to use for new projects, with a product at the end that is more stable. The whole modular approach is where the Datactics self-service platform has been built: reusable components that can be recycled and customised for rapid, lowrisk development and deployment within a user-friendly lo-code interface.  

From a business point of view, the driving force behind the lo-code, the no-code approach is about a tactical way to address specific problems, where the existing infrastructure isn’t delivering what the business needs but the business users aren’t technical coders. For example, a bank or financial firm might need to capture an additional piece of information to meet a regulatory requirement. They might design and provide a webform or something similar that captures and relays the data into a datastore, and then into the firm’s regulatory reporting framework. This all plays a part in developing efficient business process management. 

This is where no/lo-code comes in as it allows you to do this kind of thing very quickly – those kinds of ad-hoc changes you might need to do to meet a specific deadline or requirement. 

The demand for this will only increase in a post-COVID-19 environmentFor instance, one of our clients mentioned that at the start of the UK lockdown phase they needed to rapidly understand what the state of their email addresses was for all their customers to whom they’d usually write by post. Their data team of professional developers had rules built in under two hours and a fully operational interactive dashboard a day later that their Risk committee could review and track data quality issues and how quickly they were being fixed.

Our Self-Service Data Quality platform, for example, is easily used to address the tactical need for data quality or matching without writing any code, or waiting for central IT to run queries. You’ve all the drag & drop capability to build rules, data pipelines, matching algorithms and so on without the need for writing any code, allowing you to do a specific job really quite quickly. Platforms like this are extremely good at these tactical use cases where you don’t want to rip out and rewrite your existing infrastructure, you just need to do this little add-on job to make it complete to meet a regulatory reporting requirement or specific business requirement. 

Because our platform doesn’t force you to use a particular persistence layer or anything like that, it’s all API-driven and sits on whatever Master Data Management platform that you have, it makes it a really flexible tool that is well-suited to these tactical use cases.

This means that the total cost of ownership for firms is far lower because lo-code platforms offer a wide range of extensibility to multiple downstream use cases. Things like regulatory compliance, emerging risks, custom data matching or even migration projects are the perfect situations where one self-service platform can be leveraged for all these things without causing huge delays in IT ticketing processes, or multiple conflicting requests hitting the central IT team all at once. 

Ultimately, lo or no-code solutions are likely to thrive as business teams discover that they can get to use the firm’s data assets themselves for faster resultswithout tying their IT teams up in knots.

The post No-code & Lo-code: A Lighter Way To Enjoy Tech? appeared first on Datactics.

]]>
Beyond data prep – Whitepaper SSDQ https://www.datactics.com/blog/cto-vision/cto-vision-beyond-data-prep-whitepaper-ssdq/ Thu, 23 Apr 2020 12:25:13 +0000 https://www.datactics.com/cto-vision-beyond-data-prep-whitepaper-ssdq/ As featured in the recent A-Team webinar, we’ve been strong advocates of a self-service approach to data quality (SSDQ), especially when it comes to regulated data types and wide-ranging demands on a firm’s data assets. This whitepaper SSDQ, authored by our CTO Alex Brown, goes deeper into the reasons why this approach is so much […]

The post Beyond data prep – Whitepaper SSDQ appeared first on Datactics.

]]>
Whitepaper SSDQ

As featured in the recent A-Team webinar, we’ve been strong advocates of a self-service approach to data quality (SSDQ), especially when it comes to regulated data types and wide-ranging demands on a firm’s data assets.

This whitepaper SSDQ, authored by our CTO Alex Brown, goes deeper into the reasons why this approach is so much in demand and explores the functionalities that a fully self-service environment needs to equip business users with rapid access to high-quality data.

In this Self-Service Data Quality whitepaper, we describe trends and technologies bringing data quality functions closer to the data. Self Service Data Quality democratizes data, moving responsibility and control from central IT functions to data teams and SMEs. As a result, greater operational efficiency and higher value data assets can be achieved. 

Download our Whitepaper SSDQ here. For more information on our user friendly  Self Service Data Quality platform, take a look at our page here.

 

The Changing Landscape of Data Quality-There has been increasing demand for higher  quality data quality and less data quality issues  in recent years – highly regulated sectors dealing with personal data, such as banking, have had a tsunami of financial regulations such as BCBS239, MiFID, FATCA and many more stipulating or implying exacting standards for data and data processes.

Meanwhile, there is a growing trend for more and more firms to become more Data and Analytics (D&A) driven, taking inspiration from Google & Facebook, to monetize their data assets. This increased focus on D&A has been accelerated by easier and lower-cost access to artificial intelligence (AI), machine learning (ML) and business intelligence (BI) visualization technologies.

However, in the now-waning hype of lots of tools and technologies comes the pragmatic realization that unless there is a foundation of good quality reliable data and efficient data preparation, insights derived from AI and analytics may not be actionable. This is where having a modern data management framework is crucial, where organisations can take a look at how they are approaching data governance and data quality.

With AI and ML becoming more of a commodity, and a level playing field, the differentiator is in the data and the quality of the data… To read more see the whitepaper above.

Click here for more thought leadership pieces from our industry experts at Datactics, or find us on LinkedinTwitter or Facebook for the latest news.

The post Beyond data prep – Whitepaper SSDQ appeared first on Datactics.

]]>
Best Practices For Creating a Data Quality Framework https://www.datactics.com/blog/cto-vision/webinar-best-practices-for-creating-data-quality-framework/ Wed, 08 Apr 2020 13:50:01 +0000 https://www.datactics.com/webinar-best-practices-for-creating-data-quality-framework/ Chief Technology Officer, Alex Brown featured as a panellist in Data Management Insight’s webinar discussing the best practices for creating a data quality framework within your organisation.  What is the problem? A-Team Insight outlines that ‘bad data affects time, cost, customer service, cripples decision making and reduces firms’ ability to manage data and comply with […]

The post Best Practices For Creating a Data Quality Framework appeared first on Datactics.

]]>
Chief Technology Officer, Alex Brown featured as a panellist in Data Management Insight’s webinar discussing the best practices for creating a data quality framework within your organisation. 

Best practices from creating a data quality framework

What is the problem?

A-Team Insight outlines that ‘bad data affects time, cost, customer service, cripples decision making and reduces firms’ ability to manage data and comply with regulations.

With so much at stake, how can financial services organisations improve the accuracy, completeness and timeliness of their data in order to improve business processes?

What approaches and technologies are available to ensure data quality meets regulatory requirements as well as their own data quality objectives?

This webinar discusses how to establish a data framework and how to develop metrics to measure data quality. It also explores experiences of rolling out data quality enterprise-wide and resolving data quality issues. It will examine fixing data quality problems in real-time and how dashboards and data quality remediation tools can help. Lastly, it will explore new approaches to improving data quality using AI, Machine Learning, NLP and text analytics tools and techniques.

The topics focused on:

  • Limitations associated with an ad-hoc approach
  • Where to start, the lessons learned and how to roll out a comprehensive data quality solution
  • How to establish a business focus on data quality and developing effective data quality metrics (aligning with data quality dimensions) 
  • Using new and emerging technologies to improve data quality and automate data quality processes
  • Best practices for creating a Data Quality Framework 

We caught up with Alex to ask him a few questions on how he thought the webinar had gone, whether it had changed or backed up his views, and where we can hear from him next…

Firstly I thought the webinar was extremely well-run, with an audience well over 300 tunings in on the day.

The biggest takeaway for me was that it confirmed a lot of the narrative we’re hearing about the middle way between two models of data quality management – a centralised, highly-controlled but slow model of IT owning and running all data processes, and the “Wild West” where everyone does their own thing in an agile but disconnected way. Both sides have benefits and pitfalls, and the webinar really brought out a lot of those themes in a set of useful practical examples. It was well worth a listen as the session took a deep dive into establishing a data quality framework, looking at things like data profiling, data cleansing and data quality rules. 

Next up from me will be a whitepaper on this subject which we’ll be releasing really soon; there’ll be more blogs from me over at Datactics.com; and finally, I’m also looking forward to the Virtual Data Management Summit, as CEO Stuart Harvey’s got some interesting insight into DataOps to share

Missed the webinar? Not to worry, you can listen to the full recording here

Click here for more from Datactics, or find us on LinkedinTwitter or Facebook for the latest news.

The post Best Practices For Creating a Data Quality Framework appeared first on Datactics.

]]>
Why you should read the European Banking Authority report on AI and Big Data https://www.datactics.com/blog/cto-vision/why-you-should-read-the-european-banking-authority-report-on-ai-and-big-data/ Thu, 13 Feb 2020 15:05:43 +0000 https://www.datactics.com/why-you-should-read-the-european-banking-authority-report-on-ai-and-big-data/ You might have missed this highly informative report from the European Banking Authority (EBA)  – because the title didn’t contain the popular buzzwords of Artificial Intelligence – AI or Machine Learning – ML (nor does the front cover have a picture of a robot!). But for anyone who is trying to understand the challenges ahead for AI and […]

The post Why you should read the European Banking Authority report on AI and Big Data appeared first on Datactics.

]]>
European banking authority

You might have missed this highly informative report from the European Banking Authority (EBA)  – because the title didn’t contain the popular buzzwords of Artificial Intelligence – AI or Machine Learning – ML (nor does the front cover have a picture of a robot!).

But for anyone who is trying to understand the challenges ahead for AI and broader data management in banking I think this report provides a rare unbiased, concise and highly educational deep dive into pretty much all of the key topics involved. I won’t give a synopsis here, just some reasons why I think you should read it:

It’s really all about AI in Banking!

‘Advanced Analytics’ is the term the authors use for AI, ML tech.

BS Free

Provides most of the background you need to see through the smoke, mirrors and hype surrounding AI or Advanced Analytics.

It’s a great introduction

But not dumbed down – Great for business people who need a better understanding of the challenges their data scientists and AI professionals face, and great for data scientists who need to understand the broader applications and implications of this rapidly emerging technology in Banking. If you don’t know what kind of algorithm might be used for a particular business case this is for you. If you are trying to understand what a data scientist means by accuracy and a confusion matrix this is for you too.

Technologically Neutral

The report maintains technological neutrality and with so much information these days coming from vendors of proprietary tech, in a world where there are few common open standards, it’s hard to find information that doesn’t in some way implies vendor lock-in.

Holistic

This report covers pretty much everything including Data Quality, different types of ML, explainability and interpretability, ethics… So many reports are very narrow focusing on one use case or tech, but this takes the whole horizon into account.

Pragmatic

It describes practical use cases for AI and the technology involved – I was particularly impressed with the technical content: accurate concise and easy to understand. More importantly, it also describes all the potential problems – things like how automated credit scoring could be ‘gamed’ by an institution’s sales staff and could coach uncreditworthy customers on how to be granted a loan!

Forward-thinking

The European Banking Authority covers the topics of ethics in AI and even security in AI. Ethics has obviously been talked about a lot in recent months (sometimes with slightly fanciful references to Asimov’s laws of Robotics!) but this report lays out some really good practical steps that need to be implemented to ensure ML solutions are fair. It’s also refreshing to see serious consideration to security (data poisoning, adversarial attacks, model stealing) something I blogged about a couple of years ago. It’s a bit like in the old days of software development when people didn’t really take things like SQL injection or cross-site scripting seriously, resulting in security breaches in many applications and web sites. If AI solutions aren’t built with security from the ground up, the next few years could see echoes of these past security breaches played out in the AI domain.

You can get the report here

Click here for more from Datactics, or find us on LinkedinTwitter or Facebook for the latest news.

The post Why you should read the European Banking Authority report on AI and Big Data appeared first on Datactics.

]]>
How Datactics helps Santa with his Data Quality issues https://www.datactics.com/blog/cto-vision/how-datactics-helps-santa-with-his-data-quality-issues/ Thu, 19 Dec 2019 16:00:45 +0000 https://www.datactics.com/how-datactics-can-help-santa-with-his-data-quality-issues/ Yes of course Santa has data quality issues! Everyone has data quality issues. In this article, we outline how Datactics software can help Santa improve the efficiency of his pre-Christmas operations and have a stress free Christmas Eve delivery and a relaxing Christmas Day. Data Quality Firewall, REST API, Data Quality Remediation Datactics provides a […]

The post How Datactics helps Santa with his Data Quality issues appeared first on Datactics.

]]>
Yes of course Santa has data quality issues! Everyone has data quality issues.

In this article, we outline how Datactics software can help Santa improve the efficiency of his pre-Christmas operations and have a stress free Christmas Eve delivery and a relaxing Christmas Day.

Data Quality Firewall, REST API, Data Quality Remediation

Datactics provides a REST API interface and “Data Quality Firewall” to allow the import of data from Optical Character Recognition (OCR) software that has scanned the children’s letters and guarantees the quality of data entering the data store. Records passing DQ criteria are automatically allowed through to Santa, while records failing DQ checks are quarantined where they can be reviewed interactively by Santa’s Elves in the Data Quality Clinic

Oh dear! Did Ellie ask for a Barbie House or a Barbie Horse? Not to worry – the Record is in quarantine and will be reviewed by an Elf who perhaps knows Ellie and can find out what she wanted, and can check against additional data sources like the latest online toy catalogues to discover what the possible matches might be. This saves the elves significant time in only having to review a smaller set of records, making the busiest time of the year far less stressful for all at the North Pole!

SVC – Single View of Child

Managing vast quantities of historical Personally Identifiable Information (PII) on his data servers in Lapland is a difficult task, but Datactics can help create a Single View of Child from the disparate data silos, normalising the data and creating a golden record for each child. This ensures that presents aren’t duplicated and more importantly keeps him compliant with GDPR.

Address Validation

The last thing Santa wants on Christmas Eve when he’s delivering to a few billion houses is to go to the wrong address, it wastes time and risks a potential present mix up. Fortunately, Datactics makes it easy to validate the children’s addresses against databases such as the Post Code Address File (PAF) and Capscan so Santa knows he’s going to the right place before he sets out.

Screening Against the Naughty List

This is not as simple as it may sound because you have to get it right or someone is going to be very upset. But using the established techniques Datactics has developed for KYC & AML screening against Politically Exposed Persons etc. Santa can screen against The Naughty List with confidence

Excitingly, it’s not only Santa who can screen against this list: everyone can try the naughty list screening for free at https://aml-screening.datactics.com/

Merry Christmas from everyone at Datactics!

Click here for more from Datactics, or find us on LinkedinTwitter or Facebook for the latest news.

The post How Datactics helps Santa with his Data Quality issues appeared first on Datactics.

]]>
Transliteration matching in Japanese, Chinese, Russian, Arabic and all non-Latin data sets https://www.datactics.com/blog/cto-vision/transliteration-matching/ Fri, 25 Oct 2019 15:15:53 +0000 https://www.datactics.com/transliteration-matching/ Here at Datactics, we’ve recently done a number of Transliteration matching tasks helping people with Japanese, Russian Cyrillic and Arabic data sets. Transliteration matching can seem challenging, especially when presented with text that you don’t understand, but with the right techniques a lot can be achieved – the key is to really understand the problem […]

The post Transliteration matching in Japanese, Chinese, Russian, Arabic and all non-Latin data sets appeared first on Datactics.

]]>

Here at Datactics, we’ve recently done a number of Transliteration matching tasks helping people with Japanese, Russian Cyrillic and Arabic data sets.

Transliteration matching can seem challenging, especially when presented with text that you don’t understand, but with the right techniques a lot can be achieved – the key is to really understand the problem and have some proven techniques for dealing with them:

  • Transliteration – Matching data within a single character set

We have a long-standing Chinese customer who routinely matches data sets of 100’s of millions of customer records all in Chinese. Even with messy data, this is a relatively straight forward task, as long as your matching algorithms can handle Unicode properly, fuzzy matching within a single character set e.g. Chinese customer database to Chinese marketing database is very similar to the same task in a roman character set albeit with some tweaks to fuzzy match tolerances.

  • Frequency Analysis

Another very useful technique is to perform frequency analysis on the input text to help identify ‘noise text’ such as company legal forms within company names that can be either eliminated from the match or that should be matched with lower importance than the rest of a company name. For example frequency analysis on a Japanese entity master database may reveal a large number of company names containing the Kanji “株式会社” or “” – the Japanese equivalent of ‘Limited’ (or ‘Ltd.’ in abbreviated form). The beauty of this technique is that it can be applied to any language or character set.

  • Matching between character sets using Transliteration, fuzzy and phonetic matching

A common requirement in the AML/KYC space is matching account names in Chinese, Japanese, or Cyrillic etc to sanctions and PEP lists which are usually published in Latin script. In order to do this a process called ‘transliteration’ is required. Transliteration converts text in one character set to another, but the results from raw transliteration are not always usable since the resulting transliterated text is often more of a ‘pronunciation guide’ rather than how a native speaker would write the text in Latin script. However, by using a combination of fuzzy and phonetic matching on the transliterated string, it is possible to obtain very accurate matching.

If you’d like to try this out for yourself, Cyrillic transliteration is built into our free AML screening demonstration app. You can register and find out more here or click below to see it in action:

The post Transliteration matching in Japanese, Chinese, Russian, Arabic and all non-Latin data sets appeared first on Datactics.

]]>