Artificial Intelligence Archives - Datactics https://www.datactics.com/tag/artificial-intelligence/ Unlock your data's true potential Thu, 27 Jan 2022 16:38:44 +0000 en-GB hourly 1 https://wordpress.org/?v=6.7.2 https://www.datactics.com/wp-content/uploads/2023/01/DatacticsFavIconBluePink-150x150.png Artificial Intelligence Archives - Datactics https://www.datactics.com/tag/artificial-intelligence/ 32 32 Datactics demonstrates rapid matching capabilities on open datasets https://www.datactics.com/blog/ai-ml/datactics-demonstrates-rapid-matching-capabilities-on-open-datasets/ Fri, 17 Dec 2021 11:31:33 +0000 https://www.datactics.com/?p=17469 This blog from Fiona Browne, Head of Software Development & AI at Datactics, covers the subject of matching data across open datasets, a project for which the firm secured Innovate UK funding.   The Rapid Match project is a vehicle to address the complexity of integrating data and matching data at scale providing a platform for reproducible data pipelines for post and […]

The post Datactics demonstrates rapid matching capabilities on open datasets appeared first on Datactics.

]]>
Rapid Matching, Software platform, open datasets

This blog from Fiona Browne, Head of Software Development & AI at Datactics, covers the subject of matching data across open datasets, a project for which the firm secured Innovate UK funding.  

The Rapid Match project is a vehicle to address the complexity of integrating data and matching data at scale providing a platform for reproducible data pipelines for post and current COVID analysis. 

The project provides a generalised framework for data quality, preparation, and matching which is easy to use and reproducible for the integration and merging of diverse datasets at scale. 

We highlighted this capability through a Use Case on the identification of financial risk across regions in the UK. Using the Datactics platform, data quality, preparation and matching tasks were undertaken to integrate diverse UK Office of National Statistics (ONS) and UK Companies House (CH) datasets to provide a view on regional funding and sectors and the impact of COVID.  

The project is a vehicle to address the complexity of integrating data and matching data at scale providing a platform for reproducible data pipelines for post and current COVID analysis. 

COVID-19 related datasets are being generated at speed and volume including governmental sources from ONS, local authorities, open data through to third party datasets. Value is obtained from integrating these data together to provide a view on a particular problem area. For example, fraud detection. It is estimated that British banks have lent about £68 billion through a trio of loan programs, with repayments backstopped by the Government. Concerns have been raised about the risk of fraud, and one estimate found defaults and fraud in the Bounce Back program for small businesses could reach 80% in the worst case. 

Why?  

Institutions and governments need rapid access to high quality data to inform decision making processes. It is essential for the data to be of high quality, accurate and up to date.  In order to do this, data needs to be complete, high quality and obtained in timely fashion. These data need to be generated at speed and volume with value achieved from integration. This is often both a tricky and time-consuming process. Furthermore, processes to perform this are often fragmented, ad-hoc, non-systematic, brittle and difficult to reproduce and maintain.   

What?  

The Rapid Match project addressed the challenges around data quality and matching at scale through a systematic process which joins large amounts of messy, incomplete data in varying formats, from multiple sources. We provide a reliable ‘match engine’ allowing government and organisations to accurately and securely integrate diverse sources of data.   

A key outcome of the project has been the data quality applied to the UK Companies House datasets. Companies House datasets are applied to a wide range of applications from providing a register of incorporated UK companies through use in KYC on-boarding and AML checks performed by institutions. It is estimated that “millions of professionals use Companies House data daily”. For example, in due diligence to verify ultimate beneficiary ownership through to matching against financial crime and terrorism lists.   

What to do next 

If you are considering how to approach your data matching strategies and would like to view the work we carried out, please get in touch with Fiona Browne on LinkedIn.

And for more from Datactics, find us on LinkedinTwitter or Facebook.

The post Datactics demonstrates rapid matching capabilities on open datasets appeared first on Datactics.

]]>
Artificial Intelligence can help businesses thrive https://www.datactics.com/blog/ai-ml/artificial-intelligence-can-help-businesses-thrive/ Thu, 02 Dec 2021 17:10:52 +0000 https://www.datactics.com/?p=17252 The coronavirus pandemic produced challenges not one of us could have expected. While some sense of normality is returning, many businesses still face an uphill battle to recover. Artificial Intelligence Technology, however, presents a solution for firms hoping to thrive once again.  Artificial Intelligence (AI) is being used for predictive tasks from fraud detection through […]

The post Artificial Intelligence can help businesses thrive appeared first on Datactics.

]]>

The coronavirus pandemic produced challenges not one of us could have expected. While some sense of normality is returning, many businesses still face an uphill battle to recover. Artificial Intelligence Technology, however, presents a solution for firms hoping to thrive once again. 

Artificial Intelligence (AI) is being used for predictive tasks from fraud detection through to medical analytics. A key component of AI is the underlying data. Data impacts predictions, scalability and fairness of AI systems. As we move towards data-centric AI, having good quality, fair, representative, reliable and complete data will provide firms with a strong foundation to undertake tasks such as decision making and knowledge to strengthen their competitive position. In fact, AI solutions can be used to improve data quality when applied to tasks such as data labelling, accuracy, consistency, and completeness of data.

AI can help businesses not only improve and integrate data, but it will help their business grow through cost reduction and profit enhancement by reducing annual tasks. It has been predicted by Gartner that the business value created by AI will reach $3.9 trillion in 2022.

Businesses thrive with AI. It can automate financial forecasting, giving them greater visibility of their future finances and in turn empowering business owners to make better decisions and take actions to achieve their ultimate goals.

A key challenge for organisations is understanding the business objectives of deploying AI solutions. Therefore, moving away from using AI for technology sake towards awareness of what is feasible and how AI can be harnessed to address these objectives. This is a significant stumbling block for businesses to understand the benefits it can bring to their organisation.

The perceived lack of access to technology and need for copious amounts of data to train machine learning models are other stumbling blocks. We must bust the myth that AI is hard to access, for instance open source projects such as TensorFlow through to Microsoft Azure ML and Amazon Sage Maker are simplifying the process of building, deploying and monitoring machine learning models in production. Most companies don’t know this or how to take advantage of AI cost effective nature.

Even though accessing the technology is easy, using it is less so. Vendors are investing heavily in making the technology more accessible to non-expert users and have overall made great strides in making AI accessible.

That is why the upcoming AI Con Conference on 3 December at Titanic Belfast is so important. It gives us the perfect opportunity to discuss the benefits of AI for local firms.

Bringing together business leaders with world-leading technology professionals, AI Con will examine how artificial intelligence is changing our world and the opportunities and challenges it presents.

The themes for this year’s conference, which hosted 450 attendees in its first year and 800 in a virtual format last year, include Applied AI, AI Next and the Business of AI. These are designed for a general audience, tech audience and business audience respectively, and encompass everything from how AI can add value to organisations to what start-ups in the space should know.

The importance of AI cannot be disputed. AI Con will provide us with an opportunity to showcase the very best of AI. With Belfast now being a recognised tech hub, AI Con provides the perfect opportunity to foster debate and discussion around the benefits AI provides for business. Engagement with key business leaders and organisations is an essential part of that.

To find out more information about this year’s AI Con visit here.

And for more from Datactics, find us on LinkedinTwitter or Facebook.

The post Artificial Intelligence can help businesses thrive appeared first on Datactics.

]]>
Rules Suggestion – What is it and how can it help in the pursuit of improving data quality?   https://www.datactics.com/blog/ai-ml/rules-suggestion-what-is-it-and-how-can-it-help-improve-data-quality/ Wed, 15 Sep 2021 09:06:21 +0000 https://www.datactics.com/?p=15573 Written by Daniel Browne, Machine Learning Engineer Defining data quality rules and collection of rules for data quality projects is often a manual time-consuming process. It often involves a subject matter expert reviewing data sources and designing quality rules to ensure the data complies with integrity, accuracy and / or regulatory standards. As data sources […]

The post Rules Suggestion – What is it and how can it help in the pursuit of improving data quality?   appeared first on Datactics.

]]>
Written by Daniel Browne, Machine Learning Engineer

Defining data quality rules and collection of rules for data quality projects is often a manual time-consuming process. It often involves a subject matter expert reviewing data sources and designing quality rules to ensure the data complies with integrity, accuracy and / or regulatory standards. As data sources increase in volume and variety with potential functional dependencies, the task of defining data quality rules becomes more difficult. The application of machine learning can aid with this task by identifying dependencies between datasets through to the uncovering patterns related to data quality and suggesting previously applied rules to similar data.   

At Datactics, we recently undertook a Rule Suggestion Project to automate the process of defining data quality rules for datasets through rule suggestions. We use natural language processing techniques to analyse the contents of a dataset and suggest rules in our rule library that best fit each column.  

Problem Area and ML Solution  

Generally, there are several data quality and data cleansing rules that you would typically want to apply to certain fields in a dataset. An example is a consistency check on a phone number column in a dataset such as checking that the number provided is valid and formatted correctly. Unfortunately, it is not usually as simple as searching for the phrase “phone number” in a column header and going from there. A phone number column could be labelled “mobile”, or “contact”, or “tel”, for example. Doing a string match in these cases may not uncover accurate rule suggestions. We need context embedded into this process and this is where machine learning comes in. We’ve been experimenting with building and training machine learning models to be able to categorise data, then return suggestions for useful data quality and data cleansing rules to consider applying to datasets.  

Human in the Loop  

The goal here is not to take away control from the user, the machine learning model isn’t going to run off with your dataset and do what it determines to be right on its own – the aim is to assist the user and to streamline the selection of rules to apply. A user will have full control to accept or reject some or all suggestions that come from the Rule Suggestion model. Users can add new rules not suggested by the model and this information is captured to improve the suggestions by the model. We hope that this will be a useful tool for users to make the process of setting up data quality and data cleansing rules quicker and easier.  

Developers View  

I’ve been involved in the development of this project from the early stages, and it’s been exciting to see it come together and take shape over the course of the project’s development. A lot of my involvement has been around building out the systems and infrastructure to help users interact with the model and to format the model’s outputs into easily understandable and useful pieces of information. This work surrounds allowing the software to take a dataset and process it such that the model can make its predictions on it, and then mapping from the model’s output to the individual rules that will then be presented to the user.  

One of the major focuses we’ve had throughout the development of the project is control. We’ve been sure to build out the project with this in mind, with features such as giving users control over how cautious the model should be in making suggestions by being able to set confidence thresholds for suggestions, meaning the model will only return suggestions that meet or surpass the chosen threshold. We’ve also included the ability to add specific word-to-rule mappings that can help maintain a higher level of consistency and accuracy in results for very specific or rare categories that the model may have little or no prior knowledge of. For example, if there are proprietary fields that may have their own unique label, formatting, patterns or structures, and their own unique rules related to that, it’s possible to define a direct mapping from that to rules so that the Rule Suggestion system can produce accurate suggestions for any instances of that information in a dataset in the future.  

Another focus of the project we hope to develop further upon is the idea of consistently improving results as the project matures. In the future we’re looking to develop a system where the model can continue to adapt based on how the suggested rules are used. Ideally, this will mean that if the model tends to incorrectly predict that a specific rule or rules will be useful for a given dataset column, it will begin to learn to avoid suggesting that rule for that column based on the fact that users tend to disagree with that suggestion. Similarly, if there are rules that the model tends to avoid suggesting for a certain column that users then manually select, the model will learn to suggest these rules in similar cases in the future.  

In the same vein as this, one of the recent developments that I’ve found really interesting and exciting is a system that allows us to analyse the performance of various different machine learning models on a suite of sample data, which allows us to gain detailed insights into what makes an efficient and powerful rule prediction model, and how we can expect models to perform in real-world scenarios. It provides us with a sandbox to experiment with new ways of creating and updating machine learning models and being able to estimate baseline standards for performance, so we can be confident of the level of performance for our system. It’s been really rewarding to be able to analyse the results from this process so far and to be able to compare the different methods of processing the data and building machine learning models and see which areas one model may outperform another and so on.  

Thanks to Daniel for talking to us about rules suggestion. If you would like to discuss further or find out more about rules suggestion at Datactics, reach out to Daniel Browne directly or you can reach out to our Head of AI, Fiona Browne

Get in touch or find us on LinkedinTwitter, or Facebook.

The post Rules Suggestion – What is it and how can it help in the pursuit of improving data quality?   appeared first on Datactics.

]]>