AI Archives - Datactics https://www.datactics.com/tag/ai/ Unlock your data's true potential Sun, 28 Jul 2024 22:45:55 +0000 en-GB hourly 1 https://wordpress.org/?v=6.7.2 https://www.datactics.com/wp-content/uploads/2023/01/DatacticsFavIconBluePink-150x150.png AI Archives - Datactics https://www.datactics.com/tag/ai/ 32 32 Battling Bias in AI: Models for a Better World https://www.datactics.com/blog/ai-ml/battling-bias-in-ai-models-for-a-better-world/ Mon, 27 Jun 2022 14:37:52 +0000 https://www.datactics.com/?p=18882 The role of synthetic data At Datactics, we develop and maintain a number of internal tools for use within the AI and Software Development teams. One of which is a synthetic data generation tool that can be used to create large datasets of placeholder information. This was initially built to generate benchmarking datasets as another […]

The post Battling Bias in AI: Models for a Better World appeared first on Datactics.

]]>
Battling Bias in AI Models for a Better World

The role of synthetic data

At Datactics, we develop and maintain a number of internal tools for use within the AI and Software Development teams. One of which is a synthetic data generation tool that can be used to create large datasets of placeholder information. This was initially built to generate benchmarking datasets as another method of evaluating software performance, but has also been used to generate sample datasets for use in software demonstrations and for building proof-of-concept solutions. The project has been hugely beneficial, providing tailor-made, customisable datasets for each specific use case, with control over dataset size, column datatype, duplicate entries, and even insertion of simulated errors to mimic the uncleanliness of some real-world datasets. As this tool has seen increased usage, we’ve discussed and considered additional areas within data science that can benefit from the application of synthetic data.

One such area is in the training of machine learning models. Synthetic data and synthetic data generation tools such as the Synthetic Data Vault have already seen widespread usage in the AI/ML space. A report from Gartner has gone as far as to predict that synthetic data will comprise the majority of data used ML model training by 2030 – and understandably so.

Sourcing data for Artificial Intelligence models

Creating implementations of technologies such as deep learning can require massive datasets. Sourcing large, comprehensive, clean, well-structured datasets for training models is a lengthy, expensive process, which is one of the main barriers of entry to the space today. To generate synthetic data for use in place of real-world data opens the door to the world of AI/ML to many teams and researchers that would have otherwise been unable to explore it. This can lead to accelerated innovation in the space, and faster implementation of AI/ML technologies in the real world.

The use of synthetic data can clearly reduce the impact of many of the struggles faced during ML model building, however, there remains some potential flaws in ML models that cannot simply be solved by replacing real-world data with synthetic data, such as bias.

The risk of bias: a real world-example

There’s no doubt that using raw real-world data in certain use cases can create heavily biased models, as this training data can reflect existing biases in our world. For example, a number of years ago, Amazon began an internal project to build a natural language processing model to parse through the CVs of job applicants to suggest which candidates to hire. Thousands of real CVs submitted by prior applicants were used as training data, labelled by whether or not they were hired.

The model trained using this data began to reflect inherent biases within our world, within the tech industry, and within Amazon as a company, resulting in the model favouring male candidates over others, and failing to close the gender gap in recruitment at the company. A candidate would be less likely to be recommended for hiring by the model if their CV contained the word “women’s”, or mention of having studied at either of two specific all-women’s colleges. This model’s training data was not fit for purpose, as the model it produced reflected the failures of our society, and would have only perpetuated these failures had it been integrated into the company’s hiring process.

It’s important to note where the issue lies here: Natural Language Processing as a technology was not at fault in this scenario – it simply generated a model that reflected the patterns in the data it was provided. A mirror isn’t broken just because we don’t like what we see in it.

For a case such as this, generating synthetic training data initially seems like an obvious improvement over using real data to eliminate the concern over bias in the model entirely. However, synthetic training data must still be defined, generated, analysed, and ultimately signed off on for use by someone, or some group of people. The people that make these decisions are humans, born and raised in a biased world, as we all are. We unfortunately all have unconscious biases, formed from of a lifetime of conditioning by the world we live in. If we’re not careful, synthetic data can reflect the biases of the engineer(s) and decision maker(s) specifically, rather than the world at large. This raises the question – which is more problematic?

Will bias always be present?

As a simple point to analyse this from, let’s look at a common learning example used for teaching the basics of building an ML model – creating a salary estimator. In a standard exercise, we can use features like qualification level, years of experience, location, etc.. This doesn’t include names, gender, religion, or any other protected characteristic. With the features we use, you can’t directly determine any of this information. Can this data still reflect biases in our world?

A synthetic training dataset can still reflect the imperfect world we live in, because the presumptions and beliefs of those that ultimately sign off on the data can be embedded into it. Take, for instance, the beliefs of the executive team at a company like Airbnb. They’ve recently abolished location-based pay grades within the company, as they believe that an employee’s work shouldn’t be valued any differently based on their location –  if they’re willing to pay an employee based in San Francisco or New York a certain wage for their given contribution to the team, an employee with a similar or greater level of output based in Iowa, Ireland or India shouldn’t be paid less, simply because average income or cost of living where they live happens to be lower.

If synthetic training data for a salary estimation model were to be analysed and approved by someone that had never considered or disagreed with this point of view, the resulting model could be biased against those that don’t live in areas with high average income and cost of living, as their predicted salary would likely be lower than someone with identical details that lived in a different area.

Similarly, returning to the example of Amazon’s biased CV-scanning model, if we were to generate a diverse and robust synthetic dataset to eliminate gender bias in a model, there’s still danger of ml algorithms favouring candidates based on “prestige” of universities, for example. As seen with the news of wealthy families paying Ivy League universities to admit their children, this could be biased in favour of people from affluent backgrounds, people that are more likely to benefit from generational wealth, which can continue to enforce many of the socioeconomic biases that exist within our world.

Additionally, industries such as tech have a noteworthy proportion of the workforce that, despite having a high level of experience and expertise in their respective field, may not have an official qualification from a university or college, having learned from real-world industry experience. A model that fails to take this into account is one with an inherent bias against such workers.

How do we eliminate bias?

As these examples show, eliminating bias isn’t as simple as removing protected characteristics, or ensuring an equal balance of instances of possible values for these features. Trends and systems in our world that reflect the imperfections and biases that exist in it may not show it explicitly, and the beliefs in ways that these systems should fundamentally operate at all can vary wildly from person to person. It presents us with an interesting issue moving forwards – If, instead of using real-world data for models to mirror the world we live in, we use synthetic data representative of a world in which we wish to live, how do we ensure that this hypothetical future world this data represents is one that works for all of us?

Centuries ago, the rules and boundaries of society were decided on and codified by the literate, those that were fortunate enough to have the resources and access to education that allowed them to learn to read and write. The rules that governed the masses and defined our way of life were written into law, and, intentionally or otherwise, these rules tended to benefit those that had the resources and power to be in the position to write them. As technological advancement saw literacy rates increase, “legalese” – technical jargon used to obfuscate the meaning of legal documents, was used to construct a linguistical barrier once again, now to those that do not have the resources to attain a qualification in law.

We’re now firmly in the technological age. As computers and software become ever more deeply ingrained into the fabric of society, it’s important that we as developers are aware of the fact that, if we’re not careful with where and how we develop and integrate our technological solutions, we could be complicit in allowing existing systems of inequality and exploitation to be solidified into the building blocks of our society for the future. Technologies like AI and ML have the ability to allow us to tackle systemic issues in our world to benefit us all, not just those fortunate enough to sit behind the keyboard or their CEOs.

However, to achieve this, we must move forward with care, with caution, and with consideration for those outside the tech space. We’re not the only ones influenced by what we create. At a time where the boot of oppression can be destroyed, it’s important that it doesn’t just end up on a different foot.

The importance of well-designed AI

This is absolutely not to say that AI and ML should be abandoned because of the theoretical dangers that could be faced as a result of careless usage – it means these tools should be utilised and explored in the right places, and in the right way. The potential benefits that well-implemented AI/ML can provide, and the fundamental improvements to our way of life and our collective human prosperity that this technology can bring could change the world for the better, forever.

Technologies such as active learning and deep learning have the capabilities to help automate, streamline and simplify tasks that would otherwise rely on vast amounts of manual human effort.

The reduction in manual human effort and attention required for tasks that can safely and reliably be operated by AI/ML, and the insights that can be gained from its implementation can lead to further advancements in science, exploration and innovation in art, and greater work-life balance, giving us back time for leisure and opportunities for shared community experiences, creating a more connected, understanding society.

That being said, there’s just as much opportunity for misuse of these tools to create a more imbalanced, divided, exploited world, and it’s our job as developers and decision-makers to steer clear of this, pushing this technology and its implementations in the right direction.

In conclusion

I believe that if synthetic data is going to comprise a large majority of the data in use in the near future, it is vitally important that we stay aware of the potential pitfalls of using such data, and make sure to utilise it only where it makes the most sense. The difficulty here for each individual ML project is in determining whether synthetic data or real-world data is the ideal choice for that specific use case of building an ML model. The Turing Institute’s FAST Track Principles for AI Ethics (Fairness, Accountability, Sustainability/Safety and Transparency) provide a strong framework for ethical decision-making and implementation of AI and ML technology – the spirit of these principles must be applied to all forms of development in the AI/ML space, including the use of synthetic data.

There’s no room for complacency. With great power, comes great responsibility.

To learn more about an AI-Driven Approach to Data Quality, download our AI whitepaper by Dr. Browne.
And for more from Datactics, find us on Linkedin or Twitter.

The post Battling Bias in AI: Models for a Better World appeared first on Datactics.

]]>
Outlier Detection – What Is It And How Can It Help In The Improvements Of Data Quality?  https://www.datactics.com/blog/ai-ml/outlier-detection-what-is-it-and-how-can-it-help-in-the-improvements-of-data-quality/ Fri, 27 May 2022 11:05:50 +0000 https://www.datactics.com/?p=18748 Identifying outliers and errors in data is an important but time-consuming task. Depending on the context and domain, errors can be impactful in a variety of ways, some very severe. One of the issues with detecting outliers and errors is that they come in many different forms. There are syntactic errors, where a value like […]

The post Outlier Detection – What Is It And How Can It Help In The Improvements Of Data Quality?  appeared first on Datactics.

]]>
Outlier Detection

Identifying outliers and errors in data is an important but time-consuming task. Depending on the context and domain, errors can be impactful in a variety of ways, some very severe. One of the issues with detecting outliers and errors is that they come in many different forms. There are syntactic errors, where a value like a date or time is in the wrong format, and semantic errors, where a value is in the correct format but doesn’t make sense in the context of the data, like an age of 500. The biggest problem with creating a method for detecting outliers in dataset is how to identify a vast range of different errors with the one tool. 

At Datactics, we’ve been working on a tool to solve some of these problems and enable errors and outliers to be quickly identified with minimal user input. With this project, our goal is to assign a number to each value in a dataset which represents the likelihood that the value is an outlier. To do this we use a number of different features of the data, which range from quite simple methods like looking at the frequency of a value or its length compared to others in its column, to more complex methods using n-grams and co-occurrence statistics. Once we have used these features to get a numerical representation of each value, we can then use some simple statistical tests to find the outliers. 

When profiling a dataset, there are a few simple things you can do to find errors and outliers in the data. A good place to start could be to look at the least frequent values in a column or the shortest and longest values. These will highlight some of the most obvious errors but what then? If you are profiling numeric or time data, you could rank the data and look at both ends of the spectrum to see if there are any other obvious outliers. But what about text data or unique values that can’t be profiled using frequency analysis? If you want to identify semantic errors, this profiling would need to be done by a domain expert. Another factor to consider is the fact that this must all be done manually. It is evident that there are a number of aspects of the outlier detection process that limit both its convenience and practicality. These are some of the things we have tried to address with this project. 

Outlier Detection

When designing this tool, our objective was to create a simple, effective, universal approach to outlier detection. There are a large number of statistical methods for outlier detection that, in some cases, have existed for hundreds of years. These are all based on identifying numerical outliers, which would be useful in some of the cases listed above but has obvious limitations. Our solution to this is to create a numerical representation of every value in the data set that can be used with a straightforward statistical method. We do this using features of the data. The features currently implemented and available for use are: 

  • Character N-Grams 
  • Co-Occurrence Statistics 
  • Date Value 
  • Length 
  • Numeric Value 
  • Symbolic N-Grams 
  • Text Similarities 
  • Time Value 

We are also working on creating a feature of the data to enable us to identify outliers in time series data. Some of these features, such as date and numeric value are only applicable on certain types of data. Some incorporate the very simple steps discussed above, like occurrence and length analysis. Others are more complicated and could not be done manually, like co-occurrence statistics. Then there are some, like the natural language processing text similarities, which make use of machine learning algorithms. While there will be some overlap in the outliers identified by these features, on the most part, they will all single out different errors and outliers, acting as an antidote to the heterogenous nature of errors discussed above. 

One of the benefits of this method of outlier detection is its simplicity which leads to very explainable results. Once features of our dataset have been generated, we have a number of options in terms of next steps. In theory, all of these features could be fed into a machine learning model which could then be used to label data as outlier and non-outlier. However, there are a number of disadvantages to this approach. Firstly, this would require a labelled dataset to train the model with, which would be time-consuming to create. Moreover, the features will differ from dataset to dataset so it would not be a case of “one model fits all”. Finally, if you are using a “black box” machine learning method when a value is labelled as an outlier, you have no way of explaining this decision or evidence as to why this value has been labelled as opposed to others in the dataset. 

All three of these problems are avoidable using the Datactics approach. The outliers are generated using only the features of the original dataset and, because of the statistical methods being used, can be identified with nothing but the data itself and a confidence level (a numerical value representing the likelihood that a value is an outlier). There is no need for any labelling or parameter-tuning with this approach. The other big advantage is, that due to the fact we assign a number to every value, we have evidence to back-up every outlier identified and are able to demonstrate how they differ from other none-outliers in the data. 

Another benefit of this approach is that it is modular and therefore completely expandable. The features the outliers are based on can be selected based on the data being profiled which increases accuracy.  Using this architecture also give us the ability to seamlessly expand the number of features available to be used and if trends or common errors are encounter that aren’t identified using the current features, it is very straightforward to create another feature to rectify this. 

And for more from Datactics, find us on LinkedinTwitter, or Facebook.

The post Outlier Detection – What Is It And How Can It Help In The Improvements Of Data Quality?  appeared first on Datactics.

]]>
Rules Suggestion – What is it and how can it help in the pursuit of improving data quality?   https://www.datactics.com/blog/ai-ml/rules-suggestion-what-is-it-and-how-can-it-help-improve-data-quality/ Wed, 15 Sep 2021 09:06:21 +0000 https://www.datactics.com/?p=15573 Written by Daniel Browne, Machine Learning Engineer Defining data quality rules and collection of rules for data quality projects is often a manual time-consuming process. It often involves a subject matter expert reviewing data sources and designing quality rules to ensure the data complies with integrity, accuracy and / or regulatory standards. As data sources […]

The post Rules Suggestion – What is it and how can it help in the pursuit of improving data quality?   appeared first on Datactics.

]]>
Written by Daniel Browne, Machine Learning Engineer

Defining data quality rules and collection of rules for data quality projects is often a manual time-consuming process. It often involves a subject matter expert reviewing data sources and designing quality rules to ensure the data complies with integrity, accuracy and / or regulatory standards. As data sources increase in volume and variety with potential functional dependencies, the task of defining data quality rules becomes more difficult. The application of machine learning can aid with this task by identifying dependencies between datasets through to the uncovering patterns related to data quality and suggesting previously applied rules to similar data.   

At Datactics, we recently undertook a Rule Suggestion Project to automate the process of defining data quality rules for datasets through rule suggestions. We use natural language processing techniques to analyse the contents of a dataset and suggest rules in our rule library that best fit each column.  

Problem Area and ML Solution  

Generally, there are several data quality and data cleansing rules that you would typically want to apply to certain fields in a dataset. An example is a consistency check on a phone number column in a dataset such as checking that the number provided is valid and formatted correctly. Unfortunately, it is not usually as simple as searching for the phrase “phone number” in a column header and going from there. A phone number column could be labelled “mobile”, or “contact”, or “tel”, for example. Doing a string match in these cases may not uncover accurate rule suggestions. We need context embedded into this process and this is where machine learning comes in. We’ve been experimenting with building and training machine learning models to be able to categorise data, then return suggestions for useful data quality and data cleansing rules to consider applying to datasets.  

Human in the Loop  

The goal here is not to take away control from the user, the machine learning model isn’t going to run off with your dataset and do what it determines to be right on its own – the aim is to assist the user and to streamline the selection of rules to apply. A user will have full control to accept or reject some or all suggestions that come from the Rule Suggestion model. Users can add new rules not suggested by the model and this information is captured to improve the suggestions by the model. We hope that this will be a useful tool for users to make the process of setting up data quality and data cleansing rules quicker and easier.  

Developers View  

I’ve been involved in the development of this project from the early stages, and it’s been exciting to see it come together and take shape over the course of the project’s development. A lot of my involvement has been around building out the systems and infrastructure to help users interact with the model and to format the model’s outputs into easily understandable and useful pieces of information. This work surrounds allowing the software to take a dataset and process it such that the model can make its predictions on it, and then mapping from the model’s output to the individual rules that will then be presented to the user.  

One of the major focuses we’ve had throughout the development of the project is control. We’ve been sure to build out the project with this in mind, with features such as giving users control over how cautious the model should be in making suggestions by being able to set confidence thresholds for suggestions, meaning the model will only return suggestions that meet or surpass the chosen threshold. We’ve also included the ability to add specific word-to-rule mappings that can help maintain a higher level of consistency and accuracy in results for very specific or rare categories that the model may have little or no prior knowledge of. For example, if there are proprietary fields that may have their own unique label, formatting, patterns or structures, and their own unique rules related to that, it’s possible to define a direct mapping from that to rules so that the Rule Suggestion system can produce accurate suggestions for any instances of that information in a dataset in the future.  

Another focus of the project we hope to develop further upon is the idea of consistently improving results as the project matures. In the future we’re looking to develop a system where the model can continue to adapt based on how the suggested rules are used. Ideally, this will mean that if the model tends to incorrectly predict that a specific rule or rules will be useful for a given dataset column, it will begin to learn to avoid suggesting that rule for that column based on the fact that users tend to disagree with that suggestion. Similarly, if there are rules that the model tends to avoid suggesting for a certain column that users then manually select, the model will learn to suggest these rules in similar cases in the future.  

In the same vein as this, one of the recent developments that I’ve found really interesting and exciting is a system that allows us to analyse the performance of various different machine learning models on a suite of sample data, which allows us to gain detailed insights into what makes an efficient and powerful rule prediction model, and how we can expect models to perform in real-world scenarios. It provides us with a sandbox to experiment with new ways of creating and updating machine learning models and being able to estimate baseline standards for performance, so we can be confident of the level of performance for our system. It’s been really rewarding to be able to analyse the results from this process so far and to be able to compare the different methods of processing the data and building machine learning models and see which areas one model may outperform another and so on.  

Thanks to Daniel for talking to us about rules suggestion. If you would like to discuss further or find out more about rules suggestion at Datactics, reach out to Daniel Browne directly or you can reach out to our Head of AI, Fiona Browne

Get in touch or find us on LinkedinTwitter, or Facebook.

The post Rules Suggestion – What is it and how can it help in the pursuit of improving data quality?   appeared first on Datactics.

]]>
KTN: AI for Services on Tour | 23/02 https://www.datactics.com/events/ktn-ai-for-services-on-tour/ Tue, 16 Feb 2021 09:25:27 +0000 https://www.datactics.com/?p=13951 We are delighted to be one of a few Northern Irish’s businesses to be part of the KTN: AI for Service on Tour. This is a brilliant opportunity for those in attendance to hear what Datactics is doing in the space. Kainos, Adoreboard, and Analytics Engines are amongst the few other companies also representing Northern […]

The post KTN: AI for Services on Tour | 23/02 appeared first on Datactics.

]]>

We are delighted to be one of a few Northern Irish’s businesses to be part of the KTN: AI for Service on Tour.

This is a brilliant opportunity for those in attendance to hear what Datactics is doing in the space.

Kainos, Adoreboard, and Analytics Engines are amongst the few other companies also representing Northern Ireland. Dr Fiona Browne will be speaking at the roadshow, which will be happening on 23rd February.

For more details and registration, click here.

The post KTN: AI for Services on Tour | 23/02 appeared first on Datactics.

]]>
The Open University talk: Business Ethics | 17/02 https://www.datactics.com/events/ou-business-ethics-17-02/ Tue, 16 Feb 2021 09:06:10 +0000 https://www.datactics.com/?p=13949 Matt Flenley, Marketing and Partnerships Manager at Datactics will be speaking this week at The Open University, delivering a talk on Business Ethics. The talk is going to cover four things: You can also read Matt’s blogs here such as a piece on AI Ethics he has written about or find out about our people here, explore our open vacancies. If you’re […]

The post The Open University talk: Business Ethics | 17/02 appeared first on Datactics.

]]>
Business Ethics

Matt Flenley, Marketing and Partnerships Manager at Datactics will be speaking this week at The Open University, delivering a talk on Business Ethics.

The talk is going to cover four things:

  1. The impact of unintended and cultural bias in machine learning 
  2. What to do if your business loses or has no soul
  3. Corporate Social Responsibility – Looking after people when the world is upside down
  4. The benefits and pitfalls of big corporate machines and rapid growth start-ups when it comes to doing charitable work and being a force for good. 

You can also read Matt’s blogs here such as a piece on AI Ethics he has written about or find out about our people here, explore our open vacancies. If you’re curious about working at Datactics please drop Matt a line on LinkedIn for a chat. 

The post The Open University talk: Business Ethics | 17/02 appeared first on Datactics.

]]>
The Open University Business Ethics talk & Datactics https://www.datactics.com/blog/marketing-insights/ou-business-ethics-talk/ Mon, 15 Feb 2021 13:00:00 +0000 https://www.datactics.com/?p=13941 Matt Flenley, Marketing and Partnerships Manager at Datactics will be speaking this week at The Open University, delivering a talk on Business Ethics. Prior to The Open University, we thought it would be a good idea to have a chat and find out why this topic, what other views he hopes to talk about, and the importance of business ethics, […]

The post The Open University Business Ethics talk & Datactics appeared first on Datactics.

]]>
Business Ethics

Matt Flenley, Marketing and Partnerships Manager at Datactics will be speaking this week at The Open University, delivering a talk on Business Ethics.

Prior to The Open University, we thought it would be a good idea to have a chat and find out why this topic, what other views he hopes to talk about, and the importance of business ethics, especially from a data perspective.

 Hi Matt, what can you tell us about the talk you are giving at The Open University?

I am really excited to give this talk as this is an area I am passionate about. The talk is going to cover four things:

  1. The impact of unintended and cultural bias in machine learning 
  2. What to do if your business loses or has no soul
  3. Corporate Social Responsibility – Looking after people when the world is upside down
  4. The benefits and pitfalls of big corporate machines and rapid growth start-ups when it comes to doing charitable work and being a force for good. 

How important do you think ethics is within the data industry? 

I think ethics are important. People very often think about algorithms and automated rules as being the critical part to measure, but before all of that, there’s data. You must involve data in the process, to be able to understand whether the sample you are measuring is right. The quality of the information you use depends on whether the information is complete and whether you sought out the correct data, to begin with. 

Do you think that an understanding of ethics and data has increased in importance in recent years? 

I do, due to the increased understanding of the importance of AI. For example, there are images on the internet, that some specific algorithms can learn from, to be able to generate people that don’t actually exist. As a result of this, images are created that are recognisable to you or me, but these people don’t exist – it’s a clever piece of AI. A problem that has been increasingly recognised with the source material is that it doesn’t contain enough images of older women. This has meant that as the algorithm generated people, the AI’s conclusions were that as they age, everyone becomes an old man! Due to the fact that there is an absence of older women images, an inaccurate representation of society becomes prevalent. If you don’t have the right data going into an algorithm, you won’t have accurate data coming out of it. People are increasingly understanding the importance of data, and examples like this shine a light on bias and how damaging it can be to society. 

How important it is to share this knowledge with the leaders of tomorrow at The Open University? 

It is absolutely critical! I believe it’s vital for business people as well as technologists to be ethicists. The more people there is that are ethicists in the discussion, the more you are going to end up with less bias in the room which will fundamentally lead to fairer outcomes. 

How important is The Open University and Datactics partnership? When did it begin? 

The relationship has been longstanding. We have a number of staff members that are studying at The Open University alongside working and indeed one working as a lecturer at the institution. One of the best parts of working with The Open University is the access to talent in unexpected places. There are a number of students that are pursuing careers in technology, who have not gone about it in a conventional way, like immediately heading to a red-brick university for a computer science degree. Some of them are further down the line in different careers and have decided to make a career change, and some have decided to retrain while working. It’s a real mix and a really encouraging, affirming environment for people to pursue their education and career. 

Thank you Matt! We will be sharing soundbites from this talk, so make sure to keep an eye out for those. 

You can also read Matt’s blogs here such as a piece on AI Ethics he has written about. Or find out about our people here, explore our open vacancies, or if you’re curious about working at Datactics please drop Matt a line on LinkedIn for a chat. 

The post The Open University Business Ethics talk & Datactics appeared first on Datactics.

]]>
AI Con 2020 Interview with Dr. Fiona Browne and Matt Flenley https://www.datactics.com/blog/marketing-insights/ai-con-2020-interview-with-dr-fiona-browne-and-matt-flenley/ Wed, 02 Dec 2020 12:00:36 +0000 https://www.datactics.com/?p=13102 Dr. Fiona Browne, Head of AI, and Matt Flenley, Marketing and Partnerships Manager at Datactics are contributing to AI Con 2020 this year.    After a successful first year, AI Con is back! This year it’s said to be bigger and better than ever with a range of talks across AI, including AI/ML in Fintech; AI in the public sector; the impact of arts; the impact of […]

The post AI Con 2020 Interview with Dr. Fiona Browne and Matt Flenley appeared first on Datactics.

]]>
Dr. Fiona Browne, Head of AI, and Matt Flenley, Marketing and Partnerships Manager at Datactics are contributing to AI Con 2020 this year.   
AI CON

After a successful first year, AI Con is back!

This year it’s said to be bigger and better than ever with a range of talks across AI, including AI/ML in Fintech; AI in the public sector; the impact of arts; the impact of AI on research and innovation; and how AI has caused a change in the screening industry. All these topics will be tackled by world-leading technology professionals and business leaders to unpack how AI is changing our world.  

Ahead of AI Con 2020 taking place virtually on the 3rd and 4th December, we thought it would be a good idea to sit down with two of those industry experts, Fiona and Matt, and ask them a few things. I wanted  to understand  what their involvement with AI is this year, any previous involvements they’ve had with AI Con, what they envisage to be the key takeaways, and of course, what talks they are most looking forward to engaging with themselves.    

Hi, Fiona and Matt. Perhaps to kick-off, you could tell talk a bit about why you both wanted to be involved with AI Con?  

Fiona: Hello! Well, we were involved with it last year and it was a great experience. We were involved in the session that focused on business and the applications of AI. We were asked then to pull a session together for this year, and we’ve been able to focus on the area that Datactics specialises in, which is Financial Services. 

This has given us the chance to unpack how machine learning can be used in Financial Services; we’ve tried to cover three broad areas within this session:  firstly, understanding those people who work in the financial institutions. Secondly, we will then delve into our bread-and-butter data quality & matching, and lastly the importance of data governance.  

Matt: Hi! Last year I worked with Fiona to arrange our involvement. This year, we had the chance to have more time to prepare. This meant that Fiona and I could collaborate even more so.

I particularly enjoyed approaching speakers such as Peggy and Sarah (to name but a few!). What interests me most is the application of AI and we are delighted to have contributed towards pulling together such a strong line-up.

The variety of talks too will bring a wide range of attendees!  

This is the second year. Perhaps you both could talk to me about your previous involvement with AI Con, if any, and how it has evolved?  

Fiona: Last year we discovered there was a significant appetite for this content. We have been able to expand this year’s conference over more streams by being more strategic with the messaging. We have also been able to create a session for ourselves (one that we know about and are vastly passionate and experienced in). This year, the conference is not local, it’s much more international. Even if you look at the line-up of our speakers for our session, they come from New York and Switzerland.

The International flavour offers a greater perspective, knowledge, and insight.   

Matt: I agree. I’ve been blown away by how engaged people have been. We have Andrew Jenkins, the Fintech Envoy for Northern Ireland and Gary Davidson of Tech Nation, who are keen to contribute to where they think the market is going.

The panel I am chairing is focusing on FinTechs that are scaling and exporting with a focus on why people should invest in NI technology. The event is well-prepared and timely, and I am looking forward to chairing on Thursday.  

So, Matt what will the panel you are chairing be discussing, who is on the panel?  

Matt: We are joined by Pauline Timoney, COO of Automated Intelligence; Chris Gregg, CEO and Founder of Light Year; and as I mentioned before, Andrew Jenkins, and Gary Davidson. We are going to look at the opportunities to collaborate with incubators like TechNation, the impact of COVID-19, Brexit, and FinTech investments for last year.

FinTech is a hugely growing sector, and we are excited to delve into why and explore where the sector is going next!  

Fiona, you have been one of the curators of AI Con, how has that process been?  

Fiona: It has been great! We were given the remit of FinTech and we could pick and choose what topics and who we wanted to add to the line-up. We have a very clear message. The talks are practical application-centred with a focus on trends and experience.

One of the largest Wealth Management Companies in the world is coming to speak to discuss their usage of technology, future projections, and more!  

What do you both envisage the biggest takeaways of AI Con being?  

Matt: One of the biggest takeaways is going to be the incredible, thriving NI FinTech sector.

When you look around the ecosystem, for example of the FinTech ecosystem report you can see the sheer explosion of firms and the problems being solved.    

Fiona: There will be maturity across the board, with more companies implementing these technologies.

People are increasingly thinking about Machine Learning and AI… how can we use it?

I believe there will be a skillset gap which will be a challenge; it will be a challenge for many firms to attract the talent that can implement these processes and technologies.  

To wrap up! On a personal, note, what talk(s) are you both most looking forward to?  

Matt: I am excited to hear from Sarah Gadd, Credit Suisse. Her wealth of experience will offer great insight into how they apply AI into reality. Not only are they on the cutting edge of technology but they have taken it off the ground. I am also looking forward to Peggy Tsai’s contribution.  

Fiona: From our side, Sarah and Peggy will be interesting. It’s an honour to have a speaker like Sarah Gadd. It’s brilliant to hear how they are applying this technology now in a regulated area. What are their challenges, solutions? Also, Peggy is giving time to the complexity of data, which is more important than ever before. Austin too will be unpacking AI in the arts and music sector. I am looking forward to the overall variety, calibre, and diversity of point of view that will be offered.  

Thank you both, for taking the time out of our schedules! If you haven’t got your place for AI Con 2020 reserved, there is no time like the present! You can secure your place for free here. It will be a brilliant conference. Who’s ready to learn more about AI? 

The post AI Con 2020 Interview with Dr. Fiona Browne and Matt Flenley appeared first on Datactics.

]]>
How can banks arm themselves against increasing regulatory and technological complexity? – FinTech Finance https://www.datactics.com/blog/ai-ml/2020-the-year-of-aml-crisis/ Tue, 03 Nov 2020 10:00:22 +0000 https://www.datactics.com/?p=12885 Datactics Head of Artificial Intelligence, Dr. Fiona Browne, recently contributed to the episode of FinTech Finance: Virtual Arena. Steered by Douglas MacKenzie, the interview covered the extent of the Anti-Money Laundering (AML) fines currently faced by banks over the last number of years and start to unpack what we do at Datactics in relation to […]

The post How can banks arm themselves against increasing regulatory and technological complexity? – FinTech Finance appeared first on Datactics.

]]>
Image of Fiona Browne

Datactics Head of Artificial Intelligence, Dr. Fiona Browne, recently contributed to the episode of FinTech Finance: Virtual Arena. Steered by Douglas MacKenzie, the interview covered the extent of the Anti-Money Laundering (AML) fines currently faced by banks over the last number of years and start to unpack what we do at Datactics in relation to this topic: helping banks address their data quality, with essential solutions designed to combat fraudsters and money launderers.  

How can banks arm themselves against increasing regulatory and technological complexity?

Fiona began by highlighting how Financial Institutions face significant challenges when managing their data. However, the increase in financial regulations since the financial crisis of 2008/2009, ensuring data quality has gained in its importance, obliging institutions to have a handle on their data and make sure it is up to date. Modern data quality platforms mean that the timeliness of data can now be checked via a ‘pulse check’ to ensure that it can be used in further downstream processes and that it meets regulations.

Where does Datactics fit in to the AML arena? 

A financial institution needs to be able to verify the client that they are working with when going through the AML checks. The AML process itself is vast but at Datactics, we focus on the area of profiling data quality and matching – it is our bread and butter. Fiona stressed the importance of internal checks as well as public entity data, such as sanction and watch lists.

In a nutshell, there is a significant amount of data to check and compare and with lack ofquality data, it becomes a difficult and costly task to perform so we at Datactics, focus on data quality cleansing and matching at scale.

Why should banks look to partner, rather than building it in house? 

One of the key issues of doing this in house is not having the necessary resources to perform the required checks and adhere to the different processes in the AML pipeline. According to the Financial Conduct Authority (FCA), in-house checks and a lack of data are causing leading financial institutions to receive hefty fines. Fiona reiterated that when Banks bring it back to the fundamentals and get their processes right and data into order, they can then use the partner’s technology to automate and streamline these processes, which in turn speeds up the onboarding process and ensure the legislation is being met.

Why did the period of 2018/2019 have such a high number of AML breaches?

Fiona explained that many transactions go back over a decade, it takes time to identify such transactions. AML compliance is difficult to achieve and regulators know that it is challenging. The regulators are doing a better job at providing guidelines to financial institutions, enabling them to address these regulations. Fiona reaffirmed that perhaps 2018/2019 was a wakeup call that was well needed to address this issue. 

And with AML fines already at $5.6 billion this year, more than the whole of 2019, what can banks do? 

Looking at the US, where although the fines for non-compliant AML processes are not as high as 2019, there is still a substantial number of fines being issued, Fiona said that it is paramount to ensure financial institutions have the right data and the right processes in place. Although it can be considered as an administrative burden, there is real criminal activity behind the scenes, which is why AML is so important. It is vital that financial institutions get a handle on this, enabling them to also improve the experience for their clients. 

The fines will continue to be issued. Why should firms look to clean data when they just want to get to the bottom line? 

It is essential to have the building blocks in place. Data quality is key for the onboarding process, but it is also essential downstream, particularly if you are wanting to do more trend analysis. Getting the fundamentals right at the start will pay back in dividends.  

Are there any other influences that Artificial Intelligence (AI) and Machine Learning (ML) can have on the banks onboarding process? 

According to Fiona, there is no silver bullet. One AI/ML technique will not solve all the AML issues. It is about deploying these techniques when approaching the issues in different ways. A large part of the onboarding process is gathering data and extracting relevant information from the data set. Fiona has seen a lot of Neuro-Linguistic Programming (NLP) techniques employed to extract the data from documents. At Datactics, we use Machine Learning in the data matching process to reduce the manual review time. ML techniques are employed in supervised and unsupervised approaches geared to pinpoint fraudulent transactions. We think that the graph databases and network analysis side of machine learning is an interesting area, we are currently exploring how it can be deployed into AML and fraud detection. 

Bonus content: In the US and Canada, one way to potentially identity fraud was to look at transactions that were over $10,000. The criminals however become increasingly savvy and utilise Machine Learning to muddy their tracks. By doing this, they can divide transactions into randomised amounts to make them appear less pertinent. As Fiona put it ‘the cat and mouse game’. 

If you are employed in the banking sector or if you must deal with large and messy datasets, you will probably face challenges derived from poor data quality, standardization, and siloed information. 

Datactics provides the tools to tackle these issues with minimum IT overhead, in a powerful and agile way. Get in touch with the self-service data quality experts today to find out how we can help.

The post How can banks arm themselves against increasing regulatory and technological complexity? – FinTech Finance appeared first on Datactics.

]]>
All things AML and FinTech Finance: Virtual Arena – weekly round-up https://www.datactics.com/blog/marketing-insights/weekly-round-up-aml-ff-arena/ Fri, 30 Oct 2020 14:00:15 +0000 https://www.datactics.com/?p=12865 We started by looking at why data matching is a key part of any AML & KYC process. It’s made more complex by the different standards, languages, and levels of quality in the different data sources on which firms typically rely on. It’s expensive too: a recent Refinitiv article states that some firms are spending up to […]

The post All things AML and FinTech Finance: Virtual Arena – weekly round-up appeared first on Datactics.

]]>
AML

We started by looking at why data matching is a key part of any AML & KYC process. It’s made more complex by the different standards, languages, and levels of quality in the different data sources on which firms typically rely on. It’s expensive too: a recent Refinitiv article states that some firms are spending up to $670m each year on KYC. 

As the week went on, we looked at some of the key areas where Datactics makes a real difference in helping firms to reduce manual effort, reduce risk, and bring down the extremely high cost of client onboarding. 

We then looked at the impact of the EU’s fifth AML directive and how firms are able to automate their sanctions screening with the sanctions match engine.  

We also explored how we support efforts to reduce risk and financial crime involving the clever tech we’ve used to transliterate between character sets and perform multi-language matching. 

Finishing up, we shared our talk with the EDM Council that explored how AI can make a real difference to the story. Bringing even more predictive capabilities to human effort means that finding those edge cases, don’t have to wait until all the obvious ones have been ruled out. We also composed a piece entitled ‘Lifting the lid on the problems that Datactics solves’, if you missed it out can check it out here

AML

If you missed any of the pieces we shared this week, feel free to read them on our DataBlog or on our social media platforms.  

In other news this week, our very own Head of AI, Dr Fiona Browne contributed to the FinTech Finance: Virtual Arena. This session discussed the huge AML fines faced by the banks over the last number of years.

AML

At Datactics we are a company that helps banks gain quality data – a tool that is equipped to fight fraudsters and money launderers. Fiona was able to share her experience as Head of AI at Datactics to shed light on how banks can arm themselves sufficiently to allow them to stand up to increasing regulatory and technological complexity. 

Datactics provides the tools to tackle these issues with minimum IT overhead, in a powerful and agile way.  If you missed the session, you can watch it back on LinkedIn by following this link.  

Have a great weekend! Hope you enjoyed this week’s round-up.    

Click here for more by the author, or find us on LinkedInTwitter or Facebook for the latest news. You can also read the last round up here or keep an eye out for our next one! 

The post All things AML and FinTech Finance: Virtual Arena – weekly round-up appeared first on Datactics.

]]>
EDM Talks: Lifting the lid on the problems that Datactics solves https://www.datactics.com/blog/marketing-insights/lifting-the-lid-edm/ Fri, 30 Oct 2020 09:00:00 +0000 https://www.datactics.com/?p=12630 Recently we partnered with the EDM Council on a video that investigates the application of AI to data quality and matching. In this EDM Talk, we lift the lid on how our AI team is developing solutions to help our clients, especially in the area of entity matching and resolution. This plays an important role in on-boarding, KYC and obtaining a single […]

The post EDM Talks: Lifting the lid on the problems that Datactics solves appeared first on Datactics.

]]>
Recently we partnered with the EDM Council on a video that investigates the application of AI to data quality and matching.

In this EDM Talk, we lift the lid on how our AI team is developing solutions to help our clients, especially in the area of entity matching and resolution. This plays an important role in on-boarding, KYC and obtaining a single customer view.

problems

What is the the data challenge? 

Institutions such as banks, often have large sets of very messy data which may be siloed and subject to duplication. When onboarding a new client or building a legal entity master, institutions may need to match clients to both internal datasets and external sources. These include vendors such as Dun and Bradstreet and Bloomberg, or taking data from a local company registration authority, such as Companies House in the UK.  This data needs to be cleaned, normalised and matched to create a single golden record in order to verify their identify and adhere to regulatory compliance. For many institutions, this can be a heavily manual and time-consuming process.  

What needs to be done to improve entity matching? 

In entity resolution, there are two main challenges to address: the data matching side; and the manual remediation side which is required to resolve those instances where we have low confidence, mismatched or unmatched entities.  

Datactics undertook a recent Use Case where we explored matching entities between two open global entity datasets Refinitiv ID and Global LEI. We augmented our fuzzy matching rule-based approach with ML to address and improve efficiencies around the manual remediation of low confidence matches.  We performed matching of entities between these datasets using deterministic rules, as many firms do today. We followed the standard approach in place for many onboarding teams, whereby entity matches that are low confidence go into manual review. Within Datactics, data engineers were timed to measure the average time taken to remediate a low confidence match which could take up to one minute and a half per entity pair. This might be fine if there are just a few entities that you need to check but whenever you have hundreds, thousands or many hundreds of thousands this highlights how challenging the task becomes and the resource and time required to commit to this task.  

At Datactics we thought this was an interesting problem to explore. We were keen to fully understand whether AI-enabled data quality and matching would bring benefits in terms of efficeincy and improvement to data quality to our clients who undertake such tasks. 

What did Datactics want to achieve? 

We were particularly interested to understand how we could reduce manual effort and increase the accuracy of data matching. We wanted to understand what benefits machine learning would bring to the process, using an approach that was transparent and which would make decision-making open and obvious to an audit or regulator. 

What benefit is there from applying Machine Learning to this problem? 

Machine learning is a broad domain. It covers application areas from speech recognition, understanding language to automating processes and decision making. Machine learning approaches are built on mathematical algorithms and statistical models. The advantages of these approaches is the ability of the algorithms to learn from data, uncover patterns and then use this learning to make predictions on new unseen cases. We see machine learning deployed in everyday life from our email filters through to personal assistance devices such as Amazon Echo and Apple Siri. 

Within the financial sector, Machine Learning techniques are being applied to tasks including profiling behaviour for fraud detection; the use of natural language processing to extract information from unstructured text to enrich the Know Your Customer onboarding process; through to the use of chatbots to automatically address customer queries and customise product offerings.  

At Datactics we view Machine Learning as a tool to automate manual tasks through to a decision making aid augmenting processing such as matching, error detection and data quality rule suggestion for our clients. This then frees up time and resource for clients enabling them to do more in their role.  

How can machine learning be applied to the process of matching? 

Within Datactics we have augmented our rules-based matching process with machine learning. Our solution has a focus on explainability and transparency to enable the tracing of why and how predictions have been made. This transparency is important to financial clients in terms of adhering to regulations through to the building of trust in the system which is providing these predictions. Using high confidence predictions, we can automate a large volume of manual review. For example, in the matching Use Case, we were able to reduce manual review burden by 45%, freeing up client’s time with expertise deployed to focus on the difficult edge cases. 

At Datactics we train machine learning models using examples of matches and non matches. Over time patterns within that data are detected and this learning can be used to make predictions on new unseen cases. A reviewer can validate the predictions and feed this back into the algorithm. This is known as human in the loop machine learning. Eventually the algorithm will become smarter in predictions making more accurate predictions. High quality predictions can lead to less manual review, by reducing the volume that need reviewed. 

The models we have built need good quality data. We used the Datactics self-service data quality platform to create good quality data sets and apply labels to that data.  Moving forward at Datactics, we are seeking to augment AI and to look at graph linkage analysis, as well as furthering enhancing our feature engineering and data set capabilities.  

To learn more about what the work we are doing with machine learning and how we are applying it into the Datactics platform, all content is available on the Datactics website. We also have a whitepaper on AI-enabled data quality. 

EDM

For a demo of the system in action please fill out the contact form. 

To find out more about what we do at Datactics, check out the full EDM talks video below

We will soon be publishing Part 2 of this blog series that will look at the application of AI and ML in the Fintech sector in more detail as well as an entity resolution use case.  

Click here for the latest news from Datactics, or find us on Linkedin, Twitter or Facebook 

The post EDM Talks: Lifting the lid on the problems that Datactics solves appeared first on Datactics.

]]>
Datactics contributes to Bank of England and FCA’s AI Public-Private Forum https://www.datactics.com/press-releases/datactics-contributes-to-bank-of-england-and-fcas-ai-public-private-forum/ Mon, 12 Oct 2020 07:27:00 +0000 https://www.datactics.com/?p=12644 Belfast, London, New York, 12th October 2020 Datactics is pleased to announce that its Head of AI, Dr Fiona Browne, has been invited to participate in the Artificial Intelligence Public-Private Forum, joining 20 other experts from across the financial technology sectors as well as academia, along with the observers from the Information Commissioner’s Office and […]

The post Datactics contributes to Bank of England and FCA’s AI Public-Private Forum appeared first on Datactics.

]]>
Belfast, London, New York, 12th October 2020
AI Public-Private Forum

Datactics is pleased to announce that its Head of AI, Dr Fiona Browne, has been invited to participate in the Artificial Intelligence Public-Private Forum, joining 20 other experts from across the financial technology sectors as well as academia, along with the observers from the Information Commissioner’s Office and the Centre for Data Ethics and Innovation.

The purpose of the Forum, launched by the Bank of England and the Financial Conduct Authority, is to facilitate dialogue between the public and private sectors to better understand the use and impact of AI in financial services, which will help further the Bank’s objective of promoting the safe adoption of this technology.

The AI Public-Private Forum, with an intended duration of one year, will consist of a series of quarterly meetings and workshops structured around three topics: data, model risk management, and governance.

Commenting on the initiative’s launch, the deputy governor for markets and banking at the BofE, David Ramsden said:

The existing regulatory landscape is somewhat fragmented when it comes to AI, with different pieces of regulation applying to different aspects of the AI pipeline, from data through model risk to governance. The policy must strike a balance between high-level principles and a more rules-based approach. We also need to future-proof our policy initiatives in a fast-changing field.

The specific aims of the Forum are: firstly, to share information and understand the practical challenges of using AI in financial services, identify existing or potential barriers to deployment, and consider any potential risks or trade-offs; secondly, to gather views on areas where principles, guidance, or regulation could support safe adoption of these technologies; and finally, to consider whether once the forum has completed its work ongoing industry input could be useful and if so, what form this could take.

The knowledge, experience, and expertise of the Forum’s members and observers will be invaluable in helping us to contextualise and frame the Bank’s thinking on AI, its benefits, its risk and challenges, and any possible future policy initiatives.

Fiona Browne, Head of AI at Datactics, said:

I’m really excited and honoured to be part of such a timely forum. AI/ML services touch our everyday lives from recommending what we watch to groceries that we buy.

Within financial services, ML can offer efficiency benefits reducing manual time-consuming tasks, to saving customers money in suggesting best financial products to bespoke customer service solutions and fraud detection. These solutions need to sit within a legal and regulatory environment in the financial sector and are not without their risks and challenges.

I hope to offer the forum insights and experience of the practical implementation of ML-based on the areas of data quality and fairness through to transparency and explainability in the process and model predictions through to the monitoring of models in production. Excited to focus and tease out potential guidance and best practice on how to safely adopt and deploy such solutions.

What is the AI Public-Private Forum?

The BOE working with FCA have established the AIPPF (AI Public-Private Forum). This forum launched in October 2020 and consists of members reflecting a variety of views who applied to be on the forum bringing with them their expertise in the area of AI/ML. The AIPPF will:

  • Share information and understand the practical challenges of using AI/ML within financial services, as well as the barriers to deployment and potential risks. 
  • Gather views on potential areas where principles, guidance or good practice examples could be useful in supporting safe adoption of these technologies. 
  • Consider whether ongoing industry input could be useful and what form this could take (e.g. considering an FMSB-type structure or industry codes of conduct). 

More information about the Forum can be found here.

The post Datactics contributes to Bank of England and FCA’s AI Public-Private Forum appeared first on Datactics.

]]>
Part 2: Self-service data improvement is the route to better data quality https://www.datactics.com/blog/marketing-insights/new-self-service-data-improvement-is-the-route-to-better-data-quality/ Thu, 08 Oct 2020 12:00:37 +0000 https://www.datactics.com/new-self-service-data-improvement-is-the-route-to-better-data-quality/ The route to better data quality – It’s easy to say that planning a journey has been made far simpler since the introduction of live traffic information to navigation apps. You can now either get there faster, or at the very least phone ahead to explain how long you’ll be delayed. It’s just as easy […]

The post Part 2: Self-service data improvement is the route to better data quality appeared first on Datactics.

]]>
The route to better data quality – It’s easy to say that planning a journey has been made far simpler since the introduction of live traffic information to navigation apps. You can now either get there faster, or at the very least phone ahead to explain how long you’ll be delayed.

This image has an empty alt attribute; its file name is Retail-Banking-Part-2-1-1-1024x1024.png

It’s just as easy to say that we wouldn’t think of ignoring this kind of data. Last week’s blog looked at the reasons for why measuring data is important for retail banks, but unless there is a strategy taken to react to the results it’s arguably pretty much meaningless.

Internal product owners, risk and compliance teams all need to use specific and robust data measurements for analytics and innovation; to identify and serve customers; and to comply with the reams of rules and regulations handed down by regulatory bodies. Having identified a way of scoring the data, it would be equally as bizarre to ignore the results.

However, navigating a smooth path in data management is hampered by the landscape being vast, unchartered and increasingly archaic. Many executives of incumbent banks are rightly worried about the stability of their ageing systems and are finding themselves ill-equipped for a digital marketplace that is evolving with ever-increasing speed.

Key business aims of using data to achieve necessary cost-savings, and grow revenues through intelligent analytics, snarl up against the sheer volume of human and financial resources needing to be ploughed into these systems, in an effort to meet stringent regulatory requirements and to reduce the customer impact, regulatory pressure and painful bad press caused by an IT outage.

Meanwhile, for those who have them, data metrics are revealing quality problems, and fixing these issues tends to find its way into a once-off project that relies heavily on manual rules and even more manual re-keying into core systems. Very often, such projects have no capacity to continue that analysis and remediation or augmentation into the future, and overtime data that has been fixed at huge cost starts to decay again and the same cycle emerges.

But if your subject matter experts (SMEs) –  your regulatory compliance specialists, product owners, marketing analytics professionals – could have cost-effective access to their data, it could put perfecting data in the hands of those who know what the data should look like and how it can be fixed.

If you install a targeted solution that can access external reference data sources, internal standards such as your data dictionary, and user and department-level information to identify the data owner, you can self-serve to fix the problems as they arise.

This can be done via a combination of SME review and through machine learning technology that evolves to apply remedial activities automatically because the rules created through correcting broken records can contain the information required to fix other records that fail the same rules.

It might sound like futuristic hype – because AI is so hot right now – but this is a very practical example of how new technology can address a real and immediate problem, and in doing so complement the bank’s overarching data governance framework.

It means that the constant push towards optimised customer journeys and propositions, increased regulatory compliance, and IT transformation can rely on regularly-perfected data at a granular, departmental level, rather than lifting and dropping compromised or out-of-date datasets.

Then the current frustration at delays in simply getting to use data can be avoided, and cost-effective, meaningful results for the business can be delivered in days or weeks rather than months or years.

Head over the next part: ‘Build vs Buy – Off-the-shelf or do-it-yourself? ‘ or click here for part 1 of this blog, covering the need for data quality metrics in retail banking.

This image has an empty alt attribute; its file name is part-1-1024x594.png

Matt Flenley is currently plying his trade as chief analogy provider at Datactics. If your data quality is keeping you awake at night, check out Self-Service Data Quality™ our award-winning interactive data quality analysis and reporting tool that is built to be used by business teams who aren’t necessarily programmers.

The post Part 2: Self-service data improvement is the route to better data quality appeared first on Datactics.

]]>
IRMAC Reflections with Dr. Fiona Browne https://www.datactics.com/blog/ai-ml/irmac-reflections-with-dr-fiona-browne/ Mon, 07 Sep 2020 09:00:00 +0000 https://www.datactics.com/?p=11379 There is a lot of anticipation surrounding Artificial Intelligence (Al) and Machine Learning (ML) in the media. Alongside the anticipation is speculation – including many articles placing fear into people by inferring that AI and ML will replace our jobs and automate our entire lives! Dr Fiona Browne, Head of AI at Datactics recently spoke at an IRMAC (Information […]

The post IRMAC Reflections with Dr. Fiona Browne appeared first on Datactics.

]]>
There is a lot of anticipation surrounding Artificial Intelligence (Al) and Machine Learning (ML) in the media. Alongside the anticipation is speculation – including many articles placing fear into people by inferring that AI and ML will replace our jobs and automate our entire lives!

Dr Fiona Browne, Head of AI at Datactics recently spoke at an IRMAC (Information Resource Management Association of Canada) webinar, alongside Roger Vandomme, of Neos, to unpack what AI/ML is, some of the preconceptions, and the reasons why different approaches to ML are taken…  

IRMAC reflections with Dr Browne

What is AI/ ML? 

Dr. Browne clarified that whilst there is no official agreed-upon definition of AI, it can be depicted as the ability of a computer to perform cognitive tasks, such as voice/speech recognition, decision making, or visual perception. ML is a subset of AI, entailing different algorithms that learn from input data.  

A point that Roger brought up at IRMAC was that the algorithms learn to identify patterns within the data and the used patterns enable the ability to distinguish between different outcomes, for example, the detection of a fraudulent or non-fraudulent transaction. 

ML takes processes that are repetitive and automates them. At Datactics, we are exploring the usage of AI and ML in our platform capabilities – Dr Fiona Browne 

What are the different approaches to ML?  

Supervised, unsupervised, and reinforcement machine learning.  Dr. Browne communicated that at a broad level, there are three approaches: supervised, unsupervised, and reinforcement machine learning.  

In supervised ML, the model learns from a labelled training data set. For example, financial transactions would be labelled as either fraudulent or genuine fed into the ML model. The model then learns from this input and can distinguish the difference.  

Where data is unlabelled, Dr. Browne explained that unsupervised ML would be more appropriate, where the model learns from unlabelled data. There is a key difference here with supervised ML in that the model would seek to uncover clusters or patterns inherent in the data to enable it to separate them out.  

Finally, reinforcement machine learning involves models that continually learn and update from performing a task. For example, a computer algorithm learning how to play the game ‘Go’. This is achieved by the outputs of the model being validated and that validation being provided back to the model.  

The difference between supervised learning and reinforcement learning is that in supervised learning the training data has the answer key with it, meaning the model is trained with the correct answer.

In contrast to this, in reinforcement learning, there is no answer, but the reinforcement agent selects what to do to perform the specific task.

It is important to remember that if there is no training dataset present, it is bound to learn from its experience.  Often the biggest trial comes when a model is being transferred out of the training environment and into the real world.

Now that AI/ML and the different approaches have been unpacked… the next question is how does explainability fit into this?  The next mini IRMAC reflection will unravel what explainability is and what the different approaches are. Stay tuned! 

Fiona has written an extensive piece on AI enabled data quality, feel free to check it out here. 

Click here for more by the author, or find us on LinkedinTwitter or Facebook for the latest news.

The post IRMAC Reflections with Dr. Fiona Browne appeared first on Datactics.

]]>
The Three Pillars of AI https://www.datactics.com/blog/cto-vision/cto-vision-the-3-pillars-of-successful-production-ai/ Fri, 04 Sep 2020 11:05:25 +0000 https://www.datactics.com/?p=7020 Recent incidents involving AI algorithms have hit the headlines, leading many to question their worth. In this article, CTO Alex Brown outlines the three pillars of AI and looks at how they each play a part in implementing AI in production. As many who work within computer science will know, many Artificial Intelligence (AI) projects fail to make the crucial […]

The post The Three Pillars of AI appeared first on Datactics.

]]>
Recent incidents involving AI algorithms have hit the headlines, leading many to question their worth.

In this article, CTO Alex Brown outlines the three pillars of AI and looks at how they each play a part in implementing AI in production.

three pillars of AI

As many who work within computer science will know, many Artificial Intelligence (AI) projects fail to make the crucial transition from experiment to production, for a wide range of reasons. In many cases, the triple investment of money, training, and time is deemed too big of a risk to take; additionally, it could be feared that initial AI and machine learning models might not scale or might be viewed as too experimental to be utilised by internal or external customers. 

Pillars

In many cases, it can also be due to a lack of data, the suitability of data, and the quality of data. But even if your data’s of the right quality and your experimental model is good, your digital transformation journey is far from over – you still have a long way to go before you can use that AI in production! 

From all the work we at Datactics have been undertaking in AI development, it’s clear to us that there are 3 critical features your AI system will need:  

Explainability

Pillars of AI

Two or three years ago, when more AI technologies and intelligent systems were emerging, no one talked about explainability – the ability to explain why an algorithm or model made a decision or set of decisions.

Today it’s a hot topic in data science and discussions around deep learning. The use of opaque ‘black box’ solutions has been widely criticised, both for a lack of transparency and also for the possible biases inherited by the algorithms that are subject to human prejudices in the training data. 

Many recent cases have shown how this can lead to fragmented and unfair decisions being made.  

Explainable AI or “XAI” is fast becoming a prerequisite for many AI projects especially in government, policing, and regulated industries such as healthcare and banking with huge amounts of data.

In these business areas, the demand for explainability is understandably high. Explainability is vital for decision–making, predictions, risk management, and policymaking.

Predictions are a delicate topic of discussion as any mistakes made can often lead to major implications. 

AI models in healthcare

As an example in healthcare, if an AI algorithm isn’t trained adequately with the correct data, we can’t be sure that it will effectively be able to properly diagnose a patient.

Therefore, training the data set and ensuring that the data entering the data set is bias-free has never been more important.  

Furthermore, XAI is not just for data scientists, but also for non-technical business specialists.

It stands to reason that it should also be easy for a business user to obtain and understand information on why a predictive model made a particular prediction from a business perspective and for a data scientist to clearly understand the behaviour of the model in as much detail as possible.  

Monitoring  

Closely related to XAI is the need to closely monitor AI model performance. Just like children may be periodically tested at school to ensure their learning is progressing, so too do AI models need to be monitored to detect “model drift”, defined as when predictions keep becoming incorrect over time in unforeseen ways. Various concept drift and data drift detection and handling schemes may be helpful for each situation.

Often, if longer-term patterns are understood as being systemic, they can be identified and managed.

AI models


Concept drift is often prominent on supervised learning problems where predictions are developed and collated over some time.  Like many things, drift isn’t something to be feared but instead measured and monitored, to ensure firstly that we have confidence in the model and the predictions it is making, and secondly that we can report to senior executives on the level of risk associated with using the model. 

Retraining  

 Many AI solutions come with ‘out of the box’ pre-trained models which can which theoretically make it quicker to deploy into production. 

measuring AI

However, it is important to understand that there isn’t a “one-size fits all” when it comes to AI, and that some customisation is going to be necessary to ensure that predictions being made fit your business purposes. 

For most cases, these models may not necessarily be well suited to your data. The vendor will have trained the models on data sets that may look quite different to your particular data and so may behave differently.

Again, this highlights the importance of monitoring and explainability, but furthermore the importance of being able to adapt a pre-trained model to your specific data in order to achieve strong AI.

To this end, vendors supplying pre-trained models should provide facilities for the customer to collect new training data and retain an off–the–shelf model.

An important consequence of this is that such AI frameworks need to have the ability to rollback to previous versions of a model in case of problems, and version control both models and training data in order to prevent weak AI.

To conclude our three pillars of AI, the route to getting AI into production is built on being able to explain it, including: 

  • The decisions baked-into the model, including why certain data was selected or omitted
  • How much the model is deviating from expectations, and why
  • How often, how and why the model has been retrained, and whether or not it should be rolled back to a previous version

For more on this subject, read up on my colleague Fiona Browne’s work, including a recent piece on Explainable AI, which can be found here 

The post The Three Pillars of AI appeared first on Datactics.

]]>
“I want to be involved in research that helps people” – Meet Keaton, part of the “Our team” series https://www.datactics.com/blog/marketing-insights/meet-keaton/ Fri, 28 Aug 2020 09:00:00 +0000 https://www.datactics.com/?p=11366 Keaton Sullivan is a PhD student in the AI Team who joined us just over three weeks ago. He’s been contributing to projects and making a considerable mark on the AI Team already! We thought we would grab him for a chat to find out how he has found working in the team so far; […]

The post “I want to be involved in research that helps people” – Meet Keaton, part of the “Our team” series appeared first on Datactics.

]]>
Keaton Sullivan is a PhD student in the AI Team who joined us just over three weeks ago. He’s been contributing to projects and making a considerable mark on the AI Team already!

We thought we would grab him for a chat to find out how he has found working in the team so far; what he has enjoyed; and any tips he would give to interns seeking to build up experience for their CV.

Hi Keaton, great to (virtually) meet you. You’re studying for a PhD at Ulster University, aren’t you? What did you study before starting your doctorate?

I studied a Bachelor’s in Computer Science at Ulster University and was delighted to achieve marks that enabled me to go straight into my PhD. I loved my time at University, the teachers are second to none and I thoroughly enjoyed the practical focus that Ulster University offered within the course.

I am delighted to be able to work with Datactics in a funded position as it means that I can support my sisters when they attend University (something I am looking forward to being able to do).

Why did you choose to study Computer Science at University?  

Whenever I was younger my grandmother bought me a laptop which spurred a fixation to learn everything about it. Later, I did an Extended Diploma which then led me to do a Bachelor’s in Computer Science at Ulster University. When I was in school, I thoroughly enjoyed the STEM subjects, but computers appealed to me because of their practical, challenging, and intriguing nature. 

I have always thought my strength is research which served me well throughout my Bachelor’s and has positioned me well undertaking my PhD. 

Keaton, tell me a bit about Datactics. How did you get involved here, what does your role look like and what are you most looking forward to?

My academic advisor introduced me to the company and pointed me towards the role. I met Fiona (Browne, Head of AI) and Stuart (Harvey, CEO) and they helped me understand a bit more about the team I’d be working with and what my role would look like – split between research and practical, hands-on work. I started a month earlier than originally planned and I have been made to feel very welcome since starting.Working with the team has been great so far as it is very collaborative.

From a social perspective, I look forward to getting involved with the social activity within the company particularly with the ‘Runners and Riders’ club due to my love for being active, in particular, climbing.

How have you found Datactics so far? 

It’s been great as there is a lot of independence. I am trying to help out as much as can. Dr Browne is incredibly understanding, nurturing, and a great mentor to have! I have learned so much from her already.

What brought you to AI?  

In University, I selected the hardest subjects I possibly could; if I found something challenging, I considered this as an opportunity to learn even more. Pushing myself to study these challenging topic areas helped me understand my direction a lot more and that is ultimately how I got here today. There have been a lot of new advances in AI within the industry; to be at the front of this is exciting.

If you had to pick an ideal role for yourself, what would it be? 

I want to be involved in research that helps people. I would love to be a part of making discoveries that can touch and impact people’s lives. Being able to make a difference and be a part of that difference is very important to me and a huge incentive for doing what I do.

What motivates you, Keaton? 

I want to act as a role model for my sisters, to give them as many opportunities as I had. I have worked hard to achieve what I have, and I want them to be able to experience University and the ability to study whatever makes their heart content.

Doing my PhD and seeking to take my career to the next level is hugely motivated by wanting to make them proud and to be able to give back to the lives of my sisters.

What’s your advice for people starting at University or beginning a career? 

It’s better to take your time and not feel like you must rush into everything. Explore different opportunities; it’s better to give your all to something. I would say don’t restrict yourself and don’t feel like you are limited by your capabilities, striving to achieve more and giving something yours all will pay you back in the long run. My biggest advice would be to just go for it but have a heart for everything you do.

Like many people today, you’re starting a job while working remotely. What would your tip be for people who are working from home?  

Dedicate an entire space, to create a distinction between work life and personal life! Working from home can be distracting at times but having that specific area helps me to focus. When I am in my workspace I focus on work and when I am not, I can relax! I am excited to get further involved in the team.

I am thoroughly enjoying working with every member of the AI team and am learning stacks from Fiona. I look forward to establishing that gap of knowledge for my PhD and to continually improve on my research capabilities and to seek to drive change; creating and instilling purpose in all I do!

Thanks for the lovely chat, Keaton, it was great to hear about your Datactics journey thus far!  We enjoyed deep diving into what makes Keaton tick, and we are excited as he continues working with us over the next number of months and beyond. Welcome to the team Keaton!

Click here for more by the author, or find us on LinkedinTwitter or Facebook for the latest news.

The post “I want to be involved in research that helps people” – Meet Keaton, part of the “Our team” series appeared first on Datactics.

]]>
“I thoroughly enjoy working in an exciting space with cutting edge technology” – Meet Dr. Fiona Browne, part of the “Our Team” series https://www.datactics.com/blog/marketing-insights/meet-dr-fiona-browne/ Mon, 24 Aug 2020 09:00:00 +0000 https://www.datactics.com/?p=11360 You may be used to seeing Dr Fiona Browne (Head of AI) across our social channels, so we thought we would sit down with her and discuss, how she got into AI; what motivates her and her passion for technology. Let’s dive straight in…  Fiona, just to begin with, tell us a bit about your background…  I started my career by studying a BSc (Hons) degree in Computing […]

The post “I thoroughly enjoy working in an exciting space with cutting edge technology” – Meet Dr. Fiona Browne, part of the “Our Team” series appeared first on Datactics.

]]>
You may be used to seeing Dr Fiona Browne (Head of AI) across our social channels, so we thought we would sit down with her and discuss, how she got into AI; what motivates her and her passion for technology. Let’s dive straight in… 

Fiona, just to begin with, tell us a bit about your background… 

I started my career by studying a BSc (Hons) degree in Computing Science from Ulster University (UU) in Belfast back in 2004. I then proceeded to work towards achieving my Ph.D. in Artificial Intelligence in Bioinformatics also from UU which was developing integrative data analysis models and tools for the prediction of protein-protein interaction networks.  Academia has always been a natural step for me, as I was always academic. 

After I completed my undergraduate degree, I knew I wanted to take it a step further and put what I had learned into practice (we will be chatting to Fiona about what drove her as a young person and what drives her now in our next chat with her, so keep a lookout for that).

I am currently in the role of Head of Artificial Intelligence at Datactics. Prior to joining the team, I was a Lecturer of Computer Science at Ulster University teaching Data Analytics with a research focus on Applied Artificial Intelligence and Data Integration. Additionally, I was a research fellow at Queens University Belfast.  

How did you get into the technology space? 

It was not a straight road! In many ways, it was a leap of faith in the beginning. I undertook Computer Science at GCSE–level which kickstarted my love for the topic area. When selecting my subjects, I had the option of History or Computing. I knew at the time that Computing was a new subject, so being someone who’s always intrigued by the new possibilities of technology that really encouraged me to select it.

When I opted to do my Ph.D., I knew that it was going to be a different way of thinking. When you get to Ph.D. level it becomes not only about having knowledge but filling gaps and contributing to knowledge. I found this to be a really exciting challenge but as I say it was a completely new way of thinking. It helped me to think of new ways to do things – suddenly I was expected to fill gaps in knowledge which allowed me to explore and develop my ideas.

Talk to us a bit about the work you do. What are your passions? 

I would say I am rigorous about data. I have always had a love for technology, it’s a strong passion for me. One of the elements of my job that I love is putting Machine Learning models into production. I thoroughly enjoy working in an exciting space with cutting edge technology. 

Scaffolding cutting edge technology is what I enjoy being involved with the most, being Head of AI has enabled me to do this. Recently we have taken on 2 new placement students and a PhD student. Seeing talent thrive and mentoring excites me hugely.

What is your proudest achievement in your career so far? 

I have to say I am very proud of the work we have done on Machine Learning within Datactics. The cross-disciplinary approach within rigorous software development has enabled the team to now have a stable model. I feel very proud of this breakthrough and I am excited about the progress that the team continues to make.

What is the biggest shift you’ve seen in the industry, Fiona? 

One of the biggest shifts I noticed is with ethics and explainability in the ML domain. As models move into production it is essential for us to ask questions around the data that they are trained on through to the evaluation of the model and how it performs on new real-world data. For these reasons, I am particularly keen to see a high standard of ethical and rigorous behaviours become ingrained into the development and support of AI/ML systems. I am a definite advocate for diversity in teams and interdisciplinary collaboration and I have recently written a piece on AI Ethics for the Datactics blog.  This blog explored what questions need to be answered to ensure the potentially negative ethical impacts of AI/ML do not outweigh the positives it can deliver across industry and academic sectors. 

I am particularly keen to see a high standard of ethical behaviour become ingrained into AI. I am also a strong advocate for Women Who Code and Women in Tech here in Belfast and internationally.

I’ve seen the need for this more and more to ensure that computer science reflects society, gives opportunities equally and reduces the risk of bias in programming. In my past roles in technological leadership and in teaching at university, I’ve definitely tried to utilise my love for academia to help inspire and prepare the next generation of technological stars. 

Thanks for the lovely chat, Fionawe really enjoyed hearing about your Datactics journey thus far! 

With a PhD in Artificial Intelligence in Bioinformatics from Ulster University, Fiona has over 15 years’ experience in research and industrial experience in AI, data analytics, and software development, including specialism in network link analysis and explainable AI.

Click here for more by Datactics, or find us on LinkedinTwitter or Facebook for the latest news.

The post “I thoroughly enjoy working in an exciting space with cutting edge technology” – Meet Dr. Fiona Browne, part of the “Our Team” series appeared first on Datactics.

]]>
“I enjoy working on AI because there is a lot of advancements to be made, it’s not quite terminator level yet!” – Meet Matt, part of the “Our team” series https://www.datactics.com/blog/marketing-insights/meet-matt-neil/ Fri, 14 Aug 2020 09:34:00 +0000 https://www.datactics.com/?p=11295 Matt Neill is working within the AI team and is now on Week 10 of his internship. He has been actively using his Data Science education background to put his stamp on the AI team. We thought we would grab him for a chat to unpack what he has been working on; how he got involved […]

The post “I enjoy working on AI because there is a lot of advancements to be made, it’s not quite terminator level yet!” – Meet Matt, part of the “Our team” series appeared first on Datactics.

]]>
Matt Neill is working within the AI team and is now on Week 10 of his internship. He has been actively using his Data Science education background to put his stamp on the AI team.

We thought we would grab him for a chat to unpack what he has been working on; how he got involved with the company; what he has enjoyed, how he has balanced working from home, why he decided to undertake a placement and to uncover what his plans are for the future.   

Hi Matt, it’s nice to finally chat with you! We are excited to hear all about what you’ve been up to in the AI Team. Could you fill us in, what brought you to Datactics?  

I have just finished my second year of Data Science at the University of Nottingham. I decided that I would take the initiative and reach out to Datactics and ask if they had any opportunities going. They returned the email and said that they had. I met up with Stuart Harvey, CEO, before Christmas, then met with Dr. Browne, Head of AI, after Christmas to find out what I would be doing.   

You are in the AI team, is that what you want to get into? Can you give us an insight into how you landed in that particular team?  

I didn’t do any computer science at school, I started coding a couple of years ago when I started University. The computer science part of my course is more focused on AI, so I have had a few years of experience doing that. It felt natural to go into the AI team as I had a particular interest in this field.  

Since joining the team, what have you been working on?  

Since joining the team, we have been collaborating on the new ML augmented data quality project focus on the current state-of-the-art in the area. The first two to three weeks were spent researching the area and understanding current and new techniques. Then we have recently been trying to implement the research we conducted and focusing our approach on ensuring data quality within datasets using different techniques. My day to day role would revolve around mostly coding to implement. The research set the scene and it’s now great to be practically contributing to putting things in place.   

Why did you choose to study Data Science at University?  

I began researching into the course (MORSE) Mathematics, Operational Research, Statistics, and Economics at Warwick University. I then became more interested in computer science which then led to Data Science (Mathematics and Computer Science). Beforehand I had never studied Computer Science at school, I studied languages and mathematics at A-Level! I was always interested in Computer Science though, even from GCSE Level. Having Mathematics in my educational background was a good building block as this underpins computer science. 

If you had to pick an ideal role for yourself, what would it be?  

What I am doing now. I enjoy having a mix of coding and research in a role. I also enjoy working on AI because there is a lot of advancements to be made, it’s not quite terminator level yet! Working with Dr. Browne has been brilliant, when I started, she walked through the different programs which was useful. If I am ever needing to ask a question, I can reach Dr. Browne without hesitation. 

What would your tips be for individuals looking for internships?  

Start early! Approach people for opportunities in plenty of time. I would also recommend building up a knowledge of what you’re getting into. AI is much more modern and is often specific to the company. I would recommend researching the companies you are applying to – being a data scientist in one company would be very different from being a data scientist in another.  

We’ve all had to adjust to working from home. Do you have any tips for working from home?  

The most important thing for me is having a good work environment. I like to have a clear workspace: a reasonable space and two monitors! I am working in my bedroom (some people say they don’t like working and sleeping in the same place), but I don’t mind. I think discipline is important in terms of sticking to working hours. I even put my phone behind me during the working day to avoid distractions. I then try to turn off my laptop after 5:30 PM and relax. 

What was your motivation for wanting an internship?  

I didn’t need to have a placement but from a CV perspective, I felt it would look good to have some sort of experience. Before Datactics, I had no experience in the workplace. It was also good to understand that I do enjoy this sort of work and when I am looking for a job come graduation, I would be looking for a role such as this!  

Matt finishes up his internship mid-September. It’s been great to have him with us and to hear just how impactful his experience has been, is motivational to other interns! We wish him well as he continues his studies in Nottingham!

Click here for more from Datactics, or find us on LinkedinTwitter or Facebook for the latest news.

The post “I enjoy working on AI because there is a lot of advancements to be made, it’s not quite terminator level yet!” – Meet Matt, part of the “Our team” series appeared first on Datactics.

]]>
“I was looking around for internships to learn different skills and find out what direction I want to go in with my career” – Meet Mary-Clare, part of the “Our Team” series https://www.datactics.com/blog/marketing-insights/meet-mary-clare/ Thu, 13 Aug 2020 09:20:00 +0000 https://www.datactics.com/?p=11291 Mary Clare is working within the AI Team, she joined us in July and already has been contributing to projects and making her stamp within the AI Team. We thought we would grab her for a chat to find out how she has found her working in the team so far; what she has been […]

The post “I was looking around for internships to learn different skills and find out what direction I want to go in with my career” – Meet Mary-Clare, part of the “Our Team” series appeared first on Datactics.

]]>
Mary Clare is working within the AI Team, she joined us in July and already has been contributing to projects and making her stamp within the AI Team.

We thought we would grab her for a chat to find out how she has found her working in the team so far; what she has been working on; what she has enjoyed so far; her background; how she has found working remotely and any tips she would give to interns seeking to find that golden internship for their CV.   

Hi Mary Claire, great to (virtually) meet you. Could you tell us a little bit about how you got involved with Datactics?  

During college time, I asked Dr Fiona Browne for help with my Python programming, she noticed the standard of my coding and made the suggestion about Datactics’ internship programme and how I could consider joining as she knew I wanted to build on my experience (and practice, practice, practice).

What is your background? What University course do you attend?  

I am going into the third year studying Chemistry with Molecular Physics over at Imperial College in London. It is a very tough course, but I wouldn’t have much exposure to data engineering, so this placement has given me a great insight into what it is like to work in a data company.  

So, having not directly having studied Computer Science or Data, what got you into coding and the glorious world of data?  

We do computer programming (Python) for our various experiments at University and I enjoy Python. At present within the team, I am using and experimenting with FlowDesigner. I am still learning on-the-go all the time! I did some coding with Arduino kits when I was 16, but I found that very difficult to get into initially. As there is computer programming in our course, I developed as I continually learned. I particularly enjoy looking at the different applications – I find it interesting.  

What motivated you to look for an internship? Was it to give you a taster of the workplace?  

I was looking around for internships to learn different skills and find out what direction I want to go in with my career. Dr. Browne had been helping me with my coding for my college work and she suggested considering an internship from there. I have enjoyed the experience so far, being part of an experimental team. When I began Dr. Browne gave me the time to do research around machine learning and undertake an online course in FlowDesigner which helped me find my feet initially.  

Have you enjoyed the experience?  

Working within the AI team has helped me enhance my communication skills. It also has enabled me to work with large data sets, focusing on recognising errors. Before joining Datactics I never would’ve thought about data quality and the size of data sets. It has broadened my horizons.  

Did you work on any particular projects that you enjoyed? 

I was able to contribute to the main team project centred around data quality augmented with machine learning. I got a great experience in cleansing data and I am glad to help in any way I can. The team is very collaborative in the approach which meant that there were endless learning opportunities. Keaton, a Ph.D. student, has recently joined so we have been working closely with him.  

The last number of months have felt different due to COVID-19 restrictions. How have you found working from home? Do you have any tips you could pass onto team members navigating working from home?  

For me, I like to have a dedicated room specifically for work. This helps separating work from my own free time. I also make sure I have a clear desk and a ‘To Do’ list prepared. One thing I really enjoy when working from home is the flexibility of being able to start earlier and finish earlier, particularly if I have plans in the evening. Remote working does require a certain level of discipline, I like to have my dedicated space, so I know when I need to focus, and it helps me to have a greater distinction between working and downtime.   

Do you recommend internships to others?  

A lot of my friends went down the route of research placements, however with COVID-19, several of them had opportunities, which they had lined up, cancelled. I have found it invaluable to be able to work within the industry and be able to build on my skills ahead of returning to university. The taster has been perfect for me – giving me a real motivation to complete my studies.  

You mention you have built up a lot of skills at Datactics. What is the best skill you have learned? Do you have any areas that you have found challenging?  

I would say, working with Flow Designer, data cleansing, and a focus on data quality. Being able to work closely with the platform has been amazing. I have had my eyes opened to the level of data cleansing and data quality required. In terms of what I have found challenging, I would say keeping track of what I am doing with the data. It is not only helpful for organisation but helps when you need to find a piece of information. Tracking is very important within the team and this an area that I have worked on over my internship.  

Have you enjoyed interacting with your team and beyond?  

The ‘Teams’ calls have been great, and Dr. Browne is always available if a chat is required or help is needed. Additionally, I enjoyed being an intern at the same time as Matt Neill. Being interns together has allowed us to support each other and learn from each other.  

What’s next for you? What are your goals for the future?  

I have a real interest in Entrepreneurship, more specifically in creating solutions for environmental problems. I am currently working on a project with a friend from University with this subject at the forefront so hopefully, that could come to fruition. 

Goals would be to first graduate from University and start up my own business one day. In terms of more short-term goals, I would love to run a half marathon at some stage. I have also entered a competition with the project I have been working on and the awards ceremonies are very soon, so I hope to do well in that!  

We wish Mary Clare all the best with her project and as she recommences her University journey. Thanks for taking the time to chat with us! 

Click here for more by the author, or find us on LinkedinTwitter or Facebook for the latest news.

The post “I was looking around for internships to learn different skills and find out what direction I want to go in with my career” – Meet Mary-Clare, part of the “Our Team” series appeared first on Datactics.

]]>