AML Archives - Datactics

Two new apprentices have joined our team and we are gearing up for FinTech Festival! – Datactics weekly round-up

Jamie Gordon — Fri, 06 Nov 2020 16:35:04 +0000

Welcoming our two new apprentices

We kicked off the week by announcing the exciting news that two new apprentices from the Belfast Met have joined our DevOps team. The firm has now grown by 25% since March 2020 and approaching 130% over the past two years, in response to rapidly growing customer demand.

The new recruits, Natalia Walsh and Victoria Wallace, will balance their four days a week at Datactics with one day spent studying for the Level 3 “Networking Infrastructure with Cyber Security” course at Belfast Met.

If you want to read more about the new apprentices joining, check out our press release here.

We want to take the opportunity again to welcome them both to our team!

Awards and Events at Datactics!

As the week continued, we unpacked the various awards we have achieved such as the Women in Diversity Award and the Investors in People accreditation. Highlighting these awards allowed us to express our progressive vision to further embed diversity within the team and invest in our people accordingly in a diverse and inclusive manner.

We also released three look-back blogs which reflected on three recent notable speaking engagements for the company.

The first of which was Fiona Browne’s contribution to FinTech Finance. The interview covered the extent of the Anti-Money Laundering (AML) fines currently faced by banks over the last number of years and began to unpack what we do at Datactics in relation to this topic.

In the blog we looked in detail into the following key questions that were put to Fiona in the FinTech Finance session:

How can banks arm themselves against increasing regulatory and technological complexity?
Where does Datactics fit in to the AML arena?
Why should banks look to partner, rather than building it in house?

Data will power the next phase of our economy

We then reflected on Kieran Seaward’s DMS Virtual Keynote, ‘A Data Driven Restart’, unpacking the key themes and questions regarding the challenges presented by COVID-19, including a wide range of changes to the way business can be conducted.

At Datactics we have been really encouraged that engagement with the market is still strong; since March, and the start of many lockdowns, we’ve conducted many hundreds of calls and meetings with clients and prospects to discuss their data management and business plans. The blog is based on our key findings from these calls and reflects the priorities many data-driven firms have.

The key questions addressed in the blog are as follows:

What is the importance of a foundation of good data quality?

What comes first? Data Governance or Data Quality?

Is it necessary to get data quality under control?

We then rounded off the week by looking back to Stuart Harvey’s contribution to the Belfast International Homecoming 2020. The panel itself was chaired by Jayne Brady, Digital Innovation Commissioner and aimed to discuss the question: Can Belfast’s Technology Companies Lead an Inclusive Recovery?

The key themes that were delved into by Stuart and the panellists included:

The vast changes in the way organisations are approaching work
Diversity being key to the development of technology
The vitality of not simply the ‘right’ education but education

FinTech Festival is coming soon!

In other news, Matt Flenley and Jordan Wray are looking forward to the upcoming Singapore FinTech Festival as guests of Invest NI. This week there was a late-night networking event on Wednesday which made for some great virtual chats and introductions. It has got us ready for the FinTech Festival which is coming up soon!

Have a great weekend! Hope you enjoyed this week’s round-up

Click here for more by the author, or find us on LinkedIn, Twitter, or Facebook for the latest news. You can also read the last round up here or keep an eye out for our next one!

The post Two new apprentices have joined our team and we are gearing up for FinTech Festival! – Datactics weekly round-up appeared first on Datactics.

All things AML and FinTech Finance: Virtual Arena – weekly round-up

Jamie Gordon — Fri, 30 Oct 2020 14:00:15 +0000

We started by looking at why data matching is a key part of any AML & KYC process. It’s made more complex by the different standards, languages, and levels of quality in the different data sources on which firms typically rely on. It’s expensive too: a recent Refinitiv article states that some firms are spending up to $670m each year on KYC.

As the week went on, we looked at some of the key areas where Datactics makes a real difference in helping firms to reduce manual effort, reduce risk, and bring down the extremely high cost of client onboarding.

We then looked at the impact of the EU’s fifth AML directive and how firms are able to automate their sanctions screening with the sanctions match engine.

We also explored how we support efforts to reduce risk and financial crime involving the clever tech we’ve used to transliterate between character sets and perform multi-language matching.

Finishing up, we shared our talk with the EDM Council that explored how AI can make a real difference to the story. Bringing even more predictive capabilities to human effort means that finding those edge cases, don’t have to wait until all the obvious ones have been ruled out. We also composed a piece entitled ‘Lifting the lid on the problems that Datactics solves’, if you missed it out can check it out here.

If you missed any of the pieces we shared this week, feel free to read them on our DataBlog or on our social media platforms.

In other news this week, our very own Head of AI, Dr Fiona Browne contributed to the FinTech Finance: Virtual Arena. This session discussed the huge AML fines faced by the banks over the last number of years.

At Datactics we are a company that helps banks gain quality data – a tool that is equipped to fight fraudsters and money launderers. Fiona was able to share her experience as Head of AI at Datactics to shed light on how banks can arm themselves sufficiently to allow them to stand up to increasing regulatory and technological complexity.

Datactics provides the tools to tackle these issues with minimum IT overhead, in a powerful and agile way. If you missed the session, you can watch it back on LinkedIn by following this link.

Have a great weekend! Hope you enjoyed this week’s round-up.

Click here for more by the author, or find us on LinkedIn, Twitter or Facebook for the latest news. You can also read the last round up here or keep an eye out for our next one!

The post All things AML and FinTech Finance: Virtual Arena – weekly round-up appeared first on Datactics.

IRMAC Reflections with Dr. Fiona Browne

Jamie Gordon — Mon, 07 Sep 2020 09:00:00 +0000

There is a lot of anticipation surrounding Artificial Intelligence (Al) and Machine Learning (ML) in the media. Alongside the anticipation is speculation – including many articles placing fear into people by inferring that AI and ML will replace our jobs and automate our entire lives!

Dr Fiona Browne, Head of AI at Datactics recently spoke at an IRMAC (Information Resource Management Association of Canada) webinar, alongside Roger Vandomme, of Neos, to unpack what AI/ML is, some of the preconceptions, and the reasons why different approaches to ML are taken…

What is AI/ ML?

Dr. Browne clarified that whilst there is no official agreed-upon definition of AI, it can be depicted as the ability of a computer to perform cognitive tasks, such as voice/speech recognition, decision making, or visual perception. ML is a subset of AI, entailing different algorithms that learn from input data.

A point that Roger brought up at IRMAC was that the algorithms learn to identify patterns within the data and the used patterns enable the ability to distinguish between different outcomes, for example, the detection of a fraudulent or non-fraudulent transaction.

ML takes processes that are repetitive and automates them. At Datactics, we are exploring the usage of AI and ML in our platform capabilities – Dr Fiona Browne

What are the different approaches to ML?

Supervised, unsupervised, and reinforcement machine learning. Dr. Browne communicated that at a broad level, there are three approaches: supervised, unsupervised, and reinforcement machine learning.

In supervised ML, the model learns from a labelled training data set. For example, financial transactions would be labelled as either fraudulent or genuine fed into the ML model. The model then learns from this input and can distinguish the difference.

Where data is unlabelled, Dr. Browne explained that unsupervised ML would be more appropriate, where the model learns from unlabelled data. There is a key difference here with supervised ML in that the model would seek to uncover clusters or patterns inherent in the data to enable it to separate them out.

Finally, reinforcement machine learning involves models that continually learn and update from performing a task. For example, a computer algorithm learning how to play the game ‘Go’. This is achieved by the outputs of the model being validated and that validation being provided back to the model.

The difference between supervised learning and reinforcement learning is that in supervised learning the training data has the answer key with it, meaning the model is trained with the correct answer.

In contrast to this, in reinforcement learning, there is no answer, but the reinforcement agent selects what to do to perform the specific task.

It is important to remember that if there is no training dataset present, it is bound to learn from its experience. Often the biggest trial comes when a model is being transferred out of the training environment and into the real world.

Now that AI/ML and the different approaches have been unpacked… the next question is how does explainability fit into this? The next mini IRMAC reflection will unravel what explainability is and what the different approaches are. Stay tuned!

Fiona has written an extensive piece on AI enabled data quality, feel free to check it out here.

Click here for more by the author, or find us on Linkedin, Twitter or Facebook for the latest news.

The post IRMAC Reflections with Dr. Fiona Browne appeared first on Datactics.

IRMAC Detective Data Work: AML and Emergent AI practices | 12/07/20

Tania — Wed, 01 Jul 2020 09:00:00 +0000

Earlier this month, our Head of AI, Dr. Fiona Browne took part in the IRMAC webinar ‘Detective Data Work’ and explored the AML and emergent AI practices.

Missed it? Watch the recording below:

In this webinar, the expert panellists questioned what anti-money laundering (AML) efforts look like, and the complexities in sifting through vast data volumes, data quality and identification in an effort to make their findings ‘explainable’.

Reducing the money flow in criminal activities had a major boast after the events of 9/11/2001.

Now Artificial Intelligence (AI) and Machine Learning (ML) techniques are beginning to revolutionize practices in this field. – IRMAC

About Fiona:

Fiona Browne is Head of Artificial Intelligence at Datactics with over 15 years’ research and industrial experience. Prior to joining Datactics, Fiona lectured in Computing Science at Ulster University teaching Data Analytics and undertaking research on applied artificial intelligence and data integration. She was a Research Fellow at Queen’s University Belfast and a Senior Software Developer at PathXL. Fiona received a BSc (Hons.) degree in Computing Science and a PhD on Artificial Intelligence in Bioinformatics from Ulster University.

About IRMAC:

The Information Resource Management Association of Canada is a non-profit, vendor-independent association of information management and business professionals.

Our primary objective is to provide a forum for members to exchange information, experiences and promote the understanding, development and practice of managing information and data as a key enterprise asset.

The post IRMAC Detective Data Work: AML and Emergent AI practices | 12/07/20 appeared first on Datactics.

Dataset Labelling For Entity Resolution & Beyond with Dr Fiona Browne

Fiona Browne — Fri, 05 Jun 2020 10:40:43 +0000

In late 2019 our Head of AI, Dr Fiona Browne, delivered a series of talks to the Enterprise Data Management Council on AI-Enabled Data Quality in the context of AML operations, specifically for resolving differences in dataset labelling for legal entity data.

In this blog post, Fiona goes under the hood to explain some of the techniques that underpin Datactics’ extensible AI Framework.

Across the financial sector, Artificial Intelligence (AI) and Machine Learning (ML) have been applied to a number of areas, including the profiling of behaviour for fraud detection and Anti-Money Laundering (AML), through to the use of natural language processing to enrich data in Know-Your-Customer processes (KYC).

An important part of the KYC/AML process is entity resolution, which is the process of identifying and resolving entities from multiple data sources. This is traditionally the space that high-performance matching engines have been deployed, with associated fuzzy-match capabilities used to account for trivial or significant differences (indeed, this is part of Datactics’ existing self-service platform).

In this arena, Machine Learning (ML) techniques have been applied to address the task of entity resolution using different approaches from graphs and network analysis to probabilistic matching.

Although ML is a sophisticated approach for democratizing entity resolution, a limitation of applying this approach is the requirement of large volumes of labelled data for the ML model to learn from when supervised ML is used.

What is Supervised ML?

For supervised ML, a classifier is trained using a labelled dataset. This is a dataset that contains example inputs paired with their correct output label. In the case of entity resolution, this includes examples of input matches and non-matches which are correctly labelled. The Machine Learning algorithms learns from these examples and identifies patterns that link to specific outcomes. The trained classifier then uses this learning to make a prediction on new unseen cases based on their input values.

Dataset Labelling

As we see from above, for supervised ML we need high quality labelled examples for the classifier to learn from. Unlabelled data or poorly labelled data will only make it harder data labelling tools to work. The process of labelling raw data from scratch can be time-consuming and labour intensive especially if experts are required to provide labels for, in this example, entity resolution outputs. The data labelling process is repetitive in nature, and there is a need for consistency in the labelling process to ensure high quality and correct labels are applied. It is also costly in monetary terms, as those involved in processing the entity data require a high level of understanding of the nature of entities and ultimate beneficial owners, and in the context of failure where regulatory sanctions and fines can result.

Approaches for Dataset Labelling

As AI/ML progresses across all sectors, we have seen the rise in industrial level dataset labelling where companies/individuals are able to outsource their labelling tasks to annotation tools and labelling services. For example, the Amazon Mechanical Turk service, which enables the crowdsourcing of labelling of data. This can reduce data labelling work from months to hours. Machine Learning models can also be harnessed for data annotation tasks using approaches such as weak and semi-supervised learning along with Human-In-The-Loop Learning (HITL). HITL enables the improvement on ML models through the incorporation of human feedback through stages such as training, testing and evaluation.

ML approaches for Budgeted Learning

We can think of budgeted learning as a balancing act between the expense (in terms of cost, effort and time) of acquiring training data against the predictive performance of the model that you are building. For example, can we label a few hundred types of data instead of hundreds of thousands? There are a number of ML approaches that can help with this question and reduce the burden of manually labelling large volumes of training data. These include transfer learning, where you reuse previously gained knowledge. For instance, leveraging existing labelled data from a related sector or similar task. The recent open-source system Snorkel uses a form of weak supervision to label datasets via programmable labelling functions.

Active learning is a semi-supervised ML approach which can be used to reduce the burden of manually labelling datasets. The ‘active learner’ proactively selects the training dataset it needs to learn from. This is based on the concept that an ML model can achieve good predictive performance with fewer training sample instances by prioritising the examples to learn from. During the training process, an active learner poses queries which can be a selection of unlabelled instances from a dataset. These ML selected instances are then presented to an expert to manually label.

As it is seen above, there are wide and varied approaches to tackling the task of dataset labelling. What approach to select depends on a number of factors from the prediction task through to expense and budgeted learning. The connecting tenet is ensuring high quality labelled datasets for classifiers to learn from.

Click here for more from Datactics, or find us on Linkedin, Twitter or Facebook for the latest news.

The post Dataset Labelling For Entity Resolution & Beyond with Dr Fiona Browne appeared first on Datactics.

Self-Service Data Quality for DataOps

Stuart Harvey — Tue, 05 May 2020 11:12:48 +0000

At the recent A-Team Data Management Summit Virtual, Datactics CEO Stuart Harvey delivered a keynote on “Self-Service Data Quality for DataOps – Why it’s the next big thing in financial services.” The keynote (available here) can be read below, with slides from the keynote included for reference. Should you wish to discuss the subject with us, please don’t hesitate to contact Stuart, or Kieran Seaward, Head of Sales.

I started work in banking in the 90’s as a programmer, developing real-time software systems written in C++. In these good old days, I’d be given a specification, I’d write some code, test and document it. After a few weeks it would be deployed on the trading floor. If my software broke or the requirements changed it would come back to me and I’d start this process all over again. This ‘waterfall’ approach was slow and, if I’m honest, apart from the professional pride of not wanting to create buggy code, I didn’t feel a lot of ownership for what I’d created.

In the last five years a new methodology in software engineering has changed all that – it’s called DevOps, and brings a very strategic and agile approach to building new software.

More recently DevOps had a baby sister called DataOps, and it’s this subject that I’d like to talk about today.

Many Chief Data Officers (CDO) and analysts have been impressed by the increased productivity and agility their Chief Technology Officer (CTO) colleagues are seeing through the use of DevOps. Now they’d like to get in on the act. In the last few months at Datactics we’ve been talking a lot to CDO clients about their desire to have a more agile approach to data governance and how DataOps fits into this picture.

In these conversations we’ve talked a great deal about the ownership of data. A key question is how to associate the measurement and fixing of a piece of broken data with the person most closely responsible for it. In our experience the owner of a piece of data usually makes the best data steward. These are the people who can positively affect business outcomes through accurate measuring and monitoring of data and is typically a CDO’s role.

We have seen a strong desire to push data science processes, including data governance and the measurement of actual data quality (at a record level) into the processes and automation that exist in a bank.

I’d like to share with you through some simple examples of what we are doing with our investment bank and wealth management clients. I hope that this shows that a self-service approach to data quality (with appropriate tooling) can empower highly agile data quality measurement for any company wishing to implement the standard DataOps processes of validation, sorting, aggregation, reporting and reconciliation.

Roles in DataOps and Data Quality

We work closely with the people who use the Datactics platform, the people that are responsible for the governance of data and reporting on its quality. They have titles like Chief Data Officer, Data Quality Manager, Chief Digital Officer and Head of Regulation. These data consumers are responsible for large volumes of often messy data relating to entities, counterparties, financial reference data and transactions. This data does not reside in just one place; it transitions through multiple bank processes. It is sometimes “at rest” in a data store and sometimes “in motion” as it passes via Extract, Transform, Load (ETL) processes to other systems that live upstream of the point at which it was sourced.

For example, a bank might download counterparty information from Companies House to populate its Legal Entity Master. This data is then published out to multiple consuming applications for Know Your Customer (KYC), Anti-Money Laundering (AML) and Life Cycle Management. In these systems the counterparty records are augmented with information such as a Legal Entity Identifier (LEI), a Bank Identifier Code (BIC) or a ticker symbol.

This ability to empower subject matter experts and business users who are not programmers to measure data at rest and in motion has led to the following trends:

Ownership: Data quality management moves from being the sole responsibility of a potentially remote data steward to all of those who are producing and changing data, encouraging a data driven culture.
Federation: Data quality becomes everyone’s job. Let’s think about end of day pricing at a bank. The team that owns the securities master will want to test accuracy and completeness of data arriving from a vendor. The analyst working upstream who takes an end of day price from the securities master to calculate a volume-weighted average price (VWAP) will have different checks relating to the timeliness of information. Finally, the data scientist upstream of this who uses the VWAP to create predictive analytics. They want to build their own rules to validate data quality.
Governance: A final trend that we are seeing is the tighter integration with standard governance tools. To be effective, self-service data quality and DataOps require tight integration with the existing systems that hold data dictionaries, metadata, and lineage information.

Here’s an illustration of how of how we see Datactics Self Service Data Quality (SSDQ) Platform integrating with DataOps in a high–impact way that you might want to consider in your own data strategy.

1. Data Governance Team

First off, we offer a set of pre-built dashboards for PowerBI, Tableau and Qlik that allow your data stewards to have rapid access to data quality measurements which relate just to them. A user in the London office might be enabled to see data for Europe or, perhaps, just data in their department. Within just a few clicks a data steward for the Legal Entity Master system could identify all records that are in breach of an accuracy check where an LEI is incorrect, or a timeliness check where the LEI has not been revalidated in the Global LEI Foundation’s (GLEIF) database inside 12 months.

2. Data Quality Clinic: Data Remediation

Data Quality Clinic extends the management dashboard by allowing a bank to return broken data to its owner for fixing. It effectively quarantines broken records and passes them to the data engineer in a queue, improving data pipelines and overall data governance & data quality. Clinic runs is a web browser and is tightly integrated with information relating to data dictionaries, lineage and third–party sources for validation. Extending our LEI example just now, I might be the owner of a bunch of entities which have failed an LEI check. Clinic would show me the records in question and highlight the fields in error. It would connect to GLEIF as the source of truth for LEIs and provide me with hints on what to correct. As you’d expect, this process can be enhanced by Machine Learning to automate this entity resolution process under human supervision.

3. FlowDesigner Studio: Rule creation, documentation, sharing

FlowDesigner is the rules studio in which the data governance team of super users build, manage, document and source-control rules for the profiling, cleansing and matching of enterprise data. We like to share these rules across our clients so FlowDesigner comes pre-loaded with rules for everything from name and address checking to CUSIP or ISIN validation.

4. Data Quality Manager: Connecting to data sources; scheduling, automating solutions

This part of the Datactics platform allows your technology team to connect to data flowing from multiple sources, schedule how rules are applied to data at rest and in–motion. It allows for the sharing and re-use of rules across all parts of your business. We have many clients solving big data problems involving hundreds of millions of records using Data Quality Manager across multiple different environments and data sources, on-premise or in public (or more typically private) cloud.

Summary: Self-Service Data Quality for DataOps

Thanks for joining me today as I’ve outlined how self-service data quality is a key part of successful DataOps. CDOs need real-time data quality insights to keep up with business needs while technical architects require a platform that doesn’t need a huge programming team to support it. If you have any questions about this topic, or how we’ve approached it, then we’d be glad to talk with you. Please get in touch below.

Click here for the latest news from Datactics, or find us on Linkedin, Twitter or Facebook

The post Self-Service Data Quality for DataOps appeared first on Datactics.

5AMLD & Data Quality: new regulation, same problems?

Matt Flenley — Mon, 27 Jan 2020 14:28:53 +0000

With the EU’s fifth Anti-Money Laundering directive (5AMLD) having gone live on the 10^th January, Matt Flenley took some time with Alex Brown, CTO at Datactics, to find out what implications there are when it comes to data quality.

Firstly, what do you think are the biggest impacts for firms?

Naturally we’re going to focus on where data quality is concerned, and from that perspective the biggest challenge is about ultimate beneficial owners, or UBOs. Being able to stand over the accuracy of information a firm holds on those who have significant control of a company, trust or other legal entity is a massive challenge in itself; you’ve any one of input error, out of date information, or intentional misleading by bad actors – or a combination of all three – that could lead to significant variances between what a firm thinks is accurate, and what the truth really is. It really undermines a firm’s capacity to combat money laundering and comply with all associated regulations.

I read that member states must have beneficial ownership registers. Can’t regulated firms just check their records against what’s held there?

Yes, member states will be expected to have beneficial ownership registers that are publicly searchable, and that they’ll hold adequate, accurate and current information on corporate and other legal entities such as trusts and so on. However, while some countries already have these in place, they can’t be seen as the golden source of accuracy and truth, as the UK Companies House explains here, “The fact that the information has been placed on the public record should not be taken to indicate that Companies House has verified or validated it in any way.” Clearly, there’s a significant imperative for data quality validation and verification in central records, and it won’t be enough just to compare what you have against what the Companies House record says.

What sort of approaches are you seeing firms taking to meet the data quality requirements of 5AMLD, and fight money laundering?

The options usually taken are to outsource, build or buy. Outsourcing due diligence activities to third parties definitely feels like the quickest fix especially when a new regulation comes along; then it’s just down to managing SLAs between parties, but ultimately there’s a risk that on its own it can be sticking plaster that doesn’t do anything about the quality of the underlying data held by the firm. Lots of the activities that outsource partners will need to do will be manual lookups of entity information and cross-referencing against multiple sources of data to determine the truth; it can be accurate, but it’s extremely time-consuming and costly as a result.

Building the technology stack is favoured by tech-heavy leaders who have invested significantly in their own IT capabilities. That approach can yield the data quality improvement needed but often the timescales needed to deliver all high-priority infrastructure projects simply won’t align with regulatory demands. Often this leaves teams relying on overtime to complete audit work manually via spreadsheets, and even with the best robotic processes to update data it can lead to a spiralling cost of compliance.

In the “Regtech” era, many providers offer parts of the compliance journey that can be bought off-the-shelf, though in reality this isn’t the normal pathway firms are taking. Whether that’s a cultural thing of simply needing to “get it done” or a reluctance to onboard more solutions, it can mean firms miss out on game-changing capabilities offered by Regtech startups and scaleups.

That’s true. At Fintech Connect I saw a demo of how ING Bank has developed a platform to “orchestrate” together a number of Regtech solution providers to help it with compliance. Do you see this as the way forward for 5AMLD?

It’s certainly one way of approaching it, though clearly ING has invested significantly in this platform. In the meantime, when it comes to getting the data right, we’ve already been asked to help firms resolve entity data duplication in their core systems and in those they have access to, including Companies House. Fuzzy matching is key to resolving these sorts of discrepancies and reduce manual workloads, and was central to the winning entry at last year’s FCA TechSprint. It’s something we’ve been working on in some pretty massive regulated datasets for well over fifteen years, so for us of course it’s good to see the industry being switched on to the possibilities. Elsewhere of course the FCA’s focus on preventing “phoenixing” is something that scalable, fuzzy match technology can really help in.

Where can people go to find out more about what Datactics does in this space?

Well, of course we’d be delighted to provide a demo, for which people can simply contact our sales team to set one up.

If you are looking at 5AMLD, then there’s a number of areas we can help with particularly:

Entity data quality – both measurement and remediation, ensuring your entity data is up to scratch;
Matching entities in disparate data silos with AI-powered human-in-the-loop entity resolution (for which we recently hosted a webinar).

We’ve also developed some publicly-available showcases of our software around matching for sanctions screening; it’s not 5AMLD reporting but clearly demonstrates how multiple records for sanctioned individuals can be mistyped, out of date or intentionally obscured – but can still be fuzzy-matched on metadata, with an accompanying confidence score.

Additionally, our LEI Match Engine does a similar job for entities, fuzzy-matching to the Global Legal Entity Identifier Foundation’s list of Legal Entity Identifier information. Both are free to use.

The post 5AMLD & Data Quality: new regulation, same problems? appeared first on Datactics.

How Datactics helps Santa with his Data Quality issues

Alex Brown — Thu, 19 Dec 2019 16:00:45 +0000

Yes of course Santa has data quality issues! Everyone has data quality issues.

In this article, we outline how Datactics software can help Santa improve the efficiency of his pre-Christmas operations and have a stress free Christmas Eve delivery and a relaxing Christmas Day.

Data Quality Firewall, REST API, Data Quality Remediation

Datactics provides a REST API interface and “Data Quality Firewall” to allow the import of data from Optical Character Recognition (OCR) software that has scanned the children’s letters and guarantees the quality of data entering the data store. Records passing DQ criteria are automatically allowed through to Santa, while records failing DQ checks are quarantined where they can be reviewed interactively by Santa’s Elves in the Data Quality Clinic

Oh dear! Did Ellie ask for a Barbie House or a Barbie Horse? Not to worry – the Record is in quarantine and will be reviewed by an Elf who perhaps knows Ellie and can find out what she wanted, and can check against additional data sources like the latest online toy catalogues to discover what the possible matches might be. This saves the elves significant time in only having to review a smaller set of records, making the busiest time of the year far less stressful for all at the North Pole!

SVC – Single View of Child

Managing vast quantities of historical Personally Identifiable Information (PII) on his data servers in Lapland is a difficult task, but Datactics can help create a Single View of Child from the disparate data silos, normalising the data and creating a golden record for each child. This ensures that presents aren’t duplicated and more importantly keeps him compliant with GDPR.

Address Validation

The last thing Santa wants on Christmas Eve when he’s delivering to a few billion houses is to go to the wrong address, it wastes time and risks a potential present mix up. Fortunately, Datactics makes it easy to validate the children’s addresses against databases such as the Post Code Address File (PAF) and Capscan so Santa knows he’s going to the right place before he sets out.

Screening Against the Naughty List

This is not as simple as it may sound because you have to get it right or someone is going to be very upset. But using the established techniques Datactics has developed for KYC & AML screening against Politically Exposed Persons etc. Santa can screen against The Naughty List with confidence

Excitingly, it’s not only Santa who can screen against this list: everyone can try the naughty list screening for free at https://aml-screening.datactics.com/

Merry Christmas from everyone at Datactics!

Click here for more from Datactics, or find us on Linkedin, Twitter or Facebook for the latest news.

The post How Datactics helps Santa with his Data Quality issues appeared first on Datactics.

Datactics demonstrates AI-Enabled Data Quality in EDM webinar

Jordan Gill — Tue, 26 Nov 2019 12:45:23 +0000

Datactics is pleased to demonstrate AI-Enabled Data Quality in our EDM Webinar. Featuring Datactics CTO, Alex Brown; Dr. Fiona Browne, Head of AI; and CEO, Stuart Harvey covering a practical use case for AI in entity resolution.

Co-hosted by EDM Council and Datactics.

Watch it online here: https://register.gotowebinar.com/recording/8207973451637805825

This webinar provides an overview of AI and the application of AI in the FinTech sector. We highlight how fundamental data quality is in the sector, underpinning key tasks such as AML/KYC and fraud detection. Clean data is critical in order to apply AI/ML solutions.

Banks face a data management challenge in relation to on-boarding and KYC. They have very large sets of messy counter-party data and this data is often subject to duplication. Datactics explores the use of AI to enhance data quality and matching by way of a deep dive into the entity resolution process. This process is currently very manual and the webinar examines the improvement in match accuracy and reduction in human effort through the intelligent application of machine learning. The webinar discusses the results of a large scale study using open entity data sources from GLEIF and Refinitiv.

The EDM webinar explores:

• How AI/ML technologies can improve the consistency and accuracy of data.

• The best approaches to implementing AI in day-to-day processes

• Methods of calculating ROI and efficiencies when evaluating AI

• How AI is helping deliver faster and better results in entity onboarding in particular

• How human-in-the-loop AI can facilitate early adoption with full traceability

For more information, to discuss how the Datactics AI Engine can help your business, or to set up a demo please contact Kieran Seaward at Kieran.Seaward@Datactics.com or call directly on 02890 233 900.

Click here for the latest news from Datactics, or find us on Linkedin, Twitter or Facebook

The post Datactics demonstrates AI-Enabled Data Quality in EDM webinar appeared first on Datactics.

Transliteration matching in Japanese, Chinese, Russian, Arabic and all non-Latin data sets

Alex Brown — Fri, 25 Oct 2019 15:15:53 +0000

Here at Datactics, we’ve recently done a number of Transliteration matching tasks helping people with Japanese, Russian Cyrillic and Arabic data sets.

Transliteration matching can seem challenging, especially when presented with text that you don’t understand, but with the right techniques a lot can be achieved – the key is to really understand the problem and have some proven techniques for dealing with them:

Transliteration – Matching data within a single character set

We have a long-standing Chinese customer who routinely matches data sets of 100’s of millions of customer records all in Chinese. Even with messy data, this is a relatively straight forward task, as long as your matching algorithms can handle Unicode properly, fuzzy matching within a single character set e.g. Chinese customer database to Chinese marketing database is very similar to the same task in a roman character set albeit with some tweaks to fuzzy match tolerances.

Frequency Analysis

Another very useful technique is to perform frequency analysis on the input text to help identify ‘noise text’ such as company legal forms within company names that can be either eliminated from the match or that should be matched with lower importance than the rest of a company name. For example frequency analysis on a Japanese entity master database may reveal a large number of company names containing the Kanji “株式会社” or “株” – the Japanese equivalent of ‘Limited’ (or ‘Ltd.’ in abbreviated form). The beauty of this technique is that it can be applied to any language or character set.

Matching between character sets using Transliteration, fuzzy and phonetic matching

A common requirement in the AML/KYC space is matching account names in Chinese, Japanese, or Cyrillic etc to sanctions and PEP lists which are usually published in Latin script. In order to do this a process called ‘transliteration’ is required. Transliteration converts text in one character set to another, but the results from raw transliteration are not always usable since the resulting transliterated text is often more of a ‘pronunciation guide’ rather than how a native speaker would write the text in Latin script. However, by using a combination of fuzzy and phonetic matching on the transliterated string, it is possible to obtain very accurate matching.

If you’d like to try this out for yourself, Cyrillic transliteration is built into our free AML screening demonstration app. You can register and find out more here or click below to see it in action:

The post Transliteration matching in Japanese, Chinese, Russian, Arabic and all non-Latin data sets appeared first on Datactics.

Who are you really doing business with?

Luca Rovesti — Fri, 04 Jan 2019 15:51:45 +0000

Entity Match Engine for Customer Onboarding. Challenges and facts from EU banking implementations.

Customer onboarding is one of the most critical entry points of new data for a bank’s counterparty information. References and interactions with internal systems, externally sourced information, KYC processes, regulatory reporting and risk aggregation are all impacted by the quality of information that is fed into the organisation from the onboarding stage.

In times of automation and development of FinTech applications, the onboarding process remains largely manual. Organisations typically aim at minimising errors by means of human validation, often tasking an off-shore team to manually check and sign-off on the information for the new candidate counterparty. Professionals with experience in data management can relate to this equation: “manual data entry = mistakes”.

There are a variety of things that can go wrong when trying to add a new counterparty to an internal system. The first step is typically to ensure that a counterparty is not already present: onboarding officers rely on a mix of name & address information, vendor codes and open industry codes (e.g. the Legal Entity Identifier) to verify this. However, inaccurate search criteria, outdated or missing information in the internal systems and the lack of advanced search tools create the potential for problems in the process – an existing counterparty can easily get duplicated, when it should have been updated.

Datactics’ Entity Match Engine provides onboarding officers with the tools to avoid this scenario, both on Legal Entities’ and Individuals’ data. With advanced fuzzy logic and clustering of data from multiple internal and external sources, Match Engine avoids the build-up of duplication caused by mistakes, mismatches or constraints of existing search technology in the onboarding process.

Another common issue caused by manual onboarding processes is the lack of standardisation in the entry data. This creates problems downstream, reducing the value that the data can bring to core banking activities, decision making and the capacity to aggregate data for regulatory reporting in a cost-effective way.

Entity Match Engine has pre-built connectivity into the most comprehensive open and proprietary sources of counterparty information, such as Bloomberg, Thomson Reuters, GLEIF, Open Corporates, Companies House, etc. These sources are pre-consolidated by the engine and are used to provide the onboarding officer with a standardised suggestion of what the counterparty information should look like, comprehensive of the most up-to-date industry and vendor identifiers.

“Measure and ensure data quality at source” is a best practice and increasingly a data management mantra. The use of additional technology in the onboarding phase is precisely intended as a control mechanism for one of the most error-prone sources of information for financial institutions.

Luca Rovesti is Presales R&D Manager at Datactics

The post Who are you really doing business with? appeared first on Datactics.

Ok, data: it’s couch to 5K time

Matt Flenley — Wed, 24 Jan 2018 09:40:19 +0000

As promised in the first blog on the topic, we’re taking a look at achievable New Year’s resolutions that are within reach, whether you’re a Chief Data Officer overseeing broad strategy or a subject matter expert using data day-to-day, regulations (GDPR Compliance anyone?) are looming.

This part takes “I need to exercise more” and turns it into “I need my data ready to deliver business improvements by the summer.”

Summer’s always used as a target until we get there and find we’re still sitting on the couch watching The Crown and the biscuits are either all gone or very soon will be.

In banking, there’s nothing more refreshing than finding out that someone else is responsible for something, and it’s the same when trying to coax yourself into new health habits. Making someone else responsible for your fitness is one way of saying “speak to a personal trainer”; good personal trainers (PTs) are usually an excellent way of understanding what needs to be targeted and how to achieve it because they possess the experience of having done it all before, even with certified biscuit addicts.

Typically you might weigh up a PT based on reputation or skillset, or whatever’s available through a company plan. Taking time to get the right one is time well spent; likewise, a bank’s data needs specific qualities to ensure it can react to looming regulations (GDPR compliance, anyone?) and pressures from teams demanding faster customer acquisition and better customer retention.

Whilst it’s far broader than data quality, it’s fair to say that one thing GDPR compliance will be helped significantly by a robust understanding of the bank’s Single Customer View. This relies on data on core systems, so running health-checks aligned to the Enterprise Data Management Council’s DCAM standard for completeness, accuracy, timeliness and so forth will provide deltas and data requiring remediation to meet quality levels.

These might already in place at an enterprise level, but what if the evidence of your eyes – mailing files with outdated or incomplete addresses, or mismatched customer information – proves that not all cases are being caught? This is where a targeted, tactical strike can pay dividends: if you consider the investment in a personal trainer to tell you what needs to be done and when, so a niche set of insights into a specific dataset can yield specific actions to be taken. At its best it can augment any existing plan – for example, you might play sport once a week already, so add in a few days at the gym and there’s no need to displace your existing activity.

So it is with data quality solutions: sometimes it can prove beneficial to find something that will give a fast sense-check into the evidence of your own eyes, to enable better downstream decision-making. Ultimately, it’s the bank’s customers who will benefit, and this approach to doing what’s within reach will deliver tangible and timely improvements when it comes to compliance with bigger regulations.

Next time, we’ll take a look at Resolution Two on our list, and it’s all about losing weight; but remember, this is about data, so…now, where did I put those biscuits?

The post Ok, data: it’s couch to 5K time appeared first on Datactics.