KYC Archives - Datactics https://www.datactics.com/tag/kyc/ Unlock your data's true potential Sun, 28 Jul 2024 22:27:02 +0000 en-GB hourly 1 https://wordpress.org/?v=6.7.2 https://www.datactics.com/wp-content/uploads/2023/01/DatacticsFavIconBluePink-150x150.png KYC Archives - Datactics https://www.datactics.com/tag/kyc/ 32 32 All things AML and FinTech Finance: Virtual Arena – weekly round-up https://www.datactics.com/blog/marketing-insights/weekly-round-up-aml-ff-arena/ Fri, 30 Oct 2020 14:00:15 +0000 https://www.datactics.com/?p=12865 We started by looking at why data matching is a key part of any AML & KYC process. It’s made more complex by the different standards, languages, and levels of quality in the different data sources on which firms typically rely on. It’s expensive too: a recent Refinitiv article states that some firms are spending up to […]

The post All things AML and FinTech Finance: Virtual Arena – weekly round-up appeared first on Datactics.

]]>
AML

We started by looking at why data matching is a key part of any AML & KYC process. It’s made more complex by the different standards, languages, and levels of quality in the different data sources on which firms typically rely on. It’s expensive too: a recent Refinitiv article states that some firms are spending up to $670m each year on KYC. 

As the week went on, we looked at some of the key areas where Datactics makes a real difference in helping firms to reduce manual effort, reduce risk, and bring down the extremely high cost of client onboarding. 

We then looked at the impact of the EU’s fifth AML directive and how firms are able to automate their sanctions screening with the sanctions match engine.  

We also explored how we support efforts to reduce risk and financial crime involving the clever tech we’ve used to transliterate between character sets and perform multi-language matching. 

Finishing up, we shared our talk with the EDM Council that explored how AI can make a real difference to the story. Bringing even more predictive capabilities to human effort means that finding those edge cases, don’t have to wait until all the obvious ones have been ruled out. We also composed a piece entitled ‘Lifting the lid on the problems that Datactics solves’, if you missed it out can check it out here

AML

If you missed any of the pieces we shared this week, feel free to read them on our DataBlog or on our social media platforms.  

In other news this week, our very own Head of AI, Dr Fiona Browne contributed to the FinTech Finance: Virtual Arena. This session discussed the huge AML fines faced by the banks over the last number of years.

AML

At Datactics we are a company that helps banks gain quality data – a tool that is equipped to fight fraudsters and money launderers. Fiona was able to share her experience as Head of AI at Datactics to shed light on how banks can arm themselves sufficiently to allow them to stand up to increasing regulatory and technological complexity. 

Datactics provides the tools to tackle these issues with minimum IT overhead, in a powerful and agile way.  If you missed the session, you can watch it back on LinkedIn by following this link.  

Have a great weekend! Hope you enjoyed this week’s round-up.    

Click here for more by the author, or find us on LinkedInTwitter or Facebook for the latest news. You can also read the last round up here or keep an eye out for our next one! 

The post All things AML and FinTech Finance: Virtual Arena – weekly round-up appeared first on Datactics.

]]>
IRMAC Reflections with Dr. Fiona Browne https://www.datactics.com/blog/ai-ml/irmac-reflections-with-dr-fiona-browne/ Mon, 07 Sep 2020 09:00:00 +0000 https://www.datactics.com/?p=11379 There is a lot of anticipation surrounding Artificial Intelligence (Al) and Machine Learning (ML) in the media. Alongside the anticipation is speculation – including many articles placing fear into people by inferring that AI and ML will replace our jobs and automate our entire lives! Dr Fiona Browne, Head of AI at Datactics recently spoke at an IRMAC (Information […]

The post IRMAC Reflections with Dr. Fiona Browne appeared first on Datactics.

]]>
There is a lot of anticipation surrounding Artificial Intelligence (Al) and Machine Learning (ML) in the media. Alongside the anticipation is speculation – including many articles placing fear into people by inferring that AI and ML will replace our jobs and automate our entire lives!

Dr Fiona Browne, Head of AI at Datactics recently spoke at an IRMAC (Information Resource Management Association of Canada) webinar, alongside Roger Vandomme, of Neos, to unpack what AI/ML is, some of the preconceptions, and the reasons why different approaches to ML are taken…  

IRMAC reflections with Dr Browne

What is AI/ ML? 

Dr. Browne clarified that whilst there is no official agreed-upon definition of AI, it can be depicted as the ability of a computer to perform cognitive tasks, such as voice/speech recognition, decision making, or visual perception. ML is a subset of AI, entailing different algorithms that learn from input data.  

A point that Roger brought up at IRMAC was that the algorithms learn to identify patterns within the data and the used patterns enable the ability to distinguish between different outcomes, for example, the detection of a fraudulent or non-fraudulent transaction. 

ML takes processes that are repetitive and automates them. At Datactics, we are exploring the usage of AI and ML in our platform capabilities – Dr Fiona Browne 

What are the different approaches to ML?  

Supervised, unsupervised, and reinforcement machine learning.  Dr. Browne communicated that at a broad level, there are three approaches: supervised, unsupervised, and reinforcement machine learning.  

In supervised ML, the model learns from a labelled training data set. For example, financial transactions would be labelled as either fraudulent or genuine fed into the ML model. The model then learns from this input and can distinguish the difference.  

Where data is unlabelled, Dr. Browne explained that unsupervised ML would be more appropriate, where the model learns from unlabelled data. There is a key difference here with supervised ML in that the model would seek to uncover clusters or patterns inherent in the data to enable it to separate them out.  

Finally, reinforcement machine learning involves models that continually learn and update from performing a task. For example, a computer algorithm learning how to play the game ‘Go’. This is achieved by the outputs of the model being validated and that validation being provided back to the model.  

The difference between supervised learning and reinforcement learning is that in supervised learning the training data has the answer key with it, meaning the model is trained with the correct answer.

In contrast to this, in reinforcement learning, there is no answer, but the reinforcement agent selects what to do to perform the specific task.

It is important to remember that if there is no training dataset present, it is bound to learn from its experience.  Often the biggest trial comes when a model is being transferred out of the training environment and into the real world.

Now that AI/ML and the different approaches have been unpacked… the next question is how does explainability fit into this?  The next mini IRMAC reflection will unravel what explainability is and what the different approaches are. Stay tuned! 

Fiona has written an extensive piece on AI enabled data quality, feel free to check it out here. 

Click here for more by the author, or find us on LinkedinTwitter or Facebook for the latest news.

The post IRMAC Reflections with Dr. Fiona Browne appeared first on Datactics.

]]>
Dataset Labelling For Entity Resolution & Beyond with Dr Fiona Browne https://www.datactics.com/blog/ai-ml/blog-ai-dataset-labeling/ Fri, 05 Jun 2020 10:40:43 +0000 https://www.datactics.com/blog-ai-dataset/ In late 2019 our Head of AI, Dr Fiona Browne, delivered a series of talks to the Enterprise Data Management Council on AI-Enabled Data Quality in the context of AML operations, specifically for resolving differences in dataset labelling for legal entity data. In this blog post, Fiona goes under the hood to explain some of […]

The post Dataset Labelling For Entity Resolution & Beyond with Dr Fiona Browne appeared first on Datactics.

]]>

In late 2019 our Head of AI, Dr Fiona Browne, delivered a series of talks to the Enterprise Data Management Council on AI-Enabled Data Quality in the context of AML operations, specifically for resolving differences in dataset labelling for legal entity data.

In this blog post, Fiona goes under the hood to explain some of the techniques that underpin Datactics’ extensible AI Framework.

Across the financial sector, Artificial Intelligence (AI) and Machine Learning (ML) have been applied to a number of areas, including the profiling of behaviour for fraud detection and Anti-Money Laundering (AML), through to the use of natural language processing to enrich data in Know-Your-Customer processes (KYC).

An important part of the KYC/AML process is entity resolution, which is the process of identifying and resolving entities from multiple data sources. This is traditionally the space that high-performance matching engines have been deployed, with associated fuzzy-match capabilities used to account for trivial or significant differences (indeed, this is part of Datactics’ existing self-service platform).

In this arena, Machine Learning (ML) techniques have been applied to address the task of entity resolution using different approaches from graphs and network analysis to probabilistic matching.

Although ML is a sophisticated approach for democratizing entity resolution, a limitation of applying this approach is the requirement of large volumes of labelled data for the ML model to learn from when supervised ML is used.

What is Supervised ML? 

For supervised ML, a classifier is trained using a labelled dataset. This is a dataset that contains example inputs paired with their correct output label. In the case of entity resolution, this includes examples of input matches and non-matches which are correctly labelled. The Machine Learning algorithms learns from these examples and identifies patterns that link to specific outcomes. The trained classifier then uses this learning to make a prediction on new unseen cases based on their input values.

Dataset Labelling

As we see from above, for supervised ML we need high quality labelled examples for the classifier to learn from. Unlabelled data or poorly labelled data will only make it harder data labelling tools to work. The process of labelling raw data from scratch can be time-consuming and labour intensive especially if experts are required to provide labels for, in this example, entity resolution outputs. The data labelling process is repetitive in nature, and there is a need for consistency in the labelling process to ensure high quality and correct labels are applied. It is also costly in monetary terms, as those involved in processing the entity data require a high level of understanding of the nature of entities and ultimate beneficial owners, and in the context of failure where regulatory sanctions and fines can result.

Approaches for Dataset Labelling

As AI/ML progresses across all sectors, we have seen the rise in industrial level dataset labelling where companies/individuals are able to outsource their labelling tasks to annotation tools and labelling services. For example, the Amazon Mechanical Turk service, which enables the crowdsourcing of labelling of data. This can reduce data labelling work from months to hours.  Machine Learning models can also be harnessed for data annotation tasks using approaches such as weak and semi-supervised learning along with Human-In-The-Loop Learning (HITL). HITL enables the improvement on ML models through the incorporation of human feedback through stages such as training, testing and evaluation.

ML approaches for Budgeted Learning

We can think of budgeted learning as a balancing act between the expense (in terms of cost, effort and time) of acquiring training data against the predictive performance of the model that you are building. For example, can we label a few hundred types of data instead of hundreds of thousands? There are a number of ML approaches that can help with this question and reduce the burden of manually labelling large volumes of training data. These include transfer learning, where you reuse previously gained knowledge. For instance, leveraging existing labelled data from a related sector or similar task. The recent open-source system Snorkel uses a form of weak supervision to label datasets via programmable labelling functions.

Active learning is a semi-supervised ML approach which can be used to reduce the burden of manually labelling datasets. The ‘active learner’ proactively selects the training  dataset it needs to learn from. This is based on the concept that an ML model can achieve good predictive performance with fewer training sample instances by prioritising the examples to learn from. During the training process, an active learner poses queries which can be a selection of unlabelled instances from a dataset. These ML selected instances are then presented to an expert to manually label.

As it is seen above, there are wide and varied approaches to tackling the task of dataset labelling. What approach to select depends on a number of factors from the prediction task through to expense and budgeted learning. The connecting tenet is ensuring high quality labelled datasets for classifiers to learn from.

Click here for more from Datactics, or find us on LinkedinTwitter or Facebook for the latest news.

The post Dataset Labelling For Entity Resolution & Beyond with Dr Fiona Browne appeared first on Datactics.

]]>
Self-Service Data Quality for DataOps https://www.datactics.com/blog/ceo-vision/ceo-vision-self-service-data-quality-for-dataops/ Tue, 05 May 2020 11:12:48 +0000 https://www.datactics.com/ceo-vision-self-service-data-quality-for-dataops/ At the recent A-Team Data Management Summit Virtual, Datactics CEO Stuart Harvey delivered a keynote on “Self-Service Data Quality for DataOps – Why it’s the next big thing in financial services.” The keynote (available here) can be read below, with slides from the keynote included for reference. Should you wish to discuss the subject with […]

The post Self-Service Data Quality for DataOps appeared first on Datactics.

]]>
At the recent A-Team Data Management Summit Virtual, Datactics CEO Stuart Harvey delivered a keynote onSelf-Service Data Quality for DataOps – Why it’s the next big thing in financial services.” The keynote (available here) can be read below, with slides from the keynote included for reference. Should you wish to discuss the subject with us, please don’t hesitate to contact Stuart, or Kieran Seaward, Head of Sales.  

I started work in banking in the 90’s as a programmer, developing real-time software systems written in C++. In these good old days, I’d be given a specification, I’d write some code, test and document it. After a few weeks it would be deployed on the trading floor. If my software broke or the requirements changed it would come back to me and I’d start this process all over again. This ‘waterfall’ approach was slow and, if I’m honest, apart from the professional pride of not wanting to create buggy code, I didn’t feel a lot of ownership for what I’d created. 

In the last five years a new methodology in software engineering has changed all that – it’s called DevOps, and brings a very strategic and agile approach to building new software.

More recently DevOps had a baby sister called DataOps, and it’s this subject that I’d like to talk about today.

Many Chief Data Officers (CDO) and analysts have been impressed by the increased productivity and agility their Chief Technology Officer (CTO) colleagues are seeing through the use of DevOps. Now they’d like to get in on the act. In the last few months at Datactics we’ve been talking a lot to CDO clients about their desire to have a more agile approach to data governance and how DataOps fits into this picture.  

In these conversations we’ve talked a great deal about the ownership of data. A key question is how to associate the measurement and fixing of a piece of broken data with the person most closely responsible for it. In our experience the owner of a piece of data usually makes the best data steward. These are the people who can positively affect business outcomes through accurate measuring and monitoring of data and is typically a CDO’s role. 

We have seen a strong desire to push data science processes, including data governance and the measurement of actual data quality (at a record level) into the processes and automation that exist in a bank. 

I’d like to share with you through some simple examples of what we are doing with our investment bank and wealth management clients. I hope that this shows that a self-service approach to data quality (with appropriate tooling) can empower highly agile data quality measurement for any company wishing to implement the standard DataOps processes of validation, sorting, aggregation, reporting and reconciliation. 

Roles in DataOps and Data Quality 

We work closely with the people who use the Datactics platform, the people that are responsible for the governance of data and reporting on its quality. They have titles like Chief Data Officer, Data Quality Manager, Chief Digital Officer and Head of Regulation. These data consumers are responsible for large volumes of often messy data relating to entities, counterparties, financial reference data and transactions. This data does not reside in just one place; it transitions through multiple bank processes. It is sometimes “at rest” in a data store and sometimes “in motion” as it passes via Extract, Transform, Load (ETL) processes to other systems that live upstream of the point at which it was sourced.  

For example, a bank might download counterparty information from Companies House to populate its Legal Entity Master. This data is then published out to multiple consuming applications for Know Your Customer (KYC), Anti-Money Laundering (AML) and Life Cycle Management. In these systems the counterparty records are augmented with information such as a Legal Entity Identifier (LEI), a Bank Identifier Code (BIC) or a ticker symbol.  

This ability to empower subject matter experts and business users who are not programmers to measure data at rest and in motion has led to the following trends: 

  • Ownership: Data quality management moves from being the sole responsibility of a potentially remote data steward to all of those who are producing and changing data, encouraging a data driven culture. 
  • Federation: Data quality becomes everyone’s job. Let’s think about end of day pricing at a bank. The team that owns the securities master will want to test accuracy and completeness of data arriving from a vendorThe analyst working upstream who takes an end of day price from the securities master to calculate a volume-weighted average price (VWAP) will have different checks relating to the timeliness of information. Finally, the data scientist upstream of this who uses the VWAP to create predictive analytics. They want to build their own rules to validate data quality. 
  • Governance: A final trend that we are seeing is the tighter integration with standard governance tools. To be effective, self-service data quality and DataOps require tight integration with the existing systems that hold data dictionaries, metadata, and lineage information.

Here’s an illustration of how of how we see Datactics Self Service Data Quality (SSDQ) Platform integrating with DataOps in a highimpact way that you might want to consider in your own data strategy. 

1. Data Governance Team 

First off, we offer a set of pre-built dashboards for PowerBI, Tableau and Qlik that allow your data stewards to have rapid access to data quality measurements which relate just to them. A user in the London office might be enabled to see data for Europe or, perhaps, just data in their department. Within just a few clicks a data steward for the Legal Entity Master system could identify all records that are in breach of an accuracy check where an LEI is incorrector a timeliness check where the LEI has not been revalidated in the Global LEI Foundation’s (GLEIF) database inside 12 months. 


2. Data Quality Clinic: Data Remediation 

Data Quality Clinic extends the management dashboard by allowing a bank to return broken data to its owner for fixing. It effectively quarantines broken records and passes them to the data engineer in a queue, improving data pipelines and overall data governance & data quality. Clinic runs is a web browser and is tightly integrated with information relating to data dictionaries, lineage and thirdparty sources for validation. Extending our LEI example just now, I might be the owner of a bunch of entities which have failed an LEI check. Clinic would show me the records in question and highlight the fields in error. It would connect to GLEIF as the source of truth for LEIs and provide me with hints on what to correct. As you’d expect, this process can be enhanced by Machine Learning to automate this entity resolution process under human supervision.  


3. FlowDesigner Studio: Rule creation, documentation, sharing 

FlowDesigner is the rules studio in which the data governance team of super users build, manage, document and source-control rules for the profiling, cleansing and matching of enterprise data. We like to share these rules across our clients so FlowDesigner comes pre-loaded with rules for everything from name and address checking to CUSIP or ISIN validation. 


4. Data Quality Manager: Connecting to data sources; scheduling, automating solutions 

This part of the Datactics platform allows your technology team to connect to data flowing from multiple sources, schedule how rules are applied to data at rest and inmotion. It allows for the sharing and re-use of rules across all parts of your business. We have many clients solving big data problems involving hundreds of millions of records using Data Quality Manager across multiple different environments and data sources, on-premise or in public (or more typically private) cloud. 


Summary: Self-Service Data Quality for DataOps 

Thanks for joining me today as I’ve outlined how self-service data quality is a key part of successful DataOps. CDOs need real-time data quality insights to keep up with business needs while technical architects require a platform that doesn’t need a huge programming team to support it. If you have any questions about this topic, or how we’ve approached it, then we’d be glad to talk with you. Please get in touch below. 

Click here for the latest news from Datactics, or find us on LinkedinTwitter or Facebook 

The post Self-Service Data Quality for DataOps appeared first on Datactics.

]]>
How Datactics helps Santa with his Data Quality issues https://www.datactics.com/blog/cto-vision/how-datactics-helps-santa-with-his-data-quality-issues/ Thu, 19 Dec 2019 16:00:45 +0000 https://www.datactics.com/how-datactics-can-help-santa-with-his-data-quality-issues/ Yes of course Santa has data quality issues! Everyone has data quality issues. In this article, we outline how Datactics software can help Santa improve the efficiency of his pre-Christmas operations and have a stress free Christmas Eve delivery and a relaxing Christmas Day. Data Quality Firewall, REST API, Data Quality Remediation Datactics provides a […]

The post How Datactics helps Santa with his Data Quality issues appeared first on Datactics.

]]>
Yes of course Santa has data quality issues! Everyone has data quality issues.

In this article, we outline how Datactics software can help Santa improve the efficiency of his pre-Christmas operations and have a stress free Christmas Eve delivery and a relaxing Christmas Day.

Data Quality Firewall, REST API, Data Quality Remediation

Datactics provides a REST API interface and “Data Quality Firewall” to allow the import of data from Optical Character Recognition (OCR) software that has scanned the children’s letters and guarantees the quality of data entering the data store. Records passing DQ criteria are automatically allowed through to Santa, while records failing DQ checks are quarantined where they can be reviewed interactively by Santa’s Elves in the Data Quality Clinic

Oh dear! Did Ellie ask for a Barbie House or a Barbie Horse? Not to worry – the Record is in quarantine and will be reviewed by an Elf who perhaps knows Ellie and can find out what she wanted, and can check against additional data sources like the latest online toy catalogues to discover what the possible matches might be. This saves the elves significant time in only having to review a smaller set of records, making the busiest time of the year far less stressful for all at the North Pole!

SVC – Single View of Child

Managing vast quantities of historical Personally Identifiable Information (PII) on his data servers in Lapland is a difficult task, but Datactics can help create a Single View of Child from the disparate data silos, normalising the data and creating a golden record for each child. This ensures that presents aren’t duplicated and more importantly keeps him compliant with GDPR.

Address Validation

The last thing Santa wants on Christmas Eve when he’s delivering to a few billion houses is to go to the wrong address, it wastes time and risks a potential present mix up. Fortunately, Datactics makes it easy to validate the children’s addresses against databases such as the Post Code Address File (PAF) and Capscan so Santa knows he’s going to the right place before he sets out.

Screening Against the Naughty List

This is not as simple as it may sound because you have to get it right or someone is going to be very upset. But using the established techniques Datactics has developed for KYC & AML screening against Politically Exposed Persons etc. Santa can screen against The Naughty List with confidence

Excitingly, it’s not only Santa who can screen against this list: everyone can try the naughty list screening for free at https://aml-screening.datactics.com/

Merry Christmas from everyone at Datactics!

Click here for more from Datactics, or find us on LinkedinTwitter or Facebook for the latest news.

The post How Datactics helps Santa with his Data Quality issues appeared first on Datactics.

]]>
Datactics demonstrates AI-Enabled Data Quality in EDM webinar https://www.datactics.com/blog/marketing-insights/datactics-demonstrates-ai-enabled-data-quality-in-edm-webinar/ Tue, 26 Nov 2019 12:45:23 +0000 https://www.datactics.com/datactics-demonstrates-ai-enabled-data-quality-in-edm-webinar/ Datactics is pleased to demonstrate AI-Enabled Data Quality in our EDM Webinar. Featuring Datactics CTO, Alex Brown; Dr. Fiona Browne, Head of AI; and CEO, Stuart Harvey covering a practical use case for AI in entity resolution. Co-hosted by EDM Council and Datactics. Watch it online here: https://register.gotowebinar.com/recording/8207973451637805825 This webinar provides an overview of AI and […]

The post Datactics demonstrates AI-Enabled Data Quality in EDM webinar appeared first on Datactics.

]]>
Datactics on Twitter: "This #WebinarWednesday we want to highlight our  recent @edmcouncil Webinar about AI-enabled Data Quality which featured our  Head of AI Fiona Browne, CEO Stuart Harvey and CTO Alex Brown.

Datactics is pleased to demonstrate AI-Enabled Data Quality in our EDM Webinar. Featuring Datactics CTO, Alex Brown; Dr. Fiona Browne, Head of AI; and CEO, Stuart Harvey covering a practical use case for AI in entity resolution.

Co-hosted by EDM Council and Datactics.

Watch it online here: https://register.gotowebinar.com/recording/8207973451637805825

This webinar provides an overview of AI and the application of AI in the FinTech sector. We highlight how fundamental data quality is in the sector, underpinning key tasks such as AML/KYC and fraud detection. Clean data is critical in order to apply AI/ML solutions.

Banks face a data management challenge in relation to on-boarding and KYC. They have very large sets of messy counter-party data and this data is often subject to duplication. Datactics explores the use of AI to enhance data quality and matching by way of a deep dive into the entity resolution process. This process is currently very manual and the webinar examines the improvement in match accuracy and reduction in human effort through the intelligent application of machine learning. The webinar discusses the results of a large scale study using open entity data sources from GLEIF and Refinitiv.

The EDM webinar explores:

How AI/ML technologies can improve the consistency and accuracy of data.

The best approaches to implementing AI in day-to-day processes

Methods of calculating ROI and efficiencies when evaluating AI

How AI is helping deliver faster and better results in entity onboarding in particular

How human-in-the-loop AI can facilitate early adoption with full traceability

For more information, to discuss how the Datactics AI Engine can help your business, or to set up a demo please contact Kieran Seaward at Kieran.Seaward@Datactics.com or call directly on 02890 233 900.

Click here for the latest news from Datactics, or find us on Linkedin, Twitter or Facebook

The post Datactics demonstrates AI-Enabled Data Quality in EDM webinar appeared first on Datactics.

]]>
Transliteration matching in Japanese, Chinese, Russian, Arabic and all non-Latin data sets https://www.datactics.com/blog/cto-vision/transliteration-matching/ Fri, 25 Oct 2019 15:15:53 +0000 https://www.datactics.com/transliteration-matching/ Here at Datactics, we’ve recently done a number of Transliteration matching tasks helping people with Japanese, Russian Cyrillic and Arabic data sets. Transliteration matching can seem challenging, especially when presented with text that you don’t understand, but with the right techniques a lot can be achieved – the key is to really understand the problem […]

The post Transliteration matching in Japanese, Chinese, Russian, Arabic and all non-Latin data sets appeared first on Datactics.

]]>

Here at Datactics, we’ve recently done a number of Transliteration matching tasks helping people with Japanese, Russian Cyrillic and Arabic data sets.

Transliteration matching can seem challenging, especially when presented with text that you don’t understand, but with the right techniques a lot can be achieved – the key is to really understand the problem and have some proven techniques for dealing with them:

  • Transliteration – Matching data within a single character set

We have a long-standing Chinese customer who routinely matches data sets of 100’s of millions of customer records all in Chinese. Even with messy data, this is a relatively straight forward task, as long as your matching algorithms can handle Unicode properly, fuzzy matching within a single character set e.g. Chinese customer database to Chinese marketing database is very similar to the same task in a roman character set albeit with some tweaks to fuzzy match tolerances.

  • Frequency Analysis

Another very useful technique is to perform frequency analysis on the input text to help identify ‘noise text’ such as company legal forms within company names that can be either eliminated from the match or that should be matched with lower importance than the rest of a company name. For example frequency analysis on a Japanese entity master database may reveal a large number of company names containing the Kanji “株式会社” or “” – the Japanese equivalent of ‘Limited’ (or ‘Ltd.’ in abbreviated form). The beauty of this technique is that it can be applied to any language or character set.

  • Matching between character sets using Transliteration, fuzzy and phonetic matching

A common requirement in the AML/KYC space is matching account names in Chinese, Japanese, or Cyrillic etc to sanctions and PEP lists which are usually published in Latin script. In order to do this a process called ‘transliteration’ is required. Transliteration converts text in one character set to another, but the results from raw transliteration are not always usable since the resulting transliterated text is often more of a ‘pronunciation guide’ rather than how a native speaker would write the text in Latin script. However, by using a combination of fuzzy and phonetic matching on the transliterated string, it is possible to obtain very accurate matching.

If you’d like to try this out for yourself, Cyrillic transliteration is built into our free AML screening demonstration app. You can register and find out more here or click below to see it in action:

The post Transliteration matching in Japanese, Chinese, Russian, Arabic and all non-Latin data sets appeared first on Datactics.

]]>
Who are you really doing business with? https://www.datactics.com/blog/good-data-culture/who-are-you-really-doing-business-with/ Fri, 04 Jan 2019 15:51:45 +0000 https://www.datactics.com/who-are-you-really-doing-business-with/ Entity Match Engine for Customer Onboarding. Challenges and facts from EU banking implementations.  Customer onboarding is one of the most critical entry points of new data for a bank’s counterparty information. References and interactions with internal systems, externally sourced information, KYC processes, regulatory reporting and risk aggregation are all impacted by the quality of information […]

The post Who are you really doing business with? appeared first on Datactics.

]]>

Entity Match Engine for Customer Onboarding. Challenges and facts from EU banking implementations. 

Customer onboarding is one of the most critical entry points of new data for a bank’s counterparty information. References and interactions with internal systems, externally sourced information, KYC processes, regulatory reporting and risk aggregation are all impacted by the quality of information that is fed into the organisation from the onboarding stage.

In times of automation and development of FinTech applications, the onboarding process remains largely manual. Organisations typically aim at minimising errors by means of human validation, often tasking an off-shore team to manually check and sign-off on the information for the new candidate counterparty. Professionals with experience in data management can relate to this equation: “manual data entry = mistakes”.

There are a variety of things that can go wrong when trying to add a new counterparty to an internal system. The first step is typically to ensure that a counterparty is not already present: onboarding officers rely on a mix of name & address information, vendor codes and open industry codes (e.g. the Legal Entity Identifier) to verify this. However, inaccurate search criteria, outdated or missing information in the internal systems and the lack of advanced search tools create the potential for problems in the process – an existing counterparty can easily get duplicated, when it should have been updated.

Datactics’ Entity Match Engine provides onboarding officers with the tools to avoid this scenario, both on Legal Entities’ and Individuals’ data. With advanced fuzzy logic and clustering of data from multiple internal and external sources, Match Engine avoids the build-up of duplication caused by mistakes, mismatches or constraints of existing search technology in the onboarding process.

Another common issue caused by manual onboarding processes is the lack of standardisation in the entry data. This creates problems downstream, reducing the value that the data can bring to core banking activities, decision making and the capacity to aggregate data for regulatory reporting in a cost-effective way.

Entity Match Engine has pre-built connectivity into the most comprehensive open and proprietary sources of counterparty information, such as Bloomberg, Thomson Reuters, GLEIF, Open Corporates, Companies House, etc. These sources are pre-consolidated by the engine and are used to provide the onboarding officer with a standardised suggestion of what the counterparty information should look like, comprehensive of the most up-to-date industry and vendor identifiers.

“Measure and ensure data quality at source” is a best practice and increasingly a data management mantra. The use of additional technology in the onboarding phase is precisely intended as a control mechanism for one of the most error-prone sources of information for financial institutions.

Luca Rovesti is Presales R&D Manager at Datactics

The post Who are you really doing business with? appeared first on Datactics.

]]>
Ok, data: it’s couch to 5K time https://www.datactics.com/blog/marketing-insights/ok-data-its-couch-to-5k-time/ Wed, 24 Jan 2018 09:40:19 +0000 https://www.datactics.com/ok-data-its-couch-to-5k-time/ As promised in the first blog on the topic, we’re taking a look at achievable New Year’s resolutions that are within reach, whether you’re a Chief Data Officer overseeing broad strategy or a subject matter expert using data day-to-day, regulations (GDPR Compliance anyone?) are looming. This part takes “I need to exercise more” and turns it into […]

The post Ok, data: it’s couch to 5K time appeared first on Datactics.

]]>

As promised in the first blog on the topic, we’re taking a look at achievable New Year’s resolutions that are within reach, whether you’re a Chief Data Officer overseeing broad strategy or a subject matter expert using data day-to-day, regulations (GDPR Compliance anyone?) are looming.

This part takes “I need to exercise more” and turns it into “I need my data ready to deliver business improvements by the summer.”

Summer’s always used as a target until we get there and find we’re still sitting on the couch watching The Crown and the biscuits are either all gone or very soon will be.

In banking, there’s nothing more refreshing than finding out that someone else is responsible for something, and it’s the same when trying to coax yourself into new health habits. Making someone else responsible for your fitness is one way of saying “speak to a personal trainer”; good personal trainers (PTs) are usually an excellent way of understanding what needs to be targeted and how to achieve it because they possess the experience of having done it all before, even with certified biscuit addicts.

Typically you might weigh up a PT based on reputation or skillset, or whatever’s available through a company plan. Taking time to get the right one is time well spent; likewise, a bank’s data needs specific qualities to ensure it can react to looming regulations (GDPR compliance, anyone?) and pressures from teams demanding faster customer acquisition and better customer retention.

Whilst it’s far broader than data quality, it’s fair to say that one thing GDPR compliance will be helped significantly by a robust understanding of the bank’s Single Customer View. This relies on data on core systems, so running health-checks aligned to the Enterprise Data Management Council’s DCAM standard for completeness, accuracy, timeliness and so forth will provide deltas and data requiring remediation to meet quality levels.

These might already in place at an enterprise level, but what if the evidence of your eyes – mailing files with outdated or incomplete addresses, or mismatched customer information – proves that not all cases are being caught? This is where a targeted, tactical strike can pay dividends: if you consider the investment in a personal trainer to tell you what needs to be done and when, so a niche set of insights into a specific dataset can yield specific actions to be taken. At its best it can augment any existing plan – for example, you might play sport once a week already, so add in a few days at the gym and there’s no need to displace your existing activity.

So it is with data quality solutions: sometimes it can prove beneficial to find something that will give a fast sense-check into the evidence of your own eyes, to enable better downstream decision-making. Ultimately, it’s the bank’s customers who will benefit, and this approach to doing what’s within reach will deliver tangible and timely improvements when it comes to compliance with bigger regulations.

Next time, we’ll take a look at Resolution Two on our list, and it’s all about losing weight; but remember, this is about data, so…now, where did I put those biscuits?

The post Ok, data: it’s couch to 5K time appeared first on Datactics.

]]>