Streamlining compliance: How generative AI revolutionizes KYC processes

Disclaimer: AI-generated summaries may contain errors, omissions, or misinterpretations. For the full context please read the content below.

Santosh Bhagat ,

Sandeep Reddy and

Muralikrishnan Puthanveedu

Published: July 02, 2025

Know Your Customer (KYC) is a crucial process that financial institutions and other regulated businesses use to verify the identity of their clients.

In today's data-driven business environment, verifiable, trustworthy information is crucial for effective decision-making. With data often considered ‘the new oil’ powering the fifth industrial revolution, trust in that data becomes paramount. As businesses increasingly enable real-time customer transactions, they depend on data-intensive processes for critical functions, including compliance, KYC, anti-money laundering (AML) and supplier risk management.

Reports from firms such as Fenergo, a major global player in risk and compliance technology, highlight the substantial increase in regulatory fines related to AML and KYC non-compliance. These fines are in the billions of dollars globally. For example, Fenergo reports that penalties for failing to comply with AML, KYC, environmental, social and governance (ESG), sanctions and customer due diligence (CDD) regulations totalled $6.6bn in 2023.

Here, we discuss the transformative impact of generative AI (genAI) on KYC processes in financial institutions, the rising importance of KYC in a data-driven environment and the increasing costs of non-compliance. We also delve into the current state of KYC, associated challenges and risks and genAI’s potential to enhance KYC workflows.

The current state of KYC in banking

To begin, let's establish a basic understanding of KYC and related banking terminology. We frequently encounter terms such as KYC, VKYC, eKYC, ReKYC and CKYC, outlined in more detail in the following table.

Examples of global standard regulatory requirements

Term	Full form	Mode	Key feature	Purpose	Regulatory basis
KYC	Know your customer	Physical / Digital	Initial ID verification	Fraud/AML prevention	BSA/USA PATRIOT Act, 2001 (FinCEN/OCC)
ReKYC	Re-know your customer	Physical / Digital	Periodic re-verification	Update customer data	FinCEN CDD Rule, 2016 (amended 2024)
eKYC	Electronic KYC	Digital	Digital ID verification	Fast onboarding	E-SIGN Act, 2000 / FinCEN Guidance, 2020
VKYC	Video KYC	Video (remote)	Live video verification	Remote KYC compliance	FinCEN Guidance, 2020
CKYC	Centralized KYC	Centralized	Unified KYC registry	Cross- institutional efficiency

Expand table

Collapse table

The processes for validating KYC documents are jurisdiction-specific, with regulatory bodies in each country establishing distinct requirements and procedures that reflect their individual goals and approaches to combating financial crime and ensuring regulatory compliance.

Most banks now offer digital KYC processes. For new customers or account activation processes, video KYC enables identity verification through video calls with agents. During this, documents are captured and validated via a maker-checker workflow, as demonstrated in this flow diagram.

How documents are captured and validated via a maker-checker workflow — Maker-checker workflow

Today, most banks have implemented digital processes that allow existing customers to complete their ReKYC in real-time. This ReKYC process is initiated based on the customer's risk profile.

Banks calculate ReKYC duration using a risk-based approach, factoring in customer risk profiles, regulatory guidance and internal policies. Customer risk assessment is based on factors such as customer type, geography, transaction behaviour or product and behavior anomalies. Banks set ReKYC intervals based on risk ratings, guided by FinCEN’s RBA and internal policies.

Beyond scheduled intervals, banks also perform ReKYC when specific events occur, such as a change in customer profile, suspicious activity, regulatory update or other account level activity.

Automated systems enable banks to continuously monitor customer activity and dynamically adjust ReKYC frequency. The performance of these KYC services is evaluated against the metrics outlined below.

Metrics that KYC services are evaluated against

Key performance indicators and requirements for KYC services:

Uptime: Expected to be 99.9%.
Performance:
- Support up to 100 TPS in parallel.
- Response time under 50 ms.
Scalability: Designed for 1-3 years of growth.
Integration and automation:
- Requires channel integration.
- Needs automated compliance reporting.
Data management:
- Demands real-time updates, including for customer demographics.
- Prioritizes lower risk of data leakage.
- Must act as a single source of truth for the bank.
Compliance and operations:
- Must meet audit and compliance requirements.
- Should allow for quick rollout of changes.
- Aims for maximum reliability.

Unified KYC workflow:

Unified KYC seeks to provide a holistic customer view, where KYC status is accessed from a single source and demographic updates are seamlessly synchronized across all connected CRM systems.

Challenges and risks in the KYC process

While crucial for financial security, KYC processes can present many challenges, including data privacy, lengthy onboarding and high costs, while also posing risks of fraud, false positives and regulatory non-compliance.

Here's a more detailed breakdown of the challenges and risks associated with KYC processes:

Challenges and risks in the KYC process

Category	Challenge	Risk	Example impact
Data quality	Inaccurate/incomplete data	Synthetic ID fraud	Mules onboard undetected
Cost	High operational expenses	Financial strain	$25-50B industry cost
Regulation	Complex, evolving norms	Fines ($6.6B in 2023)	Reputational damage
Customer UX	Process friction	Customer churn	Weak ReKYC compliance
Technology	Legacy system integration	Detection delays	Mule networks untracked
Scalability	Diverse population	Vulnerable group exploitation	Rural mules missed
Privacy	Data breaches	KYC data sold to fraudsters	₹21,367 Cr cyber losses
Error	False positives/negatives	Missed mules	Fraud team overload

Expand table

Collapse table

Traditional machine learning vs. generative AI in KYC: A comparative analysis

Traditional machine learning (ML) identifies “what is” (e.g., risky or not), while genAI creates “what could be” (e.g., realistic synthetic data), shifting from analysis to synthesis.

Traditional ML	Generative AI
Relies on discriminative models (e.g., logistic regression, random forests, CNNs) to classify or predict based on labeled data.	Uses generative models (e.g., GANs, VAEs, transformers) to create new data or simulate realistic scenarios from learned distributions.
In KYC, analyzes historical customer data (e.g., IDs, transaction patterns) to detect patterns or flag anomalies.	Generates synthetic customer profiles, documents, or behavioral data to augment training sets or test systems.
Heavily reliant on large, labeled datasets (e.g., verified customer records, fraud labels). Privacy concerns arise as it requires real customer data, risking breaches or regulatory violations (e.g., RBI guidelines, GDPR).	Reduces dependency on real data by generating synthetic alternatives that mimic real distributions. Mitigates privacy risks since synthetic data can replace sensitive PII (Personally Identifiable Information) for training.
Easier to implement with mature tools (e.g., Scikit-learn, XGBoost) and clear workflows (train, test, deploy).	Harder to implement — requires advanced frameworks (e.g., PyTorch, Hugging Face), significant compute power (GPUs) and expertise in tuning generative models.
High accuracy in well-defined tasks (e.g., spotting known fraud patterns) when trained on quality data.	Accuracy depends on the quality of generated data — may introduce noise if not tuned well (e.g., unrealistic synthetic IDs).

Expand table

Collapse table

Several companies are now focusing on applying genAI and ML to the KYC space:

Company	Technology and approach	Business problem addressed
Fenergo	AI, NLP, ML for automated onboarding, high-risk client identification.	Simplifying KYC/AML compliance and client lifecycle management.
Trulioo	AI and ML for electronic KYC (eKYC), risk profiling.	Automating customer due diligence and global KYC compliance.
Moody’s	AI, ML, genAI for automated risk assessment, intelligent screening.	Reducing manual KYC efforts and enhancing due diligence.
H2O.ai	AI and ML for data integration, predictive modeling, anomaly detection.	Enhancing KYC compliance by detecting suspicious behaviors and building customer profiles.
KFin Technologies	AI-driven, blockchain-backed KYC with DigiLocker, eAadhaar integration.	Streamlining KYC compliance and onboarding efficiency.

Expand table

Collapse table

Leveraging genAI for KYC

GenAI solutions are enhancing KYC workflows with a combination of automation, adaptability and efficiency. These solutions equip financial institutions with advanced tools for identity verification, risk profiling, case management, fraud detection and transaction monitoring — all crucial for navigating complex AML compliance requirements.

Stage	Activities	How genAI can be useful
Customer identification	Collect basic customer information such as name, address, DoB, contact details and OVD (Official valid documents).	LLMs, such as GPT or open-source models like LLaMA, can be used to create chatbots and virtual assistants that help customers with identity submission. This technology enables real-time, conversational data capture, replacing manual forms or branch visits.
Documents verification and due diligence	Cross-check documents against issuing authority databases. Gather additional details: occupation, income source, purpose of account, PEP sanction lists etc.	Diffusion models or Generative Adversarial Network (GANs) simulate edge-case fraud scenarios — forged passports, synthetic laundering patterns, or unusual onboarding behaviors — by generating high-fidelity data that challenges existing systems.
Account onboarding	Finalize customer registration and activate banking services.	LLMs trained on regulatory documents automate the interpretation of complex compliance rules, enabling real-time detection of potential non-compliance and reducing risk. These models can also generate personalized alerts and messages, using customer data to provide clear, tailored compliance information. It is advisable for institutions to use custom on-premises LLMs to safeguard sensitive data effectively, maintain complete data control and comply with regulatory requirements.
Ongoing monitoring and reporting	Continuously track customer activity to detect suspicious behavior or updates in risk profile.	GenAI generates diverse customer behavior scenarios (e.g., synthetic transaction histories) to enrich risk assessment models, blending seamlessly with traditional ML classifiers.

Expand table

Collapse table

The future: A hybrid path

The real power lies in synergy. Traditional ML can anchor KYC with its precision, while genAI augments it with creativity. Picture this: an ML model flags risky accounts, while a GAN generates synthetic test cases to sharpen its edge. Or an LLM summarizes findings for auditors, paired with ML’s risk scores.

LLMs and multimodal genAI (text, image, voice) will power intelligent KYC agents, handling end-to-end onboarding, queries and reporting with human-like finesse. These agents will integrate with core banking systems, learning from customer interactions in real-time.

To better understand this approach, take the example of a real life scenario — identifying a ‘mule account.’ As one of the most pressing issues for banks globally, we must first define what a mule account is:

A newly opened account using stolen/synthetic identities.
An existing legitimate account co-opted by fraudsters (e.g., via social engineering).
Operated by ‘money mules’ — individuals recruited, sometimes unwittingly, to move funds for a fee.

Identifying mule accounts in a banking system is a critical task for combating money laundering, fraud and other illicit activities. Mule accounts are bank accounts used — knowingly or unknowingly — to transfer illegally obtained funds, often obscuring the trail back to criminals.

How to identify mule accounts in a banking system:

By monitoring transaction patterns.
By assessing behavioral anomalies.
By analysing the behaviour or data pattern during the account opening process.
By using network link analysis.

How can a hybrid approach improve mule account identification?

XGBoost (Extreme Gradient Boosting) is a powerful machine learning algorithm well-suited for applications like bank mule account identification. As an advanced implementation of the gradient boosting framework, XGBoost is optimized for speed, scalability and high performance. It operates by sequentially building an ensemble of decision trees, with each tree correcting the errors of previous ones by minimizing a loss function, such as log loss for classification problems.

XGBoost is well-suited for banking applications due to its ability to handle imbalanced and high-dimensional data, as well as its capacity for real-time instant flagging. It can effectively adapt to complex patterns, such as sudden spikes in dormancy, and learn to identify new mule tactics through retraining.

However, XGBoost does have limitations that can affect model precision and speed in mule account identification.

Mule accounts are rare, limiting labeled data for model training.
XGBoost based models need labeled data, but many mules lack prior flags, especially new synthetic accounts.
It can be ineffective with unstructured behaviour sequencing data, such as typing speed or mouse path.
Fraudster tactics are constantly evolving, so the XGBoost model requires retraining.

How does a hybrid strategy optimize outcomes here?

GANs (Generative Adversarial Networks): A generator creates realistic mule samples and a discriminator refines them against real data. It can further integrate with XGBoost and augment the training data.
Diffusion's AI model can help in generating synthetic behavioral sequences (e.g., hesitant logins, rushed applications) to extract features.
Graph GANs can generate synthetic mule networks to train network detection.
Self-supervised LLM is fine-tuned on fraud alerts, to detect emerging patterns and help XGBoost to recognize novel fraud patterns.
The reason for this combination is LLM’s ability to process unstructured data and extract rich features with XGBoost’s efficiency, interpretability and performance on structured data. This subsides the limitation of XGBoost and makes the approach more efficient.

GenAI will shift KYC from reactive to proactive by simulating emerging fraud tactics — deep fake IDs, synthetic identities or AI-generated laundering schemes — before they hit the real world.

As KYC evolves, banks must weigh the trade-offs in ML’s stability versus genAI’s potential. Those that succeed will do so by mastering both, turning compliance from a burden into a strategic advantage.

Challenges in building a hybrid KYC solution

Limited data: Rare mule accounts create scarce labeled data, hindering ML model training. This scarcity hinders the development of accurate models, as traditional ML relies heavily on labeled datasets. Generating synthetic data to augment training sets is a potential solution, but ensuring the quality and realism of this data is technically challenging.
Evolving fraud: Constantly changing fraud tactics require frequent model retraining and proactive genAI simulations. A hybrid KYC solution must incorporate mechanisms for continuous learning to keep pace with evolving fraud techniques. This requires ongoing investment in model retraining, monitoring, and updating to ensure the system remains effective.
Unstructured data: Processing behavioral data (e.g., typing speed) demands complex genAI pipelines. Traditional ML models like XGBoost struggle with unstructured behavioral data, such as typing speed or mouse paths, which are increasingly relevant for fraud detection.
Integration complexity: Combining ML and genAI with banking systems requires seamless interoperability.

Legacy systems often have rigid architectures, making it difficult to incorporate advanced AI technologies. Hybrid solutions require robust data pipelines and interoperability standards to ensure that traditional ML and genAI components work cohesively within existing workflows.
Regulatory compliance: Adhering to diverse global and local regulations adds complexity.

Financial institutions operate in a complex regulatory landscape, particularly when serving global markets. Hybrid KYC solutions must ensure that both traditional ML and genAI components adhere to these regulations without compromising data security.

KYC processes involve handling sensitive customer data, such as personal identification and financial records. Ensuring compliance with stringent data privacy regulations, such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), is critical when integrating advanced AI technologies.
High implementation costs: Developing a hybrid KYC solution involves substantial upfront costs for technology infrastructure, data management, model development and ongoing maintenance. Integrating genAI with existing systems, such as core banking systems (CBS) and customer data platforms (CDP), demands skilled personnel and robust IT infrastructure. The challenge lies in ensuring that the efficiency gains from automation justify these costs while maintaining or reducing overall expenses.

Building a hybrid KYC solution that synergizes traditional ML and genAI is a complex but transformative venture. Thoughtworks is uniquely positioned to guide financial institutions through this journey, leveraging our expertise in AI, data engineering and regulatory compliance to deliver innovative, value-driven solutions.

Get in touch to find out how our AI solutions can streamline your KYC processes, improve fraud detection and transform compliance into a strategic asset.

Recommended insights

View less

Industries

Publications and Tools

All Insights

Streamlining compliance: How generative AI revolutionizes KYC processes

The current state of KYC in banking

Key performance indicators and requirements for KYC services:

Unified KYC workflow:

Challenges and risks in the KYC process

Traditional machine learning vs. generative AI in KYC: A comparative analysis

Leveraging genAI for KYC

The future: A hybrid path

How can a hybrid approach improve mule account identification?

How does a hybrid strategy optimize outcomes here?

Challenges in building a hybrid KYC solution

Recommended insights

Let's talk about your next project