Know Your Customer (KYC) is a crucial process that financial institutions and other regulated businesses use to verify the identity of their clients.
In today's data-driven business environment, verifiable, trustworthy information is crucial for effective decision-making. With data often considered ‘the new oil’ powering the fifth industrial revolution, trust in that data becomes paramount. As businesses increasingly enable real-time customer transactions, they depend on data-intensive processes for critical functions, including compliance, KYC, anti-money laundering (AML) and supplier risk management.
Reports from firms such as Fenergo, a major global player in risk and compliance technology, highlight the substantial increase in regulatory fines related to AML and KYC non-compliance. These fines are in the billions of dollars globally. For example, Fenergo reports that penalties for failing to comply with AML, KYC, environmental, social and governance (ESG), sanctions and customer due diligence (CDD) regulations totalled $6.6bn in 2023.
Here, we discuss the transformative impact of generative AI (genAI) on KYC processes in financial institutions, the rising importance of KYC in a data-driven environment and the increasing costs of non-compliance. We also delve into the current state of KYC, associated challenges and risks and genAI’s potential to enhance KYC workflows.
The current state of KYC in banking
To begin, let's establish a basic understanding of KYC and related banking terminology. We frequently encounter terms such as KYC, VKYC, eKYC, ReKYC and CKYC, outlined in more detail in the following table.
Examples of global standard regulatory requirements
Term |
Full form |
Mode |
Key feature |
Purpose |
Regulatory basis |
KYC |
Know your customer |
|
Initial ID verification |
Fraud/AML prevention |
BSA/USA PATRIOT Act, 2001 (FinCEN/OCC) |
ReKYC |
|
Physical / Digital |
Periodic re-verification |
Update customer data |
FinCEN CDD Rule, 2016 (amended 2024) |
eKYC |
|
Digital |
Digital ID verification |
|
E-SIGN Act, 2000 / FinCEN Guidance, 2020 |
VKYC |
Video KYC |
Video (remote) |
Live video verification |
Remote KYC compliance |
|
CKYC |
Centralized KYC |
Centralized |
Unified KYC registry |
Cross- institutional efficiency |
The processes for validating KYC documents are jurisdiction-specific, with regulatory bodies in each country establishing distinct requirements and procedures that reflect their individual goals and approaches to combating financial crime and ensuring regulatory compliance.
Most banks now offer digital KYC processes. For new customers or account activation processes, video KYC enables identity verification through video calls with agents. During this, documents are captured and validated via a maker-checker workflow, as demonstrated in this flow diagram.


Today, most banks have implemented digital processes that allow existing customers to complete their ReKYC in real-time. This ReKYC process is initiated based on the customer's risk profile.
Banks calculate ReKYC duration using a risk-based approach, factoring in customer risk profiles, regulatory guidance and internal policies. Customer risk assessment is based on factors such as customer type, geography, transaction behaviour or product and behavior anomalies. Banks set ReKYC intervals based on risk ratings, guided by FinCEN’s RBA and internal policies.
Beyond scheduled intervals, banks also perform ReKYC when specific events occur, such as a change in customer profile, suspicious activity, regulatory update or other account level activity.
Automated systems enable banks to continuously monitor customer activity and dynamically adjust ReKYC frequency. The performance of these KYC services is evaluated against the metrics outlined below.


Key performance indicators and requirements for KYC services:
Uptime: Expected to be 99.9%.
Performance:
Support up to 100 TPS in parallel.
Response time under 50 ms.
Scalability: Designed for 1-3 years of growth.
Integration and automation:
Requires channel integration.
Needs automated compliance reporting.
Data management:
Demands real-time updates, including for customer demographics.
Prioritizes lower risk of data leakage.
Must act as a single source of truth for the bank.
Compliance and operations:
Must meet audit and compliance requirements.
Should allow for quick rollout of changes.
Aims for maximum reliability.
Unified KYC workflow:
Unified KYC seeks to provide a holistic customer view, where KYC status is accessed from a single source and demographic updates are seamlessly synchronized across all connected CRM systems.


Challenges and risks in the KYC process
While crucial for financial security, KYC processes can present many challenges, including data privacy, lengthy onboarding and high costs, while also posing risks of fraud, false positives and regulatory non-compliance.
Here's a more detailed breakdown of the challenges and risks associated with KYC processes:
Challenges and risks in the KYC process
Category |
Challenge |
Risk |
Example impact |
---|---|---|---|
Data quality |
Inaccurate/incomplete data |
Synthetic ID fraud |
Mules onboard undetected |
Cost |
High operational expenses |
Financial strain |
$25-50B industry cost |
Regulation |
Complex, evolving norms |
Fines ($6.6B in 2023) |
Reputational damage |
Customer UX |
Process friction |
Customer churn |
Weak ReKYC compliance |
Technology |
Legacy system integration |
Detection delays |
Mule networks untracked |
Scalability |
Diverse population |
Vulnerable group exploitation |
Rural mules missed |
Privacy |
Data breaches |
KYC data sold to fraudsters |
₹21,367 Cr cyber losses |
Error |
False positives/negatives |
Missed mules |
Fraud team overload |
Traditional machine learning vs. generative AI in KYC: A comparative analysis
Traditional machine learning (ML) identifies “what is” (e.g., risky or not), while genAI creates “what could be” (e.g., realistic synthetic data), shifting from analysis to synthesis.
Traditional ML |
Generative AI |
Relies on discriminative models (e.g., logistic regression, random forests, CNNs) to classify or predict based on labeled data. |
Uses generative models (e.g., GANs, VAEs, transformers) to create new data or simulate realistic scenarios from learned distributions. |
In KYC, analyzes historical customer data (e.g., IDs, transaction patterns) to detect patterns or flag anomalies. |
Generates synthetic customer profiles, documents, or behavioral data to augment training sets or test systems. |
Heavily reliant on large, labeled datasets (e.g., verified customer records, fraud labels). Privacy concerns arise as it requires real customer data, risking breaches or regulatory violations (e.g., RBI guidelines, GDPR). |
Reduces dependency on real data by generating synthetic alternatives that mimic real distributions. Mitigates privacy risks since synthetic data can replace sensitive PII (Personally Identifiable Information) for training. |
Easier to implement with mature tools (e.g., Scikit-learn, XGBoost) and clear workflows (train, test, deploy). |
Harder to implement — requires advanced frameworks (e.g., PyTorch, Hugging Face), significant compute power (GPUs) and expertise in tuning generative models. |
High accuracy in well-defined tasks (e.g., spotting known fraud patterns) when trained on quality data. |
Accuracy depends on the quality of generated data — may introduce noise if not tuned well (e.g., unrealistic synthetic IDs). |
Several companies are now focusing on applying genAI and ML to the KYC space:
Company |
Technology and approach |
Business problem addressed |
Fenergo |
AI, NLP, ML for automated onboarding, high-risk client identification. |
Simplifying KYC/AML compliance and client lifecycle management. |
Trulioo |
AI and ML for electronic KYC (eKYC), risk profiling. |
Automating customer due diligence and global KYC compliance. |
Moody’s |
AI, ML, genAI for automated risk assessment, intelligent screening. |
Reducing manual KYC efforts and enhancing due diligence. |
H2O.ai |
AI and ML for data integration, predictive modeling, anomaly detection. |
Enhancing KYC compliance by detecting suspicious behaviors and building customer profiles. |
KFin Technologies |
AI-driven, blockchain-backed KYC with DigiLocker, eAadhaar integration. |
Streamlining KYC compliance and onboarding efficiency. |
Leveraging genAI for KYC
GenAI solutions are enhancing KYC workflows with a combination of automation, adaptability and efficiency. These solutions equip financial institutions with advanced tools for identity verification, risk profiling, case management, fraud detection and transaction monitoring — all crucial for navigating complex AML compliance requirements.


Stage |
Activities |
How genAI can be useful |
Customer identification |
Collect basic customer information such as name, address, DoB, contact details and OVD (Official valid documents). |
LLMs, such as GPT or open-source models like LLaMA, can be used to create chatbots and virtual assistants that help customers with identity submission.
|
Documents verification and due diligence |
Cross-check documents against issuing authority databases. Gather additional details: occupation, income source, purpose of account, PEP sanction lists etc. |
Diffusion models or Generative Adversarial Network (GANs) simulate edge-case fraud scenarios — forged passports, synthetic laundering patterns, or unusual onboarding behaviors — by generating high-fidelity data that challenges existing systems. |
Account onboarding |
Finalize customer registration and activate banking services. |
LLMs trained on regulatory documents automate the interpretation of complex compliance rules, enabling real-time detection of potential non-compliance and reducing risk. These models can also generate personalized alerts and messages, using customer data to provide clear, tailored compliance information. It is advisable for institutions to use custom on-premises LLMs to safeguard sensitive data effectively, maintain complete data control and comply with regulatory requirements. |
Ongoing monitoring and reporting |
Continuously track customer activity to detect suspicious behavior or updates in risk profile. |
GenAI generates diverse customer behavior scenarios (e.g., synthetic transaction histories) to enrich risk assessment models, blending seamlessly with traditional ML classifiers. |
The future: A hybrid path
The real power lies in synergy. Traditional ML can anchor KYC with its precision, while genAI augments it with creativity. Picture this: an ML model flags risky accounts, while a GAN generates synthetic test cases to sharpen its edge. Or an LLM summarizes findings for auditors, paired with ML’s risk scores.
LLMs and multimodal genAI (text, image, voice) will power intelligent KYC agents, handling end-to-end onboarding, queries and reporting with human-like finesse. These agents will integrate with core banking systems, learning from customer interactions in real-time.
To better understand this approach, take the example of a real life scenario — identifying a ‘mule account.’ As one of the most pressing issues for banks globally, we must first define what a mule account is:
A newly opened account using stolen/synthetic identities.
An existing legitimate account co-opted by fraudsters (e.g., via social engineering).
Operated by ‘money mules’ — individuals recruited, sometimes unwittingly, to move funds for a fee.
Identifying mule accounts in a banking system is a critical task for combating money laundering, fraud and other illicit activities. Mule accounts are bank accounts used — knowingly or unknowingly — to transfer illegally obtained funds, often obscuring the trail back to criminals.
How to identify mule accounts in a banking system:
By monitoring transaction patterns.
By assessing behavioral anomalies.
By analysing the behaviour or data pattern during the account opening process.
By using network link analysis.
How can a hybrid approach improve mule account identification?
XGBoost (Extreme Gradient Boosting) is a powerful machine learning algorithm well-suited for applications like bank mule account identification. As an advanced implementation of the gradient boosting framework, XGBoost is optimized for speed, scalability and high performance. It operates by sequentially building an ensemble of decision trees, with each tree correcting the errors of previous ones by minimizing a loss function, such as log loss for classification problems.
XGBoost is well-suited for banking applications due to its ability to handle imbalanced and high-dimensional data, as well as its capacity for real-time instant flagging. It can effectively adapt to complex patterns, such as sudden spikes in dormancy, and learn to identify new mule tactics through retraining.
However, XGBoost does have limitations that can affect model precision and speed in mule account identification.
Mule accounts are rare, limiting labeled data for model training.
XGBoost based models need labeled data, but many mules lack prior flags, especially new synthetic accounts.
It can be ineffective with unstructured behaviour sequencing data, such as typing speed or mouse path.
Fraudster tactics are constantly evolving, so the XGBoost model requires retraining.
How does a hybrid strategy optimize outcomes here?
GANs (Generative Adversarial Networks): A generator creates realistic mule samples and a discriminator refines them against real data. It can further integrate with XGBoost and augment the training data.
Diffusion's AI model can help in generating synthetic behavioral sequences (e.g., hesitant logins, rushed applications) to extract features.
Graph GANs can generate synthetic mule networks to train network detection.
Self-supervised LLM is fine-tuned on fraud alerts, to detect emerging patterns and help XGBoost to recognize novel fraud patterns.
The reason for this combination is LLM’s ability to process unstructured data and extract rich features with XGBoost’s efficiency, interpretability and performance on structured data. This subsides the limitation of XGBoost and makes the approach more efficient.
GenAI will shift KYC from reactive to proactive by simulating emerging fraud tactics — deep fake IDs, synthetic identities or AI-generated laundering schemes — before they hit the real world.
As KYC evolves, banks must weigh the trade-offs in ML’s stability versus genAI’s potential. Those that succeed will do so by mastering both, turning compliance from a burden into a strategic advantage.
Challenges in building a hybrid KYC solution
Limited data: Rare mule accounts create scarce labeled data, hindering ML model training. This scarcity hinders the development of accurate models, as traditional ML relies heavily on labeled datasets. Generating synthetic data to augment training sets is a potential solution, but ensuring the quality and realism of this data is technically challenging.
Evolving fraud: Constantly changing fraud tactics require frequent model retraining and proactive genAI simulations. A hybrid KYC solution must incorporate mechanisms for continuous learning to keep pace with evolving fraud techniques. This requires ongoing investment in model retraining, monitoring, and updating to ensure the system remains effective.
Unstructured data: Processing behavioral data (e.g., typing speed) demands complex genAI pipelines. Traditional ML models like XGBoost struggle with unstructured behavioral data, such as typing speed or mouse paths, which are increasingly relevant for fraud detection.
Integration complexity: Combining ML and genAI with banking systems requires seamless interoperability.
Legacy systems often have rigid architectures, making it difficult to incorporate advanced AI technologies. Hybrid solutions require robust data pipelines and interoperability standards to ensure that traditional ML and genAI components work cohesively within existing workflows.
- Regulatory compliance: Adhering to diverse global and local regulations adds complexity.
Financial institutions operate in a complex regulatory landscape, particularly when serving global markets. Hybrid KYC solutions must ensure that both traditional ML and genAI components adhere to these regulations without compromising data security.
KYC processes involve handling sensitive customer data, such as personal identification and financial records. Ensuring compliance with stringent data privacy regulations, such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), is critical when integrating advanced AI technologies.
- High implementation costs: Developing a hybrid KYC solution involves substantial upfront costs for technology infrastructure, data management, model development and ongoing maintenance. Integrating genAI with existing systems, such as core banking systems (CBS) and customer data platforms (CDP), demands skilled personnel and robust IT infrastructure. The challenge lies in ensuring that the efficiency gains from automation justify these costs while maintaining or reducing overall expenses.
Building a hybrid KYC solution that synergizes traditional ML and genAI is a complex but transformative venture. Thoughtworks is uniquely positioned to guide financial institutions through this journey, leveraging our expertise in AI, data engineering and regulatory compliance to deliver innovative, value-driven solutions.
Get in touch to find out how our AI solutions can streamline your KYC processes, improve fraud detection and transform compliance into a strategic asset.