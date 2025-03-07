Introduction

Generative AI (GenAI) applications are advancing rapidly across various domains of human life. At the core of GenAI systems are large language models (LLMs), which serve as conversational AI, chatbots, language models, and other language-driven applications. LLMs are deep neural networks (DNNs) designed to process input and generate human-like responses. These models consist of multiple hidden layers, each containing thousands of neurons, positioned between the input and output layers. They are trained on massive datasets with billions of parameters. As stochastic models, LLMs excel in contextual understanding, coherent and fluent text generation, intelligent question answering, and language translation. Their development has revolutionized natural language processing (NLP) and its wide-ranging applications.

New models are being developed almost daily by organizations, research institutes, and corporations. Leaderboards are inundated with comparative analyses of these models' performance across diverse benchmarks. According to a recent survey by Boston Consulting Group, corporations investing in Generative AI anticipate a threefold return on investment (ROI) by 2027. Additionally, a report by the International Data Corporation (IDC) projects that global spending on AI will reach approximately $632 billion by 2028, with the integration of Generative AI products driving a compound annual growth rate (CAGR) of 29%.

The reality paints a different picture. The performance of Generative AI technologies often falls short of their promises, with ROI failing to meet the hype. Many GenAI applications and products struggle to transition from proof-of-concept to full-scale production. Daron Acemoglu, a professor at MIT, predicts that AI may impact less than 5% of all tasks and contribute only a 0.9% increase to the U.S. GDP over the next decade. Such contrasting perspectives create uncertainty around the adoption of GenAI technologies in everyday business operations.

What explains these paradoxical viewpoints?

The primary reason is that GenAI applications often underperform, produce unreliable results or behave unpredictably. Additionally, the lack of quantitative evaluation methods fails to build trust among stakeholders.