Enable javascript in your browser for better experience. Need to know to enable it? Go here.
Published : Sep 27, 2023
This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more
Sep 2023
Assess ?

GGML is a C library for machine learning that allows for CPU inferencing. It defines a binary format for distributing large language models (LLMs). To do that it uses quantization, a technique that allows LLMs to run on consumer hardware with effective CPU inferencing. GGML supports a number of different quantization strategies (e.g., 4-bit, 5-bit, and 8-bit quantization), each of which offers different trade-offs between efficiency and performance. A quick way to test, run and build apps with these quantized models is a Python binding called C Transformers. This is a Python wrapper on top of GGML that takes away the boilerplate code for inferencing by providing a high level API. We've leveraged these libraries to build proof of concepts and experiments. If you're considering self-hosted LLMs, carefully assess these community-supported libraries for your organization.

Download the PDF



English | Español | Português | 中文

Sign up for the Technology Radar newsletter


Subscribe now

Visit our archive to read previous volumes