Technology Radar

GGML

Published : Sep 27, 2023

NOT ON THE CURRENT EDITION

This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more

Sep 2023

Assess

GGML es una librería de aprendizaje automático en C que permite la inferencia de CPU. Esta librería define un formato binario para distribuir modelos grandes de lenguaje (LLMs, por sus siglas en inglés). Para hacerlo, usa cuantificación digital, una técnica que permite que los LLMs ejecuten inferencia de CPU efectiva en hardware de consumo. GGML soporta varias estrategias de cuantificación digital (e.g., cuantificación de 4 bits, 5 bits, y 8 bits), cada una de las cuales ofrece diferentes relaciones coste-beneficio entre eficiencia y rendimiento. Una manera rápida de probar, ejecutar y construir aplicaciones con estos modelos de cuantificación, es un binding de Python llamado C Transformers. Se trata de un wrapper de Python sobre GGML que nos abstrae del repetitivo código necesario para ejecutar inferencia al proveer una API de alto nivel. Hemos usado estas librerías para construir pruebas de concepto y experimentos. Si estás valorando usar LLMs auto alojados, evalúe cuidadosamente estas librerías para su organización.

Download the PDF

English | Português

Sign up for the Technology Radar newsletter

Subscribe now

Industrias

Publicaciones Digitales y Herramientas

Todos los Insights

GGML

Download the PDF

Sign up for the Technology Radar newsletter

Visit our archive to read the previous volumes