Skip to main content
🦙

Llama.cpp

Run Meta's Llama and other LLMs locally with CPU inference

Code & Development
Llama.cpp logo

Llama.cpp

Run Meta's Llama and other LLMs locally with CPU inference

llama.cpp is an open-source library that enables running large language models like Llama, Mistral, and others efficiently on consumer CPUs without requiring a GPU. It uses quantization techniques to dramatically reduce memory requirements. Privacy-focused users, developers, and researchers use llama.cpp to run powerful AI models entirely on their own hardware.

Key Features

  • CPU-optimized LLM inference
  • GGUF model format support
  • GPU offloading support
  • OpenAI-compatible API server
  • Quantization for memory efficiency
#local LLM#open source#privacy#CPU inference#Llama

Get Started

Visit Llama.cpp
🟢
Free
Completely free to use

Quick Info

Category
Code & Development
Pricing
Free

More Code & Development Tools