llama.cpp
The industry-standard C++ inference engine for high-performance, local LLM execution across all hardware architectures.
Has API
PricingFree
Free
Quantized LLM Inference
Model Fine-tuning (LoRA)
Text Embeddings
Discover the best AI tools to help you model fine-tuning (lora).