Serving / Inferencing

📄️ llama.cpp

From llama.cpp: The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud.

📄️ vLLM

vLLM is a fast and easy-to-use library for LLM inference and serving.

Hugging Face's transformers is an open-source library designed to facilitate the use of natural language processing (NLP) models. You can directly run Rubra's LLMs using the transformers library with the support of Rubra's inferencing tool package, rubra_tools. This guide will walk you through the steps to seamlessly integrate and utilize Rubra's models with the transformers library.

📄️ llama.cpp

📄️ vLLM

📄️ HuggingFace Transformers