← All tags

#inference

1 post

Posts tagged inference

llmopsinferenceself-hosting

Serving open models with vLLM

Hands-on guide to self-hosting open-weights LLMs with vLLM: install, serve an OpenAI-compatible API, quantize, benchmark, and manage VRAM.

6 min read