llmopsinferenceself-hosting
Serving open models with vLLM
Hands-on guide to self-hosting open-weights LLMs with vLLM: install, serve an OpenAI-compatible API, quantize, benchmark, and manage VRAM.
3 posts
Hands-on guide to self-hosting open-weights LLMs with vLLM: install, serve an OpenAI-compatible API, quantize, benchmark, and manage VRAM.
Building an end-to-end multi-agent system — support, recommendation, and pricing — with LangChain, AutoGen, FastAPI, Kafka, and Qdrant. What held up and what I'd change.
A hands-on QLoRA fine-tuning walkthrough: dataset prep, 4-bit training with peft and trl, merging, and vLLM serving behind an OpenAI-compatible API.