#llmops

3 posts

llmopsinferenceself-hosting

Serving open models with vLLM

Hands-on guide to self-hosting open-weights LLMs with vLLM: install, serve an OpenAI-compatible API, quantize, benchmark, and manage VRAM.

Jun 20, 2026 6 min read

agentsragllmops

Lessons from shipping a multi-agent e-commerce assistant

Building an end-to-end multi-agent system — support, recommendation, and pricing — with LangChain, AutoGen, FastAPI, Kafka, and Qdrant. What held up and what I'd change.

Jun 12, 2026 3 min read

fine-tuningtrainingllmops

Fine-tuning an open model with QLoRA

A hands-on QLoRA fine-tuning walkthrough: dataset prep, 4-bit training with peft and trl, merging, and vLLM serving behind an OpenAI-compatible API.

Jun 11, 2026 7 min read

#llmops

Posts tagged llmops

Serving open models with vLLM

Lessons from shipping a multi-agent e-commerce assistant

Fine-tuning an open model with QLoRA