Writing

Blog

9 posts on building with AI.

All posts

llmopsinferenceself-hosting

Serving open models with vLLM

Hands-on guide to self-hosting open-weights LLMs with vLLM: install, serve an OpenAI-compatible API, quantize, benchmark, and manage VRAM.

6 min read
mcpagentstools

Building an MCP server and wiring it to an agent

Build a typed MCP server in Python, run it over stdio and HTTP, wire it into Cursor, and drive it from a custom client and an OpenAI-compatible model.

8 min read
ragagentsretrieval

Agentic RAG: retrieval that decides for itself

Build agentic RAG in Python: hybrid retrieval as a tool, a bounded agent loop, sufficiency and grounding checks, against any OpenAI-compatible endpoint.

8 min read
fine-tuningtrainingllmops

Fine-tuning an open model with QLoRA

A hands-on QLoRA fine-tuning walkthrough: dataset prep, 4-bit training with peft and trl, merging, and vLLM serving behind an OpenAI-compatible API.

7 min read
structured-outputtool-usejson

Structured outputs you can trust

A layered, runnable approach to reliable structured LLM outputs: pydantic schemas, json_schema enforcement, bounded validate-and-retry, and constrained decoding.

7 min read
computer-visionresearch

Hybrid CNN + Vision Transformer for deepfake detection

Why combining CNNs, InceptionNeXt, and a Vision Transformer beat either alone for video deepfake detection — and why cross-dataset generalization is the metric that matters.

3 min read