Writing

Blog

9 posts on building with AI.

llmopsinferenceself-hosting

Serving open models with vLLM

Hands-on guide to self-hosting open-weights LLMs with vLLM: install, serve an OpenAI-compatible API, quantize, benchmark, and manage VRAM.

Jun 20, 2026 6 min read

mcpagentstools

Building an MCP server and wiring it to an agent

Build a typed MCP server in Python, run it over stdio and HTTP, wire it into Cursor, and drive it from a custom client and an OpenAI-compatible model.

Jun 19, 2026 8 min read

asrnlpresearch

From 34% to 28% WER: lessons from code-switching ASR

What I learned building a Whisper + LLaMA speech recognizer for Malay–English code-switching — where the WER actually came from, and what didn't help.

Jun 18, 2026 4 min read

edge-aideploymentsdv

Shipping a model to the edge: PyTorch → ONNX → TensorRT

A hands-on, current (2026) path for taking a PyTorch CV model to ONNX and a TensorRT engine: export, parity, FP16/INT8 build, and latency gating.

Jun 17, 2026 8 min read

ragagentsretrieval

Agentic RAG: retrieval that decides for itself

Build agentic RAG in Python: hybrid retrieval as a tool, a bounded agent loop, sufficiency and grounding checks, against any OpenAI-compatible endpoint.

Jun 15, 2026 8 min read

agentsragllmops

Lessons from shipping a multi-agent e-commerce assistant

Building an end-to-end multi-agent system — support, recommendation, and pricing — with LangChain, AutoGen, FastAPI, Kafka, and Qdrant. What held up and what I'd change.

Jun 12, 2026 3 min read

fine-tuningtrainingllmops

Fine-tuning an open model with QLoRA

A hands-on QLoRA fine-tuning walkthrough: dataset prep, 4-bit training with peft and trl, merging, and vLLM serving behind an OpenAI-compatible API.

Jun 11, 2026 7 min read

structured-outputtool-usejson

Structured outputs you can trust

A layered, runnable approach to reliable structured LLM outputs: pydantic schemas, json_schema enforcement, bounded validate-and-retry, and constrained decoding.

Jun 9, 2026 7 min read

computer-visionresearch

Hybrid CNN + Vision Transformer for deepfake detection

Why combining CNNs, InceptionNeXt, and a Vision Transformer beat either alone for video deepfake detection — and why cross-dataset generalization is the metric that matters.

Jun 6, 2026 3 min read

Blog

All posts