Bac Nguyen

Bac NguyenBac Nguyen — AI Engineer building production AI for software-defined vehicles: computer vision, edge AI, agents, and ML systems. Projects, research, and lessons from the build.https://bacnguyenne.github.io/Serving open models with vLLMhttps://bacnguyenne.github.io/blog/serving-open-models-with-vllm/https://bacnguyenne.github.io/blog/serving-open-models-with-vllm/Hands-on guide to self-hosting open-weights LLMs with vLLM: install, serve an OpenAI-compatible API, quantize, benchmark, and manage VRAM.Sat, 20 Jun 2026 00:00:00 GMTllmopsinferenceself-hostingBuilding an MCP server and wiring it to an agenthttps://bacnguyenne.github.io/blog/build-an-mcp-server/https://bacnguyenne.github.io/blog/build-an-mcp-server/Build a typed MCP server in Python, run it over stdio and HTTP, wire it into Cursor, and drive it from a custom client and an OpenAI-compatible model.Fri, 19 Jun 2026 00:00:00 GMTmcpagentstoolsFrom 34% to 28% WER: lessons from code-switching ASRhttps://bacnguyenne.github.io/blog/lessons-from-code-switching-asr/https://bacnguyenne.github.io/blog/lessons-from-code-switching-asr/What I learned building a Whisper + LLaMA speech recognizer for Malay–English code-switching — where the WER actually came from, and what didn't help.Thu, 18 Jun 2026 00:00:00 GMTasrnlpresearchShipping a model to the edge: PyTorch → ONNX → TensorRThttps://bacnguyenne.github.io/blog/deploy-a-model-to-the-edge/https://bacnguyenne.github.io/blog/deploy-a-model-to-the-edge/A hands-on, current (2026) path for taking a PyTorch CV model to ONNX and a TensorRT engine: export, parity, FP16/INT8 build, and latency gating.Wed, 17 Jun 2026 00:00:00 GMTedge-aideploymentsdvAgentic RAG: retrieval that decides for itselfhttps://bacnguyenne.github.io/blog/agentic-rag/https://bacnguyenne.github.io/blog/agentic-rag/Build agentic RAG in Python: hybrid retrieval as a tool, a bounded agent loop, sufficiency and grounding checks, against any OpenAI-compatible endpoint.Mon, 15 Jun 2026 00:00:00 GMTragagentsretrievalLessons from shipping a multi-agent e-commerce assistanthttps://bacnguyenne.github.io/blog/lessons-from-a-multi-agent-system/https://bacnguyenne.github.io/blog/lessons-from-a-multi-agent-system/Building an end-to-end multi-agent system — support, recommendation, and pricing — with LangChain, AutoGen, FastAPI, Kafka, and Qdrant. What held up and what I'd change.Fri, 12 Jun 2026 00:00:00 GMTagentsragllmopsFine-tuning an open model with QLoRAhttps://bacnguyenne.github.io/blog/fine-tuning-with-qlora/https://bacnguyenne.github.io/blog/fine-tuning-with-qlora/A hands-on QLoRA fine-tuning walkthrough: dataset prep, 4-bit training with peft and trl, merging, and vLLM serving behind an OpenAI-compatible API.Thu, 11 Jun 2026 00:00:00 GMTfine-tuningtrainingllmopsStructured outputs you can trusthttps://bacnguyenne.github.io/blog/structured-outputs-you-can-trust/https://bacnguyenne.github.io/blog/structured-outputs-you-can-trust/A layered, runnable approach to reliable structured LLM outputs: pydantic schemas, json_schema enforcement, bounded validate-and-retry, and constrained decoding.Tue, 09 Jun 2026 00:00:00 GMTstructured-outputtool-usejsonHybrid CNN + Vision Transformer for deepfake detectionhttps://bacnguyenne.github.io/blog/hybrid-cnn-vit-deepfake-detection/https://bacnguyenne.github.io/blog/hybrid-cnn-vit-deepfake-detection/Why combining CNNs, InceptionNeXt, and a Vision Transformer beat either alone for video deepfake detection — and why cross-dataset generalization is the metric that matters.Sat, 06 Jun 2026 00:00:00 GMTcomputer-visionresearch