<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Bac Nguyen</title><description>Bac Nguyen — AI Engineer building production AI for software-defined vehicles: computer vision, edge AI, agents, and ML systems. Projects, research, and lessons from the build.</description><link>https://bacnguyenne.github.io/</link><item><title>Serving open models with vLLM</title><link>https://bacnguyenne.github.io/blog/serving-open-models-with-vllm/</link><guid isPermaLink="true">https://bacnguyenne.github.io/blog/serving-open-models-with-vllm/</guid><description>Hands-on guide to self-hosting open-weights LLMs with vLLM: install, serve an OpenAI-compatible API, quantize, benchmark, and manage VRAM.</description><pubDate>Sat, 20 Jun 2026 00:00:00 GMT</pubDate><category>llmops</category><category>inference</category><category>self-hosting</category></item><item><title>Building an MCP server and wiring it to an agent</title><link>https://bacnguyenne.github.io/blog/build-an-mcp-server/</link><guid isPermaLink="true">https://bacnguyenne.github.io/blog/build-an-mcp-server/</guid><description>Build a typed MCP server in Python, run it over stdio and HTTP, wire it into Cursor, and drive it from a custom client and an OpenAI-compatible model.</description><pubDate>Fri, 19 Jun 2026 00:00:00 GMT</pubDate><category>mcp</category><category>agents</category><category>tools</category></item><item><title>From 34% to 28% WER: lessons from code-switching ASR</title><link>https://bacnguyenne.github.io/blog/lessons-from-code-switching-asr/</link><guid isPermaLink="true">https://bacnguyenne.github.io/blog/lessons-from-code-switching-asr/</guid><description>What I learned building a Whisper + LLaMA speech recognizer for Malay–English code-switching — where the WER actually came from, and what didn&apos;t help.</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>asr</category><category>nlp</category><category>research</category></item><item><title>Shipping a model to the edge: PyTorch → ONNX → TensorRT</title><link>https://bacnguyenne.github.io/blog/deploy-a-model-to-the-edge/</link><guid isPermaLink="true">https://bacnguyenne.github.io/blog/deploy-a-model-to-the-edge/</guid><description>A hands-on, current (2026) path for taking a PyTorch CV model to ONNX and a TensorRT engine: export, parity, FP16/INT8 build, and latency gating.</description><pubDate>Wed, 17 Jun 2026 00:00:00 GMT</pubDate><category>edge-ai</category><category>deployment</category><category>sdv</category></item><item><title>Agentic RAG: retrieval that decides for itself</title><link>https://bacnguyenne.github.io/blog/agentic-rag/</link><guid isPermaLink="true">https://bacnguyenne.github.io/blog/agentic-rag/</guid><description>Build agentic RAG in Python: hybrid retrieval as a tool, a bounded agent loop, sufficiency and grounding checks, against any OpenAI-compatible endpoint.</description><pubDate>Mon, 15 Jun 2026 00:00:00 GMT</pubDate><category>rag</category><category>agents</category><category>retrieval</category></item><item><title>Lessons from shipping a multi-agent e-commerce assistant</title><link>https://bacnguyenne.github.io/blog/lessons-from-a-multi-agent-system/</link><guid isPermaLink="true">https://bacnguyenne.github.io/blog/lessons-from-a-multi-agent-system/</guid><description>Building an end-to-end multi-agent system — support, recommendation, and pricing — with LangChain, AutoGen, FastAPI, Kafka, and Qdrant. What held up and what I&apos;d change.</description><pubDate>Fri, 12 Jun 2026 00:00:00 GMT</pubDate><category>agents</category><category>rag</category><category>llmops</category></item><item><title>Fine-tuning an open model with QLoRA</title><link>https://bacnguyenne.github.io/blog/fine-tuning-with-qlora/</link><guid isPermaLink="true">https://bacnguyenne.github.io/blog/fine-tuning-with-qlora/</guid><description>A hands-on QLoRA fine-tuning walkthrough: dataset prep, 4-bit training with peft and trl, merging, and vLLM serving behind an OpenAI-compatible API.</description><pubDate>Thu, 11 Jun 2026 00:00:00 GMT</pubDate><category>fine-tuning</category><category>training</category><category>llmops</category></item><item><title>Structured outputs you can trust</title><link>https://bacnguyenne.github.io/blog/structured-outputs-you-can-trust/</link><guid isPermaLink="true">https://bacnguyenne.github.io/blog/structured-outputs-you-can-trust/</guid><description>A layered, runnable approach to reliable structured LLM outputs: pydantic schemas, json_schema enforcement, bounded validate-and-retry, and constrained decoding.</description><pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate><category>structured-output</category><category>tool-use</category><category>json</category></item><item><title>Hybrid CNN + Vision Transformer for deepfake detection</title><link>https://bacnguyenne.github.io/blog/hybrid-cnn-vit-deepfake-detection/</link><guid isPermaLink="true">https://bacnguyenne.github.io/blog/hybrid-cnn-vit-deepfake-detection/</guid><description>Why combining CNNs, InceptionNeXt, and a Vision Transformer beat either alone for video deepfake detection — and why cross-dataset generalization is the metric that matters.</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><category>computer-vision</category><category>research</category></item></channel></rss>