Deep-dives into AI, machine learning systems, production engineering and the architecture of intelligent machines.
3 articles published
An exhaustive systems-level analysis of KV cache memory pathologies — internal fragmentation, reservation waste, and external fragmentation — and the OS-inspired virtual memory paradigms that drive 3–4× concurrency gains on consumer GPUs.
A complete engineering guide to running three heterogeneous LLMs sequentially on a 12 GB RTX 4070 — exploiting NVMe bandwidth and zero-retention VRAM eviction to build a self-correcting cognitive trinity with Ollama and CrewAI.