writing · engineering · ideas

Articles

Deep-dives into AI, machine learning systems, production engineering and the architecture of intelligent machines.

All AI / ML Systems MLOps LLMs

4 articles published

Systems · AI / ML

Defeating KV Cache Memory Fragmentation: PagedAttention

Systems · AI / ML

Mar 2026 25 min read

Defeating KV Cache Memory Fragmentation: Implementing PagedAttention for High-Throughput Local Inference

An exhaustive systems-level analysis of KV cache memory pathologies — internal fragmentation, reservation waste, and external fragmentation — and the OS-inspired virtual memory paradigms that drive 3–4× concurrency gains on consumer GPUs.

AI / ML · Systems

Architecting a Local Multi-Agent AI Council

AI / ML · Systems

Mar 2026 15 min read

Architecting a Local Multi-Agent AI Council: Overcoming Consumer Hardware Constraints through Sequential Orchestration

A complete engineering guide to running three heterogeneous LLMs sequentially on a 12 GB RTX 4070 — exploiting NVMe bandwidth and zero-retention VRAM eviction to build a self-correcting cognitive trinity with Ollama and CrewAI.

AI / ML · Systems

The Agentic Microkernel: Architecting the LLM-as-an-OS Runtime

AI / ML · Systems

Apr 2026 20 min read

The Agentic Microkernel: Architecting the LLM-as-an-OS Runtime

How treating the local LLM as a CPU — with virtual context paging, WebAssembly-sandboxed syscalls, and hardware-style interrupt handling — resolves the deepest bottlenecks of autonomous AI computing.