Floating Astronaut
writing · engineering · ideas

Articles

Deep-dives into AI, machine learning systems, production engineering and the architecture of intelligent machines.

All AI / ML Systems MLOps LLMs

3 articles published

Systems · AI / ML Defeating KV Cache Memory Fragmentation: PagedAttention
Systems · AI / ML
Mar 2026 25 min read

Defeating KV Cache Memory Fragmentation: Implementing PagedAttention for High-Throughput Local Inference

An exhaustive systems-level analysis of KV cache memory pathologies — internal fragmentation, reservation waste, and external fragmentation — and the OS-inspired virtual memory paradigms that drive 3–4× concurrency gains on consumer GPUs.

AI / ML · Systems Architecting a Local Multi-Agent AI Council
AI / ML · Systems
Mar 2026 15 min read

Architecting a Local Multi-Agent AI Council: Overcoming Consumer Hardware Constraints through Sequential Orchestration

A complete engineering guide to running three heterogeneous LLMs sequentially on a 12 GB RTX 4070 — exploiting NVMe bandwidth and zero-retention VRAM eviction to build a self-correcting cognitive trinity with Ollama and CrewAI.