Thinking out loud on hard problems
- 97% Fewer Tokens, Zero Model Changes: RTK at the Harness Boundary
The harness made the agent reliable — and expensive. RTK cuts the cost without touching the model, the prompt, or the pipeline logic. And it covers a lot more than git.
- Your Agent Doesn't Need a Better Model. It Needs a Harness.
Most agent failures aren't capability problems — they're enforcement problems. A controlled six-rung experiment shows what actually moves the needle.
- Three Layers Between Your Code and Your Database
Drivers, ORMs, and wire protocol adapters do three different jobs. Conflating them is the line between a developer who uses databases and an engineer who designs systems around them.
- Voicebox.sh — The Local-First Voice Stack That Just Made MCP Voice Agents Practical
An engineer's tour of voicebox.sh — what it actually is, how the local API and MCP server work, and what developers are already building on top of it.
- The AI Stack Nobody Drew You a Map For: Agents, RAG, MCP, Skills, and the LLM Beneath Them All
Everyone says they're building AI. Almost nobody agrees on what that means. Here's the five-layer stack that actually explains what enterprise AI is made of — and how to stop picking the wrong piece for the wrong problem.
- AWS Just Made Its Platform Agent-Native: What the AWS MCP Server GA Actually Means
The AWS MCP Server is generally available — and it's not just a developer convenience. It's AWS making a bet that agents, not humans, will be the primary interface to cloud infrastructure.
- AI Concepts Glossary: A Principal Engineer's Reference
A living reference for AI terminology organized into five conceptual domains — model access & licensing, training & customization, architecture & deployment, evaluation & use, and representations, search & agents.
- The Infrastructure That Enforces Itself: Compliance-Grade Multi-Tenant SaaS on Amazon EKS
A deep dive into building compliance-grade multi-tenant SaaS on Amazon EKS — using Flux GitOps, Terraform Enterprise workspace versioning, Vault 2.0 workload identity, and Argo Workflows for fully automated, auditable tenant onboarding.
- Deploying Production-Grade LLM Inference on AWS EKS — A Hands-On Deep Dive
An architectural walkthrough of the GenAI on EKS workshop — vLLM, Ray Serve, Karpenter, DCGM + AMP observability, and AWS Strands Agents — with the design decisions behind each layer.
- PageIndex and Vectorless RAG — A Structural Alternative for Professional Documents
Reasoning-based retrieval as an alternative to vector similarity search for structured professional documents — how PageIndex achieves 98.7% on FinanceBench, applied across healthcare, wealth management, banking, and travel with full domain use cases and implementation pathway.
21 posts