The DNS Mystery: Five Factors and a Ruby Gem

“Connection reset by peer.” If you’ve run applications in Kubernetes long enough, you’ve probably seen this error. Usually it’s a service that went away, a network hiccup, something transient. You retry, it works, you move on. But what if it keeps happening? What if it only happens with DNS lookups, and only sometimes, and only in production? That’s where I found myself a few weeks ago. The Event Our Ruby application started throwing intermittent Errno::ECONNRESET errors. Not on HTTP requests to external APIs — on DNS lookups. The stack trace pointed to getaddrinfo, the standard libc function for resolving hostnames. ...

March 21, 2026 · awbuana

The Resource Request You Think Is Saving Money Is Actually Breaking Your App

I thought I was being clever. When we migrated our services to Google Kubernetes Engine with auto scale profile optimized, I looked at our resource specs and saw an opportunity. Our pods were requesting 100m CPU but had limits set to 1000m. Ten times headroom! Surely we could tighten that up and save some money. So I did what seemed logical: I kept the limits high (just in case of traffic spikes) but dropped the requests even lower. 50m here, 25m there. The cluster was happy. Our costs went down. I patted myself on the back for being such a savvy engineer. ...

March 19, 2026 · awbuana

The DNS Query That Wouldn't Stop: Debugging GKE's Hidden Ndots Problem

“Why is our DNS resolution so slow?” I remember staring at that Slack message, coffee going cold, wondering if I’d missed something obvious. We’d been running on Google Kubernetes Engine for months without issues. Then suddenly, DNS lookups were timing out. Services couldn’t reach each other. External APIs were failing. The Discovery A teammate noticed intermittent 5xx errors from one of our microservices. “Network issues,” they said. “Probably transient.” I wish it had been transient. ...

March 19, 2025 · awbuana