The DNS Mystery: Five Factors and a Ruby Gem
“Connection reset by peer.” If you’ve run applications in Kubernetes long enough, you’ve probably seen this error. Usually it’s a service that went away, a network hiccup, something transient. You retry, it works, you move on. But what if it keeps happening? What if it only happens with DNS lookups, and only sometimes, and only in production? That’s where I found myself a few weeks ago. The Event Our Ruby application started throwing intermittent Errno::ECONNRESET errors. Not on HTTP requests to external APIs — on DNS lookups. The stack trace pointed to getaddrinfo, the standard libc function for resolving hostnames. ...