Early March 2023 downtime report

Ever since rankett.net was first installed, a few operations would intermittently fail with some sort of EAI_AGAIN error (mainly operations to do with uploading files to object storage). At the time, I was not aware of what was causing these errors.

However, a few weeks ago, while performing routine upgrades to server software, the page became completely unresponsive, even after a clean reboot. I determined that this was due to NGINX encountering an error during startup. By checking journalctl it was found that this was an error to do with a failure to resolve a hostname, that of the object storage provider. At this point I learned that EAI_AGAIN is an error that has to do with a failed DNS query.

After debugging by manually making DNS requests using dig it was discovered that the server was not successfully resolving any DNS requests. This was due to the fact that the virtual machine had been configured by Contabo to use Contabo’s own DNS servers, which had become overloaded and unresponsive.

To alleviate this, I added Cloudflare’s DNS server as a fallback to the operating system’s DNS configuration. This fixed the problem, and no EAI_AGAIN errors have been thrown since this downtime.