When it Doesn’t Work – Sometimes it’s Painful

As you well know, the theme of this blog is: “It is works out of the box – what fun is that?” Ol’ Sopwith loves it when things don’t work! That means you have to fix it. Fun!

Today was one of those days when “fixing it” was not fun.

I run a backup server that takes care of all my backup chores. Had it running for years. On all my computers, I run an rsync backup script that backs up everything to this server. The server has a pair of hot-swap SATA drives that get rotated to a fire safe on a regular basis. This setup has served me well.

Today when I logged on to the backup server, Ubuntu’s updater advised (again) that an Ubuntu OS upgrade was available from 14.04 to 16.04. I have avoided this task long enough. Now that I have experience with 16.04 – I trust it.

I performed the OS upgrade. This tool several hours as warned by the script. When the upgrade completed I rebooted the server and it booted to 16.04. One of the first things I noticed was the Cinnamon window manager was not available at login. That was weird. I logged in and reinstalled Cinnamon and all was well. Excellent.

I soon realized however, that my systems that are not on the same network as the backup server could no longer connect to it via ssh. This means my backup scripts would not work on remote machines. The hosts on the same network could ssh just fine.

For the next three hours, I tried everything I knew to get this to work. Everything. Googling did not help. I finally resorted to WireShark to see what was going on at the packet level. I soon discovered that remote hosts were sending the ceremonial SYN to the server when attempting an ssh connection, but the server did not respond with an SYN/ACK. This ruled out network issues and meant there was something broken on the backup server.

Remember the magic question when something stops working – “What changed?”

An OS upgrade. In all of the efforts to fix this problem, I must have rebooted the backup server more than a dozen times. Nothing worked. Local hosts worked fine – remote hosts didn’t. Strange. Sopwith has to admit, this one inflicted┬ámaximum frustration.

Finally, in desperation, I decided to do a ‘hard’ reboot of the backup server. This means shutting down the server to power off status. When this is done, it forces hard reboots of all of the hardware. When I turned the server back on, it booted and everything worked!

Whoah. I thought this trick only worked on systems built in the 1980’s. Apparently it still works today. I suspect the hard reset of the NIC in the server solved this problem, as hard as it was to discover.

The lesson here is two-fold. First, never – ever, give up on a problem. And second, when your frustration level is at a record high, go back to trying basic things that you think won’t help.



Leave a Reply

Your email address will not be published. Required fields are marked *