The Case of the Missing VM

Last year I started a new job as a Database Engineer. In this job, I get to see a lot of support tickets for help to restore customer’s databases. It amazes me to know how many people do not have backups of their database. Then, late last year I realized that my team does not have backups of our databases. Follow me on this one. I’m not saying that my company does not do backups of their company wide applications; I know they do. I’m saying that the databases my team uses to perform work for customers are not backed up. These databases are not managed by our IT department; therefore, it is the team’s responsibility to maintain backups, especially since most of the databases are offsite.

This realization must have struck many members of my team simultaneously because a lot of conversations started revolving around the void in our database backups. Due to the nature of our work, our team only uses a database for about six to nine months at a time. Then, we destroy the database and start over. A point was made that everything we do can be recreated in a matter of time. However, what if we are down to the last day of the life of that database and it crashes causing the whole project to be pushed back weeks in order to recreate everything? Nevertheless, if there is a possibility for a database to have issues, wouldn’t it make sense to have a solution in place for the “what if” part of life?

In light of that “what if” our team will start adding maintenance plans to our databases. I am very glad that we were able to implement this new policy of backups; however, something else occurred to me- we have a virtual layer to deal with. Now, I’m not trying to sound dooms-day-ish, but I am practical. What if the VM that is hosting both SQL Server and the backups becomes corrupt? When I brought this up people looked at me crossed eyed like I was crazy. One person even said “VM’s don’t go bad.”

Well it happened!!! Last week something happened to our VM host and just like that one of our VMs was gone, moved to Invalid according to VMware. Thankfully, SQL Server was not being hosted on that machine, but my proof of screen shots helped my case.

What happened? We are still figuring that part out, but from the below screen shot you can tell that a VM is missing from the mix.


Last time I counted from 1 to 10, 6 came between 5 and 7. That day it did not, and 6 had turned into

‘Unknown (invalid).’

This situation does have a happy ending; we were able to reboot the host and everything came back online, but that was 5 days after VM6 disappeared.

Lesson learned: When dealing with backups move them off the server that the backup was created on, especially if the server is a VM.

I write this lesson to myself as well as others. This could have been one of my production SQL servers that went down without a backup. That day had a happy ending, but what about all of the tomorrows? I don’t know about you, but I’m not willing to take that chance.

The Case of the Missing VM