I’ve been working on a script to speed up our failover to the cloud testing as I wrote about in this previous blog here. Unfortunately, I haven’t been able to dedicate time to this, so I’ve been working on it here and there. I’m pretty close to completing it, at least to do what I need it to do given the skill level I’m at.

To touch on what I was trying to accomplish again, I’ll do a quick run down of the problem I was trying to solve. We currently put in data recovery solutions that take images of the servers we’re protecting. These images are then replicated offsite to our partner. When we need to recover offsite, we have the ability to virtualize any of the images transferred offsite. To make sure everything is working, we do regular tests.

When these test are performed, we choose the instance to virtualize and we create a network to virtualize them on. Unfortunately, the way this failover works, well the way virtualization works, is a new NIC is created. When a new NIC is created the IP configuration you had on all your servers is lost. Instead, they get IPs via DHCP from the network you setup, which unfortunately doesn’t give you any options other than network address, subnet mask, and gateway. This leads to a problem where none of the servers can contact active directory, and when Windows servers can’t contact AD, they can take a long time to boot and an even longer time to login.

Another problem we have is we have agents running on the servers being protected, because we monitor them as well. When these test failover servers boot up, we start getting calls about servers hard booting. This is because the live and test server have the same unique ID in the agent, and they are both reporting back as the same server.

To solve these problems, I wanted to write a script that stops and disables the services that we don’t want running during test failover. I also wanted the script to assign a designated IP configuration so the servers could find the domain controllers.

Here’s what I came up with so far. I have it running in Scheduled Tasks on Windows Startup. Because a new NIC has to be installed during boot up, I built a delay in to give it enough time to complete.

This isn’t working flawlessly yet, but I wanted to put it out there and see if anyone had some feedback or better ideas. Two of the problems I’m having are as follows:

1. The script isn’t working consistently. This may be related to execution time. I’m considering changing it to a service, and then possibly I can do some type of pause and loop to confirm the NIC fully loaded. Some servers seem to work like I expect, and some seem to only work after I reboot them a second time.

2. Not all the services are stopping and being disabled. I can’t understand why, since it works for almost all the services. Sometimes the service is disabled but running, which is why I put a reboot in as the last action. Sometimes, a service will be stopped but not set to disabled, which means it will be running after the reboot.

Testing this is a pain. Everything seems to work when I have it on a test machine and run it manually. It even seems to work when I schedule it. The problem is the startup doesn’t have the same process it does when you are going from a development server to the failed over virtual server. To test it in the correct scenario, I have to update the script, copy it to the live server, and then wait for the live server to back up and replicate offsite. That can take a decent amount of time.

Anyway, let me know if you see any major amateur mistakes or better ways to do something.

 

 

Author
profilepicJason Vanzin is the CEO at Vanzin Consulting Corp. He has over 15 years of IT experience and lives in Pittsburgh, PA. He blogs on topics related to Business Continuity, Python programming, and technology in general.

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: