Disaster Recovery and Disaster Avoidance Planning for a Small Manufacturing Firm
-
Posting over here on behalf of @garak0410
I am entering my 5th year as the solo IT Director/Administrator for a small but multi-million dollar company. Before my arrival, downtime was the norm. After that, it is rare and when it does happen, it is usually power related.
Server and Workstation support is only a fraction of my job. I am literally the jack of all IT trades here from VBA programming, to tablets, phones, gadgets, to infrastructure. And it is tough to feel like an expert in any of it.
We are in a Hyper-V environment. We have a Dell T420 that runs three VM's:
- Domain Controller DNS/DHCP
- File Server
- Backup Server (Veeam and writes to a Windows Storage Server 2012 NAS)
We are growing so the need for a second virtual host is needed due to lack of storage and overall "power" on the T420. However, I wasn't able to spend money on a new server. So, re purposed an aging PowerEdge 2900 tower server as another virtual host to now run our SQL production, Applications Server and now it will also run DocuWare and a software suite knon as ABIS/Adjutant.
We still have an old and under-powered SQL server that we will retire and move to the production SQL server VM once I get the time to re-code the VBA spreadsheets that use it. After that, we will be all virtual except for our Spiceworks and Internet Monitor (Work Examiner), which run on a workstation.
I am now facing the possibility of running 4 production VM servers on an aging but still powerful physical server.
We are a fast paced, quick turn around metal building manufacturer. We cannot afford downtime.
I've been in contact with Dell (my preferred vendor) about what it will take to increase our redundancy. They presented a solution that included a beefy server, SAN, SAN switch and some additional switches to help with latency. It was over 60K.
I presented to management and it scared them off, even if I stressed how this helps us with our up-time and redundancy.
At bare minimum, I am looking at upgrading the aging PowerEdge 2900 to at least another T420 with a little more memory and storage than the current one. That may be all I get to spend since we will be investing a lot in the ABIS software.
And I think I'll need another NAS for Docuware since that will be for document management and basically keeping the files forever. Will also be backing that up to external drives for offsite storage.
Back to backups...I've been very happy with the performance and tech support from VEEAM and I plan on keeping them for a while.
OK, worse case scenario...one of these physical servers goes down. We don't have 24 hour turnaround service. We are down 48-72 hours. Management knows that. Being virtualized, I feel pretty comfortable that we can restore to a replacement server pretty seamlessly even if it does take time.
I can get suggestions on how we need to spend this or that but right now, I am looking for the best solution on a budget as I know I can't justify the 68K. I might be surprised to hear some tell me I can stick with the 2900 for now.
So, knowing the above is my situation, what would you suggest? I do know I need to get off that 2900 soon, even if it is still chugging along and actually seems to perform better than my T420 oddly. I do know that at the moment, the most I think I could get would be an under $10,000 server to replace the 2900. Beyond that, it is going to be tough but I do want to update our Disaster Recovery Plan to give them realistic estimates on downtime.
-
I saw that @garak0410 mentioned that they were limited to 6Mb/s on their WAN link and that that had ruled out moving things to hosted. Mostly I would agree but need to point out that doing something like AD hosted on Azure, Rackspace or Amazon would easily work over 6Mb/s even without a local node, but you would assume that you would still do a local node and therefore the 6Mb/s is way more than enough for the AD DC. Won't help with the file server, but that is likely less critical.
-
I'd have to look at a lot of the workloads specifically but the recommendations for AWS and Azure above can easily make sense. As a manufacturing firm you need to be aware of your ISP dependency for manufacturing and what impact that might be. Likely you can only look at hybrid at this time. But doing Active Directory as a hybrid with, just as an example, an AD DC "master" with the FSMO roles on Azure and then your local DC only being the "secondary" can save a lot of headaches. You don't even need to back up the local one, only the one on Azure and you can have Microsoft do that. The more workloads you move to the cloud or split between, the less need for a second [expensive] on site server and less need for complicated HA which will take a lot of your time.
-
I think the basics have been covered pretty thoroughly. The highlights are summed up as:
- Never go to a salesperson (anyone you talk to at Dell, for example, is paid to sell to you and is a salesperson) in order to get advice as to what to do. Not only is this generally bad, this is the literal textbook example that we use and they gave the text book scam response. The infamous 3-2-1 / Inverted Pyramid of Doom SAN scam that Dell is famous for. Dell is a great vendor, but they will run this scam every, single time. The profits are just too good to pass it up. Never give them that chance.
- No SAN. No way. There is no way a SAN plays into your business needs here. Both you lack the scale to talk SAN. AND your needs were around reliability, the SAN will actually cripple you there.
- You need one server for your technical needs and likely a second for your failover. That's two servers, tops. No other equipment. And that's IF you really need two. One might do it. Or one might do it for now and you could do two later. Remember that even a single new server to handle all of the load is a huge improvement is time to manage, effort AND reliability over where you are now. It's an improvement in every way, so consider that "better" is always "better". Is it enough? Maybe, maybe not. But it is half the cost or less of two servers and still better than you have ever been before. Consider if that might be enough (and consider that some workloads like AD could keep running on the old server for the secondary failover.) So this could be VERY cheap.
- Don't buy new, that's a waste of money. Get your Dells from xByte. Will save you literally a fortune here. A small one, but a fortune.
- Hyper-V with Starwind or XenServer with HALizard are your platform choices. Both will meet your needs.
-
I think that it is pretty much guaranteed that some amount of workload must be on premises at this point. That is a given. But maybe not the failover components. I wonder if getting AD into a hybrid with Azure, AWS or Rackspace would solve the fears around reliability, coupled with a new server with good support? Well worth asking. Going to a single server system would save a ton of money and make things very simple.
-
Something else worth considering is a backup appliance. Something like a Unitrends that can do the backups AND spin up those images in case the primary hardware has failed. That would reduce the need for a secondary server.
Likewise using Xen Orchestra with XenServer you would get backups for free and a second server (that also houses the XO system and holds the backups) could be used to spin up downed VMs.
In both cases, this would be a collapsing of several nodes into two: one production and one backup.
-
Yeah, considering their monetary constraints, a single server was my first consideration for that thread.
As for Azure, AWS, Rackspace, etc - do they really need that?
As you already said, if they get one good server that can handle all of their current workloads, the other current server could be used as backups and redundancies.
-
@scottalanmiller said:
Something else worth considering is a backup appliance. Something like a Unitrends that can do the backups AND spin up those images in case the primary hardware has failed. That would reduce the need for a secondary server.
Likewise using Xen Orchestra with XenServer you would get backups for free and a second server (that also houses the XO system and holds the backups) could be used to spin up downed VMs.
In both cases, this would be a collapsing of several nodes into two: one production and one backup.
The Unitrends seems like a spend for nearly no reason in this case considering he already has the hardware for the second XenServer/XO option.
-
@Dashrender said:
As for Azure, AWS, Rackspace, etc - do they really need that?
A single VM can handle being an AD DC very well. It makes for a really nice, easy way to not only get your AD to be very resilient, it also moves the workload off premises along with the backups (if you put the FSMO roles there.) So you get the benefits of a hosted AD DC with the performance of a local - no need for local backups and you can keep using AD even if the local controller fails. Because it provides HA for AD and reduces backup needs, it can be a big win in some cases. Those cases primarily being if it tips the scales so that a second local server is not needed.
-
@Dashrender said:
The Unitrends seems like a spend for nearly no reason in this case considering he already has the hardware for the second XenServer/XO option.
I agree. XO is so powerful (and free) that it is really, really difficult to justify doing much else.
-
@scottalanmiller said:
@Dashrender said:
The Unitrends seems like a spend for nearly no reason in this case considering he already has the hardware for the second XenServer/XO option.
I agree. XO is so powerful (and free) that it is really, really difficult to justify doing much else.
Even without XO, if you go Hyper-V and Veeam, you'd save a bundle since he already has the hardware, and he'd be able to do everything you mentioned.
-
@scottalanmiller said:
@Dashrender said:
As for Azure, AWS, Rackspace, etc - do they really need that?
A single VM can handle being an AD DC very well. It makes for a really nice, easy way to not only get your AD to be very resilient, it also moves the workload off premises along with the backups (if you put the FSMO roles there.) So you get the benefits of a hosted AD DC with the performance of a local - no need for local backups and you can keep using AD even if the local controller fails. Because it provides HA for AD and reduces backup needs, it can be a big win in some cases. Those cases primarily being if it tips the scales so that a second local server is not needed.
I suppose - how do you connect the Azure based AD back to the home base? Can you buy firewall based VPN, or are you thinking something like Pertino or ZT?
-
@Dashrender said:
I suppose - how do you connect the Azure based AD back to the home base? Can you buy firewall based VPN, or are you thinking something like Pertino or ZT?
Same ways as any hosted, or off premises solution. Whether it is Azure, in a colo, down the street at the boss' house, at a second site... whatever solution you use for that you can probably use for the one on Azure. You can use an IPSec VPN, OpenVPN, Clientless SSL VPN, Pertino, ZeroTier or even (don't actually do this) just open the ports.
-
I've got a ghost writer...nice...
I am just crazy busy at work but this post reflects a fraction of what's on my plate right now. SO let me take a piece at a time.
We are running production VM's on that aging (but licensed) PowerEdge 2900. Do I need replacing that as my priority or perhaps look at the Starwind solution first? We are going to need more storage since we are adopting DocuWare.
-
@garak0410 said:
We are running production VM's on that aging (but licensed) PowerEdge 2900. Do I need replacing that as my priority or perhaps look at the Starwind solution first? We are going to need more storage since we are adopting DocuWare.
Aging servers are bad things. That's when the risk goes way, way up. Nothing wrong with utilizing old stuff, but from the sounds of it you have one things that makes tons and tons of sense (someone jump in if I'm missing something big here) and that is....
Get one "nice" "new" server that will handle your entire workload without a problem and migrate everything to that. That's job one. Everything else is secondary and we can figure out the details after that. Getting to one, new server will dramatically lower your risk and make your job easier and is probably necessary no matter what else you decide to do. So getting that done and out of the way is a discrete, and very important first step. Once you have that and good backups, you can breath easily and move from being in a critical disaster avoidance mode to casually tweaking the environment for the best long term strategy.
-
I put "nice" and "new" in quotes because you definitely should not get new. Check out xByte (see their add on the right over there -----> ) and see how awesome refurb can be. That's all that we buy. Something like a nicer R510 might be all that you need. That will be very cheap. Get a warranty from xByte. So "new to you" and far better than what you have, but nothing crazy.
-
Going to a single server you likely want RAID 10, but RAID 6 can do fine. But likely you won't need more than four drives if you go NL-SAS or better, so RAID 6 wouldn't be an option yet. Go high on memory, it's cheap and almost always the bottleneck.
-
We will need the DPACK to know for sure, but an R510 is a monster compared to what you have and so much cheaper than the R700 series. The R510 can hold more storage than you could possibly need and is a very low cost chassis. Hard to go wrong with it. It's my favourite entry level Dell on the market. (The R720xd is my favourite mainline Dell.)
-
@garak0410 said:
We are going to need more storage since we are adopting DocuWare.
Got a ballpark number on that? How much storage does your fileserver use today? How much are you anticipating from DocuWare?
-
@scottalanmiller said:
I put "nice" and "new" in quotes because you definitely should not get new. Check out xByte (see their add on the right over there -----> ) and see how awesome refurb can be. That's all that we buy. Something like a nicer R510 might be all that you need. That will be very cheap. Get a warranty from xByte. So "new to you" and far better than what you have, but nothing crazy.
Checking out XByte now...good prices...would want something strong enough to run all of our VM's...then perhaps re-purpose the current T420 for the redundancy project.