The Inverted Pyramid of Doom Challenge

hutchingsp

At the time of going the HUS route the HCI solutions were limited.

We looked at a couple of the big names, pricing was insane though I'd happily have a rack of either if money wasn't a factor in any of the lifecycle beyond just the initial purchase price.

That left a whole bunch of "roll your own" solutions, Scale at the time didn't seem to scale plus we're comfortable on ESXi, could we change? Yes of course, but then you're migrating to a hypervisor with a more limited support ecosystem however rock solid that hypervisor may be so I'm not sure what we actually gain there?

Starwind and other solutions are a bit too roll-your-own as was, I felt, StoreVirtual, in that if it all went wrong you're possibly stuck between the hardware vendor, the switch vendor, the OS vendor, the HCI stack vendor and so on which is great when you've saved a few $$$ but not so good when your business is down and nobody wants to own the issue.

Scale, Simplivity, Nutanix etc. are (or were when we purchased, maybe it's changed) all designed on the principle you scale compute and storage linearly and we don't which means that if/when I needed to add another TB of capacity I'd either be bolting on NAS and stuff to a HCI solution, or I'd be buying additional nodes giving me compute and storage and licensing costs when really all I needed was a disk tray and a couple of drives which I can do with the HDS.

In short it's simple and it's reliable and from the numbers at the time I know we didn't pay massively more than anything I could have rolled ourselves and ended up with a lot more complexity for very little gain.

That said, when the replacement time comes I suspect the HCI and VSA options will have matured a lot from where they were 2 years ago so there will be some more interesting options on the table - but I see that as being more in terms of things like stretching clusters than chasing 9's from a pile of kit all sitting in the same rack.

tonyshowoff

@scottalanmiller said in The Inverted Pyramid of Doom Challenge:

to hit checkboxes while resting on fragile, single point of failure storage (SAN, NAS, DAS, etc.)

As I've heard many times, even once from one of my own employees who argued with me about this "but how often do SANs/NASes/etc fail? Basically never, so it's not a single point of failure, and it makes it easier to move from one VM host to another."

I'm still baffled by this, and actually have seen them fail and take all customer data with them. What in God's name makes people seem to believe that you're more likely to have a motherboard or power supply fail in a server (since they don't use the disks) than actual harddisks. For well over a year we've had a no hire policy during interview if someone describes inverted pyramid as a proper way to set something up,. Yes, they can learn to do otherwise, but:

I don't want to teach them nor do I want my employees to waste their time teaching them something they should have figured out themselves.
If they just thought about it logically for even a second, they'd see how it's probably a stupid idea to put all your eggs in one basket.

I have no regrets. At no point in time was "needing the space" a good enough reason, unless the SAN was for one VM host or something like that possibly, but these days that kind of thing is rare anyway, at least in the environments for our customers, and our own.

scottalanmiller

@John-Nicholson said in The Inverted Pyramid of Doom Challenge:

@scottalanmiller said in The Inverted Pyramid of Doom Challenge:

We have this now and we use the same capacity with replicated local disks as you would with a SAN with RAID 10. Are you using RAID 6 or something else to get more capacity from the SAN than you can with RLS? We aren't wasting any capacity having the extra redundancy and reliability of the local disks with RAIN.

With HDT pools you could have Tier 0 be RAID 5 SSD, RAID 10 for 10K's in the middle Tier, and RAID 6 for NL-SAS with sub-lun block tiering across all of that. Replicated local you generally can't do this (or you can't dynamically expand individual tiers). Now as 10K drives make less sense (hell magnetic drives make less sense) the cost benefits of a fancy tiering system might make less sense. Then again I see HDS doing tiering now between their custom FMD's regular SSD's, and NL's in G series deployments so there's still value in having a big ass array that can do HSM.

We have only two tiers, but they can be dynamically expanded. Any given node can be any mix of all slow tier, all fast tier or a blend. There is a standard just because it's pre-balanced for typical use, but nothing ties it to that.

scottalanmiller

@John-Nicholson said in The Inverted Pyramid of Doom Challenge:

@scottalanmiller His previous RAIN system (HP Store Virtual) requires local RAID. So your stuck with nested RAID 5's (Awful IO performance) or nested RAID 10 (awful capacity capabilities, but great residency). Considering the HDS has 5 nines already though kinda moot.

Oh I know that some systems can be limited and lack functionality. But that doesn't imply that it is a limitation of the concept or a limitation of available implementations. It may be (or have been) but is not currently. HP VSA isn't bad, but I rarely think of that as the "go to" implementation of a RAIN system.

scottalanmiller

@John-Nicholson said in The Inverted Pyramid of Doom Challenge:

There are other advantages to his design over a HCI design. If he has incredibly data heavy growth in his environment he doesn't have to add hosts. As licensing for Microsoft applications stacks (datacenter, SQL etc) are tied to CPU Core's here in the near future adding hosts to add storage can be come rather expensive if you don't account for it properly. Now you could mount external storage to the cluster to put the growing VM's on, but I'm not sure if Scale Supports that?

I don't know about every vendor obviously, but I know that with our Scale we can swap out existing drives for larger ones or faster ones (as long as we don't lose more capacity than we need, obviously) if we need to rebalance our capacity. So some of our nodes are 1.8TB disks, for example. If our needs changed and we needed to go to 6TB disks, we could just swap them out and grow that way (in place.)

If we needed more growth than "bigger" drives would account for and/or we needed to increase performance or fault tolerance at the same time (bigger drives means more capacity bounded by the same IOPS which can be a problem) we can add "storage only" nodes. They can't run VMs but will grow the storage pool. These are rare to use and very few people talk about them because you can also just buy a normal compute node with storage and tell your workloads not to go there to solve the licensing issues 99% of the time. That way you get more performance and "fewer eggs in the same baskets" out of the investment. But you can balance things as needed.

I don't know how many players do that. I'm pretty sure that Starwinds does. I know that rolling your own with Gluster or CEPH (far from ideal) will do that. It's at least a semi-standard feature in the RLS space these days.

DRBD does not have this yet. I've not tried HAST yet, on my long list of things to put into the lab, but I'm pretty sure that it does not either. But you should really think of those only as two node solution tools, not scale out ones.

scottalanmiller

@John-Nicholson said in The Inverted Pyramid of Doom Challenge:

The other issue with Scale (and this is no offense to them) or anyone in the roll your own hypervisor game right now is there are a number of verticle applications that the business can not avoid, that REQUIRE specific hypervisors, or storage be certified. To get this certification you have to spend months working with their support staff to validate capabilities (performance, availability, predictability being one that can strike a lot of hybrid or HCI players out) as well as commit to cross engineering support with them.

This is always tough and is certainly a challenge to any product. It would be interesting to see a survey of just how often this becomes an issue and how it is addressed in different scenarios. From my perspective, and few companies can do this, it's a good way to vet potential products. Any software vendor that needs to know what is "under the hood" isn't ready for production at all. They might need to specify IOPS or resiliency or whatever, sure. But caring about the RAID level used, that it is RAID or RAIN, what hypervisor is underneath the OS that they are given - those are immediate show stoppers, no vendor with those kinds of artificial excuses to not provide support are show the door. Management should never even know that the company exists as they are not viable options and not prepared to support their products. Whether it is because they are incompetent, looking for kickbacks or just making any excuse to not provide support does not matter, it's something that a business should not be relying on for production.

StorageNinja

@scottalanmiller said in The Inverted Pyramid of Doom Challenge:

@John-Nicholson said in The Inverted Pyramid of Doom Challenge:

@scottalanmiller said in The Inverted Pyramid of Doom Challenge:

We have this now and we use the same capacity with replicated local disks as you would with a SAN with RAID 10. Are you using RAID 6 or something else to get more capacity from the SAN than you can with RLS? We aren't wasting any capacity having the extra redundancy and reliability of the local disks with RAIN.

With HDT pools you could have Tier 0 be RAID 5 SSD, RAID 10 for 10K's in the middle Tier, and RAID 6 for NL-SAS with sub-lun block tiering across all of that. Replicated local you generally can't do this (or you can't dynamically expand individual tiers). Now as 10K drives make less sense (hell magnetic drives make less sense) the cost benefits of a fancy tiering system might make less sense. Then again I see HDS doing tiering now between their custom FMD's regular SSD's, and NL's in G series deployments so there's still value in having a big ass array that can do HSM.

We have only two tiers, but they can be dynamically expanded. Any given node can be any mix of all slow tier, all fast tier or a blend. There is a standard just because it's pre-balanced for typical use, but nothing ties it to that.

The other advantage of having tiers with different raid levels etc, is he can use RAID 6 NL-SAS for ICE cold data, and RAID 5/10 for higher tiers for better performance. Only a few HCI solutions today do true always on erasure codes in a way that isn't murderous to performance during rebuilds. (GridStore, VSAN, ?)

Cost. Mirroring has a 2/3x overhead for FTT=1/2 while erasure codes can get that much much lower (IE half as many drives for FTT=2, or potentially less depending on stripe width). As we move to all flash in HCI (it coming) the IO/latency overhead for Erasure Codes and Dedupe/Compression becomes negligible. This is a competitive gap between several different solutions right now in that space.
When your adding nodes purely for capacity this carries other non-visible costs. Power draw (A Shelf on that HUS draws a lot less than a server), scale out systems consume more ports (and while this benefits throughput, and network ports are a LOT cheaper this is more structured cabling, more ports to monitor for switch monitoring licensing etc).

At small scale none of this matters that much (OPEX benefits in labor and support footprint trump these other costs). At mid/large scale this stuff adds up...

StorageNinja

@scottalanmiller 1.8TB drives if they are 10K are 2.5'' 6TB are 3.5''. If they can stuff a 3.5'' drive in a 2.5'' bay I'd be impressed.

The reality of 10K drives is the roadmap is dead. I don't expect to see anything over 1.8TB and in reality because those are 512E/4KN block drives anyone with any legacy OS's ends up stuck with 1.2TB drives more often than not if they don't want weird performance issues.

(Fun fact, Enterprise Flash drives are ALL 512E for 4KN back ends, but it doesn't matter because they have their own write buffers that absorb and re-order the writes to prevent any amplification).

Storage nodes are not commonly used, but largely because the vendors effectively charge you the same amount for them (At least the pricing on Nutanix storage only nodes wasn't that much of a discount). Outside of licensing situations no one would ever buy them up front (they would have right sized the cluster design from the start). In reality they something you kinda get forced into buying (You can't add anyone else's storage to the cluster).

I get the opex benefits of CI and HCI appliances, but the fact that you've completely frozen any flexibility on servers and storage comes at a cost, and that's lack of control by the institution on expansion costs.

scottalanmiller

@John-Nicholson said in The Inverted Pyramid of Doom Challenge:

@scottalanmiller 1.8TB drives if they are 10K are 2.5'' 6TB are 3.5''. If they can stuff a 3.5'' drive in a 2.5'' bay I'd be impressed.

It's all 3.5" bays, universally. Scale has no 2.5" bay offerings.

scottalanmiller

@John-Nicholson said in The Inverted Pyramid of Doom Challenge:

I get the opex benefits of CI and HCI appliances, but the fact that you've completely frozen any flexibility on servers and storage comes at a cost, and that's lack of control by the institution on expansion costs.

I wouldn't call it "completely frozen." You can, at least on the Scale and some other vendors systems, mix and match storage-only and compute nodes, you can get different nodes with different capacity or performance drive options, you can get different processor sizes (core count) and speeds and even different processor counts, different memory configurations, etc. Sure, if you want to mix and match AMD64 with Power8 or Sparc64 systems you are out of luck. That's a true limitations. But are you running into many customers that want to do that? It exists, but it is very, very rare. Especially in companies small enough to fit into the scale limitations of most HCI solutions. Outside of leaving the AMD64 world, which few SMBs do, there are very few limitations.

scottalanmiller

@John-Nicholson said in The Inverted Pyramid of Doom Challenge:

Storage nodes are not commonly used, but largely because the vendors effectively charge you the same amount for them (At least the pricing on Nutanix storage only nodes wasn't that much of a discount).

Because those nodes need CPU and RAM as well, there is only so much room for discounts. However, when doing this often you are doing it because, for example, you want the storage and failover of three nodes but only want Windows DC licensing on two nodes. So you pay for two nodes with dual high end Intel Xeons with lots of cores and high clock speeds, gobs of RAM, etc. Then the storage node is just a single, low end proc and a very small amount of RAM. So while the storage only "discount" might be very small, the amount that you have to pay less for the node compared to the compute nodes can be massive.

StorageNinja

@scottalanmiller said in The Inverted Pyramid of Doom Challenge:

@John-Nicholson said in The Inverted Pyramid of Doom Challenge:

The other issue with Scale (and this is no offense to them) or anyone in the roll your own hypervisor game right now is there are a number of verticle applications that the business can not avoid, that REQUIRE specific hypervisors, or storage be certified. To get this certification you have to spend months working with their support staff to validate capabilities (performance, availability, predictability being one that can strike a lot of hybrid or HCI players out) as well as commit to cross engineering support with them.

This is always tough and is certainly a challenge to any product. It would be interesting to see a survey of just how often this becomes an issue and how it is addressed in different scenarios. From my perspective, and few companies can do this, it's a good way to vet potential products. Any software vendor that needs to know what is "under the hood" isn't ready for production at all. They might need to specify IOPS or resiliency or whatever, sure. But caring about the RAID level used, that it is RAID or RAIN, what hypervisor is underneath the OS that they are given - those are immediate show stoppers, no vendor with those kinds of artificial excuses to not provide support are show the door. Management should never even know that the company exists as they are not viable options and not prepared to support their products. Whether it is because they are incompetent, looking for kickbacks or just making any excuse to not provide support does not matter, it's something that a business should not be relying on for production.

This right here makes no sense to me. You are ok with recommending infrastructure that can ONLY be procured from a single vendor for all expansions and has zero cost control of support renewal spikes, hardware purchasing, software purchasing (Proprietary hypervisor only sold with hardware), but you can't buy a piece of software that can run on THOUSANDS of different hardware configurations and more than 1 bare metal platform?

In Medicine for EMR's Caché effectively controls the database market for anyone with multiple branches with 300+ beds and one of every service. (Yes there is All Scripts that runs on MS SQL, and no it doesn't scale and is only used for clinics and the smallest hospitals as Philip will tell you). If you tell the chief of medicine you will only offer him tools that will not scale to his needs you will (and should) get fired. There are people who try to break out from the stronghold they have (MD Anderson who has a system that is nothing more than a collection of PDF's) but its awful, and doctors actively choose not to work in these hospitals because the IT systems are so painful to use (You can't actually look up what medicines someone is on, you have to click through random PDF's attached to them or find a nurse). The retraining costs that would be required to fulfill an IT mandate of "can run on any OS or hypervisors" to migrate this platform (or many major EMR's) are staggered. IT doesn't have this much power even in the enterprise. Sometimes the business driver for a platform outweighs the loss of stack control, or conformity of infrastructure (I"m sure the HFT IT guys had this drilled into their head a long time ago). This is partly the reason many people still quietly have a HP-UX, or AS400 in the corner still churning their ERP..

I agree that you should strongly avoid lock in where possible, but when 99% of the other community is running it on X and your the 1% that makes support calls a lot more difficult, not just because they are not familiar with your toolset. A lot of these products have cross engineering escalation directly into the platforms they have certified. We have lock-in on databases for most application stacks (and live with it no matter how many damn yachts we buy Larry). The key things are:

Know the costs going in. Don't act surprised when you buy software for 1 million and discover you need 500K worth of complementary products and hardware to deploy it.
Know what parts you can swap if they fail to deliver. (Hardware, support, OS, Database, Hypervisor) and be comfortable with any with reduced choice, or no choice. Different people may need different levels of support for each.
Also know what your options for the platform for hosted or OPEX non-hardware offerings are (IE can I replicate to a multi-tenant DR platform to make DR a lowered Opex).

scottalanmiller

@John-Nicholson said in The Inverted Pyramid of Doom Challenge:

I agree that you should strongly avoid lock in where possible, but when 99% of the other community is running it on X and your the 1% that makes support calls a lot more difficult, not just because they are not familiar with your toolset.

If your application isn't working, why are you looking at my hardware? I've never once, ever, seen a company that needed to call their EMR vendor to get their storage working, or their hypervisor. What scenario do you picture this happening in? What's the use case where your application vendor is being asked to support your infrastructure stack? And, where does it end?

There are only four enterprise hypervisors in any case, so if you are vendor that demands this level of integration, you need only support the four. Sure, someone new might come along, but this is a really silly limition to my thinking. It's no business of an application maker what platform is delivering the system, only that it is delivered. If that makes their job more complicated, you have other issues. If they even ask to see your underlying system, you have issues.

scottalanmiller

@John-Nicholson said in The Inverted Pyramid of Doom Challenge:

The retraining costs that would be required to fulfill an IT mandate of "can run on any OS or hypervisors" to migrate this platform (or many major EMR's) are staggered.

Well, no. The cost is literally zero. In fact, it takes cost and effort to not support it. OS and hypervisors are totally different here. Writing for an OS takes work, because that's your application deployment target. That's where you need to target the OS in question. The hypervisor is no business to an application writer. That's below the OS, on the other side of your interface. There is zero effort, literally zero, for the application team.

So what I see isn't a lack of effort, it's throwing in additional effort to try to place blame elsewhere for problems that aren't there. I've run application development teams, if your team is this clueless, I'm terrified to tell the business that I let you even in the door, let alone deployed your products.

scottalanmiller

@John-Nicholson said in The Inverted Pyramid of Doom Challenge:

IT doesn't have this much power even in the enterprise.

It does in finance, that's for sure. Any business that takes supportability and viability into account would never have IT not in a veto position here. IT may not pick the products, but it's a minimal level of business competence for them to be able to veto things that are not supportable (likely by the vendor), viable or secure.

In healthcare where cost effective, stable, supportable and secure don't matter, sure. But that's not the business world, either. You don't run quality IT in that field, not good business practices. It's its own thing and often decisions are made for reasons very, very different than "what's good for making money or supporting healthcare." I've been told flat out by healthcare management that "lowering cost, making money or providing better healthcare" were of zero interest to them because they were non-profit and the patients were not their customers.

scottalanmiller

@John-Nicholson said in The Inverted Pyramid of Doom Challenge:

This is partly the reason many people still quietly have a HP-UX, or AS400 in the corner still churning their ERP..

That "can" happen, but I never see those companies. What I find are always companies that lack the skills, resources or business acumen to do an application migration or to plan for one and get stuck cycle after cycle deploying something far too expensive because they did not develop the skills, acquire the skills or prioritize the planning to protect themselves - all business failings of bad management and not a "reason to strategize around this process."

StorageNinja

@scottalanmiller said in The Inverted Pyramid of Doom Challenge:

@John-Nicholson said in The Inverted Pyramid of Doom Challenge:

I agree that you should strongly avoid lock in where possible, but when 99% of the other community is running it on X and your the 1% that makes support calls a lot more difficult, not just because they are not familiar with your toolset.

If your application isn't working, why are you looking at my hardware? I've never once, ever, seen a company that needed to call their EMR vendor to get their storage working, or their hypervisor. What scenario do you picture this happening in? What's the use case where your application vendor is being asked to support your infrastructure stack? And, where does it end?

Because performance and availability problems come from the bottom up not the top down. SQL has storage as a dependency, storage doesn't have SQL as a dependency, and everything rolls downhill...

IF I'm running EPIC and want to understand why a KPI was missed and if there was a correlation to something in the infrastructure stack I can overlay the syslog of the entire stack, as well as the SNMP/SMI-S, API and hypervisor performance stats (application stats) the EUC environment stats (Hyperspace either on Citrix or View) and see EXACTLY what caused that query to run slow. There are tools for this. These tools though are not simple to build and if they don't have full API access to the storage, or hypervisor, with full documentation and this stuff built (including log clarification) its an expensive opportunity to migrate this to a new stack, and something they want to be restrictive on.

SAP HANA is a incredible pain in the ass to tune and setup, and depending on the underlying disk engine may have different block sizing or other best practices. This is one of the times where things like NUMA affinity can actually make or break an experience. Getting their support to understand a platform enough to help customers tune this, their PSO assist in deployments, and their support identify known problems with the partner ecosystem means they are incredibly restrictive (Hence they ares still defining HCI requirements).

It costs the software vendors money in support to deal with non-known platforms (Even if they don't suck). The difference between two vendors arguing and pointing fingers, and two vendors collaborating at the engineering level to understand and solve problems together is massive in customer experience (and the cost required).

The amount of effort that goes into things like this, and reference architectures is honestly staggering and humbling (these people are a lot smarter than me). I used to assume it was just vendors being cranky, but seeing the effort required of a multi-billion dollar.

At the end of the day EPIC, SAP and other ERP platforms will take the blame (not Scale, or Netapp, or EMC) if the platform delivers an awful experience (Doctors or CFO's will just remember that stuff was slow and broke a lot) so being fussy about what you choose to support is STRONGLY in their business interests and is balanced against offering enough choice that they do not inflate their costs too high. Its a balancing act.

scottalanmiller

@John-Nicholson said in The Inverted Pyramid of Doom Challenge:

Because performance and availability problems come from the bottom up not the top down. SQL has storage as a dependency, storage doesn't have SQL as a dependency, and everything rolls downhill...

That doesn't make sense, though. Applications care that they have enough CPU, memory, IOPS, bandwidth, etc. That's it. They don't care how it is delivered, only that it is available when needed. This would be, again, a failing of any application team and any IT team if they look to the application for issues involving not providing enough resources for performance.

If your point here is that incompetent IT departments tend to buy unsupportable, crappy software.. sure. No one is denying that people don't do their jobs well. But that doesn't mean we should recommend doing things poorly just because lots of people aren't good at their jobs.

scottalanmiller

@John-Nicholson said in The Inverted Pyramid of Doom Challenge:

SAP HANA is a incredible pain in the ass to tune and setup, and depending on the underlying disk engine may have different block sizing or other best practices. This is one of the times where things like NUMA affinity can actually make or break an experience. Getting their support to understand a platform enough to help customers tune this, their PSO assist in deployments, and their support identify known problems with the partner ecosystem means they are incredibly restrictive (Hence they ares still defining HCI requirements).

Right, so at this point you are talking about outsourcing your IT department as this isn't application work, this is infrastructure work. So this is turning into a completely different discussion. Now you are hiring an external IT consulting group that doesn't know the platform(s) that you might be running. That's a totally different discussion.

But what we are talking about here is needing an application vendor to do underlying IT work for them. It's a different animal. It does happen and there is nothing wrong with outsourcing IT work, obviously, I'm a huge proponent of that. But there is no need to get it from the application vendor, that some do might make sense in some cases, that application teams demand that they also be your IT department is a problem unless you are committed to them delivering their platform as an appliance and it should be treated that way in that case. Nothing wrong with that per se and a lot of places do just that.

scottalanmiller

@John-Nicholson said in The Inverted Pyramid of Doom Challenge:

At the end of the day EPIC, SAP and other ERP platforms will take the blame (not Scale, or Netapp, or EMC) if the platform delivers an awful experience (Doctors or CFO's will just remember that stuff was slow and broke a lot) so being fussy about what you choose to support is STRONGLY in their business interests and is balanced against offering enough choice that they do not inflate their costs too high. Its a balancing act.

Yes, I totally understand, vendors that target irrational, emotion, incompetent businesses have an interest in doing things that are not in the interest of those customers. As Scott Adams' defines it, the stupid rich. You make your best money by overcharging for bad products and marketing to those that aren't smart enough to figure out how they are getting screwed. I don't blame the vendors for making money, I blame customers for buying into it.

If our goal is to make money off of the businesses, we do one thing. If we are IT and our job is to make good decisions and protect the business from predatory vendors, we do another.