How Many HCI Nodes for the SMB

1337

@scottalanmiller said in How Many HCI Nodes for the SMB:

That's pretty rare. Each core today is worth many cores a few years ago. A modern 16 core machine is like a 64 core machine from just a decade ago or less.

That's not the case. The per core performance increase has been very modest. Instead of increasing the per core performance Intel & AMD has lowered the frequency and power requirement per core and as a result been able to increased the number of cores per chip instead.

For instance a server CPU that was a monster 10 years ago would be something like the X5690 with 6 cores @ 3.5Ghz. Today you might have something like a 8280 CPU with 28 cores @ 2.7Ghz. It's only about 25% faster per core. But you have 4-5 times as many cores.

Or in AMD's case today you might have the 7702 with 64 cores @ 2.0GHz. That's more than 10 times as many cores compared to 10 years ago, but they are still only about 25% faster.

scottalanmiller

@Pete-S said in How Many HCI Nodes for the SMB:

For instance a server CPU that was a monster 10 years ago would be something like the X5690 with 6 cores @ 3.5Ghz. Today you might have something like a 8280 CPU with 28 cores @ 2.7Ghz. It's only about 25% faster per core. But you have 4-5 times as many cores.

Are you thinking that clock frequency determines core performance? That's only true with the same chip, not between chip designs. Per clock cycle performance is much, much higher today than it used to be.

Remember a 1.7Ghz Pentium III would crush a 3.0Ghz Pentium 4, for example. Clock speed has always been super misleading and never a possible way to measure core performance.

scottalanmiller

@Pete-S said in How Many HCI Nodes for the SMB:

Or in AMD's case today you might have the 7702 with 64 cores @ 2.0GHz. That's more than 10 times as many cores compared to 10 years ago, but they are still only about 25% faster.

Let's use this example. IPC is what matters, clock speed is totally irrelevant and tells us nothing about system performance.

On a per core bases, from 2011 to 2020, AMD went from a per core performance of AMD FX-8150 at 3.15IPC/s in 2011 to AMD Ryzen 9 3950X 10.18IPC/s in 2019. A per core performance improvement of 323%. (Then it increased the core count, by a lot, as well.)

The increase per core in performance over time is normally staggering, it always has been. And this is what Moore's Law references - performance, not timing clock. The frequency is just the crystal timing circuit, it's an important part of a process under the hood, but not relevant to someone in IT, only to chip designers and electrical engineers.

That's why a four core AMD system today is roughly a 13 core system from just eight years ago. That was a LOT of horsepower eight years ago.

scottalanmiller

In the ARM world, which has come along a lot faster, the increase from 2008 to 2018 in per core performance rose by over 1,000%, it's pretty crazy. They went from way behind AMD and Intel, to a bit in front. Like 20% above them when both are at peak (but Intel and AMD have more cores, so per die performance remains far ahead.)

1337

@scottalanmiller said in How Many HCI Nodes for the SMB:

@Pete-S said in How Many HCI Nodes for the SMB:

Or in AMD's case today you might have the 7702 with 64 cores @ 2.0GHz. That's more than 10 times as many cores compared to 10 years ago, but they are still only about 25% faster.

Let's use this example. IPC is what matters, clock speed is totally irrelevant and tells us nothing about system performance.

On a per core bases, from 2011 to 2020, AMD went from a per core performance of AMD FX-8150 at 3.15IPC/s in 2011 to AMD Ryzen 9 3950X 10.18IPC/s in 2019. A per core performance improvement of 323%. (Then it increased the core count, by a lot, as well.)

The increase per core in performance over time is normally staggering, it always has been. And this is what Moore's Law references - performance, not timing clock. The frequency is just the crystal timing circuit, it's an important part of a process under the hood, but not relevant to someone in IT, only to chip designers and electrical engineers.

That's why a four core AMD system today is roughly a 13 core system from just eight years ago. That was a LOT of horsepower eight years ago.

You read too many game sites. What you're talking about is not relevant.

scottalanmiller

@Pete-S said in How Many HCI Nodes for the SMB:

You read too many game sites. What you're talking about is not relevant.

I'm talking about the topic at hand.... processor throughput. It's the only relevant thing to what we are discussing. This is standard industry knowledge back in the 70s and 80s. Not sure what gaming has to do with this, but this predated PC gaming in any real fashion as standard stuff IT should know.

scottalanmiller

@Pete-S You state the common lay person myth that clock frequency is how CPU performance is measured. You then state that modern processors are barely faster than old ones that had dramatically less advanced technology. You gave literally zero basis for this statement, you just pulled it out of thin air without even a hint of reasoning for it.

I provided actual MIPS calculations pulled from chip measurements (IPC/s are derived from MIPS on the chips.) You can argue that there are better measurements. But your argument seems to be solely something you made up based on nothing, and claiming that the math and measurements are wrong because "gamers", which is weird because gamers have always been the non-technical people that tend to use the clock cycle numbers because they are "easy" and nothing to do with CPU performance.

Do you have a reason you believe this is wrong? Or are you sticking to "because I said so" and trying to ignore the math and measurements, common sense and long term industry knowledge (and simple observation.)

It takes extremely little effort to see modern CPUs at similar clock speeds and core counts running circles around old processors. I'm not sure how anyone could even suggest otherwise given how visible and obvious this is, and how obvious it is that billions of dollars and decades of CPU research go into nothing else.

Measuring a CPU by clock cycle and expecting to know how fast the resulting computer is is like trying to guess the performance of a car by nothing other than its red line RPM limit. It's not a useless number, if you know everything else about the car, but the gearing, horsepower, and torque are what matter, not the engine RPMs. If you know enough of the engine design, gearing, weight, etc. you can use engine RPMs to estimate certain things, but that's it, on its own, it's meaningless.

Recent example: 11th gen Intel i7 vs. 10th gen Intel i7. The top speed drops from 4.9Ghz to 4.7Ghz, while Intel claims an 11% improvement in web performance. Both CPUs are quad core, HT. i7-10610U vs i7-1165G7. How could the performance leap ahead while clock cycles drop so much, if there isn't something else that matters?

Now while we can claim that Intel is lying, and claim that CPUs are literally moving backwards, this is not realistic. The only significant move backwards that processors ever have made was the high clock cycle Pentium 4 era with clock cycles dramatically over those of the faster Pentium III processors that predated them and, ultimately, ended up replacing them. Until the early 2000s, clock cycles were often "good enough" indicators because all mainline CPU were single core, single thread and similar enough architectures. But the P4 broke all that with a huge step backwards and a massive focus on the "easy for the masses" marketing numbers of clock cycles. That's when, even pretty casual computer users, learned that simple clock cycles didn't tell them anything.

notverypunny

So to sort-of come back around to the initial question, it's also going to depend on your position on N+1. Is it a nice to have or a must have? We're technically running our main VDI workload on HCI but if I'm not mistaken are commited beyond the N+1 threshhold. For our case it's not the end of the world, just means that if we lose a node there are some people who won't be able to work until we can either get the node back up or move on to some older gear. If you're running mission critical servers on HCI then I'd say of course that 2 is a minimum but that could depend on how the solution is engineered. I haven't looked into the various options but I could see a 3 node minimum requirement to satisfy quorum needs or avoid a split-brain scenario where the solution tries to spin up the VMs on the spare node when they haven't really gone down, just a communication glitch....

So like most things, the real answer is "It depends"

scottalanmiller

@notverypunny said in How Many HCI Nodes for the SMB:

So to sort-of come back around to the initial question, it's also going to depend on your position on N+1. Is it a nice to have or a must have? We're technically running our main VDI workload on HCI but if I'm not mistaken are commited beyond the N+1 threshhold.

The implication of two nodes is that it is still N+1. You just buy bigger nodes if necessary to keep it to two nodes.

scottalanmiller

@notverypunny said in How Many HCI Nodes for the SMB:

I haven't looked into the various options but I could see a 3 node minimum requirement to satisfy quorum needs or avoid a split-brain scenario where the solution tries to spin up the VMs on the spare node when they haven't really gone down, just a communication glitch....

You don't need three nodes for that. You can use a witness and there is technology to make it pretty much unnecessary even so.

1337

@scottalanmiller said in How Many HCI Nodes for the SMB:

@Pete-S You state the common lay person myth that clock frequency is how CPU performance is measured. You then state that modern processors are barely faster than old ones that had dramatically less advanced technology. You gave literally zero basis for this statement, you just pulled it out of thin air without even a hint of reasoning for it.
I provided actual MIPS calculations pulled from chip measurements (IPC/s are derived from MIPS on the chips.) You can argue that there are better measurements. But your argument seems to be solely something you made up based on nothing, and claiming that the math and measurements are wrong because "gamers", which is weird because gamers have always been the non-technical people that tend to use the clock cycle numbers because they are "easy" and nothing to do with CPU performance.
Do you have a reason you believe this is wrong? Or are you sticking to "because I said so" and trying to ignore the math and measurements, common sense and long term industry knowledge (and simple observation.)

No, clock frequency isn't a measure of performance. I've never said that anywhere.

No, the CPUs you mentioned earlier are desktop CPUs and not something you'll see in a server. That's why it's irrelevant.
IPC is also as irrelevant as clock frequency in itself and I'll explain why below.

Also when I talk about servers CPUs from 10 years and forward it's Xeon 5500/5600 series, E5-2600 V1/V2/V3/V4, Scalable gen1/gen2, AMD Epyc gen1/gen2. Servers you see in 1U or 2U server such as Dell R710 and newer.

Looking at maximum number of cores per CPU the last 10 years you'll see:

5600 - 6 cores
E5-2600 V1 - 8c
E5-2600 V2 - 10c (12c in special SKU)
E5-2600 V3 - 18c
E5-2600 V4 - 22c
Scalable Gen 1 - 28c
Epyc gen 1 - 32c
Scalable Gen 2 - 56c (but not readily available)
Epyc Rome gen 2 - 64c

If we would for a moment assume the CPUs had exactly the same cores at the same clock frequency, the increase in core count would be more than 10 times. So a server today could have more than 10 times the processing power.

The good thing is that cores can do more work today at the same clock frequency. The bad thing is that due to the thermal design envelope you can't run a high core CPU at high frequency or it will burn up. So we are NOT running at the same frequency. And that's where the problem lies.

Looking at the sales info from Intel & AMD you would have thought that each new CPU would be a tremendous improvement. But as any sales info it only tells part of the truth and each measurement is made in very specific situations to show the largest possible improvement. As anyone should expect.

But if you run a generic benchmark that is not designed to give inflated numbers, the situation is different.

For instance comparing X5690 I mentioned before as the pinnacle of CPU performance about 10 years ago:
https://browser.geekbench.com/processors/intel-xeon-x5690
to AMD's 7742 64 core monster CPU:
https://browser.geekbench.com/processors/amd-epyc-7742

Looking at the single core benchmark (which is a mix of running different computations) we'll see that the new cores in this case only has 12% more performance than the 10 year old cores.

We can argue what this benchmark is measuring all day long. If the newer CPU would have executed the instructions faster it would have had a better result. It's as simple as that.

This is not an outlier or freak benchmark, there are hundreds of these. Some will show that the newer cores are maybe 75% faster while other might even show that newer cores are not faster at all.

The largest improvement comes when the new CPU has some new instructions that can help in some cases for instance AVX-512.

Now, you can get CPUs with cores that are significantly faster because they run a higher frequencies, but as I said those CPUs are not available with as many cores. Because they would burn up.

StorageNinja

@scottalanmiller said in How Many HCI Nodes for the SMB:

The implication of two nodes is that it is still N+1. You just buy bigger nodes if necessary to keep it to two nodes.

If your licensing Oracle RAC for 40K per core (list, I know you'll pay less but still) or SAP HANA (where you pay per TB of RAM) then scaling out to a larger cluster has some advantages on N+1 math where 50% vs. 25% on 4 smaller nodes for HA protection comes to play.

travisdh1

@StorageNinja said in How Many HCI Nodes for the SMB:

@scottalanmiller said in How Many HCI Nodes for the SMB:

The implication of two nodes is that it is still N+1. You just buy bigger nodes if necessary to keep it to two nodes.

If your licensing Oracle RAC for 40K per core (list, I know you'll pay less but still) or SAP HANA (where you pay per TB of RAM) then scaling out to a larger cluster has some advantages on N+1 math where 50% vs. 25% on 4 smaller nodes for HA protection comes to play.

How many SMBs actually use Oracle RAC or SAP HANA? Can't be many.

StorageNinja

@travisdh1 said in How Many HCI Nodes for the SMB:

How many SMBs actually use Oracle RAC or SAP HANA? Can't be many.

I know people with 20 employees and 400 oracle databases FWIW. There's a lot of smaller application providers who do niche SaaS stuff.

SAP is pulling Oracle support, and making everyone move to HANA going forward for their apps.

StorageNinja

@Pete-S said in How Many HCI Nodes for the SMB:

But if you run a generic benchmark that is not designed to give inflated numbers, the situation is different.

I don't run generic benchmarks in production for a living thankfully
The reality is most CPU intensive stuff takes advantage of at least some of the new offload extensions and libraries. Also memory throughput is often the limiting factor for databases and other intensive IO applications.