ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Defining High Availability

    Scheduled Pinned Locked Moved IT Discussion
    best practicesit dictionaryrisk
    31 Posts 5 Posters 5.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DashrenderD
      Dashrender @Jimmy9008
      last edited by

      @Jimmy9008 said in Defining High Availability:

      So, how many 9's up-time would 192 seconds of downtime for a whole year be?

      365 (days) * 24 (hours) * 60 (min) * 60 (sec) = total seconds in 1 year =31,536,000

      31,536,000 - 192 =amount uptime = 31,535,808

      31,535,808 (actual uptime) / 31,536,000 (max uptime) = .9999939117 or in percent 99.99939117 % uptime

      scottalanmillerS 1 Reply Last reply Reply Quote 2
      • DashrenderD
        Dashrender @Jimmy9008
        last edited by

        @Jimmy9008 said in Defining High Availability:

        I find this all very interesting. Anywhere to read more in depth on industry standards surrounding this?

        My team base availability on HTTP/S error codes. If a code comes back, say 404, then we consider that unavailable. If the page loads, but the site does not function because our development team messed a release up, as long as it is not an error such as 404, we consider we are available.

        Our development team probably calculate their up-time differently, but its all very interesting to me.

        Uptime can be many different things.

        The platform uptime, the application uptime, the internet connection uptime, etc.

        Assuming you're providing a service to someone else - they only thing they care about is the uptime they have connecting to that service. So 404 aren't the only thing they care about. If your app is dead, yet the page doesn't display 404, it's still an outage to the end user.

        I'm guessing for the most part, that's the one that primarily matters - so the fact that your team looks at only their bit - yeah, doesn't make the customer any happier.

        1 Reply Last reply Reply Quote 1
        • scottalanmillerS
          scottalanmiller @Jimmy9008
          last edited by

          @Jimmy9008 said in Defining High Availability:

          I find this all very interesting. Anywhere to read more in depth on industry standards surrounding this?

          Industry standards are very general, hence why I tackled it around server numbers specifically. HA is used 99% (I made that up) by marketing, and only 1% by actual IT. IT needs to not use such a generality and must work with real numbers (X Nines) as goals. There is really no time that working with "HA" as a general concept works for IT, because it's a process not a product, and because achieving proper availability at cost is a sliding scale that we have to work with for everything.

          So defining HA for a specific item (a server, wordpress, an ERP, etc.) is a case by case basis. Physical servers have a known industry standard, so an order of magnitude better (HA) or worse (LA) is easy to define. For anything software related, it is not so clear.

          Then there is more to it, as well. If a standard server gets around five nines of availability. And HA is six nines, what if we need "in between" or "far more"? You can't work with a term like HA, you must define the "nines" and work with that.

          1 Reply Last reply Reply Quote 1
          • scottalanmillerS
            scottalanmiller @Jimmy9008
            last edited by

            @Jimmy9008 said in Defining High Availability:

            My team base availability on HTTP/S error codes. If a code comes back, say 404, then we consider that unavailable. If the page loads, but the site does not function because our development team messed a release up, as long as it is not an error such as 404, we consider we are available.

            A 404 would be tough in that case. Because you might be calling something unavailable based on a bad request.

            Example: My store must be open 24/7.

            Problem (that a 404 represents): Customer went to the wrong address and didn't find my store.

            1 Reply Last reply Reply Quote 0
            • scottalanmillerS
              scottalanmiller @Dashrender
              last edited by

              @Dashrender said in Defining High Availability:

              .9999939117 or in percent 99.99939117 % uptime

              AKA: Five Nines

              Or more accurately, 5 Nines+

              That extra "39" after your five nines is a significant improvement over five nines, but not close to six nines. I'd call it "really good" availability 🙂

              DashrenderD 1 Reply Last reply Reply Quote 0
              • DashrenderD
                Dashrender @scottalanmiller
                last edited by

                @scottalanmiller said in Defining High Availability:

                @Dashrender said in Defining High Availability:

                .9999939117 or in percent 99.99939117 % uptime

                AKA: Five Nines

                Or more accurately, 5 Nines+

                That extra "39" after your five nines is a significant improvement over five nines, but not close to six nines. I'd call it "really good" availability 🙂

                And significant means the difference between 315.36 seconds of downtime vs your 192 seconds (5 mins 15.36 second vs 3 min 12 seconds).

                scottalanmillerS 1 Reply Last reply Reply Quote 1
                • scottalanmillerS
                  scottalanmiller @Dashrender
                  last edited by scottalanmiller

                  @Dashrender said in Defining High Availability:

                  @scottalanmiller said in Defining High Availability:

                  @Dashrender said in Defining High Availability:

                  .9999939117 or in percent 99.99939117 % uptime

                  AKA: Five Nines

                  Or more accurately, 5 Nines+

                  That extra "39" after your five nines is a significant improvement over five nines, but not close to six nines. I'd call it "really good" availability 🙂

                  And significant means the difference between 315.36 seconds of downtime vs your 192 seconds (5 mins 15.36 second vs 3 min 12 seconds).

                  Yeah. Its nearly half!

                  J 1 Reply Last reply Reply Quote 0
                  • J
                    Jimmy9008 @scottalanmiller
                    last edited by

                    @scottalanmiller said in Defining High Availability:

                    @Dashrender said in Defining High Availability:

                    @scottalanmiller said in Defining High Availability:

                    @Dashrender said in Defining High Availability:

                    .9999939117 or in percent 99.99939117 % uptime

                    AKA: Five Nines

                    Or more accurately, 5 Nines+

                    That extra "39" after your five nines is a significant improvement over five nines, but not close to six nines. I'd call it "really good" availability 🙂

                    And significant means the difference between 315.36 seconds of downtime vs your 192 seconds (5 mins 15.36 second vs 3 min 12 seconds).

                    Yeah. Its nearly half!

                    Hopefully it will remain as 48 seconds for the rest of the year. So, if that were to happen we would be: %99.99984779, correct?

                    scottalanmillerS 1 Reply Last reply Reply Quote 0
                    • scottalanmillerS
                      scottalanmiller @Jimmy9008
                      last edited by

                      @Jimmy9008 said in Defining High Availability:

                      @scottalanmiller said in Defining High Availability:

                      @Dashrender said in Defining High Availability:

                      @scottalanmiller said in Defining High Availability:

                      @Dashrender said in Defining High Availability:

                      .9999939117 or in percent 99.99939117 % uptime

                      AKA: Five Nines

                      Or more accurately, 5 Nines+

                      That extra "39" after your five nines is a significant improvement over five nines, but not close to six nines. I'd call it "really good" availability 🙂

                      And significant means the difference between 315.36 seconds of downtime vs your 192 seconds (5 mins 15.36 second vs 3 min 12 seconds).

                      Yeah. Its nearly half!

                      Hopefully it will remain as 48 seconds for the rest of the year. So, if that were to happen we would be: %99.99984779, correct?

                      Sounds about right. Six nines is just 2.6 seconds!

                      J 1 Reply Last reply Reply Quote 0
                      • J
                        Jimmy9008 @scottalanmiller
                        last edited by

                        @scottalanmiller said in Defining High Availability:

                        @Jimmy9008 said in Defining High Availability:

                        @scottalanmiller said in Defining High Availability:

                        @Dashrender said in Defining High Availability:

                        @scottalanmiller said in Defining High Availability:

                        @Dashrender said in Defining High Availability:

                        .9999939117 or in percent 99.99939117 % uptime

                        AKA: Five Nines

                        Or more accurately, 5 Nines+

                        That extra "39" after your five nines is a significant improvement over five nines, but not close to six nines. I'd call it "really good" availability 🙂

                        And significant means the difference between 315.36 seconds of downtime vs your 192 seconds (5 mins 15.36 second vs 3 min 12 seconds).

                        Yeah. Its nearly half!

                        Hopefully it will remain as 48 seconds for the rest of the year. So, if that were to happen we would be: %99.99984779, correct?

                        Sounds about right. Six nines is just 2.6 seconds!

                        Yeah, not long. That is unplanned downtime only though. We have plenty of planned downtime for running updates and other projects. But still, good.

                        I'm off to a new job on 1st of April so won't know the end of year figure. I'd hope it is around 48 seconds though.

                        scottalanmillerS 2 Replies Last reply Reply Quote 0
                        • scottalanmillerS
                          scottalanmiller @Jimmy9008
                          last edited by

                          @Jimmy9008 said in Defining High Availability:

                          Yeah, not long. That is unplanned downtime only though. We have plenty of planned downtime for running updates and other projects. But still, good.

                          Ah, we often describe those as "Planned Availability" and "Unplanned Availability". Most people talking HA want both at six nines.

                          1 Reply Last reply Reply Quote 0
                          • scottalanmillerS
                            scottalanmiller
                            last edited by

                            NTG had two servers in our early days beat six nines over a decade. We just got lucky, but holy cow.

                            1 Reply Last reply Reply Quote 0
                            • scottalanmillerS
                              scottalanmiller @Jimmy9008
                              last edited by

                              @Jimmy9008 keep in mind that resulting availability and risk aren't the same thing. Any five nines system is expected to hit six nines nine out of ten years. It's the average over the operating lifespan, not over a set interval. Otherwise any normal interval that you select would have 100% uptime.

                              So there are two ways to look at it reasonably...

                              1. Resulting Availability Over Operational Lifetime
                              2. Expected Availability Over Operational Lifetime

                              The first is what an individual system actually provides. The second is the average of all systems configured identically, over all of their operational lifetimes.

                              The first you measure. The second you project with simulations.

                              In extremely large systems, like BackBlaze, they get close approximations to the later through measurement because they look only at small components (like hard drives) of which they have substantiation numbers to create a reasonable approximation to a full number.

                              When I was on Wall St., we had 80,000 servers in our pool and so we had actual risk and availability numbers for the industry in datacenters like ours. But it still only told us about a handful of server models, and only under our exact conditions. And it still took a decade or more to produce meaningful numbers, and those numbers only applied to the servers of the past, not the ones being installed new.

                              1 Reply Last reply Reply Quote 2
                              • 1
                              • 2
                              • 2 / 2
                              • First post
                                Last post