ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    IT Survey: Preemptive Drive Replacement in RAID Arrays

    Scheduled Pinned Locked Moved IT Discussion
    storageraidwinchester drivesurvey
    44 Posts 11 Posters 11.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DustinB3403D
      DustinB3403 @coliver
      last edited by

      @coliver said:

      @DustinB3403 said:

      To follow up, I've never performed it either. But have heard people say that they replace their drives to avoid the urgent rush of a RAID being depreciated, because of a failed drive.

      Wouldn't it be just as good to have a cold spare on a shelf waiting for a failure?

      I absolutely agree that having a spare drive on the shelf is more effective, than replacing a drive even if it hasn't failed.

      Some people simply don't want to understand what has to be performed to rebuild the array when you replace drives just to replace them.

      scottalanmillerS 1 Reply Last reply Reply Quote 0
      • D
        Drew
        last edited by

        I'm guessing this isn't exactly what you're referring to but I thought I'd add my experience anyway. I guess it depends on what you mean by "perfectly healthy". One manufacturer might consider a drive perfectly healthy while another might not.

        Certain arrays will look at bad blocks to decide to preemptively to stop using a drive and switch to a hot spare if the number of bad blocks has reached a certain percentage and then they will send you a replacement drive. The number of bad blocks that constitutes a drive that is perfectly healthy vs impending failure varies.

        I've contacted a vendor before and sent diagnostic logs on arrays that were going to fall off support to analyze drives that hadn't necessarily crossed that line but might raise a few flags to see if I could get some drives replaced.

        As for replacing drives that show no signs at all of failing but just replacing due to being a certain age. I've never done this.

        scottalanmillerS 1 Reply Last reply Reply Quote 0
        • DashrenderD
          Dashrender
          last edited by

          I mentioned this to an associate of mine and he came up with a possible situation where this could matter, but we both agreed it was pretty unlikely.

          His reason was, if the labor pool for emergency repair is small to handle all the emergencies that are happening. Of course there are tons of mitigations for this, but I though the general idea had merit.

          MattSpellerM scottalanmillerS 2 Replies Last reply Reply Quote 0
          • MattSpellerM
            MattSpeller @Dashrender
            last edited by

            @Dashrender Also maintenance on exceptionally expensive to access sites (think weather station in Greenland or something)

            coliverC scottalanmillerS 2 Replies Last reply Reply Quote 1
            • coliverC
              coliver @MattSpeller
              last edited by

              @MattSpeller said:

              @Dashrender Also maintenance on exceptionally expensive to access sites (think weather station in Greenland or something)

              That still doesn't make sense because of the failure curve of hard drives. We have no idea if the new drive will die immediately or soon after installation. They would then have to have a second maintenance event to replace the failed drive. Now this may happen either way but it makes more sense to wait until the drive actually fails then to preemptively replace it. Especially if you can get months to years out of the drive you would have replaced.

              MattSpellerM 1 Reply Last reply Reply Quote 1
              • MattSpellerM
                MattSpeller @coliver
                last edited by

                @coliver It makes more sense in that scenario than it does in any other I can think of!

                I can think of much better ways to setup a remote station like that - I'm just trying to see if there's a scenario where his advice is actually... good.

                1 Reply Last reply Reply Quote 0
                • Deleted74295D
                  Deleted74295 Banned
                  last edited by

                  For the hard to access station, they should have spares on a shelf, but in theory, when you buy a drive and store it for 3 years, what happens with the warranty if you put it in and it dies after a month?

                  MattSpellerM DashrenderD scottalanmillerS 3 Replies Last reply Reply Quote 3
                  • MattSpellerM
                    MattSpeller @Deleted74295
                    last edited by

                    @Breffni-Potter spares are a luxury unless you use them on a regular basis

                    Deleted74295D scottalanmillerS 2 Replies Last reply Reply Quote 0
                    • Deleted74295D
                      Deleted74295 Banned @MattSpeller
                      last edited by

                      @MattSpeller said:

                      @Breffni-Potter spares are a luxury unless you use them on a regular basis

                      ala weather station in greenland.

                      Shipping cannot be easy, so what are they to do?

                      MattSpellerM 1 Reply Last reply Reply Quote 0
                      • DashrenderD
                        Dashrender @Deleted74295
                        last edited by

                        @Breffni-Potter said:

                        For the hard to access station, they should have spares on a shelf, but in theory, when you buy a drive and store it for 3 years, what happens with the warranty if you put it in and it dies after a month?

                        It would be out of warranty. But this wouldn't be the situation as @MattSpeller is describing. If they only visit the site say once every 3 months, presumably they would bring drives with them.

                        But really, you wouldn't setup a system that relied on this type of solution in this scenerio, you'd choose something with more robustness built in. Though I can't tell you what that would look like. Perhaps 2 or even three equal sized arrays kept in sync with redundant data paths, etc. If the data is that important, but you can only visit the site once every three months, you can't just use the day to day setup in most cases.

                        MattSpellerM scottalanmillerS 2 Replies Last reply Reply Quote 2
                        • MattSpellerM
                          MattSpeller @Deleted74295
                          last edited by

                          @Breffni-Potter Oh! Yeah I totally agree - that scenario leads to lots of unusual setups

                          1 Reply Last reply Reply Quote 0
                          • MattSpellerM
                            MattSpeller @Dashrender
                            last edited by

                            @Dashrender exactly, there are much better ways to set that kinda thing up - I think we're still looking for a scenario where dude-buddy-guy from SW forums would be right. He may just be 100% wrong.

                            DashrenderD 1 Reply Last reply Reply Quote 0
                            • MattSpellerM
                              MattSpeller
                              last edited by

                              I confess to enjoying "devil's advocate" and thought experiments a lot

                              1 Reply Last reply Reply Quote 0
                              • DashrenderD
                                Dashrender @MattSpeller
                                last edited by

                                @MattSpeller said:

                                @Dashrender exactly, there are much better ways to set that kinda thing up - I think we're still looking for a scenario where dude-buddy-guy from SW forums would be right. He may just be 100% wrong.

                                well, again, my friends suggested reason, lack of personnel resources in times of emergency, could be a reason.

                                scottalanmillerS 1 Reply Last reply Reply Quote 0
                                • Deleted74295D
                                  Deleted74295 Banned
                                  last edited by

                                  I will tell you 3 concrete facts.

                                  1. You must never reboot the servers. Constant up time is vital.

                                  2. Don't install updates, Microsoft will only break the server to force you to upgrade to the latest version.

                                  3. Linux is not safe for production. Too complicated and too buggy.

                                  Why are these facts true?

                                  Because my experience, training and mentors have fostered a closed minded set of views in my mind and because of this I need to ignore all propaganda. I am not here to listen and learn, I am only here to teach others of the correct way of doing things.

                                  Yes my job security might be at risk because I am not open to new ideas or learning new concepts but I'm irreplaceable here.

                                  MattSpellerM 1 Reply Last reply Reply Quote 2
                                  • Deleted74295D
                                    Deleted74295 Banned
                                    last edited by

                                    Oh by the way.

                                    If you use mixed operating systems, (ala 7/Vista/8.1/10) when you get Cryptolocker or other Malware, the damage is limited to one group of operating systems.

                                    1 Reply Last reply Reply Quote 0
                                    • MattSpellerM
                                      MattSpeller @Deleted74295
                                      last edited by

                                      @Breffni-Potter lol 10/10

                                      1 Reply Last reply Reply Quote 1
                                      • nadnerBN
                                        nadnerB
                                        last edited by

                                        Well played @Breffni-Potter 😉

                                        1 Reply Last reply Reply Quote 0
                                        • scottalanmillerS
                                          scottalanmiller @DustinB3403
                                          last edited by

                                          @DustinB3403 said:

                                          I've heard of doing it every 2 - 3 years, but not as a part of routine maintenance.

                                          What is schedule for routine maintenance with where you heard this?

                                          2-3 years would definitely constitute routine maintenance. I think even for people doing this, 2-3 years seems extremely short.

                                          1 Reply Last reply Reply Quote 0
                                          • scottalanmillerS
                                            scottalanmiller @DustinB3403
                                            last edited by

                                            @DustinB3403 said:

                                            To follow up, I've never performed it either. But have heard people say that they replace their drives to avoid the urgent rush of a RAID being depreciated, because of a failed drive.

                                            But there is an urgent rush anyway, they didn't avoid one. And they create more of them. It's literally the same as crashing your car to avoid accidents, preemptively.

                                            DustinB3403D 1 Reply Last reply Reply Quote 1
                                            • 1
                                            • 2
                                            • 3
                                            • 2 / 3
                                            • First post
                                              Last post