Synology one bad sector crashes whole volume RAID0

guyinpv

I plugged in an external USB drive to act as a backup for the NAS. Using the app Hyper Backup it was in the middle of doing its first full backup when suddenly the Synology DS216+ crashes, beeping, blinking orange lights.

I log in and see all these:

0_1539714176610_bad sector.png

The NAS has 2 x 4TB WD Red drives. I'm running them striped since I wanted more space and perhaps speed.

Anyway, when I go in Storage Manager to the HDD/SDD section, it says disk 2 is crashed and has a bad sector count of 6 now.

The status light is blinking orange. The Disk 2 light is solid orange. About 5 minutes a splash of new "Bad sector was found" messages pop up.

Neither drive has had a bad sector up until now. SMART has always been ok.

Of all the space available, over 7TB, I've only got about 1.2TB on it so far. No problems until doing this backup to the USB drive.

What I don't get is why something simple like a bad sector is causing the entire volume to crash and everything to halt? Bad sectors happen, why isn't it just marked and move on? If the bad sector was found where a file is stored, why can't it just report that file as corrupt, mark the bad sector and move on? This doesn't make sense, that one little sector causes the entire volume to crash!

Lastly, it doesn't tell me what to do. It's just like, hmm, backup everything and buy new drives. Over a bad sector!! Try to backup everything, when it's crashed and doesn't let me open any shares? I can't tell if it's doing something in the background, like trying to recover, or if it's just stopped, cause it keeps popping up messages every 5 minutes, so maybe its doing something?

What the heck do I do then? Turn it off and on and hope it recovers? I don't know.

travisdh1

@guyinpv The big question is how many bad sectors? I'd be afraid the errors mean one of the drives ran out of spare sectors to swap out.

1337

@guyinpv said in Synology one bad sector crashes whole volume RAID0:

I'm running them striped since I wanted more space and perhaps speed.

Disks fail, that is what they do. You're asking for it when running RAID0.

But now you just buy new drives and restore your backup.

WD has some test program that can verify that the disk is broken, then just send it in for warranty replacement - if it's still under warranty. WD Red had 3 years I believe but can be extended to 5 years for a small fee.

1337

If you need to save what's on the disk you need to:

insert the 1st drive on a linux computer (don't mount it) and make a dd image copy of the entire disk.
Use options conv=noerror,sync so the drive keeps reading even after errors.
Expect the cloning to take a long time if you have many bad blocks.
do the same with the second disk.
mount the cloned disks/images and run fsck on them or use recovery software
recover or copy what is possible and copy the data to where you want it.

Don't do anything else with the failed disks other than clone them. That's data recovery 101.

Obsolesce

IMO one of the reasons one would have a Synology is for the support. Have you called them?

JaredBusch

Failed RAID0 = no data. Simple as that..

Always possible to get super lucky, but don't count on it.

Dashrender

After making the images as @Pete-S suggests, you can try running Spinrite on one or both drives and see if it comes back enough to finish your backup.

guyinpv

Interestingly, I force turned off the Synology, pulled the drives and did a quick canned air cleanup.

Turned back on and it came to life. Looking at the drive screen, the count of bad sectors is now at 38.
This makes no sense, jumping from 0 bad to 38 out of the blue.

I know enough about drives to know they can recover from bad sectors and avoid those areas of the disk. It's weird to me that it would bring down the entire volume and crash the whole thing over a bad sector.

Was reading this thread and seems some people thing it could be a DSM bug: https://forum.synology.com/enu/viewtopic.php?f=19&t=93339&start=30

I did happen to do a DSM upgrade this morning but all was well until I ran that full backup onto the USB drive. It crashed somewhere in the the middle of that backup.

JaredBusch

Because it attempted to read every sector. This is not a surprise.

scottalanmiller

If a sector really can't be read, even after many tries, that will cause a cascade of issues on a RAID 0 volume. Because those sectors have to work together to recreate the data.

Donahue

Is there some sort of jbod mode or something that is common for wanting a larger drive, giving up the performance of R0? Then, when a drive does fail, it only takes out that drive and not the whole shebang? Is that actually a thing in production use?

scottalanmiller

@donahue said in Synology one bad sector crashes whole volume RAID0:

Is there some sort of jbod mode or something that is common for wanting a larger drive, giving up the performance of R0? Then, when a drive does fail, it only takes out that drive and not the whole shebang? Is that actually a thing in production use?

RAID 0 should, like JBOD setups, really just be for ephemeral data, like caches.

Dashrender

@scottalanmiller said in Synology one bad sector crashes whole volume RAID0:

@donahue said in Synology one bad sector crashes whole volume RAID0:

Is there some sort of jbod mode or something that is common for wanting a larger drive, giving up the performance of R0? Then, when a drive does fail, it only takes out that drive and not the whole shebang? Is that actually a thing in production use?

RAID 0 should, like JBOD setups, really just be for ephemeral data, like caches.

that's not an answer to his question.

scottalanmiller

@dashrender said in Synology one bad sector crashes whole volume RAID0:

@scottalanmiller said in Synology one bad sector crashes whole volume RAID0:

@donahue said in Synology one bad sector crashes whole volume RAID0:

Is there some sort of jbod mode or something that is common for wanting a larger drive, giving up the performance of R0? Then, when a drive does fail, it only takes out that drive and not the whole shebang? Is that actually a thing in production use?

RAID 0 should, like JBOD setups, really just be for ephemeral data, like caches.

that's not an answer to his question.

It's the answer he needs, not the answer he wants.

Individual drives are just "smaller RAID 0s", if you have to worry about the size of the failure domain, it means you can't implement the solution in production.

Harry Lui

@guyinpv said in Synology one bad sector crashes whole volume RAID0:

Interestingly, I force turned off the Synology, pulled the drives and did a quick canned air cleanup.

Turned back on and it came to life. Looking at the drive screen, the count of bad sectors is now at 38.
This makes no sense, jumping from 0 bad to 38 out of the blue.

I know enough about drives to know they can recover from bad sectors and avoid those areas of the disk. It's weird to me that it would bring down the entire volume and crash the whole thing over a bad sector.

It's gonna fail again. No it's not, it's RAID 0, it needs EVERY sector. Think of it as a password, lost one character, you lost the entire password.

Harry Lui

@guyinpv said in Synology one bad sector crashes whole volume RAID0:

The NAS has 2 x 4TB WD Red drives. I'm running them striped since I wanted more space and perhaps speed.

...

SATA drives speed > 1 Gbps, there was no speed advantage. Since you didn't need the space, all you did was add risk by running RAID 0.

guyinpv

@harry-lui said in Synology one bad sector crashes whole volume RAID0:

@guyinpv said in Synology one bad sector crashes whole volume RAID0:

The NAS has 2 x 4TB WD Red drives. I'm running them striped since I wanted more space and perhaps speed.

...

SATA drives speed > 1 Gbps, there was no speed advantage. Since you didn't need the space, all you did was add risk by running RAID 0.

Yes it was risk. The NAS was originally just going to be an external backup for the server. I only used RAID 0 for the combined space which is close to the what my server has which uses RAID 10 and 4 drives.

Frankly I just thought it would be more robust. I mean, I know it "can" fail, just didn't think it would be within a year. I also know my car tires can get blowouts, but I don't expect one every month or two either.

I'll probably replace this WD Red now it's at 39 bad sectors. Redo the RAID with a mirror instead. I'll lose the space but I don't expect to use up 4TB soon anyway.

DustinB3403

Aren't we discussing something like 7TB usable space? Why is this even a question. Four 8TB drives would give you 8TB usable in a RAID1.

There is zero benefit listed with what has been described so far with regards to RAID0.

scottalanmiller

@guyinpv said in Synology one bad sector crashes whole volume RAID0:

Frankly I just thought it would be more robust. I mean, I know it "can" fail, just didn't think it would be within a year. I also know my car tires can get blowouts, but I don't expect one every month or two either.

Statistically, you'd expect it to be around a year. RAID 0 is incredibly unstable because it takes the risk of a single drive and magnifies it dramatically. So if the average failure of a single drive is, maybe, once every six years, RAID 0 with four drives would make that every 1.5 years on average. And that's just an average. So well inside the bell curve are failures at six months and three years.

And RAID 0 has failures that cause all data loss that don't cause full data loss on single drives. The RAIDing process making RAID 0 astronomically more dangerous than just 4x the risk of a lone drive.

DustinB3403

Ah so you actually have two disks with 4TB each and went with the "I need more space" RAID0.

The fix here is bigger disks and RAID1 in that case, it's going to be slow (being 5400 RPM) but at least you have the protection you were looking for.

Granted this is backup only.