 |
 |
 |
| |
md RAID reconstruction: best practice? |
View full version |
|
User #12855 154 posts
Forum Regular
|
this may sound like quite a trivial question, but could someone please confirm whether I need to unmount the hard drives before performing RAID reconstruction using the mdadm tool? should i be booting into single user mode 1st? or should i boot up using a live cd, such as knoppix? which is the accepted or safest method? how would i accomplish this?
Here's the deal: I've been experimenting with Linux RAID (md) and wanted to improve the availability of my server. So when everyone's asleep, I can replace the faulty drive with a new one.
I have /dev/sda and /dev/sdb mirrored (RAID1). Both sda and sdb have a "/" [md0] and "swap" [md1] partition. Also, both sda & sdb's MBR have GRUB installed. Now, when sdb gets unplugged (after powering off of course!), the system still boots into Linux :) Reconnecting sdb and disconnecting sda reveals the same results - flawless booting!
All is good so far. Now, lets plug sda & sdb back and boot into Linux. Running "cat /proc/mdstat" and "mdadm --detail /dev/md0" reveals a degraded RAID array (with sdb flagged as faulty/foreign). So using mdadm again, we can perform a "hot insert" of sdb. After a few moments, we can confirm (through /proc/mdstat and mdadm) that rebuilding was completed. OK, onto rebooting the system. First reboot after reconstruction seems flawless. So we shutdown the system again, and unplug sda again.
Bad news this time around. Powering on the system, we immediately notice GRUB's failed attempt to load the linux kernel, citing crc errors. And I thought this RAID1 mirror was perfect - even after reconstruction. So what went wrong? Here's GRUB's error message FYI:
Booting 'Debian GNU/Linux, kernel 2.6.8-2-386'
[..snip..]
Uncompressing Linux...
crc error
-- System halted
Am I correct in suspecting that mounted drives can't be mirrored (especially the boot blocks)? Or am I way off the mark? Let me know. Looking forward to hearing your responses!!
|
posted 2006-Feb-8, 10pm AEST
|
|
User #5201 3662 posts
Whirlpool Forums Addict
|
Try plugging SATA disk two into SATA disk one's controller. Whilst they're mirrored, I think Grub will only boot off a physical device until you've loaded the kernel proper.
|
posted 2006-Feb-8, 10pm AEST
|
|
User #32109 6420 posts
Whirlpool Forums Addict
|
Yup flip the disks....
You'll also find that you may need to recreate the boot loader. :-/
I have rebuilt RAID-1 mirrors online many times and it works fine... though boot loaders I traditionally re-do manually....
They seem to prefer them done manually.. *shrugs* I think the crc is due to an internal checksum it creates at the time of creation for a sanity check (happy to be corrected). This is why the failure after mirroring.
Lilo seems more reliable in this case (reconstruction after mirror sync), though I do prefer grub.
I normally:
1. Remove old disk from md. 2. Insert new disk into md 3. Disks commence syncing.. Wait, wait, wait... done! 4, Re-create a grub entry on both disks. 5. Boot from Disk 1.... 6. Shutdown ... 7. Boot Disk 2
Hope that helps.
|
posted 2006-Feb-8, 10pm AEST
|
|
User #12855 154 posts
Forum Regular
|
ok, ill try swapping the disks around when i get the chance and see what happens. i didn't know grub (or bootloaders) were so fussy.
§tr!deя, i have performed Step 2, 3, 5, 6, and 7. is it necessary to perform step 1? or is it safe to jump right into step 2 (using mdadm /dev/md0 -a /dev/sdb)? also with regards to step 4, could you show me how you recreate your grub entry of both disks? i have been experimenting with many different ways, including "grub-install" - but the most failsafe method seems to be running grub, then doing device, root, setup. is there an easier way?
|
posted 2006-Feb-8, 11pm AEST
|
|
User #12855 154 posts
Forum Regular
|
looks like i have a bigger problem now. after further testing, it seems i can no longer boot from md0.
after md0 was resync'd with sda & sdb plugged in together, i rebooted and started getting errors after loading the kernel such as missing files and segmentation faults. specifically, the errors occured right after it said 2 of 2 drives now ready for md0 (or words to that effect)...
so i unplugged sdb again, and guess what?? it booted fine! so does this mean that raid arrays cannot be reconstructed when one of the drives are already mounted? or is there something more sinister going on here? definately appears as though the "mirror" copy on sdb is somewhat "shattered"! hopefully there's no such thing as cyber-superstition ;-)
ps - i did wait until md0 completed resyncing before i hit reboot...
|
posted 2006-Feb-9, 10pm AEST
|
|
User #32109 6420 posts
Whirlpool Forums Addict
|
henry writes... is it necessary to perform step 1?
Generally I do.... mainly so the md doesn't get in a weird state (It's only one extra command... and ensures things go smoothly.
After reading your latest post... sounds like your grub 'install' could be failing.
What are you using for the device and root ?
|
posted 2006-Feb-9, 10pm AEST
|
|
User #98433 750 posts
Whirlpool Enthusiast
|
If your /boot is RAID-1 software mirrored with md, things are not so simple.
I only have experience with /dev/hd* (IDE/PATA) devices.
To answer your burning question: yes, md is designed to allow you to resync live mounted volumes, they don't even have to be reduced to read-only - write performance (and performance overall) greatly suffers but it's all supposed to work "on the fly".
I've even pulled the IDE cable on a junk machine (no, it is not supposed to be hot-swapped :-) whilst doing a file copy, re-inserted the cable, did a IDE reset with hdparm and successfully re-synced the drive... the copy was paused for a few seconds whilst md waited for the "bad" drive to timeout, but otherwise completed successfully.
Don't forget you have to partition the new drive. I have the partition table dumped somewhere with sfdisk and re-import it to create the partitions with one command.
The tricky bit with the boot-loader is booting off the secondary drive... you see, if the primary disk is still detected by the BIOS, the BIOS will try to boot from the primary disk. If it has failed completely, the BIOS will try to boot from the secondary disk, but it will be treated as the primary... or something... it gets a bit hazy here: but when you install the boot-loader, you have to force the installer utility to install the boot-loader as if the secondary disk was actually the primary one.
Below is the GRUB session to install a boot-loader on hdc such that with the primary disk missing, the system can boot from this secondary disk: taken from TLDP: www.tldp.org/HOWTO/Softw...WTO-7.html#ss7.3
grub grub>device (hd0) /dev/hdc grub>root (hd0,0) grub>setup (hd0)
I'm not sure if that will help on your (SATA?) /dev/sd* disks...
|
posted 2006-Feb-9, 11pm AEST
|
|
User #12855 154 posts
Forum Regular
|
§tr!deя writes... Generally I do.... mainly so the md doesn't get in a weird state (It's only one extra command... and ensures things go smoothly.
After reading your latest post... sounds like your grub 'install' could be failing.
What are you using for the device and root ?
i've confirmed that once sdb has been disconnected, then reconnected, it already appears as "removed" when i do a "mdadm --detail /dev/md0". so i guess it's safe to skip the step 1 since it already seems to be done (correct me if im wrong!).
for grub, i let debian installer install into the mbr of /dev/sda. after installation has completed (including after base-config), i run grub from bash prompt, then do:
device (hd2) /dev/sdb root (hd2,0) setup (hd2)
funny thing is, grub thinks this is the order of my hard disks when already booted into linux:
hd0 = /dev/hda (ide hdd) hd1 = /dev/sda (sata hdd) hd2 = /dev/sdb (sata hdd)
but when i enter the command line mode within the grub bootloader, i thinks otherwise:
hd0 = /dev/sda (sata hdd) hd1 = /dev/hda (ide hdd) hd2 = /dev/sdb (sata hdd)
i've confirmed these drive mapping using "geometry" and "cat" commands in grub. it would be much appreciated if someone could shed some light on this discrepency im seeing here...
csirac2 writes... If your /boot is RAID-1 software mirrored with md, things are not so simple.
I only have experience with /dev/hd* (IDE/PATA) devices.
To answer your burning question: yes, md is designed to allow you to resync live mounted volumes, they don't even have to be reduced to read-only - write performance (and performance overall) greatly suffers but it's all supposed to work "on the fly".
I've even pulled the IDE cable on a junk machine (no, it is not supposed to be hot-swapped :-) whilst doing a file copy, re-inserted the cable, did a IDE reset with hdparm and successfully re-synced the drive... the copy was paused for a few seconds whilst md waited for the "bad" drive to timeout, but otherwise completed successfully.
Don't forget you have to partition the new drive. I have the partition table dumped somewhere with sfdisk and re-import it to create the partitions with one command.
The tricky bit with the boot-loader is booting off the secondary drive... you see, if the primary disk is still detected by the BIOS, the BIOS will try to boot from the primary disk. If it has failed completely, the BIOS will try to boot from the secondary disk, but it will be treated as the primary... or something... it gets a bit hazy here: but when you install the boot-loader, you have to force the installer utility to install the boot-loader as if the secondary disk was actually the primary one.
Below is the GRUB session to install a boot-loader on hdc such that with the primary disk missing, the system can boot from this secondary disk: taken from TLDP: www.tldp.org/HOWTO/Softw...WTO-7.html#ss7.3
grub grub>device (hd0) /dev/hdc grub>root (hd0,0) grub>setup (hd0)
I'm not sure if that will help on your (SATA?) /dev/sd* disks...
thank you for getting to the point so quickly in your post!!! i really appreciate your direct answer to the "burning question"! :) your experience has obviously proven that md is very robust - even in worse-case scenarios (i personally consider physically yanking out ide cables while the computer is on a worse-case scenario!).
just wondering, when you reinserted the ide cable, did you need to repartition the drive again? did md gracefully recover from the reinsertion without repartitioning? so am i correct in thinking the definition of a "new" hdd (in this context) refers to a blank hdd freshly baked from the factory? i'm still a little confused here because i started getting missing files & segmentation faults during bootup after both sda & sdb were resync'd. but these errors strangely disappared once sdb was removed. as explained in above posts, sdb was the drive i originally unplugged, then reinserted using mdadm.
the info about grub has been very useful. my bios did try and boot from the 2nd sata disk, once the 1st sata was unplugged. the bit about forcing the secondary into a primary seems tricky though. with sda (primary/1st sata) unplugged, if i issue the commands:
grub grub>device (hd0) /dev/sdb grub>root (hd0,0) grub>setup (hd0)
as per your example, then shutdown, replugged sda again, what would happen? would i need to redo grub again on both sda and sdb?
rest assured, your help has been directly applicable to the sata world as well! =)
|
posted 2006-Feb-11, 11am AEST
edited 2006-Feb-11, 11am AEST
|
|
User #12533 1685 posts
Whirlpool Enthusiast
|
Regarding the bootloader issue... I haven't had much luck with grub so i tend to use lilo, for these sorts of things. If you were using lilo you would need to add the following to your lilo.conf. basically tells lilo to install bootloader on more than one disk.
raid-extra-boot = /dev/sda,/dev/sdb
|
posted 2006-Feb-11, 12pm AEST
|
|
User #98433 750 posts
Whirlpool Enthusiast
|
henry writes... just wondering, when you reinserted the ide cable, did you need to repartition the drive again?
No, I'm talking about adding a blank or foreign drive. It is actually quite important... drives that have been used in a RAID volume will be "fingerprinted". If you try to add a disk from a foreign volume into a different one, it takes a few extra steps because md tries to be smart and is reluctant to overwrite a member of a foreign volume.
As for why you were getting segfaults... I'd like to know at what stage. Did the system boot as far as getting into single-user? If so, then I do not know why this happened to you. It could be for a reason unrelated to software RAID - perhaps there's a setting you need to tweak for your SATA controller, a different kernel version, BIOS settings, etc but I had a hunch it might be something to do with the boot loader (if the errors were very early on in the boot process).
When you "unplugged", it was with the machine turned off. And you still managed to boot from the second disk in the mirror with the first missing.
Did the kernel finish loading? Did it segfault on services that were trying to be started? It could even be something as simple as an unclean shutdown forcing the system to mount your file systems as read-only on the next bootup. How many times did you try booting off sdb?
As for drive numbering: I'm not sure, but I imagine that under Linux, grub gets the numbering from the Linux kernel somehow. From the command-line, I suppose it gets the numbering from the BIOS perhaps... that's just a guess.
If you're concerned about making grub behaving consistently - (I was a long-time LILO user and I get annoyed by GRUB's "intelligence" sometimes, and so Vinco has a point) - there's a useful config keyword "fallback", you can read about it here: www.gnu.org/software/gru...fallback-systems
I'm just waiting for the day GRUB can read my E-Mail..
The real reason that made me change from GRUB instead of LILO was when I started building systems which had to boot from LVM running on top of software RAID, which LILO at the time had issues with (or perhaps in combination with a jfs /boot volume, can't remember...)
Although LILO is a lot more "static", it's nice not having to second-guess GRUB's behaviour.
|
posted 2006-Feb-11, 1pm AEST
edited 2006-Feb-11, 1pm AEST
|
|
User #12855 154 posts
Forum Regular
|
Vinco writes... Regarding the bootloader issue... I haven't had much luck with grub so i tend to use lilo, for these sorts of things. If you were using lilo you would need to add the following to your lilo.conf. basically tells lilo to install bootloader on more than one disk.
raid-extra-boot = /dev/sda,/dev/sdb
i've almost had it with grub ;-) i will try a few more times, but if i still can't get it to work properly, i will undoubtedly follow your advice and use lilo!
csirac2 writes... No, I'm talking about adding a blank or foreign drive. It is actually quite important... drives that have been used in a RAID volume will be "fingerprinted". If you try to add a disk from a foreign volume into a different one, it takes a few extra steps because md tries to be smart and is reluctant to overwrite a member of a foreign volume.
thanks for clarifying that csirac2! when you mention fingerprinted, i assume you're referring to the raid superblocks created by raid. on a side note, how do you get rid of these superblocks? while reinstalling debian over and over again, partman strangely experienced bouts of déjà vu, remembering the raid filesystem of md0 (e.g. ext3) and md1 (e.g. swap). i thought a (dos) fdisk /mbr would have wiped them clean. ive noticed mdadm has the ability to wipe these superblocks, but i haven't been able to clear them properly as they still reappear in partman.
As for why you were getting segfaults... I'd like to know at what stage. Did the system boot as far as getting into single-user? If so, then I do not know why this happened to you. It could be for a reason unrelated to software RAID - perhaps there's a setting you need to tweak for your SATA controller, a different kernel version, BIOS settings, etc but I had a hunch it might be something to do with the boot loader (if the errors were very early on in the boot process).
Did the kernel finish loading? Did it segfault on services that were trying to be started? It could even be something as simple as an unclean shutdown forcing the system to mount your file systems as read-only on the next bootup. How many times did you try booting off sdb?
basically, the segfaults appear after md0 is ready with 2 of 2 devices online. i assume this is when the kernel has fully loaded. although i can't do a copy + paste of this (this would really help here i guess), it basically says can't find /etc/init.d/xxxx and/or it repeats some attempt to run a command 10 times in a row and then i get an error saying that it respawned too quickly; waiting 5 minutes before retrying. at other times, i get the maintenance (enter root password)/Ctrl+D option appearing, but if i press Ctrl+D, it just repeats the same option. typing in the (correct) root password does nothing, simply does the same thing as Ctrl+D. only way to get out of this is to do a hard reset (Ctrl+Alt+Del soft reset is not good enough) or power off using the power button. i think (but don't quote me on this) this also happened in single-user mode - there was an option in the grub menu (setup by Debian) to enter recovery mode, which I believe is single-user mode.
the system was cleanly shutdown after resyncing both sda & sdb. fsck on startup displayed that / (md0) was clean. ive only tried booting with sdb alone a few times.
u might be right saying its the SATA controller/BIOS. i have an onboard Silicon Image 3112 and according to some googling, there may be data corruption issues (which also affects Windows). i will immediately investigate this. if i find this to be the issue, then md is NOT to blame (as i should be). and this issue has obviously been overcomplicated by data corruption from the SATA controller/BIOS. nevertheless, grub bootloader issues will remain methinks. i shall find out soon enough...
When you "unplugged", it was with the machine turned off. And you still managed to boot from the second disk in the mirror with the first missing.
yes, the machine was turned off to the point of the motherboard's +5VSB being turned off (power cable physically disconnected from the wall's power point). and yes, i still managed to boot from the second disk in the mirror with the first missing (but that was before things corrupted... will have to reconfirm this).
As for drive numbering: I'm not sure, but I imagine that under Linux, grub gets the numbering from the Linux kernel somehow. From the command-line, I suppose it gets the numbering from the BIOS perhaps... that's just a guess.
If you're concerned about making grub behaving consistently - (I was a long-time LILO user and I get annoyed by GRUB's "intelligence" sometimes, and so Vinco has a point) - there's a useful config keyword "fallback", you can read about it here: www.gnu.org/software/gru...fallback-systems
I'm just waiting for the day GRUB can read my E-Mail..
The real reason that made me change from GRUB instead of LILO was when I started building systems which had to boot from LVM running on top of software RAID, which LILO at the time had issues with (or perhaps in combination with a jfs /boot volume, can't remember...)
Although LILO is a lot more "static", it's nice not having to second-guess GRUB's behaviour.
i think grub may be reading from the BIOS's boot order when in the bootloader. but in linux, it may be reading from elsewhere. if this is the case, i would consider this a bug (unless, of course, the BIOS is dynamically changing the boot order depending on whether it was in protected mode or not (im assuming that the bootloader is running in 16bit unprotected mode - correct me if im wrong)).
ive also wanted to go the LVM+RAID route, but after Debian had been installed, it seems to throw up all these errors at boot (regardless of 2.4 or 2.6 series kernel) when loading the LVM stuff. ive confirmed this running vmware (you can't blame faulty/buggy hardware here!), so it will be a while (debian 3.2 anyone?) before i feel confident enough to use LVM with RAID.
thanks again for taking your time to help! im sure we will get to the bottom of this!! :)
|
posted 2006-Feb-11, 4pm AEST
|
|
User #98433 750 posts
Whirlpool Enthusiast
|
henry writes... thanks for clarifying that csirac2! when you mention fingerprinted, i assume you're referring to the raid superblocks created by raid. on a side note, how do you get rid of these superblocks?
It's a bit crude, but dd if=/dev/null of=/dev/hda has always worked for me. You obviously don't have to do the whole disk, I just ctrl-c after I'm satisfied the superblocks are zeroed.
basically, the segfaults appear after md0 is ready with 2 of 2 devices online. i assume this is when the kernel has fully loaded. although i can't do a copy + paste of this (this would really help here i guess),
If you're feeling adventurous and have a second computer with a NULL modem cable, you could boot GRUB/LILO with the console configured on the serial port. Then you could capture the boot messages with the second computer. Details here: www.tldp.org/HOWTO/Remot...ADER-GRUB-SERIAL
it basically says can't find /etc/init.d/xxxx and/or it repeats some attempt to run a command 10 times in a row and then i get an error saying that it respawned too quickly; waiting 5 minutes before retrying.
Reminds me of a problem I had at one stage with the getty login prompts not successfully coming up... I recall it was a filesystem problem, /var/run/utmp had something wrong with it. IIRC I had to boot from a KNOPPIX CD, do fsck, mount read-write, and clean out /var/run manually (stale PID/lock files or something). I'm not saying this is necessarily the cause of your problems, even if the same fix worked for you it's really only treating a symptom and not the cause.
ive also wanted to go the LVM+RAID route, but after Debian had been installed, it seems to throw up all these errors at boot (regardless of 2.4 or 2.6 series kernel) when loading the LVM stuff.
It can be done (I've done it), but it requires messing with initrd images and stuff.
The simplest way to do this comfortably via the stock Debian installer (no post-install "cooking" of scripts/kernel/etc) is to create a separate partition of around 100MB which will be software RAID-1 only (no LVM) and use it for /boot.
Then, create your root / (and other, if applicable) mountpoints on the remaining partition space using LVM on top of software RAID.
I'm pretty sure the current Debian installer is quite good at preventing you from configuring the system into an unbootable state (it even knows what combinations of RAID/LVM/filesystem types GRUB and LILO will/wont' work with!), as long as you stick to using the Debian installer interface and don't try to do anything sneaky behind its back...
It's been a while since I did all this, I could be forgetting something, but that should put you in a useful direction anyway... I'm very impressed with the installer for Sarge/3.1, it's lightyears ahead of what we put up with in Woody/3.0!
Good luck.
|
posted 2006-Feb-11, 6pm AEST
|
|
User #12855 154 posts
Forum Regular
|
csirac2 writes... It's a bit crude, but dd if=/dev/null of=/dev/hda has always worked for me. You obviously don't have to do the whole disk, I just ctrl-c after I'm satisfied the superblocks are zeroed.
looks like ill be zero filling the disk before i put this system into any serious use!
If you're feeling adventurous and have a second computer with a NULL modem cable, you could boot GRUB/LILO with the console configured on the serial port. Then you could capture the boot messages with the second computer. Details here: www.tldp.org/HOWTO/Remot...ADER-GRUB-SERIAL
Reminds me of a problem I had at one stage with the getty login prompts not successfully coming up... I recall it was a filesystem problem, /var/run/utmp had something wrong with it. IIRC I had to boot from a KNOPPIX CD, do fsck, mount read-write, and clean out /var/run manually (stale PID/lock files or something). I'm not saying this is necessarily the cause of your problems, even if the same fix worked for you it's really only treating a symptom and not the cause.
fortunately, i wont need to go out and buy that serial cable. i have confirmed there is indeed data corruption occuring in my sata raid array! although this is bad news, its also good news in the sense that i finally know why things were breaking and why segfaults/missing file errors were appear after resyncing.
to test this theory, i created a 10GB file on md0 filled with zeros-only. then i ran a test to spit out any non-zeros in that file. and spit out non-zeros it did! for the benefit of others, this data corruption bug affects systems with an onboard Silicon Image 3112 SATA controller AND nForce 2 chipset. you must update the motherboard BIOS to the latest revision and then enter the CMOS setup to set the "EXT-P2P's Discard Time" to 1ms. otherwise data corruption will occur if you use SATA drives.
it would be really nice if the linux module/kernel developers implement a workaround for this "bug" - basically detect if this combination of hardware is used, and then force EXT-P2P's Discard Time=1ms; just like how the Pentium FPU bug workaround is enabled at bootup if u have such a system.
It can be done (I've done it), but it requires messing with initrd images and stuff.
The simplest way to do this comfortably via the stock Debian installer (no post-install "cooking" of scripts/kernel/etc) is to create a separate partition of around 100MB which will be software RAID-1 only (no LVM) and use it for /boot.
Then, create your root / (and other, if applicable) mountpoints on the remaining partition space using LVM on top of software RAID.
I'm pretty sure the current Debian installer is quite good at preventing you from configuring the system into an unbootable state (it even knows what combinations of RAID/LVM/filesystem types GRUB and LILO will/wont' work with!), as long as you stick to using the Debian installer interface and don't try to do anything sneaky behind its back...
that's exactly what i did. i left /boot on its own as md0, then md1 was dedicated to lvm. the system boots and seems to work fine, except there'd be all these lvm errors appearing during bootup. but if u run evms (ncurses version), it will warn u about some disparity. im sure all these errors are easily recreated - just need the time to reinstall it. anyone care to volunteer? don't forget that vmware gsx server is only US$2800 $0 now! ;-)
It's been a while since I did all this, I could be forgetting something, but that should put you in a useful direction anyway... I'm very impressed with the installer for Sarge/3.1, it's lightyears ahead of what we put up with in Woody/3.0!
Sarge installer is far cleaner than Woody's, but i do miss the amount of power (or was that verbosity?) in Woody's installer. some things in sarge installer annoy me (like the inability to specify http proxy for security updates in base-config - when using non-expert mode; i dont consider http proxy settings an expert feature). im confident that etch (3.2's codename iirc) will be far more refined and intuitive!
Good luck.
Thank you csirac2! I'll need it!! :-)
|
posted 2006-Feb-12, 9pm AEST
edited 2006-Feb-12, 9pm AEST
|
|