We are working on a set of upgrades to our environment and replacing a set of very stable but now old IBM x3650 servers ( currently running 5.7 ) with a set of new Dell R 710 servers.
New Dell servers on Oracle Linux 6.2 ... using red hat compatible kernel aka:
The new dell boxes have an internal raid controller ( Perc H700 ? ) and are connected to EMC direct attached storage using emulex HBA's. All operating system and linux software installed on internal disks ( mirrored ) ... all database stuff going to be on EMC storage.
Our new servers had a very strange set of behaviors when booting up from internal disks. Most of the time they would boot up and see the first internal raid drive as /dev/sda ( so /boot partition is on /dev/sda1 ) ... but at other times they would see /boot on a different device ( for example /boot on /dev/sdi1 ).
The entries in /etc/fstab for 6.x systems now apparently use UUID entries ... ( for example ):
UUID=e6964e7e-62a9-450c-a66e-a411b40a4ed9 / ext4 defaults 1 1
So when the servers came up on a different boot drive they would run ok ... looking strange ... but we ran into a different problem using ( still trying to use ... don't get me started ) a backup linux imaging product ( Acronis ) that just did not understand at all backing up or restoring a system when it was not running from /dev/sda.
Logically it seemed pretty straight forward. Force a way somehow so that first internal drive is always on /dev/sda.
We pay Oracle for linux support so open a ticket with them. We now have a solution but it took a very very long time for oracle linux support to come up with solution. Might be a by product of working with a junior level person ... might be from a strange new problem. Tried all sorts of stuff initially with udev rules ... nope none of this worked at all.
Eventually the solution that is now deployed and working involved removing lpfc ( emulex ? HBA support ? ) modules from the initramfs image that is invoked on first boot up. Of course we run stuff on EMC storage and yes eventually after booting our HBA's are working just fine.
Anyway here is what we had to do to get this working in our 6.2 redhat compatible kernel environment. It is some low level pretty esoteric linux stuff and well beyond what I wanted to have to deal with ... but it is working nicely.
Step 1: get the latest available dracut rpm's and stick them into directory for updating:
dracut]# ls -ltr | more
Step2: Update to latest rpm's ... ( not sure why the 100% 50% 100% stuff gone from below )
rpm -Uvh dracut*.rpm | more
Step 3: Verify installation of new dracut rpms
# rpm -qa | grep dracut
Step 4: Now change to the /boot directory and create a new initramfs image file.
Use this command: dracut --omit-drivers lpfc
dracut --omit-drivers lpfc initramfs-$(uname -r)-no-lpfc.img
Step 5: Check img file created ...
-ltr *.img | more
-rw-r--r-- 1 root root 15875365 Jan 11 13:39 initramfs-2.6.32-220.el6.x86_64-no-lpfc.img
Step 6: Verify that no lpfc moduels are in the new initramfs image file
# zcat *no-lpfc.img | cpio -t | grep lpfc | more
Agove output is correct ... if you see something like this ... lpfc is still in the img file:
Final step ... create an entry in /etc/grub.conf to point to the new initramfs img file.
Copy the current /etc/grub.conf to something else.
Change the default= value to point to new lines at the end of the /etc/grub.conf file. My change was to change default=1 to default=2.
Add in new lines at the end of grub.conf ... my entries looked like this ( this is just part of my grub.conf file ).
At this point the change should be complete ... start rebooting and test ... do we always come up on /dev/sda?
For me yes this finally fixed the problem.
My guess is that I will have to revisit all of this when doing next OL linux update. Probably going to sit out 6.3 and eventually move from 6.2 up to 6.4 ... probably will have to rebuild new initramfs image and of course test.
I hope this saves some other poor geek time ... it sure took us and oracle support a long time to get this working correctly!