Thursday, May 28, 2009

Smoke and Mirrors - Veritas Disaster Recovery: Part 1: An unbootable root disk

I know my Data is there! I just mirrored it! At least it said it did?

Sometimes for whatever reason Veritas volume manger will mirror successfully but the mirror disk wont boot.
Perhaps you forgot to run vxbootsetup or it just didn't work correctly.

What if your primary root mirror now died and you had to get the data off the alternate disk but it wont boot!
Using some spare hardware here's what I went through.

I moved the un-bootable (Solaris 8) SCSI disk (mirror) from my Sunfire 4800 (D240 tray) into a spare Sunfire V240 system in my lab. It has 4 SCSI disk slots so it's a perfect recovery platform in this case.
We installed Veritas Volume manager 4.1 on Solaris 10 and encapsulated our root drive.
Here's the gotcha, to keep the new rootdg (Veritas root data disk) separate from the disk we are recovering we changed the default name rootdg to spamdg instead.

Once our Solaris 10 system disk was encapsulated and booted we could plug our bad mirror into a spare disk slot, plug a new disk into another spare slot and ufsdump the data to a new disk.
Heres the main steps.
  • The V240 installed with Solaris 10 was installed with Veritas VXVM 4.1
  • The root disk (disk0 slot) was encapsulated with rootdg renamed to spamdg (call it what you want, just NOT rootdg or bootdg)
  • The recovery source disk was installed in slot 2
  • A new target disk same size or larger was put in slot 3
  • system was booted and rootdg was imported
  • root / , var and opt filesystems were mounted.
  • new slices are created on the target disk.
  • target disk is newfs 'ed and mounted
  • ufsdump is used to copy the data to the new disk
  • new disk is manually un-encapsulated.
  • New disk is returned to original Sunfire 4800 system and is bootable.
  • New disk is remirrored for redundancy
Recovery platform:
  • I used a Spare Sunfire V240 which runs Solaris 10 (could have been Solaris 8 or 9.)
  • It has 4 disk slots I am using only disk0 as the boot disk. 4 slots are numbered disk 0,1,2,3.
  • I have the install CDROM for Veritas Storage foundation 4.1

Detailed steps

1. Remove the un-bootable Solaris 8 disk from the F4800 system's D240 media tray.

2. Install Veritas Volume manager (VXVM) on the V240 using the Veritas Storage Foundation install cd.

You will not be able to read the data on the disk unless you are on a system running Veritas Volume Manager. Encapsulate the root disk using the menu vxdiskadm. When asked to create the rootdg (root data group) change the default name to prevent confusion with the rootdg from your source recoverdisk.

4. Plug in the Solaris 8 disk (Recovery source disk) to slot3 of the V240

5. If you have hot plugged the disk run this command to get Solaris and vxvm to see the disk.

# vxdiskconfig

VxVM INFO V-5-2-1401 This command may take a few minutes to complete execution Executing Solaris command: devfsadm (part 1 of 2) at 11:14:15 EDT Executing VxVM command: vxdctl enable (part 2 of 2) at 11:14:29 EDT
Command completed at 11:15:08 EDT

6. Import the rootdg of this disk
# vxdg -Cf import rootdg VxVM vxdg WARNING V-5-1-560 Disk rootdisk1: Not found, last known location: c0t0d0s2

The error is because the disk had formerly been one of a mirrored pair.

7. Do vxdisk list and vxprint to see the disks and volumes

# vxdisk list

DEVICE TYPE DISK GROUP STATUS

c1t0d0s2 auto:sliced spamdg01 spamdg online

c1t3d0s2 auto:sliced rootdisk3 rootdg online

- - rootdisk1 rootdg failed was:c0t0d0s2


# vxprint -g rootdg

Disk group: rootdg <<<<<<<~~~~~~~~~~~~here it is ~~~~~~~~ TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0
dg rootdg rootdg - - - - - - dm rootdisk1 - - - - NODEVICE - -
dm rootdisk3 c1t3d0s2 - 35363560 - - - -
sd rootdisk1Priv - DISABLED 4711 - NODEVICE - PRIVATE
v opt gen DISABLED 8528720 - ACTIVE - -
pl opt-01 opt DISABLED 8528720 - NODEVICE - -
sd rootdisk1-03 opt-01 DISABLED 8528720 0 NODEVICE - -
pl opt-02 opt DISABLED 8528720 - ACTIVE - -
sd rootdisk3-01 opt-02 ENABLED 8528720 0 - - -
v rootvol root DISABLED 14338616 - ACTIVE - -
pl rootvol-01 rootvol DISABLED 14338616 - NODEVICE - -
sd rootdisk1-B0 rootvol-01 DISABLED 1 0 NODEVICE - Block0
sd rootdisk1-02 rootvol-01 DISABLED 14338615 1 NODEVICE - -
pl rootvol-02 rootvol DISABLED 14338616 - ACTIVE - -
sd rootdisk3-03 rootvol-02 ENABLED 14338616 0 - - -
v swapvol swap DISABLED 4094728 - ACTIVE - -
pl swapvol-01 swapvol DISABLED 4094728 - NODEVICE - -
sd rootdisk1-01 swapvol-01 DISABLED 4094728 0 NODEVICE - -
pl swapvol-02 swapvol DISABLED 4094728 - ACTIVE - -
sd rootdisk3-04 swapvol-02 ENABLED 4094728 0 - - -
v var gen DISABLED 8194168 - ACTIVE - -
pl var-01 var DISABLED 8194168 - NODEVICE - -
sd rootdisk1-04 var-01 DISABLED 8194168 0 NODEVICE - -
pl var-02 var DISABLED 8194168 - ACTIVE - -
sd rootdisk3-05 var-02 ENABLED 8194168 0 - - -


8. Start the volumes so you can mount them (or reboot then the volumes are enabled.)

# vxvol -g rootdg start rootvol
# vxvol -g rootdg start swapvol
# vxvol -g rootdg start var
# vxvol -g rootdg start opt

9. Mount the volumes so you can now get at your data

# mount /dev/vx/dsk/rootdg/rootvol /mnt

# cd /mnt

# ls -l
You should now see your root disk data
You can do the same for var and opt .Now back it up or begin rebuilding a new boot drive


NEXT:
Veritas Disaster Recovery: Part 2: rebuilding the boot disk .... Stay tuned

Install Solaris 8 bootblk from cdrom while still booted


I have been experimenting for some time with Veritas Volume Manager and root disk encapsulation on our aging Solaris 8 systems.
Since I am using a copy of a disk from a hefty SunFire 4800 on a smaller V240 I am encountering some strange issues. Perhaps I will bolg about all the tricks soon.
Heres one.
I want to reinstall a bootblock on a newly mirrored rootdisk.
However since I am running a disk built on the wrong architecture some device paths are not normal or broken.

One odd result is that veritas vxbootsetup is refusing to install the boot block.
Since I have had to do this several times it is very slow booting from cdrom i wanted to install it without rebooting first.

Issue:
After mirroring the boot drive you want to make the new disk bootable.

# /etc/vx/bin/vxbootsetup -g rootdg -n

/usr/lib/fs/ufs/bootblk: File not found
eeprom(1M) not implemented on SUNW,Sun-Fire-V240
/usr/lib/fs/ufs/bootblk: File not found
eeprom(1M) not implemented on SUNW,Sun-Fire-V240

Solution:
Dont rerun vxbootsetup or it will try to create even more partitions on the new disk.

No need to reboot from CD ROM
Normally you would boot from cdrom to do this like this.

ok} boot cdrom -sw
...

# /usr/sbin/installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t3d0s0

Instead put the cdrom in the drive without rebooting and install it as follows (all on one line)

# /usr/sbin/installboot /cdrom/sol_8_204_sparc/s0/Solaris_8/Tools/Boot/usr/platform/SUNW,Sun-Fire-V240/lib/fs/ufs/bootblk /dev/rdsk/c1t3d0s0

I hope I just save you an extra CD reboot.
But you better test the disk to be sure it boots I have encountered some strange booting issues.

Friday, May 8, 2009

Furnace Troubleshooting and the "Know Where Man"


(This picture is not my panel thankfully)

I was working on finishing may basement last Weekend. I was improving the wall insulation around the electrical panel, pushing insulation in behind the wires on the panel board.
Shortly after my wife called downstairs and told me the furnace fan would not come on.
I never connected it since i had not tripped any breakers.

Well, then began the furnace troubleshooting. I called a young friend with some training on HVAC and we looked it over.
- No lights on the furnace.
- Open the furnace panels.
- Check the power: 110V coming in.(So we thought.)
- Check power going to the furnace panel safety switch: 110V
- switch to 24VAC transformer 110V coming in Zero VAC coming out: AH HA! Bad Transformer we thought
- Wrote down the part number.
- My wife called a local Electrical supplier who could get the part.
- Picked it up next day for $50.
- Put it in .... SAME PROBLEM! What? thats not it? where did we go wrong?

Dumbfounded and given up now on self repair we called in a professional Furnace man.
first thing he checked (of course) was the power coming in ...
"Only 96V coming in " he declared. "REALLY ?" I sceptically replied. Wondering "is he checking this right?"
Immediately he went back to the on off switch and opened it up. "still only 96V."
Over to the Electrical panel he removed the panel cover. here he found a loose (black) wire on the breaker. Rechecked it: still only 96V. Poked around in the rats nest of wires the builder had installed.
Here he found an even looser (white) Neutral wire (Slightly discolored from heat).
Tightened the wire and presto! furnace lit up.

We wrote out a check for $125 to the furnace man while my Wife scolded me for not finding it.

Now our error : was how we checked the power coming in. We tested the AC line in from the black wire to ground! (for ground we used the furnace body).
We should have checked across the Black and the White wire! we would have seen the lower voltage.
Loose wires on a panel are bad thing. I'm glad we found this!
---

As for my wife complaining about paying $125 for a guy to tighten 2 screws it reminds me of this story about Henry Ford's "Know Where man" ...

http://answers.google.com/answers/threadview?id=183998
http://www.snopes.com/business/genius/where.asp

"Nikola Tesla visited Henry Ford at his factory, which was having some
kind of difficulty. Ford asked Tesla if he could help identify the
problem area. Tesla walked up to a wall of boilerplate and made a
small X in chalk on one of the plates. Ford was thrilled, and told him
to send an invoice.
The bill arrived, for $10,000. Ford asked for a breakdown. Tesla sent
another invoice, indicating a $1 charge for marking the wall with an
X, and $9,999 for knowing where to put it."

"Know Where Man" Urban Legend, hosted by snopes.com