Friday, June 23, 2017


MegaRaid  Problems - Can't recreate a Logical disk containing a single physical disk after failure.


While working on a disk replacement on an older Oracle (SUN) X4170 M2 we ran into trouble.
In this config there was a raid controller configured with 4 physical hard disks used as 4 logical disks.
No redundancy was configured here. Redundancy was configure using Solaris 10 ZFS layer in zpools.

Here is where it got ugly. When a disk died the MegaRaid logical disk disappeared (No redundancy so that;s expected.)

After disk replacement you need to recreate the logical disk (again expected). MegaRaid however refused to recreate the logical disk. Try as we might it only spit out a generic exit code and Failure message:

# ./MegaCli -CfgLdAdd -r0  [252:1] -a0
                                   
Adapter 0: Configure Adapter Failed
Exit Code: 0x54

Solution is below however a little background on this config.

Normal MegaRaid config:

Logical disks:


 # ./MegaCli -LDInfo -Lall -aALL |egrep 'Virtual|size|State'
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
State               : Optimal
Virtual Drive: 1 (Target Id: 1)
State               : Optimal
Virtual Drive: 2 (Target Id: 2)
State               : Optimal
Virtual Drive: 3 (Target Id: 3)
State               : Optimal

Physical disks:


# ./MegaCli -PDList -aALL |egrep 'Slot|state|Inq'                
Slot Number: 0
Firmware state: Online, Spun Up
Inquiry Data: SEAGATE ST930005SSUN300G06061201Q1ABC        
Slot Number: 1
Firmware state: Online, Spun Up
Inquiry Data: SEAGATE ST930003SSUN300G0E71101471DEF        
Slot Number: 2
Firmware state: Online, Spun Up
Inquiry Data: SEAGATE ST930005SSUN300G06061201Q1LGHI        
Slot Number: 3
Firmware state: Online, Spun Up
Inquiry Data: SEAGATE ST930005SSUN300G06061201Q1LJKL  


So in this case we saw

Logical disks

 # ./MegaCli -LDInfo -Lall -aALL |egrep 'Virtual|State'
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
State               : Optimal
Virtual Drive: 1 (Target Id: 1)
State               : Optimal

Virtual Drive: 3 (Target Id: 3)
State               : Optimal

# ./MegaCli -LDInfo -L2 -aALL
                                     
Adapter 0 -- Virtual Drive Information:
Adapter 0: Virtual Drive 2 Does not Exist.

Replaced disk was now available in Physical disk list

# ./MegaCli -PDList -aALL |egrep 'Slot|state|Inq|Enc'
Enclosure Device ID: 252
Slot Number: 0
Enclosure position: N/A
Firmware state: Online, Spun Up
Inquiry Data: SEAGATE ST930005SSUN300G06061201Q1ABC          
Enclosure Device ID: 252
Slot Number: 1
Enclosure position: N/A
Firmware state: Unconfigured(good), Spun Up
Inquiry Data: SEAGATE ST930003SSUN300G0E71101471ZDEF          
Enclosure Device ID: 252
Slot Number: 2
Enclosure position: N/A
Firmware state: Online, Spun Up
Inquiry Data: SEAGATE ST930005SSUN300G06061201Q1LGHI         
Enclosure Device ID: 252
Slot Number: 3
Enclosure position: N/A
Firmware state: Online, Spun Up
Inquiry Data: SEAGATE ST930005SSUN300G06061201Q1LJKL   


So we should be able to add the new disk back into a logical disk array but it fails:

# ./MegaCli -CfgLdAdd -r0  [252:1] -a0
                                   
Adapter 0: Configure Adapter Failed
Exit Code: 0x54

So after trying a few things with the field engineer replacing the disk. reinserting different slots etc. came my new favorite quote:

                " Friends don't let friends MegaRaid"


But here is the solution:


Any Logical disk with caching enabled retains data that was being written at fail time. So Logical disk 2 cache data was retained. 
Logical disk (LD2) and in fact any new LD could not be recreated when cache was being preserved.


Confirm :


# ./MegaCli -GetPreservedCacheList -a0
                                   
Adapter #0
Virtual Drive(Target ID 02): Missing.
Exit Code: 0x00

Dump cache


# ./MegaCli -DiscardPreservedCache -L2 -a0                                

Adapter #0
Virtual Drive(Target ID 02): Preserved Cache Data Cleared.

Exit Code: 0x00

Try again to recreate LD



# ./MegaCli -CfgLdAdd -r0[252:1] -a0
                                     
Adapter 0: Created VD 2
Adapter 0: Configured the Adapter!!
Exit Code: 0x00

SUCCESS.
Now you can move on to any software configured mirroring etc.

In this case we were able to do a zpool replace command.


A little more background on LSI logic RAID controllers  Using Megaraid, Megacli, Storcli

For comparison sake we have seen similar config issues on Both IBM and Cisco UCS servers. In the case of non redundant (raid0 )logical disks this preserved cache issue can also arise. In the case of Cisco UCS we have seen a more meaningful error appear leading to the solution much quicker:

No doubt the vendor or OS package has a more updated of Megaraid i.e. Megacli64 or storcli.

# ./MegaCli64 -CfgLdAdd -r0[8:13] -a0
                                     

Adapter 0: Configure Adapter Failed

FW error description: 
 The current operation is not allowed because the controller has data in cache for offline 
 or missing virtual disks.  

Exit Code: 0x54

The solution is using the same option -DiscardPreservedCache 

No comments:

Post a Comment