UCS: Warning: there are pending SEEPROM errors on one or more devices, failover may not complete

03 Dec

Andrius | December 3rd,2012 | Cisco, UCS | 3 comments

In UCS CLI after issuing command ‘show cluster state‘ a warning is received on one of the chassis.

UCS-B # show cluster state
Cluster Id: 0xf122a7f83dba11e0-0x9a4c123573c4f1c4

B: UP, PRIMARY
A: UP, SUBORDINATE

HA READY
Detailed state of the device selected for HA storage:
Chassis 1, serial: FOX1234567A, state: active
Chassis 2, serial: FOX1234567B, state: active
Chassis 5, serial: FOX1234567C, state: active with errors

Fabric B, chassis-seeprom local IO failure:
FOX1234567C READ_FAILED, error: TIMEOUT, error code: 10, error count: 7
Warning: there are pending SEEPROM errors on one or more devices, failover may not complete

In sam_techsupportinfo log you’ll see the following message
Creation Time: 2012-10-12T01:12:21.217
ID: 2712562
Description: device FOX1234567C, error accessing shared-storage
Affected Object: sys/mgmt-entity-B
Trigger: Oper
User: internal
Cause: Device Shared Storage Io Error
Code: E4196537

This is known Cisco Bug CSCtu17144 and here is what needs to be done

If the fault condition stays on or keeps being cleared and re-raised, try the following workarounds:
1. Reboot the IO module.
2. Remove and re-seat the IO module. Make sure the module is in contact with the backplane firmly.

I’ve had this problem couple times and resetting IO module was enough in both cases

Tags: Cisco, CSCtu17144, Device Shared Storage Io Error, error accessing shared-storage, state: active with errors, UCS

This entry was posted on Monday, December 3rd, 2012 at 22:32 and is filed under Cisco, UCS. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

3 Responses

mark hreha says:

19/02/2013 at 23:39

I am experiencing these same issues on one of my clusters that runs firmware 2.02q. This may be the cause of the issues with multipath that crippled the oracle rac cluster. In an attempt to rule out this particular blade/chassis I have provisioned another server on a different chassis.

Host details
Oracle rac
OS: rhel6x64
kernel: 2.6.32-279.14.1.el6.x86_64
Blade: B200 M3

UCS4sjc-A# show cluster state
Cluster Id: 0x84dbbf20c37911e0-0x89d2547fee05f1c4

A: UP, PRIMARY
B: UP, SUBORDINATE

HA READY
Detailed state of the device selected for HA storage:
Chassis 1, serial: FOX1502GV2E, state: active with errors
Chassis 2, serial: FOX1527GQY7, state: active
Chassis 3, serial: FOX1618GE3L, state: active

Fabric A, chassis-seeprom local IO failure:
FOX1502GV2E READ_FAILED, error: TIMEOUT, error code: 10, error count: 8

Warning: there are pending SEEPROM errors on one or more devices, failover may not complete
UCS4sjc-A#

I am planning on upgrading infrastructure firmware to 2.11 tonight.

Mark Hreha

Reply
- Andrius says:
  
  20/02/2013 at 14:09
  
  Reseat IOMs one by one on chassis 1.This should clear the errors that you are seeing when running ‘show cluster state’ command.
  
  For Oracle RAC there is a know issue with dropped storage. Upgrade to UCS 2.1(1) and fnic driver 1.5.0.20
  
  Reply
Tim H. says:

29/07/2014 at 18:46

This is most frustrating… A bug which puts the environment into a degraded state in everyone’s monitoring application and has been ongoing without a fix for three years. Tens of thousands shipped and the only thing customer’s are told is to live with it or reseat the hardware periodically. Cisco makes Microsoft look reliable. Going back to c7000 and Flex-10 is looking more likely every day.

Reply

UCS: Warning: there are pending SEEPROM errors on one or more devices, failover may not complete

3 Responses

Leave a Reply to Andrius

IT Blog

Just another blog on Kozeniauskas.com Network

Visitors

Archives