HP 3PAR CRC errors in correlation with Brocade SAN

HP 3PAR CRC errors and Invalid transmission word in correlation with Brocade SAN switches

The advantage of HP 3PAR is hidden in monitoring mechanism. One of them is especially useful in case we do not have any special monitoring within our SAN. Thanks to that, we have opportunity to detect possible issue, before it will affect substantially our environment.

3par showhost -lesb

Intermittent CRC Errors Detected

Let’s take a look how HP 3PAR can present such an information to us. First example comes from HP 3PAR Service Processor Onsite Customer Care (SPOCC).

Event type: evt_host_port_crc_errors			     
ID: 30003
Component: Port 3:1:3
Short Dsc: Host Port 3:1:3 experienced over 50 CRC errors (50) in 24 hours Event String: Host Port 3:1:3 experienced over 50 CRC errors (50) in 24 hours
...
Event String: Port 3:1:3 Degraded (Intermittent CRC Errors Detected {0x2})

Same thing can be checked on HP 3PAR system itself. Just look at the event log.

3PAR-cluster cli% shoeventlog -oneline -startt 1/1/16
2016-01-01 12:33:18 GMT        2 Minor           FC LESB Error   sw_port:2:3:2 FC LESB Error Port ID [2:3:2]-Counters: (Invalid transmission word) (Invalid CRC) -ALPAs:  140700
2016-01-01 12:44:19 GMT        2 Minor           FC LESB Error   sw_port:3:1:3 FC LESB Error Port ID [3:1:3]-Counters: (Invalid transmission word)-ALPAs:  140700

(more…)

Read More

[MetroCluster] How to troubleshoot interconnect link down and FC-VI

Recently I faced interconnect failure at one of customer environment. Everything run smoothly until Monday morning I received notification event about interconnect link down.

Cluster Interconnect link is DOWN

In environment that I have pleasure to work with, we have fabric-attached MetroCluster configuration. Between filers there are double HA interconnect cable (attached to the FC switch) and the Heartbeat communication is served via MetroCluster FC-VI card. In case this single two ported card goes down, then we can talk about a little disaster, because without heartbeat messages between nodes we have guaranteed takeover.
(more…)

Read More