Cisco Systems issued a field notice regarding its 1200 series router line cards. The noticed warned of line card resets resulting from SEUs – spontaneous, non-reoccurring or transient, and non-reproducible errors. Cards are showing memory parity errors or application-specific integrated circuit (ASIC) errors which may have resulted in a card reload with a two to three minute recovery.
Cisco 12000 line cards may reset after single event upset (SEU) failures. This field notice highlights some of those failures, why they occur, and what work arounds are available.
Unlike hard errors, soft errors are spontaneous, non-reoccurring or transient, and non-reproducible. The error is called “soft” because:
1) The device functions normally after data is restored. 2) The transient error is present in data stored in memory devices on line cards. 3) The error is caused by system noise or by ionizing radiation.
SEU failures are often caused by the following:
1) Alpha particles emitted by radioactive packaging and wafer processing materials on synchronous random-access memory (SRAM) and dynamic random-access memory (DRAM) products. 2) Thermal neutron from cosmic radiation of energy less then 15ev. 3) Terrestrial high energy cosmic particles, neutrons, protons, pions and muons.
The chance for single event upset (SEU) failures in memory devices increases as densities rise and core voltages drop. IOS performs error recovery which is the ability to detect soft errors and ensure they don’t adversely affect product performance. The methods used by IOS on Cisco 12000 include:
1) ECC (Error Correction Code). 2) Replacement from backup data sources. 3) Hitless switchover to redundant line cards.
Cards are showing memory parity errors or application-specific integrated circuit (ASIC) errors which may have resulted in a card reload with a two to three minute recovery. Data is passing normally after the card reloaded.
The Cisco IOS® Software Release 12.0(25)S and later include several SEU error recovery improvements for the Cisco 12000 series.
IOS releases 12.0(21)S6, 12.0(22)S4, 12.0(23)S2, 12.0(21)S1 and later include SEU failure fixes for Cisco 12000 Engine 3 based line cards. These improvements reduce the chance of card reload due to SEU failures, reduce reload time if it occurs, and provides better text messaging for the failure types.
For customers using Engine 3, 4, or 4+ based line cards, these IOS improvements have significantly reduced error recovery time to under three seconds.
Note: Customers should not replace hardware after a single SEU failure. The linecard should be monitored for further instances. If additional failures occur, contact Cisco Technical Support.
Source: Cisco – www.cisco.com/en/US/ts/fn/200/fn25994.html | August 15, 2003