Data collection for analysis of IOS-XE router crash

This post is intended to convey the basic data collection needed for TAC to identify the cause of a router crash. The scope of this blog is limited to router platforms running IOS-XE software. For list of products supporting IOS-XE, please refer the following link:

https://www.cisco.com/c/en/us/products/ios-nx-os-software/ios-xe/index.html#~stickynav=2

we would be briefly discussing about- what is a router crash and types of crashes, before we go the data collection needed for a crash. IOS XE Software is a modular operating system built on a Linux kernel on a Route Processor (RP), Embedded Services Processor (ESP), or SPA Interface Processor (SIP). The IOS daemon (IOSD) and other IOS XE processes run on the Linux kernel, so there are several types of crashes as listed below:

1- IOSD crash (on RP module)
2- SPA driver crash (on SIP module)
3- IOS-XE process crash (on RP, ESP, SIP)
4- QFP microcode crash (on ESP module)
5- Linux Kernel crash (on RP, ESP, SIP)
Types of Crashes Crashinfo File Name (Syntax) Example
IOSD Crash crashinfo_RP_SlotNumber_00_Date-Time-Zone crashinfo_RP_00_00_20080807-063430-UTC
SPA Driver Crash crashinfo_SIP_SlotNumber_00_Date-Time-Zone crashinfo_SIP_00_00_20080828-084907-UTC
Types of Crashes Core Dump File Name (Syntax) Example
IOSD Crash hostname_RP_SlotNumber_ppc_linux_iosd-_ProcessID.core.gz Router_RP_0_ppc_linux_iosd-_17407.core.gz
SPA Driver Crash hostname_SIP_SlotNumber_mcpcc-lc-ms_ProcessID.core.gz Router_SIP_1_mcpcc-lc-ms_6098.core.gz
IOS XE Process Crash hostname_FRU_SlotNumber_ProcessName_ProcessID.core.gz Router_RP_0_fman_rp_28778.core.gz Router_ESP_1_cpp_cp_svr_4497.core.gz
Cisco QFP Crash hostname_ESP_SlotNumber_cpp-mcplo-ucode_ID.core.gz Router_ESP_0_cpp-mcplo-ucode_042308082102.core.gz
Linux Kernel Crash hostname_FRU_SlotNumber_kernel.core Router_ESP_0_kernel.core

Note: FRU stands for Field Replaceable Unit and could be RP/ESP/SIP. Depending on which module the crash happens the file name for IOS-XE crash and Kernel crash will vary. Other 3 type of Crashes have fixed syntax where module name does not vary.

 

Data collection:

1- Crashinfo file  – Available in bootflash: or harddisk: and can be exported via TFTP/FTP using the command “copy bootflash: TFTP:” or “copy bootflash: FTP:”. Given that TFTP/FTP IP address is reachable from the router.

2- Core file – Available in bootflash: or harddisk: and can be exported via TFTP/FTP using the command “copy bootflash: TFTP:” or “copy bootflash: FTP:”. Given that TFTP/FTP IP address is reachable from the router.

3- Tracelogs – There could be too many tracelog files in your bootflash: which makes it difficult to extract one at a time. you can use the command syntax: “archive tar /create bootflash:/<filename>.tar bootflash:/tracelogs/” which will generate a “.tar” in the bootflash that has all the tracelogs in it and can be extracted over TFTP/FTP at once.

examplearchive tar /create bootflash:/TACTRACELOGS.tar bootflash:/tracelogs/

4- show techThis would help TAC to correlate available data and configuration with any known crashes previously reported on that platform. For example, certain crashes occur only when certain configuration is present on the router or only when certain stats reflect increasing counter values.

5- Syslogs/console from the time of issue

 

Leave a Reply