Issue Description
The customer noticed that while he is passing 9 Gbps through one S6700 switch (being part of RRPP ring) and coming from a single port the CPU rises up until 60 %. When the customer stops the traffic the CPU usage drops down again to its normal values.
Big problem!!! Because is supposed that data to be forwarded using ASIC without utilizing the CPU unit. So why is the CPU at 60 % ? It is normal for this load of traffic coming just from one port ? Of course not, but what is happening?
Big problem!!! Because is supposed that data to be forwarded using ASIC without utilizing the CPU unit. So why is the CPU at 60 % ? It is normal for this load of traffic coming just from one port ? Of course not, but what is happening?
Handling Process
I analyzed the processes handled by the CPU from display cpu-usage: (here are shown the most representative tasks)
bcmL2MOD.0 17% 0/8285f38d tS16 //Task that handles mac-flapping. bcmCNTR.0 8% 0/40c54910 tS17 PPI 7% 0/36d7ee7e PPI Product Process Interface // PPI: This is a task at the adaptation layer. Maintain chip interface status bmLINK.0 2% 0/160654ff tS1a //bmLI: Scan port status and notify the application modules of status changes So the switch is facing mac-flapping from the RRPP ring? Great discovery but there is no other traffic in the ring than the one used for testing. Are related? The customer had a strange way of testing high load traffic by plugging the traffic generator / measuring device on one port (e.g. xg0/0/13) and using loopback internal on the way-out port (e.g. xg0/0/14) so all the frames sent to the way-out port are coming back from that port. How this could represent the resolution for high-cpu issue? Well, after all is quite simple because using a traffic generator the source MACs of the frames are the same and coming from both ports xg0/0/13 and xg0/0/14 – when is coming back - it is processed by the switch as mac-flapping that is why it causes high-cpu. Witness is the output of display mac-address flapping record: ["TES1-EQI1.192"]display mac-address flapping record S : start time E : end time (Q) : quit vlan (D) : error down ------------------------------------------------------------------------------- Move-Time VLAN MAC-Address Original-Port Move-Ports MoveNum ------------------------------------------------------------------------------- S:2014-10-03 00:09:20 1818 cafe-beef-cafe XGE0/0/13 XGE0/0/14 35186 E:2014-10-03 00:09:53
Root Cause
The way how the customer wanted to pass high load traffic through the switch enabling on the way-out port loopback internal command.
Suggestions
Do not use loopbacks while testing traffic performances and having the traffic source coming from a measuring device/traffic generator.
Leave a comment