Multiple Servers Connected to an S7700 Cannot Communicate

Issue Description

 

As shown in Figure, an S7706 that runs V200R001C00SPC300 and V200R001SPH007 functions as the aggregation switch in the server area and serves as the gateway of all servers. Multiple VLANs are assigned on the S7700, multiple servers are deployed in each VLAN, and servers in different VLANs need to communicate.
Figure  Networking
fd456406745d816a45cae554c788e754 4
The administrator finds that servers fail to communicate sometimes. For example, the server at 10.1.2.6 in VLAN 500 can communicate with the server at 10.1.4.11 sometimes, services are interrupted sometimes, and ping packets are discarded in some situations.
The configuration is as follows (interfaces are access interfaces and the configuration is not provided here)
vlan 100
description ==hongruan==
vlan 101
description ==hongruan-sub==
vlan 200
description ==tianyu==
vlan 300
description ==xiweier==
vlan 400
description ==UT==
vlan 500
description ==xike==
vlan 600
description ==dongfangwangxin==
vlan 700
description ==guanyong==
vlan 900
description ==shiboyun==
vlan 1000
description ==wangguan==
#
interface Vlanif100
description ==hongruan==
ip address 10.1.2.3 255.255.255.192
#
interface Vlanif101
ip address 10.1.2.67 255.255.255.192
#
interface Vlanif200
description ==tianyu==
ip address 10.1.2.131 255.255.255.128
#
interface Vlanif300
description ==xiweier==
ip address 10.1.3.3 255.255.255.128
#
interface Vlanif400
ip address 10.1.3.131 255.255.255.128
vrrp vrid 7 virtual-ip 10.1.3.129
#
interface Vlanif500
ip address 10.1.4.3 255.255.255.128
#
interface Vlanif600
ip address 10.1.4.131 255.255.255.128
#
interface Vlanif700
ip address 10.1.5.3 255.255.255.128
#
interface Vlanif900
ip address 10.1.6.3 255.255.255.128
#
interface Vlanif1000
ip address 10.1.254.2 255.255.255.128
#

Handling Process

1. Check the ARP table. When service forwarding fails, the ARP entry matching the IP address does not exist. Run the display arp track command on the S7700. The command output shows that there is the log about deleting the ARP entry. The ARP entry deletion time is the same as the packet loss time of the server.

[S7700] display arp track
Operate Flags: M - Modify, D - Delete
--------------------------------------------------------------------------------
Op IP-Address      MAC-Address    VLAN Old-Port     New-Port     System-Time   
--------------------------------------------------------------------------------
M  10.1.3.180      xxxx-xxxx-0710 400  GE1/0/39     GE1/0/40     09-05 12:34:35
D  10.1.2.6       xxxx-xxxx-9cd6 300   GE2/0/30                  09-05 12:34:59
D  10.1.4.11       xxxx-xxxx-f9d8 500  GE2/0/10                  09-05 12:35:33

According to the preceding information, ping packets are lost because the ARP entry is deleted on the S7700. The S7700 cannot process excess ARP Request packets simultaneously, so the S7700 does not send ARP Reply packets to the server in a timely manner. Within the aging time, the ARP entry of the server is deleted.
2. Run the display cpu-defend statistics packet-type arp-request all command. You can view the following information:

[S7700] display cpu-defend statistics packet-type arp-request all 
Statistics on mainboard: 
------------------------------------------------------------------------------- 
Packet Type         Pass(Bytes)  Drop(Bytes)   Pass(Packets)   Drop(Packets) 
------------------------------------------------------------------------------- 
arp-request            79785920     13193856         1246655          206154 
------------------------------------------------------------------------------- 
Statistics on slot 1: 
------------------------------------------------------------------------------- 
Packet Type         Pass(Bytes)  Drop(Bytes)   Pass(Packets)   Drop(Packets) 
------------------------------------------------------------------------------- 
arp-request             3730112            0           58283               0 
------------------------------------------------------------------------------- 
Statistics on slot 2: 
------------------------------------------------------------------------------- 
Packet Type         Pass(Bytes)  Drop(Bytes)   Pass(Packets)   Drop(Packets) 
------------------------------------------------------------------------------- 
arp-request            73818304     20585792         1153411          321653 
------------------------------------------------------------------------------- 
Statistics on slot 3: 
------------------------------------------------------------------------------- 
Packet Type         Pass(Bytes)  Drop(Bytes)   Pass(Packets)   Drop(Packets) 
------------------------------------------------------------------------------- 
arp-request              531264            0            8301               0 
------------------------------------------------------------------------------- 
Statistics on slot 5: 
------------------------------------------------------------------------------- 
Packet Type         Pass(Bytes)  Drop(Bytes)   Pass(Packets)   Drop(Packets) 
------------------------------------------------------------------------------- 
arp-request                 N/A          N/A               0               0 
------------------------------------------------------------------------------- 
Statistics on slot 6: 
------------------------------------------------------------------------------- 
Packet Type         Pass(Bytes)  Drop(Bytes)   Pass(Packets)   Drop(Packets) 
------------------------------------------------------------------------------- 
arp-request            15580920            0          232981               0 
-------------------------------------------------------------------------------

3. Bind the static ARP entry of the server on the S7706 to ensure that the ARP entry of the server remains unchanged during testing. Perform the ping operation. No packet is discarded.
There are too many downstream ARP Request packets. As a result, the S7706 randomly discards ARP Request packets of the server. Within the aging time, the ARP entry of the server is deleted. Consequently, ping packets of the downstream server are discarded.
4. Configure a CPU defense policy on the S7700 to check the MAC address of the server that sends too many ARP Request packets.
cpu-defend policy test
auto-defend enable
auto-defend attack-packet sample 5  //The switch samples and identifies every five packets. A small sampling ratio indicates more consumed CPU resources.
auto-defend threshold 30  //Checking threshold for attack source tracing
auto-defend trace-type source-mac  //Attack source tracing based on source MAC addresses
auto-defend protocol arp  //ARP packets that the device monitors in attack source tracing
cpu-defend-policy test global  //Apply the CPU defense policy globally.
5. Run the display auto-defend attack-source slot 2 command to check the MAC address of the server that sends excess ARP Request packets.
Attack Source User Table (MPU):
————————————————————————————————
MacAddress       InterfaceName      Vlan:Outer/Inner      TOTAL
————————————————————————————————
0000-0000-00db   GigabitEthernet2/0/22         193           416
You can also run the display logbuffer command to check the MAC address of the server of which ARP Request packets are discarded.

Root Cause

The downstream server sends many ARP Request packets to the S7700, whereas the S7700 can process ARP Request packets of a certain number. In this case, normal ARP Request packets are discarded by CPCAR, and the packets cannot be sent to the CPU of the S7700 for processing.
Within the aging time, the ARP entry of the server is aged out on the S7700. Consequently, servers in different VLANs cannot communicate.

Solution

The server automatically sends dozens of ARP packets every second. The frequency is then adjusted to be one ARP packet per second, and services are restored.