How to troubleshoot BRAS Hot-Standby issues on NE40E

Issue Description

Customer deploy BRAS Hot-Standby for PPPoE service, the topology is shown below. The problem phenomenon is after CPE getting online, there is online information on BRAS_Master, but it doesn’t synchronize to BRAS_Backup.
NE40E BRAS_Backup.
On BRAS_Master:
<NE40E-Master>disp aaa statistics

---------------------------------------------------------------------
  Total  online users             4
  PPPoE  online users             3  PPPoA  online users             0
  VLAN   online users             0  FTP    online users             0
  SSH    online users             1  Telnet online users             0
  LEASE  online users             0  TUNNEL online users             0
  PPP    online users             0  LNS    online users             0
On BRAS_Backup:
<NE40E-Backup>disp aaa statistics
  ---------------------------------------------------------------------
  Total  online users             1 
  PPPoE  online users             0  PPPoA  online users             0
  VLAN   online users             0  FTP    online users             0
  SSH    online users             1  Telnet online users             0
  LEASE  online users             0  TUNNEL online users             0
  PPP    online users             0  LNS    online users             0

 

Alarm Information

None

Handling Process

1. First of all, BRAS hot/standby requires Cross-chassis HAG license, but customer only activate PPPoE license.

<NE40E-Master>display license
Item name          Item type  Value    Description
-------------------------------------------------------------
LCR5BASUPG00       Function   YES      NetEngine40E&80E PPPoE/IPoE Function Upgrade License
LCR5QS0100         Resource   6        Concurrent Users(1k)

2.The correct license should contains itemLCR5MHAG00 as below.

<NE40E-Master>display license
Item name          Item type  Value    Description
-------------------------------------------------------------
LCR5MHAG00         Function   YES      Cross-chassis HAG
LCR5BASUPG00       Function   YES      NetEngine40E&80E PPPoE/IPoE Function Upgrade License
LCR5QS0100         Resource   6        Concurrent Users(1k)

3. After activate the correct license, the problem still exists. BRAS configuration on both Master and Backup as below.

#
interface GigabitEthernet1/0/0.1
pppoe-server bind Virtual-Template 1
8021p 5
user-vlan 7
remote-backup-profile ne1                     //Bind remote-backup-profile to bas interface.
bas
#
  access-type layer2-subscriber default-domain authentication default2
  roam-domain default2
  client-option82
  dhcp reply trust broadcast-flag
  ip-trigger
  arp-trigger
#
#
remote-backup-service rbs-bras2
peer x.x.x.2 source x.x.x.1 port 7001             //Bras synchronize information via this IP and TCP port.
track interface GigabitEthernet x/x/4           //Track the uplink interface.
protect tnl-policy bras1-bras2 peer-ip x.x.x.2   
ip-pool private1-red metric 10                       
switchover uplink failure-ratio 30 duration 0
#
remote-backup-profile ne1
service-type bras
backup-id 11 remote-backup-service rbs-bras2
peer-backup hot
vrrp-id 4 interface GigabitEthernet1/0/0.2                  //This vrrp is used to detect Bras Hot-Standby.
#

4. From the configuration above, we can see vrrp-id 4 is used to detect BRAS Hot-Standby, check if it ‘s working properly.

<NE40E-Master>dis vrrp brief
Type:VRRP      Total:1     Master:1     Backup:0     Non-active:0     
VRID  State        Interface                Type     Virtual IP    
----------------------------------------------------------------
4     Master       GE1/0/0.2                Admin    172.168.0.1
<NE40E-Backup>dis vrrp brief
Type:VRRP      Total:1     Master:0     Backup:1     Non-active:0     
VRID  State        Interface                Type     Virtual IP    
----------------------------------------------------------------
4     Backup     GE1/0/0.2                Admin    172.168.0.2

5. Check the remote-backup-service on backup BRAS, we found TCP-State is always Connecting, The correct state should be connected.
[NE40E-Backup-rm-backup-srv-rbs-bras2]disp remote-backup-service rbs-bras2

----------------------------------------------------------
Service-Index    : 0
Service-Name     : rbs-bras2
TCP-State        : Connecting
Peer-ip          : 185.105.40.2
Source-ip        : 185.105.40.1
TCP-Port         : 7001
Track-BFD        : --
Track-interface0 : GigabitEthernet1/1/4
                    Weight : 10
Track-interface1 : GigabitEthernet1/1/6
                    Weight : 10
Uplink state     : 2 (1:DOWN 2:UP)
Last up time     : 2016-03-16 16:10:46
Last down time   : 2016-03-16 16:11:00

Last down reason : TCP closed for peer closed
Domain-map-list  : —
6. Since the TCP connection is not established, check if there is routing to peer BRAS. Both BRAS configured correct routing to peer and can ping each other via directly connected interface.
Master:

#
ip route-static x.x.x.2 255.255.255.255 10.0.0.2
#
Backup:
#
ip route-static x.x.x.1 255.255.255.255 10.0.0.1
#
[NE40E-Backup]ping -a x.x.40.2 x.x.40.1
PING x.x.40.2: 56  data bytes, press CTRL_C to break
    Reply from x.x.40.2: bytes=56 Sequence=1 ttl=255 time=1 ms
    Reply from x.x.40.2: bytes=56 Sequence=2 ttl=255 time=1 ms

7. From step 6, we can see ICMP is reachable, but need to further confirm TCP packets is sent and received properly. To ensure, we debug TCP on backup BRAS as below but get noting.

<NE40E-Backup>debugging tcp packet src-ip x.x.40.1
Info: Filter added!
<NE40E-Backup>t d
Info: Current terminal debugging is on.
<NE40E-Backup>t m

8. Apparently, backup BRAS doesn’t reveive TCP packets from master. After check the policy, we found there is a ACL 3999 match TCP source-port greater than 0, which will block TCP traffic between master and backup BRAS.

#
acl number 3999
rule 3 permit ip source 172.168.0.0 0.0.0.255 destination 224.0.0.18 0
rule 4 permit ip source 10.0.0.3 0 destination 224.0.0.18 0
rule 5 permit ip source 10.0.0.2 0 destination 224.0.0.18 0
rule 7 permit ip source 192.168.254.0 0.0.0.7
rule 8 permit ip source 10.90.0.0 0.0.7.255
rule 20 permit tcp source-port gt 0
#
This acl is applied to in Slot 1 as below:
#
cpu-defend policy 1
user-defined-flow 1 acl 3999
car user-defined-flow 1 cir 0
#
slot 1
cpu-defend-policy 1
#

9. After remove rule 20 in ACL 3999, TCP connection is established and user-information synchronized successfully on Backup BRAS.

<NE40E-Backup>disp aaa statistics
  ---------------------------------------------------------------------
  Total  online users             4
 PPPoE  online users             3  PPPoA  online users             0
  VLAN   online users             0  FTP    online users             0
  SSH    online users             1  Telnet online users             0
  LEASE  online users             0  TUNNEL online users             0
  PPP    online users             0  LNS    online users             0

 

Root Cause

1.There is no correct license for BRAS Hot-Standby.
2. Incorrect policy block synchronization between BRAS via TCP connection.

Solution

1.Apply and activate the license contains Cross-chassis HAG function.
2.Assure the TCP connection between master and backup BRAS is established.