Saturday 23 June 2018

Link Aggregation(LAG) and LACP Protocol:


LAG (Link Aggregation):

  • It is a virtual concept

Thus, the LAG makes two links appear as one in STP

  • The basic concept of LAG is that multiple physical links are combined into one logical bundle. This provides two major benefits, depending on the LAG configuration:
    • Increased capacity – traffic may be balanced across the member links to provide aggregated throughput
    • Redundancy – the LAG bundle can survive the loss of one or more member link
  • LAG makes two connections appear as one to STP, so the second connection won’t be blocked.
  • Load sharing among the LAG links: Traffic from each flow of packets goes through a single link. So, even if we get a flow of 20G, it will go via a single link only. So, the throughput is limited to the bandwidth of the single link only (here: 10G).

Note: The physical type (SFP,...) and speeds(10G,...) of all links in a port-channel should be the same.

#int et 17-18

#channel-group mode on //static configuration of link aggregation

Where channelgroupnumber = 1

#show span //we can see the port channel 1 in show spanning-tree

#show int port-Channel 1 (OR) #show port-channel

  • Now, if we make the port-channel of et17 and 18 in vlan 51. And now, if we make individual et17 and et 18 in van 30. Then, even then the port-channel only will take effect as it will act as a single interface. Thus, ALWAYS port-channel configuration overrides any interface specific configuration.
  • Even after link aggregation, show lldp nei will show the individual interfaces only, since lldp considers only the physical topology.
  • If we tcpdump on int et 17, we can only see the lldp protocols. If we do tcpdump on port-channel 1, we can see all other protocols like STP, etc

#tcpdump int po1

  • If we use link aggregation on one side (MT701) and stop link aggregation on other side (MT703), then:In EOS,
  • On MT701, the STP will not even consider the individual ports. It only considers the port-channel as an interface and it only has the port channel’s state.
  • On MT703, the interfaces are considered different. So, if any ARP request broadcast comes from the port-channel on MT701, then, MT703 receives it on both the interfaces and floods it again via the other link.
  • RTAG 7:
    • RTAG7 is a hashing algorithm that load balances the traffic.
    • Hash engine comes out with a number for each flow.
    • Using the number, the flow is sent through that port.
    • This ensures that the flows are distributed among all the links
    • #show port-channel load-balance trident fields //we can see which all fields of a packet participate in the hashing.
    • #port-channel load-balance trident fields ? //we can configure which all fields we can disable or enable to influence the hashing.
  • Why is the spanning tree cost of port channel comes down? (et 18, 19 have cost of 2000 each but if we use port-channel the cost of port-channel becomes 1999)
    • Because of bandwidth since cost depends on bandwidth
    • 10G links have cost of 2000 by default. Higher the bandwidth, lower is the cost.
    • But, if another 10G link is there, cost will be 2000, so STP may consider that link. So, we use cost of 1999 on the port-channel link.
    • If any speed mismatch is there among the links in a port channel, then, only the higher bandwidth link is active.

DYNAMIC LINK AGGREGATION:

  • LACP (Link Aggregation Control Protocol)
  • #int et 17-18

#channel-group mode active //dynamic lacp active

(OR)

#channel-group mode passive //dynamic lacp passive

  • If both the sides are configured as active, then, both sides can start LACP transmission. If one side is configured as passive, then, it can only receive lacp, not send. (both sides cannot be passive, since both sides will only be waiting for lacp)
  • Passive LACP: the port prefers not transmitting LACPDUs. The port will only transmit LACPDUs when its counterpart uses active LACP (preference not to speak unless spoken to).
  • Active LACP: the port prefers to transmit LACPDUs and thereby to speak the protocol, regardless of whether its counterpart uses passive LACP or not (preference to speak regardless).
  • In L2 header, the type will be a Slow Protocol and the subtype in slow protocols will be LACP. It is a slow protocol since if one side LACP is configured, then, it will keep sending LACP PDUs every 30 seconds.
  • The LACP rate fast feature is used to set the rate (once every second) at which the LACP control packets are sent to an LACP-supported interface. The normal rate at which LACP packets are sent is 30 seconds.

#lacp rate fast //1 seconds

(OR)

#lacp rate normal //30 seconds

  • The above timeout is used to configure the timeout for the partner. If we use rate fast, then, within 3 seconds if it doesn’t get a reply, it knows that the link is not active. It tears down the port-channel.

(OR)

In the rate fast timeout configuration, an LACPDU is sent every second. If no response comes from its partner after 3 LACPDUs are sent, a timeout event occurs and the port channel is removed.

    • Key: Tells the port-channel number (there are two separate fields for both the actor and partner)
    • Port: tells the port number .eg. et 17 (there are two separate fields for both the actor and partner)
    • Actor State:
      • LACP Activity: tells whether the actor is in active (1) or passive state (0)
      • LACP Timeout: tells whether the timeout is normal (1) or fast (0)
      • Aggregation: 1 tells whether the port is a part of port channel. 0 tells that it is operating as a single link.
      • Synchronization: the flag is set when we get back a LACPDU from the partner.

(1 Means it has been allocated to the correct link aggregation group, the group has been associated with a compatible aggregator, and the identity of the link aggregation group is consistent with the system ID and operational key information transmitted. If the value is 0, the link is not synchronized .ie. it is currently not in the right aggregation.)


      • Collecting: after sync, if we receiver data from partner, accept it.
      • Distributing: after sync, even send data.
      • Default: 1 indicates that the actor’s receive machine is using the default operational partner information, administratively configured for the partner. 0 indicates the operational partner information in use has been received in an LACP PDU.
      • Expired: 1 indicates the actor or partner is in an expired state. 0 indicates the actor or partner is not in an expired state .ie. no lacpdu received within timeout
  • Now, a case when we aggregate many links into a port channel but the links are connected to different switches. Then, when the actor sends out a LACPDU, whichever partner sends the first reply, that link will become active. The other links connected to different partners will get inactive state.
  • #show lacp internal detailed //shows the current switch’s details. Also we can see all the flags which are set and not set. We can also see the port-priorities, so that we choose which link should come up in case of mismatched-aggregate
  • #show lacp neighbor detailed  //shows the system id, port number and key of the neighbor
  • #show port-channel //in this itself we can see the reason for the port channel not active
  • #show lacp neighbor all-port //shows details about the neighbour for each of the physical port. We can then use the oper-key value to know if the port-channel link is connected to same port-channel on other side also.
    • If oper-key is different and admin-key is same, then, this port channel is connected to 2 different port-channels on the other side. But, they are in same switch
    • If oper-key and admin-key are both different, then, this port-channel is connected to 2 different switches.
  • #show lacp interface all-port //shows details for lacp not working...If the partner’s sys-id is same and oper-key is different, then, it means same port channel on our side is connected to two port-channels to same partner on other side. (mismatched-aggregate)
  • NOTE: Oper-key is the number assigned to port-channel. For eg, if I give my channel-group 100, then, port channel number is 100 and oper-key will be shown as 64 (since: 100 in decimal is 64 in hex)
  • #show lacp interface detailed all-ports and #show etherchannel detailed all-ports are two ultimate troubleshooting commands if we can’t find out issue