One Response

  1. Doug Youd (@cnidus)
    Doug Youd (@cnidus) at |

    Gday Michael,

    Disclaimer: I work for Cumulus Networks.

    Generally when I’m doing MLAG-based implementations I try to suggest LACP everywhere if possible. (There are a number of situations where this is not realistic though).

    My justification is that gives the switches a messaging mechanism to the hosts for a variety of topology changes and allows for more intelligent failover. It can also simplify the network configuration (i.e. avoid having to use ifplugd etc).

    For example, there a few failure scenarios in an MLAG:
    1) Uplink failure
    2) Peerlink (ISL) failure
    3) Switch failure.
    4) mlag daemon failure.
    5) Control-plane failure.
    6) Planned maintenance.

    These scenarios can have different desired actions. In a lot of cases, if the host is given the appropriate topology change info, it can make the best decision on which links to use and which to drop from the bundle.

    We use the “chassis ID/MAC” in the LACP messages to notify on topology change.

    For example, in a planned maintenance window, clagd (the mlag process) will notify its peer to assume the ‘primary’ role, shutdown the local daemon gracefully and revert the local chassis-MAC to the default (i.e. non-cluster MAC). The host will then see an LACP bundle with 2 different chassis-mac’s and drop the link to the maintenance-switch (as defined in the LACP standard).

    When maintenance is complete, the clagd daemon will start again, form a relationship with the peer, then set the local chassis-mac for the lacp links back to the cluster-ID (same as the peer had active). The host will then see the chassis-mac matching again and gracefully add the link back into the bundle and seamlessly start using it.

    tl;dr – My 2c is (if its available), LACP config on the host is worth the effort for MLAG topo’s.

    Reply

Leave a Reply