16 Responses

  1. David Pasek
    David Pasek at |

    Disclaimer: I work for DELL.

    Hi Michael. Very nice post. I think generally is better to use L3 Leaf-spine fabric with OSPF and/or BGP because of better scalability and reliability. I’m sure you are working towards L3 network architecture and I’m looking forward for future posts because this is exactly what I’m testing in my lab to be familiar with. I’m very interested what will be your final conclusion.

  2. David Pasek
    David Pasek at |

    Excellent. Sounds very promising.

    Absolutely agree with you. L2 is much simpler and that’s the reason I only did the same L2 (VLT/mVLT/rVLT) deployments for our customers so far.

    However L2 network is single failure domain having potential negative impact on availability and scalability.

    Right now I have one customer who is planning new datacenter for 3,000 servers plus for next year. He is interested in leaf-spine L3 network PoC so I’l most probably (if time allows) try to build a small lab this month.

    I have some ideas how to implement L3 leaf-spine with OSPF but I have never test it so far because I’m more virtualization guy then networking guy and as you stated correctly L2 implementation is simpler for us but that’s most probably only because we are little bit less familiar with L3 dynamic routing protocols.

    That’s the reason I’m very interested on your thoughts and thanks for sharing your experience with community.

  3. 5 Step Process for Getting the Best Out of Your VMware Support Experience | Long White Virtual Clouds

    […] Force10 or PowerConnect switches. I’ve previously written about their switches in my article Configuring Scalable Low Latency L2 Leaf-Spine Network Fabrics with Dell Networking Switches. Other OEM partners and solutions providers may have different support offerings and you should […]

  4. Shan
    Shan at |

    Hi Michael,

    Great post, Is LACP recommended load balancing on the ESXi vDS?

    1. David Pasek
      David Pasek at |

      Hi Shan,

      LACP on ESXi VDS is definitely highly recommended. One reason is that LACP solve potential black hole scenario when VLTi (peer-link) failure. VLTi failure is rare but it can happen.

      I have described this in detail at


      1. David Pasek
        David Pasek at |

        Hi Mike. Are you sure? Did you test it? I did 🙂

        I did a test as a part of validation tests of my already implemented VLT network design.

        I had the same assumption like you but validation tests told me the truth.

        Backup link will not help you. It is not a bug, it is a feature. If VLTi is down then MAC addresses cannot be sync between VLT nodes. Therefore VLT domain cannot work. Backup link is used just to know who is up and who is down. When VLTi is down and secondary VLT node see primary node up and running over backup link than all ports participating in VLT port-channels on secondary VLT node are switched to link down. However orphan ports (non-VLT ports) are still up and it leads to black hole scenario.

        When you use switch independent teaming on ESXi host you effectively use orphan ports.

        When you use LACP then it is VLT aware.

        Backup link is beneficial when primary node is down and backup-link is up then secondary VLT node will keep all VLT port-channel ports up.

        When backup link is not configured correctly then there is no visibility between VLT nodes during VLTi failure and it is considered as split-brain scenario when only primary VLT node will switch the traffic.


      2. David Pasek
        David Pasek at |

        I fully agree that VLTi failure probability is very low because of redundancy. But typical failure scenario is human error.

        I also agree that this particular scenario can be documented as a risk and accepted by customer.

        However if not accepted I also believe that line state tracking (In Force10 language UFD – Uplink Failure Detection) is potential solution – workaround.

        BUT REMEMBER orphaned ports must be configured as dependent on some VLT (for example VLT to upstream router) and NOT on VLTi (peer-link) port-channel because VLTi port-channel can be correctly down during primary VLT node maintenance like firmware upgrade or reload.

        I’m planning to test and validate UFD workaround in my lab because LACP nor static Ether-channel cannot be used when NPAR is leveraged on NICs. And that’s exactly what is used in my particular design because of vSphere Licensing (standard edition) and iSCSI with DCB constraints.

        I call this solution as workaround because it can introduce other unwanted side effects (dependency on something irrelevant).

        Another workaround (maybe more reliable than UFD) would be Force10 smart-script (perl, python. zsh) testing VLTi status and also the status of primary VLT node. However who likes custom scripts, right? It would have negative impact on long-term solution manageability.

        I still believe LACP (or static ether-channel) from the host is the purest solution if applicable and you will get the best result. I don’t thing VDS LACP configuration is too complex.

        I also agree that VMware’s LACP implementation is relatively new and I have seen some interoperability issues with HP IRF but that’s another topic.

        P.S. I’m also proponent of simple solutions and I really like switch independent teaming, especially VMware’s LBT but it is good to know there is at least one risky scenario.

  5. Performance Testing Oracle Databases with Swingbench Order Entry Schema | Long White Virtual Clouds

    […] Especially when deploying Oracle RAC a low latency high throughput network environment is preferred due to the cluster interconnect that coordinates between the database cluster nodes. Overall a scalable network that can scale out with your applications and servers is preferred and something that offers predictable consistent latency and throughput between endpoints. A popular network topology that provides these characteristics is a leaf spine network architecture. All of the testing in this article was based on systems deployed and connected to a leaf spine network with 40GbE Spine switches connected to 10GbE leaf switches, which are connected to the hosts. I gave an overview of my test lab network topology in Configuring Scalable Low Latency L2 Leaf-Spine Network Fabrics with Dell Networking Switches. […]

  6. LACP Configuration for VMware Distributed Switch and Dell Force10 OS | Long White Virtual Clouds

    […] the article Configuring Scalable Low Latency L2 Leaf-Spine Network Fabrics with Dell Networking Switches I wrote about the general set up of the leaf spine architecture in my Nutanix performance lab with […]

  7. Rajeev Srikant
    Rajeev Srikant at |

    Hi Mike

    I have a very basic question.

    In my environment we are planning to use NSX with Leaf Spine Architecture. (In current scenario we have traditional 3 level network design)

    When we go for NSX with Leaf Spine Architecture, would like to understand whether the leaf switches should be L2 or L3

    My understanding is that it should be L2 since NSX will be having the Edge Gateways which will form the L3 Adjacency with the spine switches.

    Please clarify

    1. @vcdxnz001
      @vcdxnz001 at |

      Yes the Leaf Switches should be configured for L2. L3 is usually in the Spine.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.