HP has issued a critical customer advisory regarding some of their high performance server adapters. I’ve had a number of customers impacted by the NC522 and NC523 10Gb/s server adapters losing connectivity. It’s good to see there is a firmware update now that may solve this.
Customer Advisory c02964542, ProLiant and StorageWorks Systems: Certain NC-Series and CN1000Q Network Adapters – FIRMWARE UPGRADE REQUIRED to Avoid the Loss and Automatic Recovery of Ethernet Connectivity or Adapter Unresponsiveness Requiring a Server Reboot to Recover, has been released/revised.
[Updated 22/03/2021] This article is now quite dated and HPE should have long fixed this firmware issue in their NIC’s. This should be viewed as historic only.
Affected Item(s):
HP NC375i Integrated Quad Port Multifunction Gigabit Server Adapter, HP ProLiant DL370 G6 Server series, HP NC522SFP Dual Port 10GbE Gigabit Server Adapter, HP NC522m Dual Port 10GbE Multifunction BL-c Adapter, HP NC375T PCI Express Quad Port Gigabit Server Adapter, HP ProLiant DL580 G7 Server series, HP NC523SFP 10Gb 2-port Server Adapter, HP ProLiant DL980 G7 Server series, HP ProLiant DL585 G7 Server series, HP CN1000Q Dual Port Converged Network Adapter, HP Business Data Warehouse Appliance, HP D2D4312 Backup System, HP D2D4324 Backup System
Description:
IMPORTANT: The network adapter firmware and driver upgrades provided in the Resolution are required to prevent the loss and recovery of Ethernet connectivity, or adapter unresponsiveness requiring a reboot to recover, from occurring. HP recommends performing these upgrades at the customer’s earliest possible convenience. Neglecting to perform the recommended action and not performing the recommended resolution could result in the potential for subsequent errors to occur.
The network adapters listed in the Scope section (below) may encounter either of the following:
* The adapter may temporarily lose Ethernet connectivity, and then automatically recover.
OR
* The adapter may stop responding, requiring a server reboot to recover the operation of the adapter.
Note: There is a low probability of this occurring when operating under a normal network workload.
This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com, by Michael Webster +.
What is the recommended Firmware and Driver by HP for both the NICs? Thanks
Hi Manish,
The firmware and driver locations are listed in the advisory instructions. Please take a look at http://h20000.www2.hp.com/bizsupport/TechSupport/….
[…] recently had a situation at a customer site where a physical NIC failure (Read the Critical Advisory I blogged about) caused an All Paths Down (APD) and management network failure. This required a […]
This "Fix" was a no go for us. We have 20 servers with 2 10Gb NC523SFPs and we continue to have an issue with random NIC flapping. One of the 2 nics will randomly lose connection to the switch for about 2 seconds, long enough to trigger that network redundancy was lost. We are running DL380 G7s and we have two separate datacenters with 10 Esx hosts in each one. We even tried putting two NC522SFPs in a server and when it "flapped" the only way I could get the network connection to stop bouncing was to reboot the server.
Hi Matt,
I can believe that. I know a number of customers still have ongoing problems with their NC522 and NC523 NIC's and are still experiencing some disconnections, although much less frequently and more minor on the whole. The real solution would be to change to NIC's that don't have these problems. I use the Intel X520 dual port cards and have not had any issues. But in a lot of cases it may not be possible to change. I was made aware recently of another firmware revision for these cards, not sure about the driver. So it's even more important with these cards to keep up to date with firmware and drivers, but really these problems should not happen with equipment that has been tested and certified to work.
Hi, i follow the steps but only upgrades the driver version, the firmware is the same, how can i upgrade the firmware??
I tried going to support.hp.com, but the drivers are not accessable that way. Instead, go to http://www.hp.com and then in the search box on the top right of the page, put the card in that you are looking for. A list where you can select your operating system should come up. Click "vmware esx/esxi 4.1". Download the HP Qlogic P2P Flash Update Kit. You have to create a CD/DVD to boot off of, and it will patch the firmware from that.
I don't know if this works, but it is the direct link for the NC523SFP card.
http://h20000.www2.hp.com/bizsupport/TechSupport/…
We're having similar problems with our storage NICs – HP NC522SFP. In one case is caused disk corruption on 30+ VM's.
It looks like HP don't provide the firmware when using as ESX hosts and instead point you to QLogic -> http://bizsupport1.austin.hp.com/bizsupport/TechS…
And the Qlogic firware update process just hangs for us.
Do we take from this that HP don't support ESX on their kit ??
Hi Simon, The customers I've helped with issues related to this advisory got the firmware and drivers directly from HP support (and had HP Techs update them). It is a bit of a process to go through when doing the update though as you have to boot from CD or USB stick. HP definitely do support the servers if it is one they have certified that is on the VMware HCL. Your colleagues in NZ IT experienced exactly the same issue and I helped them resolve it.
Hi Simon, Just so you are aware there are a bunch of other BIOS settings that are recommended in addition to the drivers and firmware. It's recommended that you configure static high performance for power management and enhanced cooling. One of the main reasons for this is that the cards are overheating under load, which is one of the problems. With these settings and the new drivers / firmware the symptoms should be greatly reduced, although I'm not 100% sure it's a complete fix.
HP techs have confirmed to me that for some customers their firmware and drivers fixed the flapping, but for others it continues. Our solution was to replace our NC523SFP cards with Intel x520-DA2 cards. We have had no more issue and are now able to move on with our migration plans.
HP does have some techs configuring firmware dumps for the network cards to try to capture what is causing the problem, but we had already replaced the nics when HP sent this to me. When calling them, make sure you ask about this.
Hi Simon, Thanks for your update today and for the link to the download for the service pack (http://h18004.www1.hp.com/products/servers/service_packs/en/index.html), which I have included in the post now. I hope this fixes the problems. Please post a comment back here if it is successful.
Many thanks for all your help. We have run the SPP against all our hosts now (and updated the 'VMWare' NIC driver) and so far… have not experienced the same issue. We reviewed the BIOS settings previously when we had PSOD crashes when the hosts first went in, but that was a seperate issue that was resolved by a BIOS update.
Anyway hopefully the latest issue is resolved (fingers crossed!)…
I notice HP haven't updated the advisory I linked to above which still suggests people go to Qlogic for the firmware, which is strange considering that the Qlogic firware hangs on these servers (our 6 core CPU builds anyway), and the HP SPP is bootable and does update the firmware succesfully.
Great to hear Simon. Weird that the advisory hasn't been updated. I do hope that your problems are gone.
We spent 2 long months trying to get these adapters to work on our DL580 G7 servers with VMware ESXi 4.1 u1 and had constant issues – many of the same issues Simon was reporting. We use NFS and iSCSI datastores on a NetApp using jumbo frames and Nexus 5ks and we had many SQL db and vmdk corruptions due to the NC522SFP's experiencing "firmware hangs" or network entropy states…each week we had a new driver to update from HP and Qlogic but they never fixed the issues. We had to pull them and replace them with NC552SFP cards (Emulex OEM). Have not had one single issue since we installed the 552's. We swapped them out in late December, so I don't know if recent firmware has fixed the issues. Good riddance to QLogic – their high performance ethernet cards should have never passed QA. They put my organization through the ringer.
BTW – we also had similar issues with their QLE8242 CNA's and the NC522SFP was suppossed to bring stability to our environment…double whammy.
Hi J, thanks for the feedback, it's unfortunate how many customers have been impacted by these problems. The first release of vSphere 5 from HP had issues with the OneConnect Emulex NIC's (mainly in blade infrastructures). This has been fixed in the latest patch releases. For a while the OneConnect wasn't on the HCL and wasn't supported by HP. So when the time is right to upgrade to ESXi 5 it would pay to run a test for a bit with the OneConnect cards and make sure you use the latest patch release.
We have 5 x HP DL585G7 servers with the onboard Quad Port HP NC375i, a separate Quad Port 1G NC375T and the Dual Port 10G NC522SFP (Yes that is 10 NIC's per server).
We have never had reliable networking with these servers (ESX 4.1, ESXi 4.1, ESXi5.0) and it is constantly a battle between ESXi versions and updates, driver updates, firmware updates and hope to keep things running. The only reliable way to ensure network connectivity is to reboot the servers every 4 weeks or so, before the network cards poop themselves.
I have calls logged with HP, If I ever hear back with a resolution, I'll let you know.
Jeff Smith @jeffsmithtbp
Hi All
As an update to this issue, I have been in contact with HP support for the last few weeks about this problem and it has done the rounds of their various support people. The short answer is I will be receiving new Main Boards and possibly NIC's for my 5 x DL585G7 servers in late May. I can't give any details unfortunately because the new boards aren't available yet but they are an update, not a direct replacement, of the existing boards. If you are still having this issue with your HP servers and these (IMO) dodgy QLogic NIC's hassle HP support, log a case, follow it up and don't take "No" for an answer until they rectify the problem.
Apparently the firmware and driver updates do fix the problems that some people are seeing so they are definitely worth a shot, but ours needed re-engineering to get past the issue.
Jeff Smith @jeffsmithtbp
Hi Jeff,
I just wanted to let you know that you aren't alone on this one. After a good few calls with HP on this issue, they agreed to send us new system boards and NICs for our DL580 G7 servers which we started switching out. The first server went well but the second box now can't load the onboard NIC drivers even though we have the latest firmware (4.0.585) and the latest drivers (4.0.614). I have an open ticket with VMware and with HP on this issue, so hopefully have resolution soon. I will post the resolution here (if we ever get one)….
Rod Hope
Hi All
HP came onsite almost two weeks ago to replace all 10 of the NIC's on two of my servers. They had to replace the riser board that houses the 4 x LOM NC375i NIC's, they replaced the 4 Port NC375T and the 2 Port NC522SFP.
When I restarted the server after the replacement the only issue was the ILOM had been reset (it resides on the riser board), apart from that no issues.
ESX started fine and all networking settings were the same, no issues with drivers or hardware. I migrated some test VM's to these servers and have not had any outages yet.
Everything seems good now and there have been no problems since. I can't guarantee that this will fix the issue, but it's a start and HP appear confident for the first time. I have HP coming onsite later this week to do the same thing on the last of my servers.
I also have another client with same problem on a number of servers that are experiencing the same issue. They have vSphere 4.1, but are seeing the same problems I have with vSphere 5.0u1. Same escalation with HP and same fix.
I'll keep you updated on the progress
Jeff Smith @jeffsmithtbp
Did you see that a bunch of updates for some of these NIC cards just came out dated September 4, 2012. HP must still be working on this problem.
I agree that the way to get at the correct link is to go to http://www.hp.com and go to drivers and search on your card number.
Hi all,
Again there's are a couple of advisory's from HP. And I'm very frustrated because since 2 years there's again again issues with the Bios FW and EMULEX FW and Drivers. I would be really interested how many people have the same problems with the G7 Server. So i created a facebook group:
https://www.facebook.com/groups/449590495114322/
newest HP advisories:
Nic FW 2013-01-24 -> http://h20000.www2.hp.com/bizsupport/TechSupport/…
Bios FW 2013-01-08 -> http://h20566.www2.hp.com/portal/site/hpsc/templa…
J, what errors did your applications report when corruption occurred and can you describe the corruption? Did they coincide with "Smux Provider Data is Invalid" errors?
Hi which cables do you use to connect the NC552SFp to the nexus 5k switches?
Hi all, i have the same problem even after updating firmware and driver.
I boot my ESXi host and the links to my storage are up, after a variable time links goes down.
Rebooting server restore the correct situation, but in minutes here we are again.
Reading ESXi logs i found
2014-11-11T15:38:47.412Z cpu9:33367)<3>qlcnic 0000:0d:00.1: vmnic5:qlcnic_check_temp:5969:Device temperature 108 degrees C exceeds maximum allowed.
so seems to be a temperature problem, i am opening a ticket with hp support and i will tell you the response
Hi Stefano, my customers have seen an improvement when using the BIOS settings of Static High Performance and Enhanced Cooling. However there is additional power consumption with these settings. I would encourage you to work with support and get a resolution. Many customers have had their NIC's replaced, and others have decided to change hardware platforms completely as a result of the experience.
i chatted with HP Support and they told me to:
– install the NC523SFP in the PCI-E riser slot 2 instead of slot 1 (slot 1 is near to mainboard)
– change thermal configuration in RBSU from "Optimal Cooling" to "Increased Cooling"
This solved the problem
Bye
Where the ‘F’ are the current firmware files for this ‘D-M’ thing? HP seems to have “LOST” all info on this card…(none of the above links work anymore).
It’s a seriously old problem and article, so maybe HP thought they’d got it well and truly fixed years ago. But thanks for the heads up. I’ll take a look.