If you upgrade your 14 gen DELL server to an Intel Xeon Scalable 2nd Gen CPU, you may receive this error.
Common second gen CPUs:
Xeon Gold 5218 (16 cores)
Xeon Gold 5220 (18 cores)
Xeon Gold 6240 (18 cores)
Xeon Gold 6252 (24 cores)
Xeon Platinum 8253 (16 cores)
Xeon Platinum 8276 (28 cores)
Chances are your BIOS is outdated and doesn’t support second gen processors. DELL did not introduce support for these processors until BIOS version 2.2.10. If you have a version below that you will receive this error without a doubt.
What you’ll have to do is install a first GEN CPU, update the BIOS, and then reinstall the second GEN processors.
I’ve probably had hundreds of servers freeze at this particular screen over my time dealing with servers. It gets to the loading bios drivers screen. You count the dots one by one and watch closely. One dot. two dots. 5 dots. And then nothing. The dots stop coming.
In almost every case I’ve found bad memory to be the culprit. In fact, I am documenting this while dealing with the exact problem. An R640 with 16 sticks of memory. It’s freezing. I know there’s a bad stick somewhere.
The only question is what’s the fastest way to locate the bad stick so you can move on with your life?
With a lot of memory I like to remove all the DIMMs from the black slots first. It’s up to you. The whole idea is to keep removing DIMMs until the darn thing boots up. If you remove sticks and it still freezes, that means the sticks you pulled out were good. In a 2 CPU configuration remove them evenly from both sides to prevent unbalanced memory errors.
Now, if you remove a bunch of sticks and it boots up, that means the bad stick is in the group of DIMMs you just pulled out. To find the bad one you now put them back in the server one by one. Once it freezes up you have found the bad DIMM. Replace it.
Sometimes you’ll do this process only to find the DIMMs just needed to be reseated. That’s fine, in the end it’s just about fixing the problem.
You may run into a case where a G9 server fails to boot into the HP Smart Storage Administrator. In my case the server would freeze here after hitting enter.
It’s not actually frozen but hitting enter does nothing and your only recourse is to just reboot the server.
Beyond that you may also notice Intelligent Provisioning appears to be completely borked. And most of the time this is the case.
The resolution is to simply reinstall Intelligent Provisioning. Start by downloading a copy here.
The file will come in an ISO format. Turn this into a bootable USB or alternatively just boot from it using the remote console in iLO. If you have difficulties making the USB I find booting it from the virtual KVM almost always works.
This is the screen you want to see. You’ll get a progress bar and the whole process takes about 10 – 15 minutes. The UID light will flash blue indicating a firmware update is taking place.
Once the system restarts you can attempt to boot into the Smart Storage Administrator once again.
In my case I was now able to successfully boot into the software and configure my drives.
One of my R640s presented a PCIe error with the following iDRAC log.
A fatal error was detected on a component at bus 25 device 0 function 0.
At boot time I received a generic PCIe error message prompting me to dig deeper into the logs. Going into the BIOS and looking at the devices it was immediately apparent what the problem was.
The server was not seeing my NDC.
Replacing the NDC resolved the issue. However, if you’re unable to determine the problematic device you can also do the following.
In iDRAC, navigate to system—>Inventory—>hardware inventory.
Here you will see hardware info related to devices and their associated bus numbers. Finding the bus number referenced in the error will tell you which device is causing the problem.
Here you can see bus number 25 is associated with the NDC:
Sometimes it’s not this easy though. If you can’t locate the bus number the best troubleshooting step is simply removing PCIe devices one by one until the error goes away. Reseat the component and replace it if the problem remains.
The DELL Poweredge R640 does support NVME with a bit of extra cabling. More specifically, the 10 bay chassis. In this tutorial I’ll show you exactly what cables you need to setup 2, 4, 8, or a complete 10 bay NVME system.
The first thing to know is there are 3 DELL cables used to make this happen. The first cable we’ll discuss supplies the first 2 ports with NVME:
DELL P/N 0M026C
It’s essentially a slim SAS cable that connects to an NVME controller/expander card. The other cables we’ll discuss plug straight into the motherboard, but this one requires a separate card.
If you’re facing the server from the front, the connector is on the bottom left of the backplane. One end of the cable is labeled BP which is short for backplane. You’ll plug the BP side into the backplane.
There will be some other cables in your way. You don’t have to unplug them but it does help. The other end of the cable is labeled CTRL. The cable will route alongside the backplane and then up the entire length of the right side of the chassis.
Route this cable alongside the fans and then up the right side of the chassis
Now that the cable is routed appropriately it’s time to install the NVME controller card.
DELL P/N 0CDC7W
Install the card into riser 1 and then attach the cable to the first port.
Now you’ve officially supplied the first 2 drive bays with NVME. The remaining 8 bays also require their own cabling setup. If all you needed was support for 2 NVME drives you can stop here. No further work is required, just plug in your drives and fire up the server.
Let’s install the other 2 sets of cables. Each of these cables supplies 4 bays with NVME support. The set of cables labeled A0 and B0 supply bays 6 – 9. The set of cables labeled A1 and B1 supply bays 2 – 5. The previous cable we just installed supplies bays 0 – 1.
Don’t feel bad if you struggle installing these, they’re an absolute pain unless you’ve cabled hundreds of them and have experience.
Here’s what the next cable looks like:
DELL P/N 0684MR
We will now move all the way to the right side of the backplane to the other Slim SAS connectors.
The cable labeled A0 and B0 will plug into their corresponding ports on the backplane (also labeled A0 and B0.)
These cables will route all the way to the left side of the chassis, up the cable channel, and then plug in to the Slim SAS connectors in the left rear of the motherboard.
Now onto the other set of cables (A1 and B1)
DELL P/N 0TXC4H
Do your best to tuck them under as best you can. Alongside the fans you’ll see hooks which keep the cables restrained and from popping out.
When it comes to routing them along the left side channel, I recommend pulling out the cables already installed. You don’t have to unplug them but it’s easier to route the NVME cables without them in the way. It’s much easier to tuck them in next to the NVME cables later.
Pull out the cables from the side channel to make installing the NVME cables easier.
Once you have them tucked in nicely you’ll see where they have to plug in to in the rear.
Find the ports labeled M1/M2/M3/M4 and just match them up.
And that’s all there is to it. If you’ve installed all 3 sets of cables you now have a server that supports 10 NVME drives. Keep in mind you don’t have to install all 3 sets. Once again, maybe you just want 4 bays with NVME connectivity. In that case, just install one set of the cables.