Blog

  • DELL R640 freezes loading BIOS drivers

    DELL R640 freezes loading BIOS drivers

    I’ve probably had hundreds of servers freeze at this particular screen over my time dealing with servers. It gets to the loading bios drivers screen. You count the dots one by one and watch closely. One dot. two dots. 5 dots. And then nothing. The dots stop coming.

    In almost every case I’ve found bad memory to be the culprit. In fact, I am documenting this while dealing with the exact problem. An R640 with 16 sticks of memory. It’s freezing. I know there’s a bad stick somewhere.

    The only question is what’s the fastest way to locate the bad stick so you can move on with your life?

    With a lot of memory I like to remove all the DIMMs from the black slots first. It’s up to you. The whole idea is to keep removing DIMMs until the darn thing boots up. If you remove sticks and it still freezes, that means the sticks you pulled out were good. In a 2 CPU configuration remove them evenly from both sides to prevent unbalanced memory errors.

    Now, if you remove a bunch of sticks and it boots up, that means the bad stick is in the group of DIMMs you just pulled out. To find the bad one you now put them back in the server one by one. Once it freezes up you have found the bad DIMM. Replace it.

    Sometimes you’ll do this process only to find the DIMMs just needed to be reseated. That’s fine, in the end it’s just about fixing the problem.

  • HP G10 Update Process

    HP G10 Update Process

    This post will describe the update process for the HP G10 platform. All G10 servers can be updated using this process. Keep in mind even though the process might be the same, individual BIOS and update packages might differ depending on the server. For example, the HP DL-360 and the HP DL-380 use different BIOS files.

    There are 2 ways you can update an HP G10 server.

    1. The traditional method is to use what HP calls a service pack. This service pack contains not only fully updated firmware for the BIOS and iLO, but also updated firmware for other devices like RAID controllers, network cards, etc. The service pack comes in an ISO format. It can be booted from a USB or can be mounted through the virtual KVM  inside the iLO.
    2. The other method is logging into the iLO and updating individual components directly. Often times you either have an outdated service pack, or no service pack at all, and can use this method instead. Try to find a service pack if you can.. In the absence of a service pack proceed to update the BIOS and iLO as individual components. HP requires a service account to download service packs so you don’t always have immediate access. Such restrictions are not applied to individual updates.

    Mounting the service pack ISO over virtual KVM

    To mount the ISO you must first login to the iLO. To login to the iLO simply note the IP address displayed at boot time:

    If no IP address is displayed please ensure your iLO interface has an ethernet cable attached. If an address outside of your network is displayed this usually means a static address has been configured and the iLO should be reset. Reset the iLO to default settings.

    Take this IP address and type it into your browser. You will be greeted with a login prompt:

    Login: Administrator

    Password: (See sticker located on top of server)

    If you are unable to login with the default credentials this is another sign that the iLO needs to be reset. Once logged in you will notice a thumbnail of the server screen in the lower left corner of the window. This is the virtual console:

    Click the thumbnail and then select HTML 5 console to launch the virtual KVM.
    You will now see the active video output of your server. From here click the little icon that looks like a CD, then click CD/DVD, then click Local *.iso file.

    Now simply browse to the service pack to mount and begin the boot process.

    Once attached, the server will attempt to boot from the ISO during the boot cycle. Wait a few moments for the server to cycle through the boot order. This is the screen you want to see:

    There is nothing more to do at this point other than wait for the update process to complete. The system may restart several times. Once completed you should verify the installation by making note of the current BIOS version. The current BIOS version is displayed at boot time and both the BIOS and iLO versions can be found in the iLO as well.

    Updating individual components

    As mentioned, in absence of a service pack components can be updated individually. The 2 most important items we must update are the BIOS and the iLO firmware.

    Let’s start with the BIOS. First, login to the iLO as described previously in this document and then navigate to Firmware & OS Software. Then click Update Firmware:

    The following window appears:

    Select choose file and then attach the firmware package for the BIOS. The latest firmware can be obtained directly from HP. The correct file will have a .FLASH extension or a .FWPKG extension depending on the server.

    You will see a progress bar indicating that the file is being uploaded. If accepted, the firmware will install.

    Flashing firmware means the file was accepted and the update process has started.

    After each update it may be necessary to power cycle the server. You can do this simply by manually turning the server on or off, or rebooting the system inside of the iLO.

    Repeat this process for the other individual components.

  • Mini Controller means sacrificing a x16 PCIe slot on the R740xd

    Mini Controller means sacrificing a x16 PCIe slot on the R740xd

    On the DELL R740xd there are 3 variations of riser 1.

    2 of these variations are incompatible with a mini controller because of their length, and the fact that they use up the slot on the motherboard required by the mini controller. To install the mini controller, an interposer must be installed first.

    The interposer is essentially an interface between the controller and the motherboard.

    There is only 1 riser compatible with the mini controller and unfortunately it only has 3 x8 PCIe slots. No x16.

    The variations that do have x16 slots are too long and slot in to the same slot required by the interposer, as shown below:

    This riser has 2 x16 slots but incompatible with a mini controller.

    So if you need a lot of x16 slots consider that you’ll need to use a PCIe RAID controller and not the mini.

  • HP DL380 G9 Not Booting Into Smart Storage Administrator

    HP DL380 G9 Not Booting Into Smart Storage Administrator

    You may run into a case where a G9 server fails to boot into the HP Smart Storage Administrator. In my case the server would freeze here after hitting enter.

    It’s not actually frozen but hitting enter does nothing and your only recourse is to just reboot the server.

    Beyond that you may also notice Intelligent Provisioning appears to be completely borked. And most of the time this is the case.

    The resolution is to simply reinstall Intelligent Provisioning. Start by downloading a copy here.

    The file will come in an ISO format. Turn this into a bootable USB or alternatively just boot from it using the remote console in iLO. If you have difficulties making the USB I find booting it from the virtual KVM almost always works.

    This is the screen you want to see. You’ll get a progress bar and the whole process takes about 10 – 15 minutes. The UID light will flash blue indicating a firmware update is taking place.

    Once the system restarts you can attempt to boot into the Smart Storage Administrator once again.

    In my case I was now able to successfully boot into the software and configure my drives.

  • A fatal error was detected on a component at bus 25 device 0 function 0.

    A fatal error was detected on a component at bus 25 device 0 function 0.

    One of my R640s presented a PCIe error with the following iDRAC log.

    A fatal error was detected on a component at bus 25 device 0 function 0.

    At boot time I received a generic PCIe error message prompting me to dig deeper into the logs. Going into the BIOS and looking at the devices it was immediately apparent what the problem was.

    The server was not seeing my NDC.

    Replacing the NDC resolved the issue. However, if you’re unable to determine the problematic device you can also do the following.

    In iDRAC, navigate to system—>Inventory—>hardware inventory.

    Here you will see hardware info related to devices and their associated bus numbers. Finding the bus number referenced in the error will tell you which device is causing the problem.

    Here you can see bus number 25 is associated with the NDC:

    Sometimes it’s not this easy though. If you can’t locate the bus number the best troubleshooting step is simply removing PCIe devices one by one until the error goes away. Reseat the component and replace it if the problem remains.