Blog

  • HP DL380 G9 Not Booting Into Smart Storage Administrator

    HP DL380 G9 Not Booting Into Smart Storage Administrator

    You may run into a case where a G9 server fails to boot into the HP Smart Storage Administrator. In my case the server would freeze here after hitting enter.

    It’s not actually frozen but hitting enter does nothing and your only recourse is to just reboot the server.

    Beyond that you may also notice Intelligent Provisioning appears to be completely borked. And most of the time this is the case.

    The resolution is to simply reinstall Intelligent Provisioning. Start by downloading a copy here.

    The file will come in an ISO format. Turn this into a bootable USB or alternatively just boot from it using the remote console in iLO. If you have difficulties making the USB I find booting it from the virtual KVM almost always works.

    This is the screen you want to see. You’ll get a progress bar and the whole process takes about 10 – 15 minutes. The UID light will flash blue indicating a firmware update is taking place.

    Once the system restarts you can attempt to boot into the Smart Storage Administrator once again.

    In my case I was now able to successfully boot into the software and configure my drives.

  • A fatal error was detected on a component at bus 25 device 0 function 0.

    A fatal error was detected on a component at bus 25 device 0 function 0.

    One of my R640s presented a PCIe error with the following iDRAC log.

    A fatal error was detected on a component at bus 25 device 0 function 0.

    At boot time I received a generic PCIe error message prompting me to dig deeper into the logs. Going into the BIOS and looking at the devices it was immediately apparent what the problem was.

    The server was not seeing my NDC.

    Replacing the NDC resolved the issue. However, if you’re unable to determine the problematic device you can also do the following.

    In iDRAC, navigate to system—>Inventory—>hardware inventory.

    Here you will see hardware info related to devices and their associated bus numbers. Finding the bus number referenced in the error will tell you which device is causing the problem.

    Here you can see bus number 25 is associated with the NDC:

    Sometimes it’s not this easy though. If you can’t locate the bus number the best troubleshooting step is simply removing PCIe devices one by one until the error goes away. Reseat the component and replace it if the problem remains.

  • R640 10 Bay NVME Tutorial

    R640 10 Bay NVME Tutorial

    The DELL Poweredge R640 does support NVME with a bit of extra cabling. More specifically, the 10 bay chassis. In this tutorial I’ll show you exactly what cables you need to setup 2, 4, 8, or a complete 10 bay NVME system.

    The first thing to know is there are 3 DELL cables used to make this happen. The first cable we’ll discuss supplies the first 2 ports with NVME:

    DELL P/N 0M026C

    It’s essentially a slim SAS cable that connects to an NVME controller/expander card. The other cables we’ll discuss plug straight into the motherboard, but this one requires a separate card.

    If you’re facing the server from the front, the connector is on the bottom left of the backplane. One end of the cable is labeled BP which is short for backplane. You’ll plug the BP side into the backplane.

    There will be some other cables in your way. You don’t have to unplug them but it does help. The other end of the cable is labeled CTRL. The cable will route alongside the backplane and then up the entire length of the right side of the chassis.

    Route this cable alongside the fans and then up the right side of the chassis

    Now that the cable is routed appropriately it’s time to install the NVME controller card.

    DELL P/N 0CDC7W

    Install the card into riser 1 and then attach the cable to the first port.

    Now you’ve officially supplied the first 2 drive bays with NVME. The remaining 8 bays also require their own cabling setup. If all you needed was support for 2 NVME drives you can stop here. No further work is required, just plug in your drives and fire up the server.

    Let’s install the other 2 sets of cables. Each of these cables supplies 4 bays with NVME support. The set of cables labeled A0 and B0 supply bays 6 – 9. The set of cables labeled A1 and B1 supply bays 2 – 5. The previous cable we just installed supplies bays 0 – 1.

    Don’t feel bad if you struggle installing these, they’re an absolute pain unless you’ve cabled hundreds of them and have experience.

    Here’s what the next cable looks like:

    DELL P/N 0684MR

    We will now move all the way to the right side of the backplane to the other Slim SAS connectors.

    The cable labeled A0 and B0 will plug into their corresponding ports on the backplane (also labeled A0 and B0.)

    These cables will route all the way to the left side of the chassis, up the cable channel, and then plug in to the Slim SAS connectors in the left rear of the motherboard.

    Now onto the other set of cables (A1 and B1)

    DELL P/N 0TXC4H

    Do your best to tuck them under as best you can. Alongside the fans you’ll see hooks which keep the cables restrained and from popping out.

    When it comes to routing them along the left side channel, I recommend pulling out the cables already installed. You don’t have to unplug them but it’s easier to route the NVME cables without them in the way. It’s much easier to tuck them in next to the NVME cables later.

    Pull out the cables from the side channel to make installing the NVME cables easier.

    Once you have them tucked in nicely you’ll see where they have to plug in to in the rear.

    Find the ports labeled M1/M2/M3/M4 and just match them up.

    And that’s all there is to it. If you’ve installed all 3 sets of cables you now have a server that supports 10 NVME drives. Keep in mind you don’t have to install all 3 sets. Once again, maybe you just want 4 bays with NVME connectivity. In that case, just install one set of the cables.

    Thank you for reading.

  • The storage BP2 SAS A0 cable is not connected, or is improperly connected.

    The storage BP2 SAS A0 cable is not connected, or is improperly connected.

    The R740xd has specific numbers it uses to refer to different backplanes on your server. The R740xd can technically support 3 different backplanes. Of course you have the primary backplane. This is the backplane used to install drives through the front of the server. The other 2 backplanes refer to the mid and rear flex bays if you have those installed.

    So quite simply:

    • BP0 – Rear backplane for flex bay
    • BP1 – Primary backplane for the front drive slots
    • BP2 – Backplane for the mid bay

    So if you’re getting backplane errors on BP2 this means the server is detecting a problem with your mid bay. (See the end of this post if you don’t actually have a mid bay installed!)

    This could happen for a number of reasons. If you just installed the mid bay it’s likely you have the wrong cable installed or have it plugged into the wrong port on the backplane. The easiest first troubleshooting step is ensuring the cable is plugged in properly. The proper cable will have 2 SAS connectors on one end that plug into the mid bay and then a single SAS connector on the other that routes along the side of the chassis and plugs into the A1 port on the primary backplane. Perhaps you have it plugged into the A2 port (12 bay model) or the B1 port (24 bay model.)

    You also might have the incorrect cable. The R740xd has 2 primary models. The 12 bay LFF version and the 24 bay SFF version. Both of these servers use different cables to interface with the backplane. The mid bay is physically the same hardware but both servers will use a different cable due to differences in the primary backplane design. You might have a cable designed for the 24 Bay installed in a 12 bay server, or vice versa.

    Take a look at the cable and then look at the port you’re plugging it into on the backplane. The port numbers should match. For example, on an R740xd 12 bay the ports for the mid bay and rear flex bay are labeled A1 and A2.

    A cable labeled A1 should only go into a port also labeled A1

    The cable will also be labeled the same. A cable labeled anything else is not going to work. Cables for the 12 and 24 bay systems might fit in each others ports but you’ll notice they go in at awkward angles. In the case of the 24 bay chassis, using a cable designed for the 12 bay will block the B1 port due to the angle of the connector. It may also throw errors. You must source the correct cable.

    However, If you’ve determined that you’re using the correct cable it’s time to look deeper. I recently built a server and had this error at boot time. I was positive I had the correct SAS cable connected to the mid bay. Most of the time I simply replace the entire mid bay but this time I looked a little deeper and noticed the cable was physically damaged. Sometimes there is obvious damage to the cable from being stuffed against and forced down into the cable channeling system.

    Replacing the cable resolved the issue. Other times the backplane itself was faulty, In that case the backplane and arguably the entire mid bay should be replaced. My logic is if you’re going to order parts to fix the problem you should order every possible part necessary to avoid wasting time and reordering should one of the parts alone fail to fix the problem.

    I’ve also had some success updating the CPLD firmware on the system. This was a classic fix we discovered at work. Sometimes we’d have a problem with one of the backplanes and swapping out the entire hardware for new stuff didn’t fix the issue. In these cases we found updating the firmware for the complex programmable logic device was the solution. This chip is involved with detecting whether or not cables are plugged in and what SAS lanes are active, so to speak. It has worked enough times that I’d say it’s worth a shot to try.

    So just to reiterate:

    • Ensure you have the right cable
    • Ensure the cable is plugged into the correct port on the backplane (A1)
    • Ensure the cable is not damaged

    Assuming all of the above conditions are satisfied yet the issue is still not resolved, try updating the CPLD firmware. Failing that it’s probably time to order a new mid bay cable, another backplane for the midbay, or the entire backplane configuration with the correct cable from a trusted source.

    As a final note, I’ve also seen this error pop up in servers that have no flex bay installed at all. In some cases updating the CPLD firmware resolved the issue, in other cases we considered the server failed at least for a serious production environment.

  • MIKROTIK hAP ax S Wifi running extremely slow on Thinkpad T480 [SOLVED]

    MIKROTIK hAP ax S Wifi running extremely slow on Thinkpad T480 [SOLVED]

    I have a Mikrotik hAP x S router configured with mostly default settings. I noticed wifi was running extremely slow on the default network. I first tested the network while plugged straight into the router and saw more or less what I expected – a 500Mbps download speed from my carrier.

    Testing the wifi connection was a different story. I was only achieving a maximum download speed of 3Mbps.

    I ran the following at my Debian terminal:

    06:43 PM-adam@adampc:~$ iw dev wlp3s0 link
    Connected to d0:ea:11:13:af:b7 (on wlp3s0)
           SSID: MikroTik-13AFB7
           freq: 2462
           RX: 16009177 bytes (20247 packets)
           TX: 32704312 bytes (30902 packets)
           signal: -41 dBm
           rx bitrate: 6.0 MBit/s
           tx bitrate: 300.0 MBit/s MCS 15 40MHz short GI

           bss flags:      short-preamble short-slot-time
           dtim period:    1
           beacon int:     100

    This revealed a healthy TX rate but an absolutely abhorrent RX rate.

    After a lot of troubleshooting I discovered both the 2.4 and 5Ghz bands were sharing the same SSID.

    The immediate solution is to simply put the 2 bands on different SSIDs.

    wifi–>interface–>wifi1 allows you to change the SSID for the 2.4Ghz network.

    Why this fixes the problem:

    When both the 2.4 GHz and 5 GHz radios share the same SSID, your laptop (and most Wi-Fi clients on Linux in particular) has to decide which band to attach to using its own internal logic. That decision is not always optimal.

    In my case, the ThinkPad T480 consistently chose the 2.4 GHz network, even though a faster 5 GHz network was available. This is a common behavior because:

    • The 2.4 GHz signal often appears “stronger” or more stable at first scan
    • Linux Wi-Fi clients tend to be “sticky” and do not aggressively roam to better bands
    • The access point’s band steering (if present) is not strong enough to override the client decision

    Once connected to 2.4 GHz, performance was severely limited due to:

    • Heavy local congestion from neighboring networks
    • Narrow channel bandwidth
    • Legacy Wi-Fi rate fallback behavior

    This resulted in the Wi-Fi link negotiating a 6 Mbps receive rate, even though signal strength was excellent. That single factor explains the observed ~2–3 Mbps real-world throughput.


    The key insight

    The issue was not raw signal strength or ISP bandwidth. It was that:

    The device was correctly connected, but on the wrong frequency band.

    Because 2.4 GHz is shared, crowded, and prone to legacy rate fallback, even a “good signal” connection can perform extremely poorly.


    Why separating SSIDs works

    By splitting the network into:

    • MikroTik-2G
    • MikroTik-5G

    you remove ambiguity. The client is now forced to make an explicit choice rather than an automatic one.

    This has several effects:

    • The 5 GHz network becomes directly selectable and predictable
    • The client stops defaulting to 2.4 GHz “by accident”
    • Roaming behavior becomes deterministic instead of heuristic
    • The high-speed band is consistently used for throughput-heavy traffic

    Final outcome

    After separating the SSIDs and connecting directly to the 5 GHz network, performance immediately returned to expected levels, with significantly higher link rates and throughput aligned with the ISP connection.

    The fix confirmed that the issue was not hardware, drivers, or ISP limitations — but simply band selection behavior combined with shared SSIDs and suboptimal Wi-Fi steering.

  • Changing Controller Mode on DELL RAID Controllers

    Many DELL RAID controllers like the H730 can operate in RAID or HBA mode. To change from one mode to the other it’s necessary to alter the advanced controller settings.

    To start, F2 into the BIOS and click device settings:

    Under devices select your controller. In this case I have an H730 mini installed:

    From here select controller management:

    Scroll down until you see Advanced Controller Management and hit Enter:

    Lastly select your preferred mode. If you’re in RAID mode you will see the option to switch to HBA mode. If you’re in HBA mode you’ll see the option to switch to RAID mode.,

  • Invalid file signature errors on HP G10 servers

    Invalid file signature errors on HP G10 servers

    When attempting to update individual components like the BIOS, you may receive the following error:

    The file signature is invalid. Make sure you are using a valid, signed flash file and try again.

    In my case the iLO 5 firmware was at version 1.46. You can’t easily jump from such an old version to the latest version. Old versions of iLO cannot verify the signatures of the newer BIOS/firmware packages. The solution is to simply stair step the iLO firmware to the latest release and then all the other packages should easily install.

    You also cannot jump from iLO version 1.46 all the way, to let’s say, version 3.18 at the time of this writing.

    I found the following upgrade path works.

    1.46 —> 2.14 —-> 2.35 —–>3.18

    You might also try the following but it fails for me sometimes depending on the server:

    1.46 —>2.35—->3.18 etc.

    Certain version of iLO introduced capabilities to handle larger file sizes. Perhaps some servers have smaller BIOS packages hence the success of this upgrade path for some and the failure for others.

    Once the iLO is fully updated it processes the BIOS upload correctly:

  • iDRAC 9 fails to update firmware at versions 3.21 and below

    iDRAC 9 fails to update firmware at versions 3.21 and below

    If you’re attempting to use DELL’s online updater to update the iDRAC firmware you might notice it fails. This is especially common with iDRAC firmware at 3.21 and earlier.

    The main symptom is of course the failed job in the job queue after using the online or local updater:

    The solution is to perform a manual local update to version 3.30. From 3.30 you can then upgrade to the latest version.

    3.30.30.30 is the last “bridge” release in the original iDRAC9 3.x train. Dell changed the firmware architecture and update prerequisites in later major versions, so systems on 3.21.x are too old to jump directly to current releases.

    Dell doesn’t always state this as “you must install 3.30 first” in every release note, but in practice and in Dell-supported upgrade paths, the progression is:

    3.21.x → 3.30.x → 4.x → 5.x → 6.x → 7.x

    Trying to skip the bridge versions can result in failed updates, Lifecycle Controller incompatibilities, or an iDRAC that needs recovery. Community members and Dell admins consistently report having to step through the major-version boundaries rather than jumping from 3.21 directly to 6.x or 7.x.

    Why 3.30 specifically?

    • It contains major changes to iDRAC and Lifecycle Controller that later releases expect to already be present.
    • Dell’s later upgrade chains are built assuming the system has crossed the 3.30 baseline first.
    • Dell’s own notes contain special handling and migration behavior that first appears at 3.30, including inventory format changes and other internal data structure updates.

    A common successful upgrade path reported for servers starting on 3.21.26.22 is:

    1. 3.21.26.22
    2. 3.30.30.30
    3. 5.10.50.00
    4. 6.00.02.00
    5. Current release

    with BIOS updates performed along the way as required.

  • A configuration change was requested to clear this computers TPM (DELL R640)

    A configuration change was requested to clear this computers TPM (DELL R640)

    The TPM chip in your server typically contains encryption/security related data.. If a request to clear the chip was made you will see the following error.

    “A configuration change was requested to clear this computers TPM (Trusted Platform Module.)

    It then warns you that clearing the TPM will erase all encryption keys stored on the chip.

    You can choose to clear the TPM or reject the change. Your choice depends entirely on your setup.

    Before clearing the TPM, determine whether the server uses:

    • BitLocker (if running Windows)
    • LUKS or other disk encryption (if running Linux)
    • Virtualization security features that store secrets in the TPM

    If the server is not using TPM-backed encryption or security keys, selecting Yes to clear the TPM is generally safe.

    If the server is using BitLocker or other TPM-based encryption, make sure you have the recovery keys before clearing it. Otherwise, the operating system may require recovery information at the next boot.

    A few questions:

    1. What operating system is installed (Windows Server, VMware ESXi, Linux, etc.)?
    2. Did this prompt appear after a BIOS/iDRAC/firmware update?
    3. Is this a production server or a lab/test machine?

    That will help determine whether clearing the TPM is appropriate.

    Other common items stored in a TPM include:

    • Disk encryption keys (or key protectors), such as those used by BitLocker.
    • Platform integrity measurements, which help verify that the server booted with trusted firmware and software.
    • Machine certificates and private keys used for authentication, VPNs, or secure communications.
    • Secure Boot and attestation data used to prove the system’s identity and integrity.
    • Virtualization and security feature secrets, such as credentials used by virtualization-based security features.
    • User authentication material, such as Windows Hello-related keys on desktop systems.

    On a server like a Dell PowerEdge R640, the most important concern is usually whether:

    1. The operating system drive is encrypted and uses the TPM.
    2. Applications or management tools store certificates or cryptographic keys in the TPM.

    What happens if you clear it?

    Clearing the TPM:

    • Deletes the TPM’s stored keys and secrets.
    • Does not erase disks or operating system files.
    • Does not delete application data.
    • May require recovery keys or re-enrollment of security features that depended on those TPM keys.

    If this is a server recently purchased and you have no encrypted data on the machine it is generally OKAY to clear the TPM.

  • A PCIe link training failure is observed in Embedded Network Device

    A PCIe link training failure is observed in Embedded Network Device

    I observed the following error on a server I was troubleshooting. “A PCIe link training failure is observed in Embedded Network Device and the link is disabled.”

    I love errors like this because it’s telling you in no uncertain terms what hardware device is causing the problem. In this case, the NDC or network daughter card.

    Drilling into BIOS—>Devices I can see the problem. This is a 4 port card but only 2 ports are showing up.

    Best case scenario is you reseat the card and the problem goes away. That’s not the case for me so my next course of action is to simply replace the card and if that doesn’t work, begin reseating the processors and inspecting the pins for damage. Because PCIe errors are related to the CPU don’t rule out problems with CPUs or the pins themselves on the motherboard.

    Luckily, I just needed to replace the card. All ports are now good to go and the PCIe errors are resolved.