Illegal opcode hp proliant ошибка

[Update]
As per Jason’s comment, with a new ILO4 update HP apparently has fixed an issue related to booting from SD cards. Whether this is the same issue is unclear though since the original KB article I linked to has not been updated.
[/Update]

Important note: The general symptom of such a Red Screen of Death described here is NOT specific to ESXi or booting from SD cards in general. It can happen with Windows, Linux or any other OS as well as other boot media such as normal disks/RAID arrays, if the server has a problem booting from this device (broken boot sector/partition/boot loader etc).

A couple of weeks ago I was updating a few HP Proliant DL360p Gen8 servers running ESXi on a local SD card with ESXi patches via VUM, so business as usual. Almost, because on one of the servers I ran into the following issue:
After rebooting the host, the BIOS POST completed fine and the Proliant DL360p Gen8 server should now boot the ESXi OS from it’s attached USB SD card where ESXi was installed; but instead it displayed this unsightly screen telling  something went very, very wrong:

iloillegalopcodeI reset the server several times via iLO but the issue persisted and I had no idea what exactly went bonkers here. Then I decided to boot a Linux live image, which worked fine, narrowing down the issue to the OS installation (device) itself. I thought the updates corrupted the installation but that actually wasn’t the case.
When attempting to mount the SD card USB drive from within the live Linux I noticed it was actually completely absent from the system. The USB bus was still ok, but lsusb showed no SD card reader device in the system at all!

Just to make sure I wasn’t imagining things I booted an ESXi installation medium too and likewise, it didn’t detect the local SD card but only the local RAID controller volume:

So the Illegal OpCode Red Screen of Death was probably the result of the server trying to force a boot from the local RAID array volume, which is a pure GPT VMFS5 volume without a proper boot partition.

I first thought the SD card reader or SD card was faulty but after googling around for a while I stumbled upon this article:
HP Advisory: ProLiant DL380p Gen8 Server -Server May Fail to Boot From an SD Card or USB Device After Frequent Reboots While Virtual Media Is Mounted in the HP Integrated Lights-Out 4 (iLO 4) Integrated Remote Console (IRC)

DESCRIPTION
In rare instances, a ProLiant DL380p Gen8 server may fail to boot from an SD card or a USB device after frequent reboots while Virtual Media is mounted in the HP Integrated Lights-Out 4 (iLO 4) Integrated Remote Console (IRC).
This issue can occur if the server is rebooted approximately every five minutes. If this occurs, the following message will be displayed: Non-System disk or disk error-replace and strike any key when ready
SCOPE
Any HP ProLiant DL380p Gen8 server with HP Integrated Lights-Out 4 (iLO 4).
RESOLUTION
If a ProLiant DL380p Gen8 server fails to boot from an SD card or a USB device, cold boot the server to recover from this issue.

The article only mentions DL380p Gen8 servers but I imagine the same could apply to DL360p Gen8 or other servers as well. The problem description doesn’t really fit all that well either to my case but I tried cold booting the server as instructed. And this did the trick. After leaving the server powered-off for about 5 minutes and powering it on again, it detected the SD card again and booted up the ESXi installation on it fine.
For good measure I rebooted the server another time, which also went without a hitch.

The key takeaway here:
1. As per the mentioned HP Advisory, the USB SD card device of a Proliant 380/360 Gen8 server might randomly disappear during a reboot, so be aware of that and try cold booting the server in that case.
2. When dealing with an Illegal OpCode boot error on a HP Proliant server like shown above, make sure you have a valid boot device and the BIOS is properly configured to boot from this device.
On a physical Linux host for example the grub boot loader might be corrupted, which can easily be fixed by re-installing grub with a live Linux. I’ve had that happen to me with physical Linux servers before.

I was running ESXI 5.5 but this issue will affect most dl360p and/or dl380p, my ESXI instance is on my SD card, the server randomly threw an error which resulted in a purple error, rebooted the server and then this red error appeared. Every reboot, it came back, even a cold reboot.

I had this issue happen to me today, every reboot, it comes back.

I tried;

  1. Cold boot.
  2. Turn off for 5 minutes and turn back on.
  3. Removing each ram stick and booting, one at a time
  4. Disable ILO DHCP and TCPIP
  5. Disabling ILO all together

The only way I got it working past the red illegal opcode error was to change the Boot Order so that USB DriveKey was at the top. C drive doesnt exist, CD-ROM is empty, don’t have Floppy Drive.

I seen this the first time and really thought that it would try 1, 2, 3, and then boot from 4, but it didn’t. I had to bring it to the top.

img-20180223-wa0008.jpeg

  1. HP ProLiant DL580 G5: Illegal Opcode on boot

    After a nightmarish 24 hours, I’ve successfully installed Ubuntu 10.04 x86-64 Server on an HP ProLiant DL580 G5 server with an added P800 Raid Controller device. I wanted to make a public record on the steps to finish the installation in case it helps anyone in the future.

    The issue we were running in to was the message «Illegal Opcode» given after BIOS startup before the OS could load, even after a successful OS installation. HP support confirmed that this message is given when the MBR on the boot controller does not refer to a valid bootable partition.

    First, our configuration (after several troubleshooting iterations — I’ll leave out those steps):

    HP ProLiant DL580 G5 with 32GB ECC-RAM, storage:
    * 2x 70GB SAS storage via p400 Raid Controller, configured as 140GB RAID 0 device («/dev/cciss/c1d0»), designated in the raid configuration manager (ORCA) as the Boot Controller
    * 24x 1TB SAS storage via p800 Raid Controller, configured as 2x 10TB RAID 6 devices («/dev/cciss/c0d0» and «/dev/cciss/c0d1»)

    In BIOS, the boot settings were:

    Boot order was set to CD, USB, Floppy, Hard Drive, Ethernet
    Hard Drive order was set to p400, IDE, p800 (note that there were no IDE drives, but for some reason the BIOS wasn’t allowing us to move that device in the order.)

    c1d0 was partitioned as:
    * primary #1 — «/boot» — 2GB — ext2
    * extended #5 — «/» — 100GB — ext4
    * extended #6 — swap — remainder (~38GB) — swap

    c0d0 and c0d1 were partitioned partitioned with lvm. Note that small chunks (~1MB) were left ‘free’ on either side of the 10TB lvm partitions, I assume this is a ‘parted’ or an lvm issue — it did not effect final performance.

    These partitions were then linked in lvm as a single 20TB JFS partition mounted inside of the root file system. (JFS because e2fsprogs doesn’t handle creation of ext4 drives larger than 16TB… still… more than a year after listing this as a ‘top priority’.)

    Installation then proceeded as expected, but note that

    grub-install uses the wrong drive. Specifically, grub-install (as executed by the install script) was installing grub on to /dev/cciss/c0d0, I assume because it was detecting that drive as hd(0). Because the p400 was addressed as /dev/cciss/c1d0 and was also set as the boot controller, grub was sent to the wrong drive, and thus the explanation for the «Illegal Opcode» error on boot.

    The Fix:

    (First I should mention that right before fixing the issue, we also updated all the firmware on the server at HP Support’s suggestion. I cannot rule out that this did not cause the success, although I personally feel it did not make the difference.)

    As the very last step, when the install script ejects the install CD and asks you to press enter to reboot,

    do not press enter, and instead press alt+F2 to go to the install CD console. This screen should say «press enter to use this console» or something like that. Press enter, and use the following commands:

    Code:

    # chroot ./target
    # grub-install /dev/cciss/c1d0

    A large volume of text will scroll across the screen including a lot of what looks like bad errors — don’t worry, this is just grub polling devices that don’t exist. I think you can use something like «—no-floppy» to suppress those warnings, but don’t worry about it. The last message should be «Installation successful» or something like that — that is your indication that the grub-install succeeded.

    Press alt+f1 to return to the install script and press enter to reboot the machine. Your ProLiant DL580 with p800 raid controller should now boot without an illegal opcode exception.


Hi.

I have recently decided to rebuild my server and wanted to disable the hardware RAID so I could use software RAID in a Ubuntu installation. So I disabled the embedded B110i controller in the PCI Device configuration menu.

The server rebooted and during POST I received the ‘Illegal OpCode error’ message with a red background. I’ve attached a screenshot that shows a similar message.

The following URL describes exactly what I have done and how to repair, I need to find the system maintenance switch to reset the NVRAM and hopefully get back into BIOS setup.

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c03481454&prodSeriesId=5075942

Can anyone tell me where on the motherboard the system maintenance switch is located so that I can jumper position 6 to reset NVRAM.

Thanks in advance

Phill

@jl-montes

We recently had installed CoreOS 410 onto hard disk on a few HP BL460c blade servers and they were running/booting ok from disk.

Late last week after recent reboots the servers began stalling at a red screen of death with the message: Illegal Opcode and what appears to be CPU register dumps.

We’ve tried blade server resets, warm boots, cold boots etc, reseating the blade servers but they will not boot from disk anymore.

We can PXE boot or boot from virtual or rescue DVD’s, but we cannot boot the CoreOS that was installed on Harddisk.

We ran HP diagnotic tools on the HW, Array Controller, Hard Disk, but everything checks out OK

Any known issues with these kinds of blade servers or HP gear?

@marineam

Nothing known to us. Is this alpha or beta/stable? If it is alpha it could have started after the upgrade to linux 3.16. When did it start? Can you capture the crash output? If you have access to the first serial port on this hardware a full dump of the log from it would be ideal. (Baud 115200) Otherwise a image of the vga console would be a start.

@jl-montes

Attached is a screenshot of the Red Screen of death

bl460cg7-illegalopcodedump

@marineam

Hm, could you tell how far the boot got before the system clobbered the console with that screen? Did it get past the bootloader to the kernel? Did colorful [ OK ] messages start showing up indicating that it got at least as far as the initrd?

@jl-montes

The boot process got passed all the HW POST and checks, the screen goes blank for a little while, then the red screen appears, i never see any of the typical kernel loading or O/S bootup messages.
I dont think its getting far enough to start the OS. Perhaps its crashing at the point that the boot block is being accessed/read and then crash

@marineam

Strange, 410 used syslinux as its bootloader so if it started you should
see a boot: prompt for a half second or so. Updates don’t touch the
bootloader or the boot_kernel (which kexecs the real kernel) so I’m stumped
why it would have initially worked but failed later. Has this only happened
to a single host or is it reproducible?

The latest alpha versions have switched to grub as the bootloader so you
may have better luck with installing alpha.
On Sep 15, 2014 10:32 AM, «jl-montes» notifications@github.com wrote:

The boot process got passed all the HW POST and checks, the screen goes
blank for a little while, then the red screen appears, i never see any of
the typical kernel loading or O/S bootup messages.
I dont think its getting far enough to start the OS. Perhaps its crashing
at the point that the boot block is being accessed/read and then crash


Reply to this email directly or view it on GitHub
#125 (comment).

@jl-montes

Its reproducible on two systems so far. We may have to try an Alpha version and see how that works over a span of several days

@marineam

Darn, twice is too much to be written off as cosmic rays or gremlins. Would
it be possible for you to capture a dump of the first 200mb or so of the
disk on one of the systems? I just need enough to see the partition table
and the first partition.

Also, are these systems BIOS or UEFI based? If UEFI do you know if it is
defaulting to booting in legacy or UEFI mode?
On Sep 15, 2014 11:28 AM, «jl-montes» notifications@github.com wrote:

Its reproducible on two systems so far. We may have to try an Alpha
version and see how that works over a span of several days


Reply to this email directly or view it on GitHub
#125 (comment).

@jl-montes

They are BIOS boot by default and I do not believe they support UEFI, would have to confirm with HP
Let me collect the disk info for the 1st 250MB of drive space and make it available

@jl-montes

When booting from a rescueCD and running parted /dev/sda print, this is what we see

Model: HP LOGICAL VOLUME (scsi)
Disk /dev/sda: 300GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: pmbr_boot

Number Start End Size File system Name Flags
1 1049kB 135MB 134MB fat16 EFI-SYSTEM boot, legacy_boot
3 135MB 1209MB 1074MB ext2 USR-A
2 1209MB 1276MB 67.1MB BOOT-B
4 1276MB 2350MB 1074MB ext2 USR-B
7 2350MB 2417MB 67.1MB OEM-CONFIG
6 2417MB 2551MB 134MB ext4 OEM
9 2551MB 300GB 297GB ROOT

@marineam

Great, thanks. I looked through the spec sheet and it didn’t mention UEFI
so I assume that means BIOS. Which I suppose is good in that I have a few
ideas for how BIOS booting may break in the way you’ve seen it but no
theories yet for UEFI. :)

On Mon, Sep 15, 2014 at 1:34 PM, jl-montes notifications@github.com wrote:

They are BIOS boot by default and I do not believe they support UEFI,
would have to confirm with HP
Let me collect the disk info for the 1st 250MB of drive space and make it
available


Reply to this email directly or view it on GitHub
#125 (comment).

@jl-montes

@marineam

I can see from the partition table that the system did successfully update so the failure happened after booting a version newer than 410. I’ve been unable to identify anything wrong and the image you provided and it successfully loads a kernel under QEMU.

I suppose the next step is to install one alpha version back (438) which will upgrade to the current version (440) and see if it survives.

Images available from:
http://alpha.release.core-os.net/amd64-usr/438.0.0/
OR:
coreos-install -C alpha -V 438.0.0 …..

@jl-montes

We PXE booted the latest Alpha 440.0.0 release then ran the coreos-install directing 438.0.0 to be the version installed to HD. The install script indicated that CoreOS was installed successfully, but there were gparted warnings.

GPT: Primary header thinks Alt. headers is not at the end of disk
GPT: Alternate GPT header not at end of disk
GPT: Use GNU Parted to correct GPT errors

We ran «parted /dev/sda print» and Parted indicates errors and offers to fix, we let parted fix errors and the warnings went away on subsequent requests to read the partition tables.

We rebooted to the systems to test a HD boot-up, but we encountered the Illegal Opcode Red screen of death again on both blade servers that had CoreOS re-installed.

We also tried PXE booting 440.0.0 and directing the coreos-installer to install «440.0.0» onto HD, we got the same gparted partition table errors and fixed, we saw the same Illegal Opcode Red screen of death on both blade servers when attempting to boot from HD

coreosinstall-438
coreosinstall-440

gpartedfix1

@marineam

The GPT errors are perfectly normal, they are the result of writing the small disk image to the larger physical disk, leaving the secondary GPT in the middle of the disk instead of at the end. CoreOS automatically fixes this on first boot and additionally resizes the ROOT partition and filesystem to match the disk.

So it looks like the issue lies in the kernel, not the bootloader, and perhaps the bootloader and and other early messages are just going by too fast to be able to catch them in the console. I’ll need to do some more digging to figure out what to poke at next. For now if you want to get these systems up and running again you can install ‘stable’ which is still at version 410 I want to get this sorted out before we promote a new version to stable.

@jl-montes

We’ve been watching the console screen pretty diligently, we dont even see the kernel attempt to load. We see the HP splash screens for the POST tests, then the PXE boot screen (default boot from HD in 10 seconds unless other menu options are selected), after 10 seconds or carraige return.

At the moment that PXE-boot hands over control to boot from localdisk a text msg indicating such is displayed, then we see a blank screen, then the red screen of death (RSOD), no kernel load attempts that we can see.

We tried installing CoreOS 410 stable, Ubuntu 14.04 and CentOS 6.5, all are having the same issue. We’ll go back to HP one more time with this newer information and level of detail

@jl-montes

@marineam

Awesome, I’ll close this issue but feel free to update with further details if you get any. That will help us point anyone else dealing with this hardware in the right direction.

Понравилась статья? Поделить с друзьями:
  • Illegal new face 3d max ошибка
  • Import seaborn as sns ошибка
  • Immobilizer see manual volvo xc90 как сбросить ошибку
  • Iis журнал ошибок
  • Import requests python ошибка