Esxi проверка диска на ошибки - TopOshibok.ru - решение и исправление самых разных ошибок

RSS

Use the vmkfstools command to check or repair a virtual disk if it gets corrupted.

-x|--fix [check|repair]

For example,

vmkfstools -x check /vmfs/volumes/my_datastore/my_disk.vmdk

check-circle-line

exclamation-circle-line

close-line

Источник

You would have come across a lot of instances of hard disk failures of your physical servers. It is necessary to identify the exact disk which is failed on the server. It can be easliy checked using hardware managenet tools like HP system Management, HP ILO or even in Hardware status tab of ESXi host from vSphere Client. This post talks about the checking the status of disk failures for esxi host command line utilities. In this post, i am going to discuss about the HP hardware’s and how to check the disk failures from command line in Hp hardware’s. This post will guide you step by step procedure to verify the disk status in ESXi host using HPSSACLI utility which is part of HP ESXi Utilities Offline bundle for VMware ESXi 5.x.

HP ESXi Utilities Offline bundle for VMware ESXi 5.x will be available as part of HP customized ESXi installer image but if it is not a HP customized ESXi image then you may need to download and install HP ESXi Utilities Offline bundle for VMware ESXi 5.x.This ZIP file contains 3 different utilities HPONCFG , HPBOOTCFG and HPSSACLI utilities for remote online configuration of servers.

HPONCFG — Command line utility used for obtaining and setting ProLiant iLO configurations.
HPBOOTCFG — Command line utility used for configuring ProLiant server boot order.
HPSSACLI – Command line utility used for configuration and diagnostics of ProLiant server SmartArrays.

You can download and install HP ESXi utilities offline bundle for ESXi 5.X using below command

esxcli software vib install -f -v /tmp/hp-esxi5.5uX-bundle-1.7-13.zip

You can even directly donwload HPSSACLI utility and Upload the VIB file into your ESXi host and execute the below command to install the HPACUCLI utility.

esxcli software vib install -f -v /tmp/hpssacli-1.60.17.0-5.5.0.vib

Once it is installed. Browse towards the directory /opt/hp/hpssacli/bin and verify the installation.

Check the Disk Failure Status:

Type the below command to check the status of Disks in your ESXi host. It displays the status of the Disk in All Arrays under the Controller.

/opt/hp/hpssacli/bin/hpssacli controller slot=0 physicaldrive all show

Thats it. We identified the disk failure, You may need to generate the HP ADU (Array Diagnostics Utility) report to raise the support case with hardware vendor. Please refer my blog post “How to Generate HP ADU Disk Report in ESXi host” to understand the step by step guide to generate ADU report from ESXi host command line. I hope this is informative for you. Thanks for Reading!!!. Be Social and Share it in Social media, if you feel worth sharing it.

Источник

17 Replies

Is this on a SAN/NAS or a local disk on the ESXi Server?

Was this post helpful?
thumb_up
thumb_down
Jaguar

This person is a verified professional.

Verify your account
to enable IT peers to see that you are a professional.

habanero

What’s running this ESXi Host?

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

This is on a local RAID array on the ESXi server.

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

ESXi is running on a dell power edge 2950 server.

Was this post helpful?
thumb_up
thumb_down
Replicate your data and replace your array. As far as I know, VMFS does it’s own housekeeping and there is no way to force a disk check on the VMFS level. An NTFS chkdisk will only be so effective. Because, as you said, it’s sitting on top of VMFS.

Whenever you suspect a bad block on disk in a production environment, it’s always better to replace first ask questions later.

And I would also advise to stay away from RAID 5 if that is what you are using currently:

RAID 5 vs RAID 10 Opens a new window

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

Wow….that seems very drastic…can I just replace a drive? Is there a way to tell which drive in the array has the bad block?

Was this post helpful?
thumb_up
thumb_down
Is the hardware under any type of warranty? If so, you can probably get it replaced on that error by talking to a support person. I’ve done it — as far as seeing which one is bad, you will need to go into the RAID controller software.

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

Yes, the server is under warranty. Ok, I’ll see if Dell will replace the drive. Thanks

Was this post helpful?
thumb_up
thumb_down
Wow….that seems very drastic…can I just replace a drive? Is there a way to tell which drive in the array has the bad block?

Doesn’t the server tell you where the error is when it tells you that there is an error?

Was this post helpful?
thumb_up
thumb_down
Scott does make a point. Can you not see from the health status in vCenter which disk? You still may need the OMSA to rebuild your array and it’s nice to have available.

Was this post helpful?
thumb_up
thumb_down
Jaguar

This person is a verified professional.

Verify your account
to enable IT peers to see that you are a professional.

habanero

Scott Alan Miller wrote:

Josh@Acts360 wrote:

Wow….that seems very drastic…can I just replace a drive? Is there a way to tell which drive in the array has the bad block?

Doesn’t the server tell you where the error is when it tells you that there is an error?

ESXi should be able to tell you (Though I’ve got the «Dell customized» version of ESXi installed, you can get it of vmware’s site.

attach_file
Attachment

vcenterstorage.PNG
112 KB

Was this post helpful?
thumb_up
thumb_down
Yeah, Jaguar nailed it. You should, at minimum, be able to see the state of your storage in vCenter. At that point you can identify the drive. Having OMSA on your host just makes it easier to perform some of your functions, like storage configurations and changes, without having to reboot and go through the bios to get to it.

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

No Vcenter all I have is the free stuff. I can see some information listed. It actually shows that there is no problem with the system… I also have an ISCSI device connected so maybe that device is the one throwing the errors.

Oh, looks like that is it…I just looked and the event logs and I see the hard disk is disk 1 which points to the ISCSI disk. I updated the firmware on this device (which is a Synology disk station 1010+) and this fixed the issue with this device.

Log Name:      System
Source:        disk
Date:          9/17/2010 9:11:31 AM
Event ID:      51
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      FBLDC.fbdomain.local
Description:
An error was detected on device \Device\Harddisk1\DR4 during a paging operation.
Event Xml:
<Event xmlns=»http://schemas.microsoft.com/win/2004/08/events/event Opens a new window«>
<System>
    <Provider Name=»disk» />
    <EventID Qualifiers=»32772″>51</EventID>
    <Level>3</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime=»2010-09-17T13:11:31.421Z» />
    <EventRecordID>177451</EventRecordID>
    <Channel>System</Channel>
    <Computer>FBLDC.fbdomain.local</Computer>
    <Security />
</System>
<EventData>
    <Data>\Device\Harddisk1\DR4</Data>
    <Binary>030080000100000000000000330004802D0100000E0000C0000000000000000000000000000000006262170000000000FFFFFFFF010000005800002100000000BB20101242032040001000003C0000000000000000000000789BF70C80FAFFFF0000000000000000909B010A80FAFFFF0000000000000000E807640000000000880000000000006407E8000000080000000000000000000000000000000000000000000000000000</Binary>
</EventData>
</Event>

attach_file
Attachment

VMWare.png
9.18 KB

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

Turns out that if I had just looked at the event log closer I would have noticed that the drive that the event refered to was pointing to my ISCSI, which was offline…

Was this post helpful?
thumb_up
thumb_down
Jaguar

This person is a verified professional.

Verify your account
to enable IT peers to see that you are a professional.

habanero

should have said it was an iSCSI

Glad you got it fixed.

Was this post helpful?
thumb_up
thumb_down

Источник

The procedure below documents the commands necessary to run a check of the system partitions of ESXi. The below image shows the output of fdisk -l and the partitions which will be checked are circled. The 2 partitions consisting of 49136 blocks are the Hypervisor1 and Hypervisor2 partitions. These are mounted by ESXi as /bootbank and /altbootbank and store the firmware which ESXi boots with. A system backup file state.tgz (local.tgz for ESXi Embedded) is also stored on these partitions.

ESXi will read /bootbank when booting and then will backup it’s configuration once per hour. The last partition consisting of 552944 blocks is Hypervisor3 and is mounted as /store by ESXi. This partition is used to store items like download files for the VI client, VMware Tools ISOs for VMs, and configuration and system files for the vCenter Server agent and the HA agent.

The last partition circle first below will only exist with ESXi Installable. This partition is mounted as /scratch and is where ESXi will place the userworld swap file. This partition will correspond to the location set by the Advanced Setting: ScratchConfig.ConfiguredScratchLocation.

While not necessary for this procedure, you can use the commands esxcfg-vmhbadevs and ls to link the partitions shown by fdisk to the mounts ESXi has made to determine which partition is /altbootbank and which is /bootbank.

~ # esxcfg-vmhbadevs -f

[2009-03-19 01:19:03 ‘StorageInfo’ warning] Skipping dir: /vmfs/volumes/0451af74-f19fbb7e-e274-97e1e6858ec4. Cannot open volume: /vmfs/volumes/0451af74-f19fbb7e-e274-97e1e6858ec4
vmhba1:0:0:8 /vmfs/devices/disks/vmhba1:0:0:8 e0a264ee-3bc421b8-cdd5-3a5cb7c2a09f
vmhba1:0:0:2 /vmfs/devices/disks/vmhba1:0:0:2 488fb202-34873070-edd2-00096b63ac0a
vmhba1:0:0:5 /vmfs/devices/disks/vmhba1:0:0:5 9820ef76-fed75a33-f596-a0e3aa642c3a
~ # ls -l | grep vmfs

l——— 0 root root 1984 Jan 1 1970 altbootbank -> /vmfs/volumes/0451af74-f19fbb7e-e274-97e1e6858ec4
l——— 0 root root 1984 Jan 1 1970 bootbank -> /vmfs/volumes/9820ef76-fed75a33-f596-a0e3aa642c3a
l——— 0 root root 1984 Jan 1 1970 scratch -> /vmfs/volumes/488fb202-34873070-edd2-00096b63ac0a
l——— 0 root root 1984 Jan 1 1970 store -> /vmfs/volumes/e0a264ee-3bc421b8-cdd5-3a5cb7c2a09f
drwxr-xr-x 1 root root 512 Jan 9 02:35 vmfs

Once you have identified the partitions to check you can use the dosfsck command to check a partition. The command has a number of options, but you must at least specify the disk to check. The first example also includes the -v option with provides verbose output. The -a option will automatically try to correct any issues.

dosfsck -v /dev/disks/vmhba1:0:0:5

dosfsck 2.11 (12 Mar 2005)
dosfsck 2.11, 12 Mar 2005, FAT32, LFN
Checking we can access the last sector of the filesystem
Boot sector contents:
System ID "mkdosfs"
Media byte 0xf8 (hard disk)
512 bytes per logical sector
1024 bytes per cluster
2 reserved sectors
First FAT starts at byte 1024 (sector 2)
2 FATs, 16 bit entries
98304 bytes per FAT (= 192 sectors)
Root directory starts at byte 197632 (sector 386)
512 root directory entries
Data area starts at byte 214016 (sector 418)
48927 data clusters (50101248 bytes)
32 sectors/track, 64 heads
0 hidden sectors
98272 sectors total
Checking for unused clusters.
/dev/disks/vmhba1:0:0:5: 10 files, 37485/48927 clusters

You can also use the -V option to run a verification pass of a partition or the -t option to test for bad sectors (this also requires the -a (automatically repair) or -r (interactively repair) options).

dosfsck -t -r /dev/disks/vmhba1:0:0:2

dosfsck 2.11, 12 Mar 2005, FAT32, LFN
Seek to 2147491840:Success

dosfsck -V /dev/disks/vmhba1:0:0:2

dosfsck 2.11, 12 Mar 2005, FAT32, LFN
Starting check/repair pass.
Starting verification pass.
/dev/disks/vmhba1:0:0:2: 8 files, 16390/65515 clusters

All the options for the command dosfsck are shown below.

dosfsck
usage: dosfsck [-aAflrtvVwy] [-d path -d ...] [-u path -u ...]
device
-a automatically repair the file system
-A toggle Atari file system format
-d path drop that file
-f salvage unused chains to files
-l list path names
-n no-op, check non-interactively without changing
-r interactively repair the file system
-t test for bad clusters
-u path try to undelete that (non-directory) file
-v verbose mode
-V perform a verification pass
-w write changes to disk immediately
-y same as -a, for compat with other *fsck

Источник

В системе есть команда esxcli

Список жестких дисков

esxcli storage core device list

Просмотр интеллектуальной информации о жестком диске

esxcli storage core device smart get -d <disk>

smart

Установить поддержку сообщества smartctl

Сначала загрузите программное обеспечение
http://pfoo.unscdf.org/esxi/smartctl-6.6-4433.x86_64.vib
Загрузить на виртуальную машину с помощью winscp
Установите, чтобы разрешить программное обеспечение сообщества
установить программное обеспечение
Используйте команду smartctl

/opt/smartmontools/smartctl -d sat -a /vmfs/devices/disks/<disk>

Имя жесткого диска можно получить указанным выше способом.

Вы также можете использовать все команды ssh

cd /vmfs/volumes/datastore1
wget http://pfoo.unscdf.org/esxi/smartctl-6.6-4433.x86_64.vib
esxcli software acceptance set --level=CommunitySupported
esxcli software vib install -v /vmfs/volumes/datastore1/smartctl-6.6-4433.x86_64.vib

Частичная ссылка https://wiki.csnu.org/index.php/ESXi_smart_/_smartctl

Источник

Check the Disk Failure Status:

17 Replies

В системе есть команда esxcli

Установить поддержку сообщества smartctl

Интересное по теме: