Проверка btrfs на ошибки

SYNOPSIS

btrfs check [options] <device>

DESCRIPTION

The filesystem checker is used to verify structural integrity of a filesystem
and attempt to repair it if requested. It is recommended to unmount the
filesystem prior to running the check, but it is possible to start checking a
mounted filesystem (see —force).

By default, btrfs check will not modify the device but you can reaffirm that
by the option —readonly.

btrfsck is an alias of btrfs check command and is now deprecated.

Warning

Do not use —repair unless you are advised to do so by a developer
or an experienced user, and then only after having accepted that no fsck
successfully repair all types of filesystem corruption. E.g. some other software
or hardware bugs can fatally damage a volume.

The structural integrity check verifies if internal filesystem objects or
data structures satisfy the constraints, point to the right objects or are
correctly connected together.

There are several cross checks that can detect wrong reference counts of shared
extents, backreferences, missing extents of inodes, directory and inode
connectivity etc.

The amount of memory required can be high, depending on the size of the
filesystem, similarly the run time. Check the modes that can also affect that.

SAFE OR ADVISORY OPTIONS

-b|—backup

use the first valid set of backup roots stored in the superblock

This can be combined with —super if some of the superblocks are damaged.

--check-data-csum

verify checksums of data blocks

This expects that the filesystem is otherwise OK, and is basically an offline
scrub that does not repair data from spare copies.

--chunk-root <bytenr>

use the given offset bytenr for the chunk tree root

-E|—subvol-extents <subvolid>

show extent state for the given subvolume

-p|—progress

indicate progress at various checking phases

-Q|—qgroup-report

verify qgroup accounting and compare against filesystem accounting

-r|—tree-root <bytenr>

use the given offset ‘bytenr’ for the tree root

--readonly

(default)
run in read-only mode, this option exists to calm potential panic when users
are going to run the checker

-s|—super <N>

use Nth superblock copy, valid values are 0, 1 or 2 if the
respective superblock offset is within the device size

This can be used to use a different starting point if some of the primary
superblock is damaged.

—clear-space-cache v1|v2

completely remove the free space cache of the given version

See also the clear_cache mount option.

--clear-ino-cache

remove leftover items pertaining to the deprecated inode map feature

DANGEROUS OPTIONS

--repair

enable the repair mode and attempt to fix problems where possible

Note

There’s a warning and 10 second delay when this option is run without
—force to give users a chance to think twice before running repair, the
warnings in documentation have shown to be insufficient.

--init-csum-tree

create a new checksum tree and recalculate checksums in all files

Warning

Do not blindly use this option to fix checksum mismatch problems.

--init-extent-tree

build the extent tree from scratch

Warning

Do not use unless you know what you’re doing.

--mode <MODE>

select mode of operation regarding memory and IO

The MODE can be one of:

original

The metadata are read into memory and verified, thus the requirements are high
on large filesystems and can even lead to out-of-memory conditions. The
possible workaround is to export the block device over network to a machine
with enough memory.

lowmem

This mode is supposed to address the high memory consumption at the cost of
increased IO when it needs to re-read blocks. This may increase run time.

Note

lowmem mode does not work with —repair yet, and is still considered
experimental.

--force

allow work on a mounted filesystem and skip mount checks. Note that
this should work fine on a quiescent or read-only mounted filesystem
but may crash if the device is changed externally, e.g. by the kernel
module.

Note

It is possible to run with —repair but on a mounted filesystem
that will most likely lead to a corruption unless the filesystem
is in a quiescent state which may not be possible to guarantee.

This option also skips the delay and warning in the repair mode (see
—repair).

EXIT STATUS

btrfs check returns a zero exit status if it succeeds. Non zero is
returned in case of failure.

AVAILABILITY

btrfs is part of btrfs-progs. Please refer to the documentation at
https://btrfs.readthedocs.io.

SEE ALSO

mkfs.btrfs(8),
btrfs-scrub(8),
btrfs-rescue(8)

btrfs-check(8)
check or repair an unmounted btrfs filesystem

SYNOPSIS

btrfs check [options] <device>

DESCRIPTION

btrfs check is used to check or repair an unmounted btrfs filesystem.


Note

Since btrfs is under development, the btrfs check capabilities are continuously enhanced. It’s highly recommended to read the following btrfs wiki before executing btrfs check with —repair option: m[blue]https://btrfs.wiki.kernel.org/index.php/Btrfsckm[]

btrfsck is an alias of btrfs check command and is now deprecated.

OPTIONS

-s|—super <superblock>

use
<superblock>th superblock copy, valid values are 0 up to 2 if the respective superblock offset is within the filesystem

-b|—backup

use the first backup roots stored in the superblock that is valid

—repair

try to repair the filesystem

—readonly

run in read-only mode (default)

—init-csum-tree

create a new CRC tree and recalculate all checksums

—init-extent-tree

create a new extent tree

—check-data-csum

verify checksums of data blocks

-p|—progress

indicate progress at various checking phases

—qgroup-report

verify qgroup accounting and compare against filesystem accounting

-E|—subvol-extents <subvolid>

show extent state for the given subvolume

-r|—tree-root <bytenr>

use the given bytenr for the tree root

—chunk-root <bytenr>

use the given bytenr for the chunk tree root

EXIT STATUS

btrfs check returns a zero exit status if it succeeds. Non zero is returned in case of failure.

LAST SEARCHED

  • plser (1)
  • PPI::Token::Cast (3)
  • jmkmf (1)
  • mrwguesswb (1)
  • atalkd (8)
  • filesystem (5)
  • pppd (8)
  • grohtml (1)
  • snd_soc_kcontrol_component (9)
  • File::ExtAttr::Tie (3)
  • Razor2::Preproc::deHTMLxs (3)

Скопирую сюда хорошее описание порядка действия для починки BTRFS. На английском.

The below are the steps I would recommend for ANY btrfs issue, smart
people reading dmesg or syslog can probably figure out which of these
steps they’d need to skip to in order to fix their particular problem.

Step 1 — boot to a suitable alternative system, such as a different
installation of openSUSE, a liveCD, or an openSUSE installation DVD.
The installation DVD for the version of openSUSE you are running is
usually the best choice as it will certainly use the same kernel/btrfs
version.
Step 2 — Go to a suitable console and make sure you do the below as root
Step 3 — Try to mount your partition to /mnt, just to confirm it’s
really broken (eg. «mount /dev/sda1 /mnt»)
Step 4 — If it mounts — are you sure it’s broken? if Yes — run «btrfs
scrub start /mnt» to scrub the system, and «btrfs scrub status /mnt»
to monitor it
Step 5 — If it doesn’t mount, try to scrub the device just in case it
works (eg. «btrfs scrub start /dev/sda1» and «btrfs scrub status
/dev/sda1» to monitor). Try mounting, if yes, you’re fixed.
Step 6 — If scrubbing is not an option or does not resolve the issue
then try «mount -o usebackuproot» instead (eg. «mount -o usebackuproot
/dev/sda1 /mnt»).

==Interlude==
All of the above steps are considered safe and should make no
destructive changes to disk, and have fixed every filesystem issue
I’ve had on btrfs in the last 5 years
Full disk issues need a different approach (basically, delete stuff
;)) documented here:
https://www.suse.com/documentation/sles-12/stor_admin/data/sect_filesystems_trouble.html

If the above doesn’t fix things for you, you can continue with the
below steps but the situation is serious enough to justify a bug
report, please!
==

Step 7 — Run «btrfs check » (eg. «btrfs check /dev/sda1»).
This isn’t going to help, but save the log somewhere, it will be
useful for the bug report.
Step 8 — Seriously consider running «btrfs restore » (eg. «btrfs restore /dev/sda1 /mnt/usbdrive»). This
won’t fix anything but it will scan the filesystem and recover
everything it can to the mounted device. This especially useful if
your btrfs issues are actually caused by failing hardware and not
btrfs fault.
Step 9 — Run «btrfs rescue super-recover » (eg. «btrfs rescue
super-recover /dev/sda1»). Then try to mount the device normally. If
it works, stop going.
Step 10 — Run «btrfs rescue zero-log » (eg. «btrfs rescue
zero-log /dev/sda1»). Then try to mount the device normally. If it
works, stop going.
Step 11 — Run «btrfs rescue chunk-recover » (eg. «btrfs rescue
chunk-recover /dev/sda1»). This will take a LONG while. Then try to
mount the device normally. If it works, stop going.
Step 12 — Don’t just consider it this time, don’t be an idiot, run
«btrfs restore » (eg. «btrfs restore
/dev/sda1 /mnt/usbdrive»).
Step 13 — Seriously, don’t be an idiot, use btrfs restore

==Danger zone=
The above tools had a small chance of making unwelcome changes, but
now you’re in the seriously suicidal territory, do not do the
following unless you’re prepared to accept the consequences of your
choice.
==

Step 14 — Now, ONLY NOW, try btrfsck aka «btrfs check —repair
» (eg. «btrfs check —repair /dev/sda1»)

There, I’m very confident the above will help you Andrei, and for
everyone else, can we please bury the nonsense that btrfs is lacking
when it comes to repair and recovery tools?

You have scrub which is safe for day to day use, the perfectly safe
usebackuproot mount option, and the various «btrfs rescue» commands
which are only moderately worrying compared to the practical Russian
roulette which is «btrfs check»

Источник

In addition to the regular logging system, BTRFS does have a stats command, which keeps track of errors (including read, write and corruption/checksum errors) per drive:

# btrfs device stats /
[/dev/mapper/luks-123].write_io_errs   0
[/dev/mapper/luks-123].read_io_errs    0
[/dev/mapper/luks-123].flush_io_errs   0
[/dev/mapper/luks-123].corruption_errs 0
[/dev/mapper/luks-123].generation_errs 0

So you could create a simple root cronjob:

[email protected]
@hourly /sbin/btrfs device stats /data | grep -vE ' 0$'

This will check for positive error counts every hour and send you an email. Obviously, you would test such a scenario (for example by causing corruption or removing the grep) to verify that the email notification works.

In addition, with advanced filesystems like BTRFS (that have checksumming) it’s often recommended to schedule a scrub every couple of weeks to detect silent corruption caused by a bad drive.

@monthly /sbin/btrfs scrub start -Bq /data

The -B option will keep the scrub in the foreground, so that you will see the results in the email cron sends you. Otherwise, it’ll run in the background and you would have to remember to check the results manually as they would not be in the email.

Update: Improved grep as suggested by Michael Kjörling, thanks.

Update 2:
Additional notes on scrubbing vs. regular read operations (this doesn’t just apply to BTRFS only):
As pointed out by Ioan, a scrub can take many hours, depending on the size and type of the array (and other factors), even more than a day in some cases. And it is an active scan, it won’t detect future errors — the goal of a scrub is to find and fix errors on your drives at that point in time. But as with other RAID systems, it is recommended to schedule periodic scrubs. It’s true that a typical i/o operation, like reading a file, does check if the data that was read is actually correct. But consider a simple mirror — if the first copy of the file is damaged, maybe by a drive that’s about to die, but the second copy, which is correct, is actually read by BTRFS, then BTRFS won’t know that there is corruption on one of the drives. This is simply because the requested data has been received, it matches the checksum BTRFS has stored for this file, so there’s no need for BTRFS to read the other copy. This means that even if you specifically read a file that you know is corrupted on one drive, there is no guarantee that the corruption will be detected by this read operation.
Now, let’s assume that BTRFS only ever reads from the good drive, no scrub is run that would detect the damage on the bad drive, and then the good drive goes bad as well — the result would be data loss (at least BTRFS would know which files are still correct and will still allow you to read those). Of course, this is a simplified example; in reality, BTRFS won’t always read from one drive and ignore the other.
But the point is that periodic scrubs are important because they will find (and fix) errors that regular read operations won’t necessarily detect.

Faulted drives: Since this question is quite popular, I’d like to point out that this «monitoring solution» is for detecting problems with possibly bad drives (e.g., dying drive causing errors but still accessible).

On the other hand, if a drive is suddenly gone (disconnected or completely dead rather than dying and producing errors), it would be a faulted drive (ZFS would mark such a drive as FAULTED). Unfortunately, BTRFS may not realize that a drive is gone while the filesystem is mounted, as pointed out in this mailing list entry from 09/2015 (it’s possible that this has been patched):

The difference is that we have code to detect a device not being present at mount, we don’t have code (yet) to detect it dropping on a mounted filesystem. Why having proper detection for a device disappearing does not appear to be a priority, I have no idea, but that is a separate issue from mount behavior.

https://www.mail-archive.com/[email protected]/msg46598.html

There’d be tons of error messages in dmesg by that time, so grepping dmesg might not be reliable.
For a server using BTRFS, it might be an idea to have a custom check (cron job) that sends an alert if at least one of the drives in the RAID array is gone, i.e., not accessible anymore…

Linux «btrfs-check» Command Line Options and Examples
check or repair a btrfs filesystem

The filesystem checker is used to verify structural integrity of a filesystem and attempt to repair it if requested. It is recommended to unmount the filesystem prior to running the check, but it is possible to start checking a mounted filesystem (see —force). By default, btrfs check will not modify the device but you can reaffirm that by the option —readonly.

Usage:

btrfs check [options]

Command Line Options:

-b|—backup

use the first valid set of backup roots stored in the superblockThis can be combined with —super if some of the superblocks are damaged.


btrfs-check -b|--backup ...

—check-data-csum

verify checksums of data blocksThis expects that the filesystem is otherwise OK, and is basically and offline scrub but does not repair data from spare copies.


btrfs-check --check-data-csum ...

—chunk-root

use the given offset bytenr for the chunk tree root


btrfs-check --chunk-root ...

-E|—subvol-extents

show extent state for the given subvolume


btrfs-check -E|--subvol-extents ...

-p|—progress

indicate progress at various checking phases


btrfs-check -p|--progress ...

-Q|—qgroup-report

verify qgroup accounting and compare against filesystem accounting


btrfs-check -Q|--qgroup-report ...

-r|—tree-root

use the given offset bytenr for the tree root


btrfs-check -r|--tree-root ...

—readonly

(default) run in read-only mode, this option exists to calm potential panic when users are going to run the checker


btrfs-check --readonly ...

-s|—super

use ‘superblock’th superblock copy, valid values are 0, 1 or 2 if the respective superblock offset is within the device sizeThis can be used to use a different starting point if some of the primary superblock is damaged.


btrfs-check -s|--super ...

—clear-space-cache

completely wipe all free space cache of given typeFor free space cache v1, the clear_cache kernel mount option only rebuilds the free space cache for block groups that aremodified while the filesystem is mounted with that option. Thus, using this option with v1 makes it possible to actually clearthe entire free space cache.For free space cache v2, the clear_cache kernel mount option destroys the entire free space cache. This option, with v2 providesan alternative method of clearing the free space cache that doesn’t require mounting the filesystem.DANGEROUS OPTIONS


btrfs-check --clear-space-cache ...

—repair

enable the repair mode and attempt to fix problems where possible

—init-csum-tree

create a new checksum tree and recalculate checksums in all filesNoteDo not blindly use this option to fix checksum mismatch problems.


btrfs-check --init-csum-tree ...

—init-extent-tree

build the extent tree from scratchNoteDo not use unless you know what you’re doing.


btrfs-check --init-extent-tree ...

—mode

select mode of operation regarding memory and IOThe MODE can be one of original and lowmem. The original mode is mostly unoptimized regarding memory consumption and can lead toout-of-memory conditions on large filesystems. The possible workaround is to export the block device over network to a machinewith enough memory. The low memory mode is supposed to address the memory consumption, at the cost of increased IO when it needsto re-read blocks when needed. This may increase run time.Notelowmem mode does not work with —repair yet, and is still considered experimental.

—force

allow to work on a mounted filesystem. Note that this should work fine on a quiescent or read-only mounted filesystem but maycrash if the device is changed externally, eg. by the kernel module. Repair without mount checks is not supported right now.EXIT STATUSbtrfs check returns a zero exit status if it succeeds. Non zero is returned in case of failure.AVAILABILITYbtrfs is part of btrfs-progs. Please refer to the btrfs wiki http://btrfs.wiki.kernel.org for further details.

Понравилась статья? Поделить с друзьями:
  • Проверить файл на ошибки автокад
  • Проверить файл word на ошибки
  • Проверить торрент файл на ошибки
  • Проверить устройство на наличие ошибок
  • Проверить уаз патриот на ошибки