Bag ошибка bug или

  • #1

I have dell r710 which I am running proxmox 6.2 on. The host has a mix of some local storage, and other storage presented via iSCSI/NFS. The local storage consists of 2x500G drives in raid 1 for the proxmox OS, and 2x500G SSDs in raid 1.

Most of the VMs run on the iSCSI share, while I have a few running on the local SSD storage.

I have been running this in my homelab for a while now, but all of a sudden I have started to have a issue I am not sure how to track down. It seems like the host can be online for about a week, then all of a sudden several of my VMs go offline.
When this happens I cannot even access the console in proxmox. They usually time out, and I believe it says «error waiting on systemd». What is in common is that they are all on the SSD storage. But not all VMs are affected on this storage. For example I also have pfsense running on these SSDs,
and it continues to work and also console access works.

It is also interesting that when this happens the I/O delay shoots up to about 8% and stays there.

I find this error in the syslog file, but I’m not sure what it means and what next steps I should take with it is.

Jun 18 03:10:26 compute1 kernel: [623467.883697] INFO: task kvm:2361 blocked for more than 241 seconds.
Jun 18 03:10:26 compute1 kernel: [623467.883732] Tainted: P IOE 5.4.34-1-pve #1
Jun 18 03:10:26 compute1 kernel: [623467.883752] «echo 0 > /proc/sys/kernel/hung_task_timeout_secs» disables this message.
Jun 18 03:10:26 compute1 kernel: [623467.883777] kvm D 0 2361 1 0x00000000
Jun 18 03:10:26 compute1 kernel: [623467.883779] Call Trace:
Jun 18 03:10:26 compute1 kernel: [623467.883787] __schedule+0x2e6/0x700
Jun 18 03:10:26 compute1 kernel: [623467.883789] schedule+0x33/0xa0
Jun 18 03:10:26 compute1 kernel: [623467.883790] schedule_preempt_disabled+0xe/0x10
Jun 18 03:10:26 compute1 kernel: [623467.883792] __mutex_lock.isra.10+0x2c9/0x4c0
Jun 18 03:10:26 compute1 kernel: [623467.883823] ? kvm_arch_vcpu_put+0xe2/0x170 [kvm]
Jun 18 03:10:26 compute1 kernel: [623467.883825] __mutex_lock_slowpath+0x13/0x20
Jun 18 03:10:26 compute1 kernel: [623467.883826] mutex_lock+0x2c/0x30
Jun 18 03:10:26 compute1 kernel: [623467.883828] sr_block_ioctl+0x43/0xd0
Jun 18 03:10:26 compute1 kernel: [623467.883832] blkdev_ioctl+0x4c1/0x9e0
Jun 18 03:10:26 compute1 kernel: [623467.883835] block_ioctl+0x3d/0x50
Jun 18 03:10:26 compute1 kernel: [623467.883837] do_vfs_ioctl+0xa9/0x640
Jun 18 03:10:26 compute1 kernel: [623467.883838] ksys_ioctl+0x67/0x90
Jun 18 03:10:26 compute1 kernel: [623467.883840] __x64_sys_ioctl+0x1a/0x20
Jun 18 03:10:26 compute1 kernel: [623467.883843] do_syscall_64+0x57/0x190
Jun 18 03:10:26 compute1 kernel: [623467.883846] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 18 03:10:26 compute1 kernel: [623467.883848] RIP: 0033:0x7f2e40f97427
Jun 18 03:10:26 compute1 kernel: [623467.883852] Code: Bad RIP value.
Jun 18 03:10:26 compute1 kernel: [623467.883853] RSP: 002b:00007f2d75ffa098 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jun 18 03:10:26 compute1 kernel: [623467.883855] RAX: ffffffffffffffda RBX: 00007f2e33af4850 RCX: 00007f2e40f97427
Jun 18 03:10:26 compute1 kernel: [623467.883856] RDX: 000000007fffffff RSI: 0000000000005326 RDI: 0000000000000012
Jun 18 03:10:26 compute1 kernel: [623467.883856] RBP: 0000000000000001 R08: 0000559be29be890 R09: 0000000000000000
Jun 18 03:10:26 compute1 kernel: [623467.883857] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f2d74a42268
Jun 18 03:10:26 compute1 kernel: [623467.883858] R13: 0000000000000000 R14: 0000559be2ef0d20 R15: 0000559be27fc740

Steps I have done so far:
I ran a memtest for one entire pass, and no issues were found
I ran smart tests on all the local drives, no issues were found
I removed the SSDs from the host, checked their status in windows, ran additional tests (none found) and applied available firmware.

Any help of a next step or details in what the possible message means would be appreciated!

Edit: If it helps, pveversion output is below:

Code:

proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

Last edited:

wolfgang


  • #2

Hi,

please upgrade to the current kernel 5.4.44-1pve and install the «intel-microcode» package.

  • #3

Hi,

please upgrade to the current kernel 5.4.44-1pve and install the «intel-microcode» package.

I appericate the response, I am upgrading now and I will update in about a week how it seems to hold up!

  • #4

Hi,

please upgrade to the current kernel 5.4.44-1pve and install the «intel-microcode» package.

Unfortunately, again after a week of uptime I have seen the same error. I did make another change, in which I moved 2 of the 3 VMs to other storage than they were originally on to troubleshoot if it was something with the local SSD storage. That did not make a difference, and today the same 3 VMs on this host went down.

I did the intel micro code package through apt, was this the method that you were referring to?

Code:

apt list | grep intel

intel-microcode/oldstable,now 3.20200609.2~deb9u1 amd64 [installed]

Code:

Jun 30 05:10:25 compute1 kernel: [642437.024414] kvm             D    0  1929      1 0x00000000
Jun 30 05:10:25 compute1 kernel: [642437.024417] Call Trace:
Jun 30 05:10:25 compute1 kernel: [642437.024427]  __schedule+0x2e6/0x6f0
Jun 30 05:10:25 compute1 kernel: [642437.024429]  schedule+0x33/0xa0
Jun 30 05:10:25 compute1 kernel: [642437.024431]  schedule_preempt_disabled+0xe/0x10
Jun 30 05:10:25 compute1 kernel: [642437.024433]  __mutex_lock.isra.10+0x2c9/0x4c0
Jun 30 05:10:25 compute1 kernel: [642437.024464]  ? kvm_arch_vcpu_put+0xe2/0x170 [kvm]
Jun 30 05:10:25 compute1 kernel: [642437.024482]  ? kvm_skip_emulated_instruction+0x3b/0x60 [kvm]
Jun 30 05:10:25 compute1 kernel: [642437.024484]  __mutex_lock_slowpath+0x13/0x20
Jun 30 05:10:25 compute1 kernel: [642437.024485]  mutex_lock+0x2c/0x30
Jun 30 05:10:25 compute1 kernel: [642437.024488]  sr_block_ioctl+0x43/0xd0
Jun 30 05:10:25 compute1 kernel: [642437.024493]  blkdev_ioctl+0x4c1/0x9e0
Jun 30 05:10:25 compute1 kernel: [642437.024497]  ? __wake_up_locked_key+0x1b/0x20
Jun 30 05:10:25 compute1 kernel: [642437.024501]  block_ioctl+0x3d/0x50
Jun 30 05:10:25 compute1 kernel: [642437.024503]  do_vfs_ioctl+0xa9/0x640
Jun 30 05:10:25 compute1 kernel: [642437.024505]  ksys_ioctl+0x67/0x90
Jun 30 05:10:25 compute1 kernel: [642437.024506]  __x64_sys_ioctl+0x1a/0x20
Jun 30 05:10:25 compute1 kernel: [642437.024509]  do_syscall_64+0x57/0x190
Jun 30 05:10:25 compute1 kernel: [642437.024512]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 30 05:10:25 compute1 kernel: [642437.024514] RIP: 0033:0x7f8571b90427
Jun 30 05:10:25 compute1 kernel: [642437.024519] Code: Bad RIP value.
Jun 30 05:10:25 compute1 kernel: [642437.024520] RSP: 002b:00007f8563d790d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jun 30 05:10:25 compute1 kernel: [642437.024522] RAX: ffffffffffffffda RBX: 00007f85646ea9a0 RCX: 00007f8571b90427
Jun 30 05:10:25 compute1 kernel: [642437.024523] RDX: 000000007fffffff RSI: 0000000000005326 RDI: 0000000000000014
Jun 30 05:10:25 compute1 kernel: [642437.024524] RBP: 0000000000000000 R08: 0000560d289ff710 R09: 0000000000000000
Jun 30 05:10:25 compute1 kernel: [642437.024525] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f83ddb09800
Jun 30 05:10:25 compute1 kernel: [642437.024526] R13: 0000000000000006 R14: 0000560d28f31d20 R15: 0000560d2883d740
Jun 30 05:10:25 compute1 kernel: [642437.024607] kvm             D    0  2043      1 0x00000000

Code:

pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.44-1-pve)
pve-manager: 6.2-6 (running version: 6.2-6/ee1d7754)
pve-kernel-5.4: 6.2-3
pve-kernel-helper: 6.2-3
pve-kernel-5.4.44-1-pve: 5.4.44-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-3
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-7
pve-cluster: 6.1-8
pve-container: 3.1-8
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1

wolfgang


  • #5

I did the intel micro code package through apt, was this the method that you were referring to?

This is correct.

When you say you moved the VM to another storage. What is this storage?
Is this on the same HBA/Raid/Onboard controller?

  • #6

The storage it was on, was on a raid controller local to this proxmox system. I moved it to a iSCSI share running on a freenas install.

I thought as a troubleshooting step I would first reinstall proxmox 6.2, and then try 6.1 again. This morning again, it did the same panic once again on 6.2. I was going to try 6.1 again. It is also strange that this time it happened in under than 24 hours, and usually it takes a week. I thought this might be a good option to rule out any hardware issues on my side.

wolfgang


  • #7

You don’t have to install PVE 6.1
It should be enough when you install an old kernel and boot it.

  • #8

So on PVE 6.1 and older kernel it seems to be up over a week (9 days).

pve-kernel-helper: 6.1-6
pve-kernel-5.3: 6.1-5
pve-kernel-5.3.18-2-pve: 5.3.18-2

I am not very familiar with running different kernels, but I assume I would just install these through apt-get? Is there a list of kernel versions so I knew which to install?

I could try upgrading to 6.2 PVE again, and using an older kernel as well.

  • #9

kweevuss are you use storage replication in your configuration?

  • #10

kweevuss are you use storage replication in your configuration?

No nothing regarding storage replication is configured on this system.

wolfgang


  • #11

I am not very familiar with running different kernels, but I assume I would just install these through apt-get?

Yes, you can install it like this.

Code:

apt install pve-kernel-5.3.18-2-pve

udo

Famous Member


  • #12

Hi,
any news on this?
I have an similiar effect, the VMs with lvm are still working, but all lvm-commands (lvs, vgs, pvs) hung and due this the node and all VMs are marked with an ? in the gui.

Code:

Sep 28 21:14:27 pve02 kernel: [ 2783.664724] INFO: task pvs:411112 blocked for more than 362 seconds.
Sep 28 21:14:27 pve02 kernel: [ 2783.692327]       Tainted: P           O      5.4.60-1-pve #1
Sep 28 21:14:27 pve02 kernel: [ 2783.716796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 28 21:14:27 pve02 kernel: [ 2783.725214] pvs             D    0 411112 401506 0x00000004
Sep 28 21:14:27 pve02 kernel: [ 2783.731267] Call Trace:
Sep 28 21:14:27 pve02 kernel: [ 2783.734179]  __schedule+0x2e6/0x6f0
Sep 28 21:14:27 pve02 kernel: [ 2783.738137]  schedule+0x33/0xa0
Sep 28 21:14:27 pve02 kernel: [ 2783.741746]  schedule_preempt_disabled+0xe/0x10
Sep 28 21:14:27 pve02 kernel: [ 2783.746757]  __mutex_lock.isra.10+0x2c9/0x4c0
Sep 28 21:14:27 pve02 kernel: [ 2783.751588]  __mutex_lock_slowpath+0x13/0x20
Sep 28 21:14:27 pve02 kernel: [ 2783.756325]  mutex_lock+0x2c/0x30
Sep 28 21:14:27 pve02 kernel: [ 2783.760072]  disk_block_events+0x31/0x80
Sep 28 21:14:27 pve02 kernel: [ 2783.764430]  __blkdev_get+0x72/0x560
Sep 28 21:14:27 pve02 kernel: [ 2783.768433]  blkdev_get+0xef/0x150
Sep 28 21:14:27 pve02 kernel: [ 2783.772264]  ? blkdev_get_by_dev+0x50/0x50
Sep 28 21:14:27 pve02 kernel: [ 2783.776787]  blkdev_open+0x87/0xa0
Sep 28 21:14:27 pve02 kernel: [ 2783.780614]  do_dentry_open+0x143/0x3a0
Sep 28 21:14:27 pve02 kernel: [ 2783.784942]  vfs_open+0x2d/0x30
Sep 28 21:14:27 pve02 kernel: [ 2783.788523]  path_openat+0x2e9/0x16f0
Sep 28 21:14:27 pve02 kernel: [ 2783.792615]  ? filename_lookup.part.60+0xe0/0x170
Sep 28 21:14:27 pve02 kernel: [ 2783.797748]  do_filp_open+0x93/0x100
Sep 28 21:14:27 pve02 kernel: [ 2783.801755]  ? __alloc_fd+0x46/0x150
Sep 28 21:14:27 pve02 kernel: [ 2783.805760]  do_sys_open+0x177/0x280
Sep 28 21:14:27 pve02 kernel: [ 2783.809845]  __x64_sys_openat+0x20/0x30
Sep 28 21:14:27 pve02 kernel: [ 2783.814140]  do_syscall_64+0x57/0x190
Sep 28 21:14:27 pve02 kernel: [ 2783.818731]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Sep 28 21:14:27 pve02 kernel: [ 2783.824695] RIP: 0033:0x7f44154d31ae
Sep 28 21:14:27 pve02 kernel: [ 2783.829163] Code: Bad RIP value.
Sep 28 21:14:27 pve02 kernel: [ 2783.833275] RSP: 002b:00007fffa0944800 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Sep 28 21:14:27 pve02 kernel: [ 2783.841829] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f44154d31ae
Sep 28 21:14:27 pve02 kernel: [ 2783.849860] RDX: 0000000000044000 RSI: 00005590d9097d70 RDI: 00000000ffffff9c
Sep 28 21:14:27 pve02 kernel: [ 2783.857990] RBP: 00007fffa0944960 R08: 00005590d7c5ca17 R09: 00007fffa0944a30
Sep 28 21:14:27 pve02 kernel: [ 2783.866100] R10: 0000000000000000 R11: 0000000000000246 R12: 00005590d7c53c68
Sep 28 21:14:27 pve02 kernel: [ 2783.874252] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000

After an reboot it’s work for an short time (app. 1h) . First it’s looks that I had trouble with raid-volumes, because I’m tried to expand an raid-volume on the hardware raid controller, but the raidexpansion is working now (this reducing IO) and the issue start again.

Code:

pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.60-1-pve)
pve-manager: 6.2-11 (running version: 6.2-11/22fb4983)
pve-kernel-5.4: 6.2-6
pve-kernel-helper: 6.2-6
pve-kernel-5.3: 6.1-6
pve-kernel-5.0: 6.0-11
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.3.13-3-pve: 5.3.13-3
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.11-pve1
ceph-fuse: 14.2.11-pve1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-1
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-6
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-12
pve-cluster: 6.1-8
pve-container: 3.1-13
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-1
pve-qemu-kvm: 5.0.0-13
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-14
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve1

Same happens an reboot before — where Kernel pve-kernel-5.4.44-2-pve: 5.4.44-2 was active (the issue starts after the online raid-extension, which wasn’t successfull and the whole server stops IO and needed an reset).
The Server is an Dell R7415 with EPYC 7451.

Udo

  • #13

For me not much of a change. I ended up staying on the older proxmox version and kernel as I posted above:

pve-kernel-helper: 6.1-6
pve-kernel-5.3: 6.1-5
pve-kernel-5.3.18-2-pve: 5.3.18-2

Personally I am planning on upgrading this server within the next 6 months. So I was hoping new hardware would change this outcome.

udo

Famous Member


  • #14

For me not much of a change. I ended up staying on the older proxmox version and kernel as I posted above:

pve-kernel-helper: 6.1-6
pve-kernel-5.3: 6.1-5
pve-kernel-5.3.18-2-pve: 5.3.18-2

Personally I am planning on upgrading this server within the next 6 months. So I was hoping new hardware would change this outcome.

Hi,
I’m not sure if new hardware is the key — this happens for me with actual hardware (less than 1 year old).
But I’ve running the kernel on other (older) Hardware without trouble…

Udo

wolfgang


  • #15

Udo do you use an IPERC on this server if yes can you tell me which one it is.
Maybe it is related to the Disk IO devices.
We have here no such problems with EPYC CPUs.
Also, your storage config is interesting.

udo

Famous Member


  • #16

Udo do you use an IPERC on this server if yes can you tell me which one it is.
Maybe it is related to the Disk IO devices.
We have here no such problems with EPYC CPUs.
Also, your storage config is interesting.

Hi Wolfgang,
yes the lvm storage is behind an perc:

Code:

PERC H740P Mini (Integriert)    Integrated RAID Controller 1    Firmware: 51.13.0-3485    Cache: 8192 MB

The lvm is on an raid6 with 6 hdds (now in expansion-process with 2 further disks)
On this node, there are only 2 lcm-storages defined

Code:

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

lvmthin: hdd-lvm
        thinpool data
        vgname hdd
        content rootdir,images

We have other pve-nodes with Epyc, where lvm is working. But with an different kernel (and lvm on raid-1 SSD):

Code:

proxmox-ve: 6.2-1 (running kernel: 5.4.41-1-pve)

Two days ago, there was an new bios-update published — at weekend this will be installed, but I’m unsure which kernel I should take.

Udo

wolfgang


  • #17

I will try here to reproduce it with an LSI 3806 Raid controller. Report if I found something.

udo

Famous Member


  • #18

I will try here to reproduce it with an LSI 3806 Raid controller. Report if I found something.

Hi,
sounds good!
Don’t forget, that we have some IO-Load: (sdb is the lvm hdd-raid)

Code:

ATOP - pve02                                    2020/10/01  10:43:39                                    --------------                                    10s elapsed
PRC | sys    7.17s  | user   7.91s  | #proc    960  | #trun      4  | #tslpi  1048  | #tslpu     3  | #zombie    0  | clones  2116  |               | #exit   2101  |
CPU | sys      70%  | user     79%  | irq       1%  | idle   4475%  | wait    178%  | guest    53%  | ipc     0.89  | cycl  107MHz  | curf 2.89GHz  | curscal   ?%  |
CPL | avg1    5.85  | avg5    5.23  | avg15   5.10  |               |               | csw   642459  | intr  374679  |               |               | numcpu    48  |
MEM | tot   251.6G  | free  108.8G  | cache 882.2M  | buff  157.2M  | slab    3.9G  | shmem 189.6M  | shrss   0.0M  | vmbal   0.0M  | hptot   0.0M  | hpuse   0.0M  |
SWP | tot    32.0G  | free   32.0G  |               |               |               |               |               |               | vmcom 177.1G  | vmlim 157.8G  |
PSI | cs     0/0/0  | ms     0/0/0  | mf     0/0/0  |               | is  38/40/42  | if  38/40/42  |               |               |               |               |
LVM | d-data_tdata  | busy     26%  | read     130  | write   1105  | KiB/r     31  | KiB/w    215  | MBr/s    0.4  | MBw/s   23.2  | avq   101.70  | avio 2.09 ms  |
LVM | d-data-tpool  | busy     26%  | read     130  | write   1105  | KiB/r     31  | KiB/w    215  | MBr/s    0.4  | MBw/s   23.2  | avq   101.70  | avio 2.09 ms  |
LVM |        dm-37  | busy     12%  | read       5  | write   3327  | KiB/r      7  | KiB/w     10  | MBr/s    0.0  | MBw/s    3.3  | avq     0.09  | avio 0.37 ms  |
LVM | 205--disk--0  | busy      9%  | read     130  | write    160  | KiB/r     31  | KiB/w    799  | MBr/s    0.4  | MBw/s   12.5  | avq    56.41  | avio 3.13 ms  |
DSK |          sdb  | busy     28%  | read     135  | write   1040  | KiB/r     30  | KiB/w    203  | MBr/s    0.4  | MBw/s   20.7  | avq    85.64  | avio 2.35 ms  |
DSK |      nvme2n1  | busy     14%  | read      43  | write   4563  | KiB/r      9  | KiB/w     26  | MBr/s    0.0  | MBw/s   11.8  | avq     0.00  | avio 0.31 ms  |
DSK |      nvme1n1  | busy     14%  | read      23  | write   4593  | KiB/r     11  | KiB/w     26  | MBr/s    0.0  | MBw/s   11.8  | avq     0.00  | avio 0.31 ms  |
NET | transport     | tcpi    6790  | tcpo    7718  | udpi    2960  | udpo    2762  | tcpao      2  | tcppo     25  | tcprs      0  | tcpie      0  | udpie      0  |
NET | network       | ipi     9830  | ipo     8595  | ipfrw      0  | deliv   9790  |               |               |               | icmpi     40  | icmpo      0  |

Udo

wolfgang


  • #19

Do you have any special setting at the Raid? Blocksize, cache mode,…..

udo

Famous Member


  • #20

Do you have any special setting at the Raid? Blocksize, cache mode,…..

Hi Wolfgang,
not realy — the special thing was an 100GB-Raidvolume for the proxmox-system and the other space for an big lvm-storage. But due the extension I had to migrate and delete the system-raid (but the issue starts just after reboot, where the system-raid still on the raidgroup).

The raid-setting:

Code:

megacli -LDInfo -L1 -a0
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 1 (Target Id: 1)
Name                :hdd-raid6
RAID Level          : Primary-6, Secondary-0, RAID Level Qualifier-3
Size                : 7.177 TB
Sector Size         : 512
Parity Size         : 3.588 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Cached, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Ongoing Progresses:
  Reconstruction           : Completed 70%, Taken 52 min.
Encryption Type     : None
Bad Blocks Exist: No
Is VD Cached: No

Udo

View previous topic :: View next topic   Author Message levente
n00b
n00b

Joined: 24 Mar 2018
Posts: 41

PostPosted: Fri Mar 08, 2019 4:07 pm    Post subject: [solved] Linux 5.0 kernel panics with «bad rip value&qu Reply with quote

Gentoo kernel panics on boot after upgrading kernel from 4.19.2 to 5.0

It gives me the error «Bad RIP value», I’m unsure what causes it

It was compiled using genkernel (using only the «all» command-line argument)

kernel config: https://0x0.st/zHiO.txt (https://pastebin.com/RWNZyKn0)

Last edited by levente on Sat Mar 23, 2019 8:46 am; edited 1 time in total

Back to top mno
Guru
Guru

Joined: 29 Dec 2003
Posts: 454
Location: Toronto, Canada

Back to top levente
n00b
n00b

Joined: 24 Mar 2018
Posts: 41

PostPosted: Fri Mar 15, 2019 7:32 am    Post subject: Reply with quote

mno wrote:
Can you post the output you get at load-time? Doing a cursory check online, this can be tied to a number of things and also to older kernel versions…

Since I’m not sure if Gentoo saves backtraces at all, I took the easy route and just took a picture of my computer

Picture: http://0x0.st/zXJn.jpg

I hope it didn’t cut anything important off

Back to top Verdazil
n00b
n00b

Joined: 14 Feb 2019
Posts: 47
Location: One small country …

Back to top Muso
Veteran
Veteran

Joined: 22 Oct 2002
Posts: 1052
Location: The Holy city of Honolulu

PostPosted: Fri Mar 15, 2019 5:17 pm    Post subject: Re: Linux 5.0 kernel panics with «bad rip value» Reply with quote

Verdazil wrote:
levente wrote:
Gentoo kernel panics on boot after upgrading kernel from 4.19.2 to 5.0

And what is this urgent need? Are you a developer?

It would be correct to upgrade to 4.19.27-r1 stable kernel release and and wait for branch 5 to become stable.

5 is stable.

https://www.kernel.org/

Quote:
Latest Stable Kernel : 5.0.2


_________________
«You can lead a horticulture but you can’t make her think» ~ Dorothy Parker

2021 is the year of the Linux Desktop!

Back to top toralf
Developer
Developer

Joined: 01 Feb 2004
Posts: 3910
Location: Hamburg

Back to top Verdazil
n00b
n00b

Joined: 14 Feb 2019
Posts: 47
Location: One small country …

Back to top Naib
Watchman
Watchman

Joined: 21 May 2004
Posts: 6025
Location: Removed by Neddy

Back to top Hu
Moderator
Moderator

Joined: 06 Mar 2007
Posts: 20723

PostPosted: Sat Mar 16, 2019 1:19 am    Post subject: Reply with quote

levente wrote:
mno wrote:
Can you post the output you get at load-time? Doing a cursory check online, this can be tied to a number of things and also to older kernel versions…

Since I’m not sure if Gentoo saves backtraces at all, I took the easy route and just took a picture of my computer

That is a Linux kernel issue, not a Gentoo issue. The kernel does not persist panic text to your local disk because there is nowhere to save it. The kernel can send the text over the network or a serial port, so that some other system can save it.

Verdazil wrote:
levente wrote:
Gentoo kernel panics on boot after upgrading kernel from 4.19.2 to 5.0

And what is this urgent need? Are you a developer?

It would be correct to upgrade to 4.19.27-r1 stable kernel release and and wait for branch 5 to become stable.

OP did not say it was urgent. The kernel he picked is a released kernel that should work if managed properly. He wants help managing it. It’s a reasonable request to put in this forum.

Back to top levente
n00b
n00b

Joined: 24 Mar 2018
Posts: 41

PostPosted: Sat Mar 23, 2019 8:46 am    Post subject: Reply with quote

Thanks for all the replies

Turns out it was a rookie mistake, I left out the —luks and —lvm options from genkernel

Back to top

Display posts from previous:   

System information

Type Version/Name
Distribution Name Centos
Distribution Version 7
Kernel Version 5.4.225-1.el7.elrepo
Architecture x86_64
OpenZFS Version zfs-2.1.6-1

Describe the problem you’re observing

Context: ZFS is running on a customer’s vmware VM. They want to migrate off the current hypervisor and onto a newer host. But after migration, I see storage errors in «zpool status» and dmesg.

I got the RIP error while trying to avoid the storage errors (by using a virtual controller rather than a paravirtual one, and other experiements). It’s probably not specific to the scsi driver though — I likely only noticed this (RIP) error due to looking closely at syslog during this experiment.

The RIP error is not itself causing us any issue, and was only observed during errors from the scsi controller. Feel free to close the ticket if there’s nothing interesting to be done about it.

Describe how to reproduce the problem

Occurred when running on the customer’s newer vmware host…

Include any warning/errors/backtraces from the system logs

[  247.125781] INFO: task txg_sync:2314 blocked for more than 122 seconds.
[  247.125849]       Tainted: P           OE     5.4.225-1.el7.elrepo.x86_64 #1
[  247.125938] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.126010] txg_sync        D    0  2314      2 0x80004000
[  247.126014] Call Trace:
[  247.126025]  __schedule+0x2d2/0x730
[  247.126033]  ? __internal_add_timer+0x2d/0x40
[  247.126036]  schedule+0x42/0xb0
[  247.126040]  schedule_timeout+0x8a/0x160
[  247.126205]  ? zio_issue_async+0x53/0x90 [zfs]
[  247.126210]  ? __next_timer_interrupt+0xe0/0xe0
[  247.126215]  io_schedule_timeout+0x1e/0x50
[  247.126225]  __cv_timedwait_common+0x131/0x170 [spl]
[  247.126231]  ? finish_wait+0x80/0x80
[  247.126240]  __cv_timedwait_io+0x19/0x20 [spl]
[  247.126387]  zio_wait+0x136/0x2a0 [zfs]
[  247.126503]  dsl_pool_sync+0xf2/0x500 [zfs]
[  247.126633]  spa_sync_iterate_to_convergence+0xf8/0x2e0 [zfs]
[  247.126764]  spa_sync+0x476/0x930 [zfs]
[  247.126924]  txg_sync_thread+0x26f/0x3f0 [zfs]
[  247.127057]  ? txg_fini+0x270/0x270 [zfs]
[  247.127074]  thread_generic_wrapper+0x79/0x90 [spl]
[  247.127080]  kthread+0x106/0x140
[  247.127091]  ? __thread_exit+0x20/0x20 [spl]
[  247.127095]  ? __kthread_cancel_work+0x40/0x40
[  247.127100]  ret_from_fork+0x1f/0x40

[  247.127135] INFO: task postmaster:5349 blocked for more than 122 seconds.
[  247.127200]       Tainted: P           OE     5.4.225-1.el7.elrepo.x86_64 #1
[  247.127262] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.127344] postmaster      D    0  5349   4284 0x00000080
[  247.127348] Call Trace:
[  247.127352]  __schedule+0x2d2/0x730
[  247.127356]  schedule+0x42/0xb0
[  247.127365]  cv_wait_common+0xfd/0x130 [spl]
[  247.127370]  ? finish_wait+0x80/0x80
[  247.127379]  __cv_wait+0x15/0x20 [spl]
[  247.127523]  zfs_rangelock_enter_impl+0x134/0x260 [zfs]
[  247.127661]  ? zfs_uio_prefaultpages+0x102/0x120 [zfs]
[  247.127810]  zfs_rangelock_enter+0x11/0x20 [zfs]
[  247.127959]  zfs_write+0xa19/0xd80 [zfs]
[  247.128096]  zpl_iter_write+0xf2/0x170 [zfs]
[  247.128103]  new_sync_write+0x125/0x1c0
[  247.128109]  __vfs_write+0x29/0x40
[  247.128113]  vfs_write+0xb9/0x1a0
[  247.128116]  ksys_write+0x67/0xe0
[  247.128120]  __x64_sys_write+0x1a/0x20
[  247.128125]  do_syscall_64+0x60/0x1b0
[  247.128130]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[  247.128134] RIP: 0033:0x7f05ff4c39b0
[  247.128141] Code: Bad RIP value.

Description of problem:

I have installed Debian 10 Buster. EA01A NEC M2M LTE Dongle is attached to Device and configured for internet access.
I have scheduled a reboot everyday at 1 AM. and I have noticed some times after reboot the system does not boot up properly. And shows “Code: Bad RIP value”.
This issue does not always occur and is not able to be reproduced yet.

Machine details:

Machine details:
root@DEB:/home# lscpu 
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       39 bits physical, 48 bits virtual
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               142
Model name:          Intel(R) Core(TM) i3-8109U CPU @ 3.00GHz
Stepping:            10
CPU MHz:             2000.001
CPU max MHz:         3600.0000
CPU min MHz:         400.0000
BogoMIPS:            6000.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            4096K
NUMA node0 CPU(s):   0-3

root@DEB:/home# uname -a
Linux DEB 4.19.0-10-amd64 #1 SMP Debian 4.19.132-1 (2020-07-24) x86_64 GNU/Linux

Following are the syslogs at the time of reboot:

Logs:

2020-11-13T01:07:57.591035+09:00 <daemon.info> smartd[520]: Device: /dev/sda [SAT], state read from /var/lib/smartmontools/smartd.SPCC_M_2_SSD-P2000559000000025478.ata.state
2020-11-13T01:07:57.591139+09:00 <daemon.info> smartd[520]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices
2020-11-13T01:07:57.591546+09:00 <daemon.info> systemd[1]: Started LSB: Color ANSI System Logo.
2020-11-13T01:07:57.593915+09:00 <daemon.info> smartd[520]: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.SPCC_M_2_SSD-P2000559000000025478.ata.state
2020-11-13T01:07:57.598691+09:00 <daemon.info> accounts-daemon[502]: started daemon version 0.6.45
2020-11-13T01:07:57.598824+09:00 <daemon.info> systemd[1]: Started Accounts Service.
2020-11-13T01:07:57.601006+09:00 <daemon.info> dphys-swapfile[518]: computing size, want /var/swap=15708MByte, restricting to 50% of remaining disk size: 6701MBytes, restricting to config limit: 4096MBytes, checking existing: keeping it
2020-11-13T01:07:57.646702+09:00 <daemon.info> loadcpufreq[504]: Loading cpufreq kernel modules...done (acpi-cpufreq).
2020-11-13T01:07:57.647011+09:00 <daemon.info> systemd[1]: Started LSB: Load kernel modules needed to enable cpufreq scaling.
2020-11-13T01:07:57.647738+09:00 <daemon.info> systemd[1]: Starting LSB: set CPUFreq kernel parameters...
2020-11-13T01:07:57.654960+09:00 <daemon.info> cpufrequtils[645]: CPUFreq Utilities: Setting ondemand CPUFreq governor...disabled, governor not available...done.
2020-11-13T01:07:57.655231+09:00 <daemon.info> systemd[1]: Started LSB: set CPUFreq kernel parameters.
2020-11-13T01:07:57.673396+09:00 <daemon.info> systemd[1]: Started dphys-swapfile - set up, mount/unmount, and delete a swap file.
2020-11-13T01:07:57.674051+09:00 <kern.info> kernel: [  189.360869] Adding 4194300k swap on /var/swap.  Priority:-2 extents:8 across:6258688k SSFS
2020-11-13T01:07:57.872792+09:00 <daemon.info> systemd[1]: Started Login Service.
2020-11-13T01:07:58.538721+09:00 <daemon.info> systemd[1]: Started Save/Restore Sound Card State.
2020-11-13T01:07:58.539084+09:00 <daemon.info> systemd[1]: Reached target Sound Card.
2020-11-13T01:08:14.530986+09:00 <daemon.notice> nscd: 517 checking for monitored file `/etc/resolv.conf': No such file or directory
2020-11-13T01:08:27.606987+09:00 <daemon.warning> dbus-daemon[513]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 30029ms)
2020-11-13T01:08:51.154249+09:00 <kern.err> kernel: [  242.838955] INFO: task systemd-udevd:291 blocked for more than 120 seconds.
2020-11-13T01:08:51.154283+09:00 <kern.err> kernel: [  242.838967]       Tainted: G     U            4.19.0-12-amd64 #1 Debian 4.19.152-1
2020-11-13T01:08:51.154287+09:00 <kern.err> kernel: [  242.838970] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2020-11-13T01:08:51.154291+09:00 <kern.info> kernel: [  242.838975] systemd-udevd   D    0   291    277 0x00000324
2020-11-13T01:08:51.154295+09:00 <kern.warning> kernel: [  242.838979] Call Trace:
2020-11-13T01:08:51.154321+09:00 <kern.warning> kernel: [  242.838989]  __schedule+0x29f/0x840
2020-11-13T01:08:51.154326+09:00 <kern.warning> kernel: [  242.838995]  schedule+0x28/0x80
2020-11-13T01:08:51.154330+09:00 <kern.warning> kernel: [  242.838999]  schedule_preempt_disabled+0xa/0x10
2020-11-13T01:08:51.154333+09:00 <kern.warning> kernel: [  242.839001]  __mutex_lock.isra.8+0x2b5/0x4a0
2020-11-13T01:08:51.154335+09:00 <kern.warning> kernel: [  242.839006]  ? addrconf_notify+0x31c/0xae0
2020-11-13T01:08:51.154338+09:00 <kern.warning> kernel: [  242.839016]  nf_tables_netdev_event+0x9b/0x1a0 [nf_tables]
2020-11-13T01:08:51.154341+09:00 <kern.warning> kernel: [  242.839023]  notifier_call_chain+0x47/0x70
2020-11-13T01:08:51.154344+09:00 <kern.warning> kernel: [  242.839028]  dev_change_name+0x1fa/0x330
2020-11-13T01:08:51.154347+09:00 <kern.warning> kernel: [  242.839033]  do_setlink+0x729/0xef0
2020-11-13T01:08:51.154350+09:00 <kern.warning> kernel: [  242.839040]  ? blk_mq_dispatch_rq_list+0x392/0x590
2020-11-13T01:08:51.154353+09:00 <kern.warning> kernel: [  242.839044]  ? elv_rb_del+0x1f/0x30
2020-11-13T01:08:51.154356+09:00 <kern.warning> kernel: [  242.839047]  ? deadline_remove_request+0x55/0xc0
2020-11-13T01:08:51.154359+09:00 <kern.warning> kernel: [  242.839050]  ? blk_mq_do_dispatch_sched+0x91/0x120
2020-11-13T01:08:51.154362+09:00 <kern.warning> kernel: [  242.839054]  ? __d_alloc+0x24/0x240
2020-11-13T01:08:51.154364+09:00 <kern.warning> kernel: [  242.839058]  rtnl_setlink+0xd9/0x130
2020-11-13T01:08:51.154368+09:00 <kern.warning> kernel: [  242.839065]  rtnetlink_rcv_msg+0x2b1/0x360
2020-11-13T01:08:51.154370+09:00 <kern.warning> kernel: [  242.839069]  ? _cond_resched+0x15/0x30
2020-11-13T01:08:51.154373+09:00 <kern.warning> kernel: [  242.839072]  ? rtnl_calcit.isra.33+0x100/0x100
2020-11-13T01:08:51.154374+09:00 <kern.warning> kernel: [  242.839076]  netlink_rcv_skb+0x4c/0x120
2020-11-13T01:08:51.154377+09:00 <kern.warning> kernel: [  242.839081]  netlink_unicast+0x181/0x210
2020-11-13T01:08:51.154380+09:00 <kern.warning> kernel: [  242.839084]  netlink_sendmsg+0x204/0x3d0
2020-11-13T01:08:51.154382+09:00 <kern.warning> kernel: [  242.839088]  sock_sendmsg+0x36/0x40
2020-11-13T01:08:51.154384+09:00 <kern.warning> kernel: [  242.839090]  __sys_sendto+0xee/0x160
2020-11-13T01:08:51.154387+09:00 <kern.warning> kernel: [  242.839096]  ? syscall_trace_enter+0x192/0x2b0
2020-11-13T01:08:51.154390+09:00 <kern.warning> kernel: [  242.839099]  __x64_sys_sendto+0x24/0x30
2020-11-13T01:08:51.154392+09:00 <kern.warning> kernel: [  242.839102]  do_syscall_64+0x53/0x110
2020-11-13T01:08:51.154395+09:00 <kern.warning> kernel: [  242.839105]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
2020-11-13T01:08:51.154398+09:00 <kern.warning> kernel: [  242.839108] RIP: 0033:0x7fdf27a749b7
2020-11-13T01:08:51.154401+09:00 <kern.warning> kernel: [  242.839115] Code: Bad RIP value.
2020-11-13T01:08:51.154403+09:00 <kern.warning> kernel: [  242.839117] RSP: 002b:00007ffc41f57038 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
2020-11-13T01:08:51.154406+09:00 <kern.warning> kernel: [  242.839119] RAX: ffffffffffffffda RBX: 0000557b32c847a0 RCX: 00007fdf27a749b7
2020-11-13T01:08:51.154408+09:00 <kern.warning> kernel: [  242.839120] RDX: 0000000000000034 RSI: 0000557b32c84be0 RDI: 000000000000000f
2020-11-13T01:08:51.154411+09:00 <kern.warning> kernel: [  242.839121] RBP: 0000000000000003 R08: 00007ffc41f570f0 R09: 0000000000000010
2020-11-13T01:08:51.154413+09:00 <kern.warning> kernel: [  242.839123] R10: 0000000000000000 R11: 0000000000000246 R12: 0000557b32c77898
2020-11-13T01:08:51.154416+09:00 <kern.warning> kernel: [  242.839124] R13: 0000557b32c84ca0 R14: 0000557b32c69150 R15: 0000557b32c5fe70
2020-11-13T01:08:51.154419+09:00 <kern.err> kernel: [  242.839148] INFO: task systemd-udevd:320 blocked for more than 120 seconds.
2020-11-13T01:08:51.154423+09:00 <kern.err> kernel: [  242.839154]       Tainted: G     U            4.19.0-12-amd64 #1 Debian 4.19.152-1
2020-11-13T01:08:51.154426+09:00 <kern.err> kernel: [  242.839157] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2020-11-13T01:08:51.154428+09:00 <kern.info> kernel: [  242.839161] systemd-udevd   D    0   320    277 0x80000324
2020-11-13T01:08:51.154432+09:00 <kern.warning> kernel: [  242.839163] Call Trace:
2020-11-13T01:08:51.154436+09:00 <kern.warning> kernel: [  242.839169]  __schedule+0x29f/0x840
2020-11-13T01:08:51.154439+09:00 <kern.warning> kernel: [  242.839173]  schedule+0x28/0x80
2020-11-13T01:08:51.154443+09:00 <kern.warning> kernel: [  242.839177]  schedule_preempt_disabled+0xa/0x10
2020-11-13T01:08:51.154448+09:00 <kern.warning> kernel: [  242.839179]  __mutex_lock.isra.8+0x2b5/0x4a0
2020-11-13T01:08:51.154475+09:00 <kern.warning> kernel: [  242.839184]  register_netdevice_notifier+0x37/0x230
2020-11-13T01:08:51.154481+09:00 <kern.warning> kernel: [  242.839189]  ? kobject_put+0x23/0x1b0
2020-11-13T01:08:51.154486+09:00 <kern.warning> kernel: [  242.839192]  ? 0xffffffffc0c39000
2020-11-13T01:08:51.154490+09:00 <kern.warning> kernel: [  242.839231]  cfg80211_init+0x37/0xcb [cfg80211]
2020-11-13T01:08:51.154494+09:00 <kern.warning> kernel: [  242.839237]  do_one_initcall+0x46/0x1c3
2020-11-13T01:08:51.154497+09:00 <kern.warning> kernel: [  242.839241]  ? free_unref_page_commit+0x91/0x100
2020-11-13T01:08:51.154499+09:00 <kern.warning> kernel: [  242.839245]  ? _cond_resched+0x15/0x30
2020-11-13T01:08:51.154502+09:00 <kern.warning> kernel: [  242.839249]  ? kmem_cache_alloc_trace+0x15e/0x1e0
2020-11-13T01:08:51.154505+09:00 <kern.warning> kernel: [  242.839254]  do_init_module+0x5a/0x210
2020-11-13T01:08:51.154510+09:00 <kern.warning> kernel: [  242.839258]  load_module+0x2167/0x23d0
2020-11-13T01:08:51.154513+09:00 <kern.warning> kernel: [  242.839264]  ? __do_sys_finit_module+0xad/0x110
2020-11-13T01:08:51.154517+09:00 <kern.warning> kernel: [  242.839266]  __do_sys_finit_module+0xad/0x110
2020-11-13T01:08:51.154520+09:00 <kern.warning> kernel: [  242.839271]  do_syscall_64+0x53/0x110
2020-11-13T01:08:51.154523+09:00 <kern.warning> kernel: [  242.839274]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
2020-11-13T01:08:51.154526+09:00 <kern.warning> kernel: [  242.839276] RIP: 0033:0x7fdf27a6df59
2020-11-13T01:08:51.154545+09:00 <kern.warning> kernel: [  242.839281] Code: Bad RIP value.
2020-11-13T01:08:51.154548+09:00 <kern.warning> kernel: [  242.839282] RSP: 002b:00007ffc41f56b68 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
2020-11-13T01:08:51.154550+09:00 <kern.warning> kernel: [  242.839284] RAX: ffffffffffffffda RBX: 0000557b32c77180 RCX: 00007fdf27a6df59
2020-11-13T01:08:51.154553+09:00 <kern.warning> kernel: [  242.839285] RDX: 0000000000000000 RSI: 00007fdf27972cad RDI: 000000000000000f
2020-11-13T01:08:51.154556+09:00 <kern.warning> kernel: [  242.839286] RBP: 00007fdf27972cad R08: 0000000000000000 R09: 0000000000000000
2020-11-13T01:08:51.154558+09:00 <kern.warning> kernel: [  242.839287] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000000000
2020-11-13T01:08:51.154561+09:00 <kern.warning> kernel: [  242.839289] R13: 0000557b32c663d0 R14: 0000000000020000 R15: 0000557b32c77180
2020-11-13T01:08:51.154565+09:00 <kern.err> kernel: [  242.839294] INFO: task modprobe:422 blocked for more than 120 seconds.
2020-11-13T01:08:51.154567+09:00 <kern.err> kernel: [  242.839299]       Tainted: G     U            4.19.0-12-amd64 #1 Debian 4.19.152-1
2020-11-13T01:08:51.154570+09:00 <kern.err> kernel: [  242.839303] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2020-11-13T01:08:51.154572+09:00 <kern.info> kernel: [  242.839307] modprobe        D    0   422    406 0x80000000
2020-11-13T01:08:51.154575+09:00 <kern.warning> kernel: [  242.839309] Call Trace:
2020-11-13T01:08:51.154578+09:00 <kern.warning> kernel: [  242.839314]  __schedule+0x29f/0x840
2020-11-13T01:08:51.154580+09:00 <kern.warning> kernel: [  242.839318]  schedule+0x28/0x80
2020-11-13T01:08:51.154582+09:00 <kern.warning> kernel: [  242.839320]  rwsem_down_write_failed+0x17c/0x3a0
2020-11-13T01:08:51.154585+09:00 <kern.warning> kernel: [  242.839324]  ? __wake_up_common_lock+0x89/0xc0
2020-11-13T01:08:51.154587+09:00 <kern.warning> kernel: [  242.839328]  ? 0xffffffffc05cb000
2020-11-13T01:08:51.154589+09:00 <kern.warning> kernel: [  242.839331]  call_rwsem_down_write_failed+0x13/0x20
2020-11-13T01:08:51.154592+09:00 <kern.warning> kernel: [  242.839334]  down_write+0x29/0x40
2020-11-13T01:08:51.154594+09:00 <kern.warning> kernel: [  242.839338]  register_pernet_subsys+0x15/0x40
2020-11-13T01:08:51.154597+09:00 <kern.warning> kernel: [  242.839343]  nf_log_ipv6_init+0x12/0x1000 [nf_log_ipv6]
2020-11-13T01:08:51.154600+09:00 <kern.warning> kernel: [  242.839347]  do_one_initcall+0x46/0x1c3
2020-11-13T01:08:51.154602+09:00 <kern.warning> kernel: [  242.839350]  ? free_unref_page_commit+0x91/0x100
2020-11-13T01:08:51.154604+09:00 <kern.warning> kernel: [  242.839353]  ? _cond_resched+0x15/0x30
2020-11-13T01:08:51.154607+09:00 <kern.warning> kernel: [  242.839357]  ? kmem_cache_alloc_trace+0x15e/0x1e0
2020-11-13T01:08:51.154609+09:00 <kern.warning> kernel: [  242.839360]  do_init_module+0x5a/0x210
2020-11-13T01:08:51.154628+09:00 <kern.warning> kernel: [  242.839363]  load_module+0x2167/0x23d0
2020-11-13T01:08:51.154632+09:00 <kern.warning> kernel: [  242.839369]  ? __do_sys_finit_module+0xad/0x110
2020-11-13T01:08:51.154634+09:00 <kern.warning> kernel: [  242.839371]  __do_sys_finit_module+0xad/0x110
2020-11-13T01:08:51.154638+09:00 <kern.warning> kernel: [  242.839376]  do_syscall_64+0x53/0x110
2020-11-13T01:08:51.154640+09:00 <kern.warning> kernel: [  242.839379]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
2020-11-13T01:08:51.154643+09:00 <kern.warning> kernel: [  242.839380] RIP: 0033:0x7f2f86ff6f59
2020-11-13T01:08:51.154645+09:00 <kern.warning> kernel: [  242.839384] Code: Bad RIP value.
2020-11-13T01:08:51.154648+09:00 <kern.warning> kernel: [  242.839386] RSP: 002b:00007ffd08bdd2d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
2020-11-13T01:08:51.154650+09:00 <kern.warning> kernel: [  242.839387] RAX: ffffffffffffffda RBX: 0000564e805fefd0 RCX: 00007f2f86ff6f59
2020-11-13T01:08:51.154652+09:00 <kern.warning> kernel: [  242.839389] RDX: 0000000000000000 RSI: 0000564e7f08d3f0 RDI: 0000000000000000
2020-11-13T01:08:51.154655+09:00 <kern.warning> kernel: [  242.839390] RBP: 0000564e7f08d3f0 R08: 0000000000000000 R09: 0000000000000000
2020-11-13T01:08:51.154657+09:00 <kern.warning> kernel: [  242.839391] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
2020-11-13T01:08:51.154661+09:00 <kern.warning> kernel: [  242.839392] R13: 0000564e805feed0 R14: 0000000000040000 R15: 0000564e805fefd0
2020-11-13T01:09:27.747860+09:00 <daemon.warning> systemd[1]: systemd-networkd.service: Start operation timed out. Terminating.
2020-11-13T01:10:01.537439+09:00 <cron.info> CRON[653]: (root) CMD (/usr/sbin/logrotate /etc/logrotate.conf >/dev/null 2>&1)
2020-11-13T01:10:20.249057+09:00 <daemon.notice> nscd: 517 checking for monitored file `/etc/resolv.conf': No such file or directory
2020-11-13T01:10:35.249374+09:00 <daemon.notice> nscd: 517 checking for monitored file `/etc/resolv.conf': No such file or directory
2020-11-13T01:10:51.986271+09:00 <kern.err> kernel: [  363.670948] INFO: task systemd-udevd:291 blocked for more than 120 seconds.
2020-11-13T01:10:51.986302+09:00 <kern.err> kernel: [  363.670959]       Tainted: G     U            4.19.0-12-amd64 #1 Debian 4.19.152-1
2020-11-13T01:10:51.986304+09:00 <kern.err> kernel: [  363.670963] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2020-11-13T01:10:51.986306+09:00 <kern.info> kernel: [  363.670967] systemd-udevd   D    0   291    277 0x00000324
2020-11-13T01:10:51.986308+09:00 <kern.warning> kernel: [  363.670970] Call Trace:
2020-11-13T01:10:51.986309+09:00 <kern.warning> kernel: [  363.670980]  __schedule+0x29f/0x840
2020-11-13T01:10:51.986312+09:00 <kern.warning> kernel: [  363.670985]  schedule+0x28/0x80
2020-11-13T01:10:51.986314+09:00 <kern.warning> kernel: [  363.670989]  schedule_preempt_disabled+0xa/0x10
2020-11-13T01:10:51.986339+09:00 <kern.warning> kernel: [  363.670991]  __mutex_lock.isra.8+0x2b5/0x4a0
2020-11-13T01:10:51.986342+09:00 <kern.warning> kernel: [  363.670995]  ? addrconf_notify+0x31c/0xae0
2020-11-13T01:10:51.986344+09:00 <kern.warning> kernel: [  363.671006]  nf_tables_netdev_event+0x9b/0x1a0 [nf_tables]
2020-11-13T01:10:51.986346+09:00 <kern.warning> kernel: [  363.671012]  notifier_call_chain+0x47/0x70
2020-11-13T01:10:51.986348+09:00 <kern.warning> kernel: [  363.671017]  dev_change_name+0x1fa/0x330
2020-11-13T01:10:51.986350+09:00 <kern.warning> kernel: [  363.671022]  do_setlink+0x729/0xef0
2020-11-13T01:10:51.986351+09:00 <kern.warning> kernel: [  363.671028]  ? blk_mq_dispatch_rq_list+0x392/0x590
2020-11-13T01:10:51.986353+09:00 <kern.warning> kernel: [  363.671033]  ? elv_rb_del+0x1f/0x30
2020-11-13T01:10:51.986355+09:00 <kern.warning> kernel: [  363.671035]  ? deadline_remove_request+0x55/0xc0
2020-11-13T01:10:51.986356+09:00 <kern.warning> kernel: [  363.671038]  ? blk_mq_do_dispatch_sched+0x91/0x120
2020-11-13T01:10:51.986358+09:00 <kern.warning> kernel: [  363.671042]  ? __d_alloc+0x24/0x240
2020-11-13T01:10:51.986360+09:00 <kern.warning> kernel: [  363.671046]  rtnl_setlink+0xd9/0x130
2020-11-13T01:10:51.986362+09:00 <kern.warning> kernel: [  363.671054]  rtnetlink_rcv_msg+0x2b1/0x360
2020-11-13T01:10:51.986364+09:00 <kern.warning> kernel: [  363.671058]  ? _cond_resched+0x15/0x30
2020-11-13T01:10:51.986365+09:00 <kern.warning> kernel: [  363.671061]  ? rtnl_calcit.isra.33+0x100/0x100
2020-11-13T01:10:51.986367+09:00 <kern.warning> kernel: [  363.671065]  netlink_rcv_skb+0x4c/0x120
2020-11-13T01:10:51.986368+09:00 <kern.warning> kernel: [  363.671069]  netlink_unicast+0x181/0x210
2020-11-13T01:10:51.986370+09:00 <kern.warning> kernel: [  363.671073]  netlink_sendmsg+0x204/0x3d0
2020-11-13T01:10:51.986371+09:00 <kern.warning> kernel: [  363.671076]  sock_sendmsg+0x36/0x40
2020-11-13T01:10:51.986373+09:00 <kern.warning> kernel: [  363.671079]  __sys_sendto+0xee/0x160
2020-11-13T01:10:51.986374+09:00 <kern.warning> kernel: [  363.671084]  ? syscall_trace_enter+0x192/0x2b0
2020-11-13T01:10:51.986376+09:00 <kern.warning> kernel: [  363.671087]  __x64_sys_sendto+0x24/0x30
2020-11-13T01:10:51.986378+09:00 <kern.warning> kernel: [  363.671090]  do_syscall_64+0x53/0x110
2020-11-13T01:10:51.986379+09:00 <kern.warning> kernel: [  363.671093]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
2020-11-13T01:10:51.986381+09:00 <kern.warning> kernel: [  363.671096] RIP: 0033:0x7fdf27a749b7
2020-11-13T01:10:51.986383+09:00 <kern.warning> kernel: [  363.671103] Code: Bad RIP value.
2020-11-13T01:10:51.986384+09:00 <kern.warning> kernel: [  363.671105] RSP: 002b:00007ffc41f57038 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
2020-11-13T01:10:51.986386+09:00 <kern.warning> kernel: [  363.671108] RAX: ffffffffffffffda RBX: 0000557b32c847a0 RCX: 00007fdf27a749b7
2020-11-13T01:10:51.986387+09:00 <kern.warning> kernel: [  363.671109] RDX: 0000000000000034 RSI: 0000557b32c84be0 RDI: 000000000000000f
2020-11-13T01:10:51.986389+09:00 <kern.warning> kernel: [  363.671110] RBP: 0000000000000003 R08: 00007ffc41f570f0 R09: 0000000000000010
2020-11-13T01:10:51.986390+09:00 <kern.warning> kernel: [  363.671111] R10: 0000000000000000 R11: 0000000000000246 R12: 0000557b32c77898
2020-11-13T01:10:51.986391+09:00 <kern.warning> kernel: [  363.671112] R13: 0000557b32c84ca0 R14: 0000557b32c69150 R15: 0000557b32c5fe70
2020-11-13T01:10:51.986393+09:00 <kern.err> kernel: [  363.671120] INFO: task systemd-udevd:320 blocked for more than 120 seconds.
2020-11-13T01:10:51.986394+09:00 <kern.err> kernel: [  363.671125]       Tainted: G     U            4.19.0-12-amd64 #1 Debian 4.19.152-1
2020-11-13T01:10:51.986396+09:00 <kern.err> kernel: [  363.671129] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2020-11-13T01:10:51.986398+09:00 <kern.info> kernel: [  363.671132] systemd-udevd   D    0   320    277 0x80000324

I am wondering if this is happens because of USB dongle, from above logs it seems something systemd-udevd netlink and rtnetlink send and receive message in “Call Trace”
Thank you so much in advance for your help.

Slack would not start all of a sudden. Slack was installed using Snap.

dmesg:

[63983.140086] ThreadPoolForeg[58617]: segfault at 34d0 ip 00000000000034d0 sp 00007fdb3e23ce08 error 14
[63983.140096] Code: Bad RIP value.
[63983.375855] traps: Chrome_IOThread[58504] trap int3 ip:55c2908ba1c4 sp:7f0ec54347e0 error:0 in slack[55c28e42e000+5caf000]

Eliah Kagan's user avatar

Eliah Kagan

117k54 gold badges319 silver badges495 bronze badges

asked Jul 11, 2020 at 3:13

Phil's user avatar

0

I also had issues with Slack launching after an upgrade, with what looks like the same error

Jul 11 16:45:43 samloyd kernel: [171452.625726] traps: Chrome_IOThread[114914] trap int3 ip:56465285b1c4 sp:7f543c1797e0 error:0 in slack[5646503cf000+5caf000]

Reverting to 4.4.3 fixed it.

Zanna's user avatar

Zanna

69.3k56 gold badges217 silver badges327 bronze badges

answered Jul 11, 2020 at 23:49

Jim Bumgardner's user avatar

1

Followed guidance in this article on the snapcraft forum, and reverted snap.

sudo snap revert slack

Reverted slack to

slack reverted to 4.4.3

Slack was able to start after being reverted.

pomsky's user avatar

pomsky

67.2k21 gold badges234 silver badges244 bronze badges

answered Jul 13, 2020 at 16:43

Phil's user avatar

PhilPhil

271 silver badge2 bronze badges

This workaround worked for me using latest Slack 4.7.0 (which was causing the issue at first place):

Open terminal and run snap shell:

snap run --shell slack

then execute slack binary:

$SNAP/usr/lib/slack/slack

Source:
https://forum.snapcraft.io/t/slack-4-7-0-sefgault-ubuntu-18-04/18708/3

So the SEGFAULT is caused by --no-sandbox flag in command argument list.

EDIT: Ubuntu snap has updated so now on Slack 4.8.0 there’s no issue with startup.

answered Jul 20, 2020 at 9:45

stamster's user avatar

stamsterstamster

8128 silver badges8 bronze badges

You must log in to answer this question.

Not the answer you’re looking for? Browse other questions tagged

.

Понравилась статья? Поделить с друзьями:
  • Bad way ошибка
  • Bandicam ошибка захвата звука windows 10
  • Bad pool header windows 10 ошибка
  • Bandicam ошибка записи звука
  • Bad pool header windows 10 как исправить ошибку