-
#1
I have dell r710 which I am running proxmox 6.2 on. The host has a mix of some local storage, and other storage presented via iSCSI/NFS. The local storage consists of 2x500G drives in raid 1 for the proxmox OS, and 2x500G SSDs in raid 1.
Most of the VMs run on the iSCSI share, while I have a few running on the local SSD storage.
I have been running this in my homelab for a while now, but all of a sudden I have started to have a issue I am not sure how to track down. It seems like the host can be online for about a week, then all of a sudden several of my VMs go offline.
When this happens I cannot even access the console in proxmox. They usually time out, and I believe it says «error waiting on systemd». What is in common is that they are all on the SSD storage. But not all VMs are affected on this storage. For example I also have pfsense running on these SSDs,
and it continues to work and also console access works.
It is also interesting that when this happens the I/O delay shoots up to about 8% and stays there.
I find this error in the syslog file, but I’m not sure what it means and what next steps I should take with it is.
Jun 18 03:10:26 compute1 kernel: [623467.883697] INFO: task kvm:2361 blocked for more than 241 seconds.
Jun 18 03:10:26 compute1 kernel: [623467.883732] Tainted: P IOE 5.4.34-1-pve #1
Jun 18 03:10:26 compute1 kernel: [623467.883752] «echo 0 > /proc/sys/kernel/hung_task_timeout_secs» disables this message.
Jun 18 03:10:26 compute1 kernel: [623467.883777] kvm D 0 2361 1 0x00000000
Jun 18 03:10:26 compute1 kernel: [623467.883779] Call Trace:
Jun 18 03:10:26 compute1 kernel: [623467.883787] __schedule+0x2e6/0x700
Jun 18 03:10:26 compute1 kernel: [623467.883789] schedule+0x33/0xa0
Jun 18 03:10:26 compute1 kernel: [623467.883790] schedule_preempt_disabled+0xe/0x10
Jun 18 03:10:26 compute1 kernel: [623467.883792] __mutex_lock.isra.10+0x2c9/0x4c0
Jun 18 03:10:26 compute1 kernel: [623467.883823] ? kvm_arch_vcpu_put+0xe2/0x170 [kvm]
Jun 18 03:10:26 compute1 kernel: [623467.883825] __mutex_lock_slowpath+0x13/0x20
Jun 18 03:10:26 compute1 kernel: [623467.883826] mutex_lock+0x2c/0x30
Jun 18 03:10:26 compute1 kernel: [623467.883828] sr_block_ioctl+0x43/0xd0
Jun 18 03:10:26 compute1 kernel: [623467.883832] blkdev_ioctl+0x4c1/0x9e0
Jun 18 03:10:26 compute1 kernel: [623467.883835] block_ioctl+0x3d/0x50
Jun 18 03:10:26 compute1 kernel: [623467.883837] do_vfs_ioctl+0xa9/0x640
Jun 18 03:10:26 compute1 kernel: [623467.883838] ksys_ioctl+0x67/0x90
Jun 18 03:10:26 compute1 kernel: [623467.883840] __x64_sys_ioctl+0x1a/0x20
Jun 18 03:10:26 compute1 kernel: [623467.883843] do_syscall_64+0x57/0x190
Jun 18 03:10:26 compute1 kernel: [623467.883846] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 18 03:10:26 compute1 kernel: [623467.883848] RIP: 0033:0x7f2e40f97427
Jun 18 03:10:26 compute1 kernel: [623467.883852] Code: Bad RIP value.
Jun 18 03:10:26 compute1 kernel: [623467.883853] RSP: 002b:00007f2d75ffa098 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jun 18 03:10:26 compute1 kernel: [623467.883855] RAX: ffffffffffffffda RBX: 00007f2e33af4850 RCX: 00007f2e40f97427
Jun 18 03:10:26 compute1 kernel: [623467.883856] RDX: 000000007fffffff RSI: 0000000000005326 RDI: 0000000000000012
Jun 18 03:10:26 compute1 kernel: [623467.883856] RBP: 0000000000000001 R08: 0000559be29be890 R09: 0000000000000000
Jun 18 03:10:26 compute1 kernel: [623467.883857] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f2d74a42268
Jun 18 03:10:26 compute1 kernel: [623467.883858] R13: 0000000000000000 R14: 0000559be2ef0d20 R15: 0000559be27fc740
Steps I have done so far:
I ran a memtest for one entire pass, and no issues were found
I ran smart tests on all the local drives, no issues were found
I removed the SSDs from the host, checked their status in windows, ran additional tests (none found) and applied available firmware.
Any help of a next step or details in what the possible message means would be appreciated!
Edit: If it helps, pveversion output is below:
Code:
proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
Last edited:
-
#2
Hi,
please upgrade to the current kernel 5.4.44-1pve and install the «intel-microcode» package.
-
#3
Hi,
please upgrade to the current kernel 5.4.44-1pve and install the «intel-microcode» package.
I appericate the response, I am upgrading now and I will update in about a week how it seems to hold up!
-
#4
Hi,
please upgrade to the current kernel 5.4.44-1pve and install the «intel-microcode» package.
Unfortunately, again after a week of uptime I have seen the same error. I did make another change, in which I moved 2 of the 3 VMs to other storage than they were originally on to troubleshoot if it was something with the local SSD storage. That did not make a difference, and today the same 3 VMs on this host went down.
I did the intel micro code package through apt, was this the method that you were referring to?
Code:
apt list | grep intel
intel-microcode/oldstable,now 3.20200609.2~deb9u1 amd64 [installed]
Code:
Jun 30 05:10:25 compute1 kernel: [642437.024414] kvm D 0 1929 1 0x00000000
Jun 30 05:10:25 compute1 kernel: [642437.024417] Call Trace:
Jun 30 05:10:25 compute1 kernel: [642437.024427] __schedule+0x2e6/0x6f0
Jun 30 05:10:25 compute1 kernel: [642437.024429] schedule+0x33/0xa0
Jun 30 05:10:25 compute1 kernel: [642437.024431] schedule_preempt_disabled+0xe/0x10
Jun 30 05:10:25 compute1 kernel: [642437.024433] __mutex_lock.isra.10+0x2c9/0x4c0
Jun 30 05:10:25 compute1 kernel: [642437.024464] ? kvm_arch_vcpu_put+0xe2/0x170 [kvm]
Jun 30 05:10:25 compute1 kernel: [642437.024482] ? kvm_skip_emulated_instruction+0x3b/0x60 [kvm]
Jun 30 05:10:25 compute1 kernel: [642437.024484] __mutex_lock_slowpath+0x13/0x20
Jun 30 05:10:25 compute1 kernel: [642437.024485] mutex_lock+0x2c/0x30
Jun 30 05:10:25 compute1 kernel: [642437.024488] sr_block_ioctl+0x43/0xd0
Jun 30 05:10:25 compute1 kernel: [642437.024493] blkdev_ioctl+0x4c1/0x9e0
Jun 30 05:10:25 compute1 kernel: [642437.024497] ? __wake_up_locked_key+0x1b/0x20
Jun 30 05:10:25 compute1 kernel: [642437.024501] block_ioctl+0x3d/0x50
Jun 30 05:10:25 compute1 kernel: [642437.024503] do_vfs_ioctl+0xa9/0x640
Jun 30 05:10:25 compute1 kernel: [642437.024505] ksys_ioctl+0x67/0x90
Jun 30 05:10:25 compute1 kernel: [642437.024506] __x64_sys_ioctl+0x1a/0x20
Jun 30 05:10:25 compute1 kernel: [642437.024509] do_syscall_64+0x57/0x190
Jun 30 05:10:25 compute1 kernel: [642437.024512] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 30 05:10:25 compute1 kernel: [642437.024514] RIP: 0033:0x7f8571b90427
Jun 30 05:10:25 compute1 kernel: [642437.024519] Code: Bad RIP value.
Jun 30 05:10:25 compute1 kernel: [642437.024520] RSP: 002b:00007f8563d790d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jun 30 05:10:25 compute1 kernel: [642437.024522] RAX: ffffffffffffffda RBX: 00007f85646ea9a0 RCX: 00007f8571b90427
Jun 30 05:10:25 compute1 kernel: [642437.024523] RDX: 000000007fffffff RSI: 0000000000005326 RDI: 0000000000000014
Jun 30 05:10:25 compute1 kernel: [642437.024524] RBP: 0000000000000000 R08: 0000560d289ff710 R09: 0000000000000000
Jun 30 05:10:25 compute1 kernel: [642437.024525] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f83ddb09800
Jun 30 05:10:25 compute1 kernel: [642437.024526] R13: 0000000000000006 R14: 0000560d28f31d20 R15: 0000560d2883d740
Jun 30 05:10:25 compute1 kernel: [642437.024607] kvm D 0 2043 1 0x00000000
Code:
pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.44-1-pve)
pve-manager: 6.2-6 (running version: 6.2-6/ee1d7754)
pve-kernel-5.4: 6.2-3
pve-kernel-helper: 6.2-3
pve-kernel-5.4.44-1-pve: 5.4.44-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-3
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-7
pve-cluster: 6.1-8
pve-container: 3.1-8
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1
-
#5
I did the intel micro code package through apt, was this the method that you were referring to?
This is correct.
When you say you moved the VM to another storage. What is this storage?
Is this on the same HBA/Raid/Onboard controller?
-
#6
The storage it was on, was on a raid controller local to this proxmox system. I moved it to a iSCSI share running on a freenas install.
I thought as a troubleshooting step I would first reinstall proxmox 6.2, and then try 6.1 again. This morning again, it did the same panic once again on 6.2. I was going to try 6.1 again. It is also strange that this time it happened in under than 24 hours, and usually it takes a week. I thought this might be a good option to rule out any hardware issues on my side.
-
#7
You don’t have to install PVE 6.1
It should be enough when you install an old kernel and boot it.
-
#8
So on PVE 6.1 and older kernel it seems to be up over a week (9 days).
pve-kernel-helper: 6.1-6
pve-kernel-5.3: 6.1-5
pve-kernel-5.3.18-2-pve: 5.3.18-2
I am not very familiar with running different kernels, but I assume I would just install these through apt-get? Is there a list of kernel versions so I knew which to install?
I could try upgrading to 6.2 PVE again, and using an older kernel as well.
-
#9
kweevuss are you use storage replication in your configuration?
-
#10
kweevuss are you use storage replication in your configuration?
No nothing regarding storage replication is configured on this system.
-
#11
I am not very familiar with running different kernels, but I assume I would just install these through apt-get?
Yes, you can install it like this.
Code:
apt install pve-kernel-5.3.18-2-pve
udo
Famous Member
-
#12
Hi,
any news on this?
I have an similiar effect, the VMs with lvm are still working, but all lvm-commands (lvs, vgs, pvs) hung and due this the node and all VMs are marked with an ? in the gui.
Code:
Sep 28 21:14:27 pve02 kernel: [ 2783.664724] INFO: task pvs:411112 blocked for more than 362 seconds.
Sep 28 21:14:27 pve02 kernel: [ 2783.692327] Tainted: P O 5.4.60-1-pve #1
Sep 28 21:14:27 pve02 kernel: [ 2783.716796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 28 21:14:27 pve02 kernel: [ 2783.725214] pvs D 0 411112 401506 0x00000004
Sep 28 21:14:27 pve02 kernel: [ 2783.731267] Call Trace:
Sep 28 21:14:27 pve02 kernel: [ 2783.734179] __schedule+0x2e6/0x6f0
Sep 28 21:14:27 pve02 kernel: [ 2783.738137] schedule+0x33/0xa0
Sep 28 21:14:27 pve02 kernel: [ 2783.741746] schedule_preempt_disabled+0xe/0x10
Sep 28 21:14:27 pve02 kernel: [ 2783.746757] __mutex_lock.isra.10+0x2c9/0x4c0
Sep 28 21:14:27 pve02 kernel: [ 2783.751588] __mutex_lock_slowpath+0x13/0x20
Sep 28 21:14:27 pve02 kernel: [ 2783.756325] mutex_lock+0x2c/0x30
Sep 28 21:14:27 pve02 kernel: [ 2783.760072] disk_block_events+0x31/0x80
Sep 28 21:14:27 pve02 kernel: [ 2783.764430] __blkdev_get+0x72/0x560
Sep 28 21:14:27 pve02 kernel: [ 2783.768433] blkdev_get+0xef/0x150
Sep 28 21:14:27 pve02 kernel: [ 2783.772264] ? blkdev_get_by_dev+0x50/0x50
Sep 28 21:14:27 pve02 kernel: [ 2783.776787] blkdev_open+0x87/0xa0
Sep 28 21:14:27 pve02 kernel: [ 2783.780614] do_dentry_open+0x143/0x3a0
Sep 28 21:14:27 pve02 kernel: [ 2783.784942] vfs_open+0x2d/0x30
Sep 28 21:14:27 pve02 kernel: [ 2783.788523] path_openat+0x2e9/0x16f0
Sep 28 21:14:27 pve02 kernel: [ 2783.792615] ? filename_lookup.part.60+0xe0/0x170
Sep 28 21:14:27 pve02 kernel: [ 2783.797748] do_filp_open+0x93/0x100
Sep 28 21:14:27 pve02 kernel: [ 2783.801755] ? __alloc_fd+0x46/0x150
Sep 28 21:14:27 pve02 kernel: [ 2783.805760] do_sys_open+0x177/0x280
Sep 28 21:14:27 pve02 kernel: [ 2783.809845] __x64_sys_openat+0x20/0x30
Sep 28 21:14:27 pve02 kernel: [ 2783.814140] do_syscall_64+0x57/0x190
Sep 28 21:14:27 pve02 kernel: [ 2783.818731] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Sep 28 21:14:27 pve02 kernel: [ 2783.824695] RIP: 0033:0x7f44154d31ae
Sep 28 21:14:27 pve02 kernel: [ 2783.829163] Code: Bad RIP value.
Sep 28 21:14:27 pve02 kernel: [ 2783.833275] RSP: 002b:00007fffa0944800 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Sep 28 21:14:27 pve02 kernel: [ 2783.841829] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f44154d31ae
Sep 28 21:14:27 pve02 kernel: [ 2783.849860] RDX: 0000000000044000 RSI: 00005590d9097d70 RDI: 00000000ffffff9c
Sep 28 21:14:27 pve02 kernel: [ 2783.857990] RBP: 00007fffa0944960 R08: 00005590d7c5ca17 R09: 00007fffa0944a30
Sep 28 21:14:27 pve02 kernel: [ 2783.866100] R10: 0000000000000000 R11: 0000000000000246 R12: 00005590d7c53c68
Sep 28 21:14:27 pve02 kernel: [ 2783.874252] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
After an reboot it’s work for an short time (app. 1h) . First it’s looks that I had trouble with raid-volumes, because I’m tried to expand an raid-volume on the hardware raid controller, but the raidexpansion is working now (this reducing IO) and the issue start again.
Code:
pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.60-1-pve)
pve-manager: 6.2-11 (running version: 6.2-11/22fb4983)
pve-kernel-5.4: 6.2-6
pve-kernel-helper: 6.2-6
pve-kernel-5.3: 6.1-6
pve-kernel-5.0: 6.0-11
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.3.13-3-pve: 5.3.13-3
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.11-pve1
ceph-fuse: 14.2.11-pve1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-1
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-6
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-12
pve-cluster: 6.1-8
pve-container: 3.1-13
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-1
pve-qemu-kvm: 5.0.0-13
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-14
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve1
Same happens an reboot before — where Kernel pve-kernel-5.4.44-2-pve: 5.4.44-2 was active (the issue starts after the online raid-extension, which wasn’t successfull and the whole server stops IO and needed an reset).
The Server is an Dell R7415 with EPYC 7451.
Udo
-
#13
For me not much of a change. I ended up staying on the older proxmox version and kernel as I posted above:
pve-kernel-helper: 6.1-6
pve-kernel-5.3: 6.1-5
pve-kernel-5.3.18-2-pve: 5.3.18-2
Personally I am planning on upgrading this server within the next 6 months. So I was hoping new hardware would change this outcome.
udo
Famous Member
-
#14
For me not much of a change. I ended up staying on the older proxmox version and kernel as I posted above:
pve-kernel-helper: 6.1-6
pve-kernel-5.3: 6.1-5
pve-kernel-5.3.18-2-pve: 5.3.18-2Personally I am planning on upgrading this server within the next 6 months. So I was hoping new hardware would change this outcome.
Hi,
I’m not sure if new hardware is the key — this happens for me with actual hardware (less than 1 year old).
But I’ve running the kernel on other (older) Hardware without trouble…
Udo
-
#15
Udo do you use an IPERC on this server if yes can you tell me which one it is.
Maybe it is related to the Disk IO devices.
We have here no such problems with EPYC CPUs.
Also, your storage config is interesting.
udo
Famous Member
-
#16
Udo do you use an IPERC on this server if yes can you tell me which one it is.
Maybe it is related to the Disk IO devices.
We have here no such problems with EPYC CPUs.
Also, your storage config is interesting.
Hi Wolfgang,
yes the lvm storage is behind an perc:
Code:
PERC H740P Mini (Integriert) Integrated RAID Controller 1 Firmware: 51.13.0-3485 Cache: 8192 MB
The lvm is on an raid6 with 6 hdds (now in expansion-process with 2 further disks)
On this node, there are only 2 lcm-storages defined
Code:
lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir
lvmthin: hdd-lvm
thinpool data
vgname hdd
content rootdir,images
We have other pve-nodes with Epyc, where lvm is working. But with an different kernel (and lvm on raid-1 SSD):
Code:
proxmox-ve: 6.2-1 (running kernel: 5.4.41-1-pve)
Two days ago, there was an new bios-update published — at weekend this will be installed, but I’m unsure which kernel I should take.
Udo
-
#17
I will try here to reproduce it with an LSI 3806 Raid controller. Report if I found something.
udo
Famous Member
-
#18
I will try here to reproduce it with an LSI 3806 Raid controller. Report if I found something.
Hi,
sounds good!
Don’t forget, that we have some IO-Load: (sdb is the lvm hdd-raid)
Code:
ATOP - pve02 2020/10/01 10:43:39 -------------- 10s elapsed
PRC | sys 7.17s | user 7.91s | #proc 960 | #trun 4 | #tslpi 1048 | #tslpu 3 | #zombie 0 | clones 2116 | | #exit 2101 |
CPU | sys 70% | user 79% | irq 1% | idle 4475% | wait 178% | guest 53% | ipc 0.89 | cycl 107MHz | curf 2.89GHz | curscal ?% |
CPL | avg1 5.85 | avg5 5.23 | avg15 5.10 | | | csw 642459 | intr 374679 | | | numcpu 48 |
MEM | tot 251.6G | free 108.8G | cache 882.2M | buff 157.2M | slab 3.9G | shmem 189.6M | shrss 0.0M | vmbal 0.0M | hptot 0.0M | hpuse 0.0M |
SWP | tot 32.0G | free 32.0G | | | | | | | vmcom 177.1G | vmlim 157.8G |
PSI | cs 0/0/0 | ms 0/0/0 | mf 0/0/0 | | is 38/40/42 | if 38/40/42 | | | | |
LVM | d-data_tdata | busy 26% | read 130 | write 1105 | KiB/r 31 | KiB/w 215 | MBr/s 0.4 | MBw/s 23.2 | avq 101.70 | avio 2.09 ms |
LVM | d-data-tpool | busy 26% | read 130 | write 1105 | KiB/r 31 | KiB/w 215 | MBr/s 0.4 | MBw/s 23.2 | avq 101.70 | avio 2.09 ms |
LVM | dm-37 | busy 12% | read 5 | write 3327 | KiB/r 7 | KiB/w 10 | MBr/s 0.0 | MBw/s 3.3 | avq 0.09 | avio 0.37 ms |
LVM | 205--disk--0 | busy 9% | read 130 | write 160 | KiB/r 31 | KiB/w 799 | MBr/s 0.4 | MBw/s 12.5 | avq 56.41 | avio 3.13 ms |
DSK | sdb | busy 28% | read 135 | write 1040 | KiB/r 30 | KiB/w 203 | MBr/s 0.4 | MBw/s 20.7 | avq 85.64 | avio 2.35 ms |
DSK | nvme2n1 | busy 14% | read 43 | write 4563 | KiB/r 9 | KiB/w 26 | MBr/s 0.0 | MBw/s 11.8 | avq 0.00 | avio 0.31 ms |
DSK | nvme1n1 | busy 14% | read 23 | write 4593 | KiB/r 11 | KiB/w 26 | MBr/s 0.0 | MBw/s 11.8 | avq 0.00 | avio 0.31 ms |
NET | transport | tcpi 6790 | tcpo 7718 | udpi 2960 | udpo 2762 | tcpao 2 | tcppo 25 | tcprs 0 | tcpie 0 | udpie 0 |
NET | network | ipi 9830 | ipo 8595 | ipfrw 0 | deliv 9790 | | | | icmpi 40 | icmpo 0 |
Udo
-
#19
Do you have any special setting at the Raid? Blocksize, cache mode,…..
udo
Famous Member
-
#20
Do you have any special setting at the Raid? Blocksize, cache mode,…..
Hi Wolfgang,
not realy — the special thing was an 100GB-Raidvolume for the proxmox-system and the other space for an big lvm-storage. But due the extension I had to migrate and delete the system-raid (but the issue starts just after reboot, where the system-raid still on the raidgroup).
The raid-setting:
Code:
megacli -LDInfo -L1 -a0
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 1 (Target Id: 1)
Name :hdd-raid6
RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3
Size : 7.177 TB
Sector Size : 512
Parity Size : 3.588 TB
State : Optimal
Strip Size : 256 KB
Number Of Drives : 6
Span Depth : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Cached, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Ongoing Progresses:
Reconstruction : Completed 70%, Taken 52 min.
Encryption Type : None
Bad Blocks Exist: No
Is VD Cached: No
Udo
n00b
Joined: 24 Mar 2018
Posts: 41
Posted: Fri Mar 08, 2019 4:07 pm Post subject: [solved] Linux 5.0 kernel panics with «bad rip value&qu | |
|
|
Gentoo kernel panics on boot after upgrading kernel from 4.19.2 to 5.0 It gives me the error «Bad RIP value», I’m unsure what causes it It was compiled using genkernel (using only the «all» command-line argument) kernel config: https://0x0.st/zHiO.txt (https://pastebin.com/RWNZyKn0) Last edited by levente on Sat Mar 23, 2019 8:46 am; edited 1 time in total |
Guru
Joined: 29 Dec 2003
Posts: 454
Location: Toronto, Canada
n00b
Joined: 24 Mar 2018
Posts: 41
Posted: Fri Mar 15, 2019 7:32 am Post subject: | ||
|
||
Since I’m not sure if Gentoo saves backtraces at all, I took the easy route and just took a picture of my computer Picture: http://0x0.st/zXJn.jpg I hope it didn’t cut anything important off |
n00b
Joined: 14 Feb 2019
Posts: 47
Location: One small country …
Veteran
Joined: 22 Oct 2002
Posts: 1052
Location: The Holy city of Honolulu
Posted: Fri Mar 15, 2019 5:17 pm Post subject: Re: Linux 5.0 kernel panics with «bad rip value» | ||||||
|
||||||
5 is stable. https://www.kernel.org/
|
Developer
Joined: 01 Feb 2004
Posts: 3910
Location: Hamburg
n00b
Joined: 14 Feb 2019
Posts: 47
Location: One small country …
Watchman
Joined: 21 May 2004
Posts: 6025
Location: Removed by Neddy
Moderator
Joined: 06 Mar 2007
Posts: 20723
Posted: Sat Mar 16, 2019 1:19 am Post subject: | ||||||||
|
||||||||
That is a Linux kernel issue, not a Gentoo issue. The kernel does not persist panic text to your local disk because there is nowhere to save it. The kernel can send the text over the network or a serial port, so that some other system can save it.
OP did not say it was urgent. The kernel he picked is a released kernel that should work if managed properly. He wants help managing it. It’s a reasonable request to put in this forum. |
n00b
Joined: 24 Mar 2018
Posts: 41
Posted: Sat Mar 23, 2019 8:46 am Post subject: | |
|
|
Thanks for all the replies
Turns out it was a rookie mistake, I left out the —luks and —lvm options from genkernel |
Display posts from previous:
System information
Type | Version/Name |
---|---|
Distribution Name | Centos |
Distribution Version | 7 |
Kernel Version | 5.4.225-1.el7.elrepo |
Architecture | x86_64 |
OpenZFS Version | zfs-2.1.6-1 |
Describe the problem you’re observing
Context: ZFS is running on a customer’s vmware VM. They want to migrate off the current hypervisor and onto a newer host. But after migration, I see storage errors in «zpool status» and dmesg.
I got the RIP error while trying to avoid the storage errors (by using a virtual controller rather than a paravirtual one, and other experiements). It’s probably not specific to the scsi driver though — I likely only noticed this (RIP) error due to looking closely at syslog during this experiment.
The RIP error is not itself causing us any issue, and was only observed during errors from the scsi controller. Feel free to close the ticket if there’s nothing interesting to be done about it.
Describe how to reproduce the problem
Occurred when running on the customer’s newer vmware host…
Include any warning/errors/backtraces from the system logs
[ 247.125781] INFO: task txg_sync:2314 blocked for more than 122 seconds.
[ 247.125849] Tainted: P OE 5.4.225-1.el7.elrepo.x86_64 #1
[ 247.125938] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 247.126010] txg_sync D 0 2314 2 0x80004000
[ 247.126014] Call Trace:
[ 247.126025] __schedule+0x2d2/0x730
[ 247.126033] ? __internal_add_timer+0x2d/0x40
[ 247.126036] schedule+0x42/0xb0
[ 247.126040] schedule_timeout+0x8a/0x160
[ 247.126205] ? zio_issue_async+0x53/0x90 [zfs]
[ 247.126210] ? __next_timer_interrupt+0xe0/0xe0
[ 247.126215] io_schedule_timeout+0x1e/0x50
[ 247.126225] __cv_timedwait_common+0x131/0x170 [spl]
[ 247.126231] ? finish_wait+0x80/0x80
[ 247.126240] __cv_timedwait_io+0x19/0x20 [spl]
[ 247.126387] zio_wait+0x136/0x2a0 [zfs]
[ 247.126503] dsl_pool_sync+0xf2/0x500 [zfs]
[ 247.126633] spa_sync_iterate_to_convergence+0xf8/0x2e0 [zfs]
[ 247.126764] spa_sync+0x476/0x930 [zfs]
[ 247.126924] txg_sync_thread+0x26f/0x3f0 [zfs]
[ 247.127057] ? txg_fini+0x270/0x270 [zfs]
[ 247.127074] thread_generic_wrapper+0x79/0x90 [spl]
[ 247.127080] kthread+0x106/0x140
[ 247.127091] ? __thread_exit+0x20/0x20 [spl]
[ 247.127095] ? __kthread_cancel_work+0x40/0x40
[ 247.127100] ret_from_fork+0x1f/0x40
[ 247.127135] INFO: task postmaster:5349 blocked for more than 122 seconds.
[ 247.127200] Tainted: P OE 5.4.225-1.el7.elrepo.x86_64 #1
[ 247.127262] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 247.127344] postmaster D 0 5349 4284 0x00000080
[ 247.127348] Call Trace:
[ 247.127352] __schedule+0x2d2/0x730
[ 247.127356] schedule+0x42/0xb0
[ 247.127365] cv_wait_common+0xfd/0x130 [spl]
[ 247.127370] ? finish_wait+0x80/0x80
[ 247.127379] __cv_wait+0x15/0x20 [spl]
[ 247.127523] zfs_rangelock_enter_impl+0x134/0x260 [zfs]
[ 247.127661] ? zfs_uio_prefaultpages+0x102/0x120 [zfs]
[ 247.127810] zfs_rangelock_enter+0x11/0x20 [zfs]
[ 247.127959] zfs_write+0xa19/0xd80 [zfs]
[ 247.128096] zpl_iter_write+0xf2/0x170 [zfs]
[ 247.128103] new_sync_write+0x125/0x1c0
[ 247.128109] __vfs_write+0x29/0x40
[ 247.128113] vfs_write+0xb9/0x1a0
[ 247.128116] ksys_write+0x67/0xe0
[ 247.128120] __x64_sys_write+0x1a/0x20
[ 247.128125] do_syscall_64+0x60/0x1b0
[ 247.128130] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[ 247.128134] RIP: 0033:0x7f05ff4c39b0
[ 247.128141] Code: Bad RIP value.
Description of problem:
I have installed Debian 10 Buster. EA01A NEC M2M LTE Dongle is attached to Device and configured for internet access.
I have scheduled a reboot everyday at 1 AM. and I have noticed some times after reboot the system does not boot up properly. And shows “Code: Bad RIP value”.
This issue does not always occur and is not able to be reproduced yet.
Machine details:
Machine details:
root@DEB:/home# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 142
Model name: Intel(R) Core(TM) i3-8109U CPU @ 3.00GHz
Stepping: 10
CPU MHz: 2000.001
CPU max MHz: 3600.0000
CPU min MHz: 400.0000
BogoMIPS: 6000.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K
NUMA node0 CPU(s): 0-3
root@DEB:/home# uname -a
Linux DEB 4.19.0-10-amd64 #1 SMP Debian 4.19.132-1 (2020-07-24) x86_64 GNU/Linux
Following are the syslogs at the time of reboot:
Logs:
2020-11-13T01:07:57.591035+09:00 <daemon.info> smartd[520]: Device: /dev/sda [SAT], state read from /var/lib/smartmontools/smartd.SPCC_M_2_SSD-P2000559000000025478.ata.state
2020-11-13T01:07:57.591139+09:00 <daemon.info> smartd[520]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices
2020-11-13T01:07:57.591546+09:00 <daemon.info> systemd[1]: Started LSB: Color ANSI System Logo.
2020-11-13T01:07:57.593915+09:00 <daemon.info> smartd[520]: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.SPCC_M_2_SSD-P2000559000000025478.ata.state
2020-11-13T01:07:57.598691+09:00 <daemon.info> accounts-daemon[502]: started daemon version 0.6.45
2020-11-13T01:07:57.598824+09:00 <daemon.info> systemd[1]: Started Accounts Service.
2020-11-13T01:07:57.601006+09:00 <daemon.info> dphys-swapfile[518]: computing size, want /var/swap=15708MByte, restricting to 50% of remaining disk size: 6701MBytes, restricting to config limit: 4096MBytes, checking existing: keeping it
2020-11-13T01:07:57.646702+09:00 <daemon.info> loadcpufreq[504]: Loading cpufreq kernel modules...done (acpi-cpufreq).
2020-11-13T01:07:57.647011+09:00 <daemon.info> systemd[1]: Started LSB: Load kernel modules needed to enable cpufreq scaling.
2020-11-13T01:07:57.647738+09:00 <daemon.info> systemd[1]: Starting LSB: set CPUFreq kernel parameters...
2020-11-13T01:07:57.654960+09:00 <daemon.info> cpufrequtils[645]: CPUFreq Utilities: Setting ondemand CPUFreq governor...disabled, governor not available...done.
2020-11-13T01:07:57.655231+09:00 <daemon.info> systemd[1]: Started LSB: set CPUFreq kernel parameters.
2020-11-13T01:07:57.673396+09:00 <daemon.info> systemd[1]: Started dphys-swapfile - set up, mount/unmount, and delete a swap file.
2020-11-13T01:07:57.674051+09:00 <kern.info> kernel: [ 189.360869] Adding 4194300k swap on /var/swap. Priority:-2 extents:8 across:6258688k SSFS
2020-11-13T01:07:57.872792+09:00 <daemon.info> systemd[1]: Started Login Service.
2020-11-13T01:07:58.538721+09:00 <daemon.info> systemd[1]: Started Save/Restore Sound Card State.
2020-11-13T01:07:58.539084+09:00 <daemon.info> systemd[1]: Reached target Sound Card.
2020-11-13T01:08:14.530986+09:00 <daemon.notice> nscd: 517 checking for monitored file `/etc/resolv.conf': No such file or directory
2020-11-13T01:08:27.606987+09:00 <daemon.warning> dbus-daemon[513]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 30029ms)
2020-11-13T01:08:51.154249+09:00 <kern.err> kernel: [ 242.838955] INFO: task systemd-udevd:291 blocked for more than 120 seconds.
2020-11-13T01:08:51.154283+09:00 <kern.err> kernel: [ 242.838967] Tainted: G U 4.19.0-12-amd64 #1 Debian 4.19.152-1
2020-11-13T01:08:51.154287+09:00 <kern.err> kernel: [ 242.838970] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2020-11-13T01:08:51.154291+09:00 <kern.info> kernel: [ 242.838975] systemd-udevd D 0 291 277 0x00000324
2020-11-13T01:08:51.154295+09:00 <kern.warning> kernel: [ 242.838979] Call Trace:
2020-11-13T01:08:51.154321+09:00 <kern.warning> kernel: [ 242.838989] __schedule+0x29f/0x840
2020-11-13T01:08:51.154326+09:00 <kern.warning> kernel: [ 242.838995] schedule+0x28/0x80
2020-11-13T01:08:51.154330+09:00 <kern.warning> kernel: [ 242.838999] schedule_preempt_disabled+0xa/0x10
2020-11-13T01:08:51.154333+09:00 <kern.warning> kernel: [ 242.839001] __mutex_lock.isra.8+0x2b5/0x4a0
2020-11-13T01:08:51.154335+09:00 <kern.warning> kernel: [ 242.839006] ? addrconf_notify+0x31c/0xae0
2020-11-13T01:08:51.154338+09:00 <kern.warning> kernel: [ 242.839016] nf_tables_netdev_event+0x9b/0x1a0 [nf_tables]
2020-11-13T01:08:51.154341+09:00 <kern.warning> kernel: [ 242.839023] notifier_call_chain+0x47/0x70
2020-11-13T01:08:51.154344+09:00 <kern.warning> kernel: [ 242.839028] dev_change_name+0x1fa/0x330
2020-11-13T01:08:51.154347+09:00 <kern.warning> kernel: [ 242.839033] do_setlink+0x729/0xef0
2020-11-13T01:08:51.154350+09:00 <kern.warning> kernel: [ 242.839040] ? blk_mq_dispatch_rq_list+0x392/0x590
2020-11-13T01:08:51.154353+09:00 <kern.warning> kernel: [ 242.839044] ? elv_rb_del+0x1f/0x30
2020-11-13T01:08:51.154356+09:00 <kern.warning> kernel: [ 242.839047] ? deadline_remove_request+0x55/0xc0
2020-11-13T01:08:51.154359+09:00 <kern.warning> kernel: [ 242.839050] ? blk_mq_do_dispatch_sched+0x91/0x120
2020-11-13T01:08:51.154362+09:00 <kern.warning> kernel: [ 242.839054] ? __d_alloc+0x24/0x240
2020-11-13T01:08:51.154364+09:00 <kern.warning> kernel: [ 242.839058] rtnl_setlink+0xd9/0x130
2020-11-13T01:08:51.154368+09:00 <kern.warning> kernel: [ 242.839065] rtnetlink_rcv_msg+0x2b1/0x360
2020-11-13T01:08:51.154370+09:00 <kern.warning> kernel: [ 242.839069] ? _cond_resched+0x15/0x30
2020-11-13T01:08:51.154373+09:00 <kern.warning> kernel: [ 242.839072] ? rtnl_calcit.isra.33+0x100/0x100
2020-11-13T01:08:51.154374+09:00 <kern.warning> kernel: [ 242.839076] netlink_rcv_skb+0x4c/0x120
2020-11-13T01:08:51.154377+09:00 <kern.warning> kernel: [ 242.839081] netlink_unicast+0x181/0x210
2020-11-13T01:08:51.154380+09:00 <kern.warning> kernel: [ 242.839084] netlink_sendmsg+0x204/0x3d0
2020-11-13T01:08:51.154382+09:00 <kern.warning> kernel: [ 242.839088] sock_sendmsg+0x36/0x40
2020-11-13T01:08:51.154384+09:00 <kern.warning> kernel: [ 242.839090] __sys_sendto+0xee/0x160
2020-11-13T01:08:51.154387+09:00 <kern.warning> kernel: [ 242.839096] ? syscall_trace_enter+0x192/0x2b0
2020-11-13T01:08:51.154390+09:00 <kern.warning> kernel: [ 242.839099] __x64_sys_sendto+0x24/0x30
2020-11-13T01:08:51.154392+09:00 <kern.warning> kernel: [ 242.839102] do_syscall_64+0x53/0x110
2020-11-13T01:08:51.154395+09:00 <kern.warning> kernel: [ 242.839105] entry_SYSCALL_64_after_hwframe+0x44/0xa9
2020-11-13T01:08:51.154398+09:00 <kern.warning> kernel: [ 242.839108] RIP: 0033:0x7fdf27a749b7
2020-11-13T01:08:51.154401+09:00 <kern.warning> kernel: [ 242.839115] Code: Bad RIP value.
2020-11-13T01:08:51.154403+09:00 <kern.warning> kernel: [ 242.839117] RSP: 002b:00007ffc41f57038 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
2020-11-13T01:08:51.154406+09:00 <kern.warning> kernel: [ 242.839119] RAX: ffffffffffffffda RBX: 0000557b32c847a0 RCX: 00007fdf27a749b7
2020-11-13T01:08:51.154408+09:00 <kern.warning> kernel: [ 242.839120] RDX: 0000000000000034 RSI: 0000557b32c84be0 RDI: 000000000000000f
2020-11-13T01:08:51.154411+09:00 <kern.warning> kernel: [ 242.839121] RBP: 0000000000000003 R08: 00007ffc41f570f0 R09: 0000000000000010
2020-11-13T01:08:51.154413+09:00 <kern.warning> kernel: [ 242.839123] R10: 0000000000000000 R11: 0000000000000246 R12: 0000557b32c77898
2020-11-13T01:08:51.154416+09:00 <kern.warning> kernel: [ 242.839124] R13: 0000557b32c84ca0 R14: 0000557b32c69150 R15: 0000557b32c5fe70
2020-11-13T01:08:51.154419+09:00 <kern.err> kernel: [ 242.839148] INFO: task systemd-udevd:320 blocked for more than 120 seconds.
2020-11-13T01:08:51.154423+09:00 <kern.err> kernel: [ 242.839154] Tainted: G U 4.19.0-12-amd64 #1 Debian 4.19.152-1
2020-11-13T01:08:51.154426+09:00 <kern.err> kernel: [ 242.839157] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2020-11-13T01:08:51.154428+09:00 <kern.info> kernel: [ 242.839161] systemd-udevd D 0 320 277 0x80000324
2020-11-13T01:08:51.154432+09:00 <kern.warning> kernel: [ 242.839163] Call Trace:
2020-11-13T01:08:51.154436+09:00 <kern.warning> kernel: [ 242.839169] __schedule+0x29f/0x840
2020-11-13T01:08:51.154439+09:00 <kern.warning> kernel: [ 242.839173] schedule+0x28/0x80
2020-11-13T01:08:51.154443+09:00 <kern.warning> kernel: [ 242.839177] schedule_preempt_disabled+0xa/0x10
2020-11-13T01:08:51.154448+09:00 <kern.warning> kernel: [ 242.839179] __mutex_lock.isra.8+0x2b5/0x4a0
2020-11-13T01:08:51.154475+09:00 <kern.warning> kernel: [ 242.839184] register_netdevice_notifier+0x37/0x230
2020-11-13T01:08:51.154481+09:00 <kern.warning> kernel: [ 242.839189] ? kobject_put+0x23/0x1b0
2020-11-13T01:08:51.154486+09:00 <kern.warning> kernel: [ 242.839192] ? 0xffffffffc0c39000
2020-11-13T01:08:51.154490+09:00 <kern.warning> kernel: [ 242.839231] cfg80211_init+0x37/0xcb [cfg80211]
2020-11-13T01:08:51.154494+09:00 <kern.warning> kernel: [ 242.839237] do_one_initcall+0x46/0x1c3
2020-11-13T01:08:51.154497+09:00 <kern.warning> kernel: [ 242.839241] ? free_unref_page_commit+0x91/0x100
2020-11-13T01:08:51.154499+09:00 <kern.warning> kernel: [ 242.839245] ? _cond_resched+0x15/0x30
2020-11-13T01:08:51.154502+09:00 <kern.warning> kernel: [ 242.839249] ? kmem_cache_alloc_trace+0x15e/0x1e0
2020-11-13T01:08:51.154505+09:00 <kern.warning> kernel: [ 242.839254] do_init_module+0x5a/0x210
2020-11-13T01:08:51.154510+09:00 <kern.warning> kernel: [ 242.839258] load_module+0x2167/0x23d0
2020-11-13T01:08:51.154513+09:00 <kern.warning> kernel: [ 242.839264] ? __do_sys_finit_module+0xad/0x110
2020-11-13T01:08:51.154517+09:00 <kern.warning> kernel: [ 242.839266] __do_sys_finit_module+0xad/0x110
2020-11-13T01:08:51.154520+09:00 <kern.warning> kernel: [ 242.839271] do_syscall_64+0x53/0x110
2020-11-13T01:08:51.154523+09:00 <kern.warning> kernel: [ 242.839274] entry_SYSCALL_64_after_hwframe+0x44/0xa9
2020-11-13T01:08:51.154526+09:00 <kern.warning> kernel: [ 242.839276] RIP: 0033:0x7fdf27a6df59
2020-11-13T01:08:51.154545+09:00 <kern.warning> kernel: [ 242.839281] Code: Bad RIP value.
2020-11-13T01:08:51.154548+09:00 <kern.warning> kernel: [ 242.839282] RSP: 002b:00007ffc41f56b68 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
2020-11-13T01:08:51.154550+09:00 <kern.warning> kernel: [ 242.839284] RAX: ffffffffffffffda RBX: 0000557b32c77180 RCX: 00007fdf27a6df59
2020-11-13T01:08:51.154553+09:00 <kern.warning> kernel: [ 242.839285] RDX: 0000000000000000 RSI: 00007fdf27972cad RDI: 000000000000000f
2020-11-13T01:08:51.154556+09:00 <kern.warning> kernel: [ 242.839286] RBP: 00007fdf27972cad R08: 0000000000000000 R09: 0000000000000000
2020-11-13T01:08:51.154558+09:00 <kern.warning> kernel: [ 242.839287] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000000000
2020-11-13T01:08:51.154561+09:00 <kern.warning> kernel: [ 242.839289] R13: 0000557b32c663d0 R14: 0000000000020000 R15: 0000557b32c77180
2020-11-13T01:08:51.154565+09:00 <kern.err> kernel: [ 242.839294] INFO: task modprobe:422 blocked for more than 120 seconds.
2020-11-13T01:08:51.154567+09:00 <kern.err> kernel: [ 242.839299] Tainted: G U 4.19.0-12-amd64 #1 Debian 4.19.152-1
2020-11-13T01:08:51.154570+09:00 <kern.err> kernel: [ 242.839303] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2020-11-13T01:08:51.154572+09:00 <kern.info> kernel: [ 242.839307] modprobe D 0 422 406 0x80000000
2020-11-13T01:08:51.154575+09:00 <kern.warning> kernel: [ 242.839309] Call Trace:
2020-11-13T01:08:51.154578+09:00 <kern.warning> kernel: [ 242.839314] __schedule+0x29f/0x840
2020-11-13T01:08:51.154580+09:00 <kern.warning> kernel: [ 242.839318] schedule+0x28/0x80
2020-11-13T01:08:51.154582+09:00 <kern.warning> kernel: [ 242.839320] rwsem_down_write_failed+0x17c/0x3a0
2020-11-13T01:08:51.154585+09:00 <kern.warning> kernel: [ 242.839324] ? __wake_up_common_lock+0x89/0xc0
2020-11-13T01:08:51.154587+09:00 <kern.warning> kernel: [ 242.839328] ? 0xffffffffc05cb000
2020-11-13T01:08:51.154589+09:00 <kern.warning> kernel: [ 242.839331] call_rwsem_down_write_failed+0x13/0x20
2020-11-13T01:08:51.154592+09:00 <kern.warning> kernel: [ 242.839334] down_write+0x29/0x40
2020-11-13T01:08:51.154594+09:00 <kern.warning> kernel: [ 242.839338] register_pernet_subsys+0x15/0x40
2020-11-13T01:08:51.154597+09:00 <kern.warning> kernel: [ 242.839343] nf_log_ipv6_init+0x12/0x1000 [nf_log_ipv6]
2020-11-13T01:08:51.154600+09:00 <kern.warning> kernel: [ 242.839347] do_one_initcall+0x46/0x1c3
2020-11-13T01:08:51.154602+09:00 <kern.warning> kernel: [ 242.839350] ? free_unref_page_commit+0x91/0x100
2020-11-13T01:08:51.154604+09:00 <kern.warning> kernel: [ 242.839353] ? _cond_resched+0x15/0x30
2020-11-13T01:08:51.154607+09:00 <kern.warning> kernel: [ 242.839357] ? kmem_cache_alloc_trace+0x15e/0x1e0
2020-11-13T01:08:51.154609+09:00 <kern.warning> kernel: [ 242.839360] do_init_module+0x5a/0x210
2020-11-13T01:08:51.154628+09:00 <kern.warning> kernel: [ 242.839363] load_module+0x2167/0x23d0
2020-11-13T01:08:51.154632+09:00 <kern.warning> kernel: [ 242.839369] ? __do_sys_finit_module+0xad/0x110
2020-11-13T01:08:51.154634+09:00 <kern.warning> kernel: [ 242.839371] __do_sys_finit_module+0xad/0x110
2020-11-13T01:08:51.154638+09:00 <kern.warning> kernel: [ 242.839376] do_syscall_64+0x53/0x110
2020-11-13T01:08:51.154640+09:00 <kern.warning> kernel: [ 242.839379] entry_SYSCALL_64_after_hwframe+0x44/0xa9
2020-11-13T01:08:51.154643+09:00 <kern.warning> kernel: [ 242.839380] RIP: 0033:0x7f2f86ff6f59
2020-11-13T01:08:51.154645+09:00 <kern.warning> kernel: [ 242.839384] Code: Bad RIP value.
2020-11-13T01:08:51.154648+09:00 <kern.warning> kernel: [ 242.839386] RSP: 002b:00007ffd08bdd2d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
2020-11-13T01:08:51.154650+09:00 <kern.warning> kernel: [ 242.839387] RAX: ffffffffffffffda RBX: 0000564e805fefd0 RCX: 00007f2f86ff6f59
2020-11-13T01:08:51.154652+09:00 <kern.warning> kernel: [ 242.839389] RDX: 0000000000000000 RSI: 0000564e7f08d3f0 RDI: 0000000000000000
2020-11-13T01:08:51.154655+09:00 <kern.warning> kernel: [ 242.839390] RBP: 0000564e7f08d3f0 R08: 0000000000000000 R09: 0000000000000000
2020-11-13T01:08:51.154657+09:00 <kern.warning> kernel: [ 242.839391] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
2020-11-13T01:08:51.154661+09:00 <kern.warning> kernel: [ 242.839392] R13: 0000564e805feed0 R14: 0000000000040000 R15: 0000564e805fefd0
2020-11-13T01:09:27.747860+09:00 <daemon.warning> systemd[1]: systemd-networkd.service: Start operation timed out. Terminating.
2020-11-13T01:10:01.537439+09:00 <cron.info> CRON[653]: (root) CMD (/usr/sbin/logrotate /etc/logrotate.conf >/dev/null 2>&1)
2020-11-13T01:10:20.249057+09:00 <daemon.notice> nscd: 517 checking for monitored file `/etc/resolv.conf': No such file or directory
2020-11-13T01:10:35.249374+09:00 <daemon.notice> nscd: 517 checking for monitored file `/etc/resolv.conf': No such file or directory
2020-11-13T01:10:51.986271+09:00 <kern.err> kernel: [ 363.670948] INFO: task systemd-udevd:291 blocked for more than 120 seconds.
2020-11-13T01:10:51.986302+09:00 <kern.err> kernel: [ 363.670959] Tainted: G U 4.19.0-12-amd64 #1 Debian 4.19.152-1
2020-11-13T01:10:51.986304+09:00 <kern.err> kernel: [ 363.670963] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2020-11-13T01:10:51.986306+09:00 <kern.info> kernel: [ 363.670967] systemd-udevd D 0 291 277 0x00000324
2020-11-13T01:10:51.986308+09:00 <kern.warning> kernel: [ 363.670970] Call Trace:
2020-11-13T01:10:51.986309+09:00 <kern.warning> kernel: [ 363.670980] __schedule+0x29f/0x840
2020-11-13T01:10:51.986312+09:00 <kern.warning> kernel: [ 363.670985] schedule+0x28/0x80
2020-11-13T01:10:51.986314+09:00 <kern.warning> kernel: [ 363.670989] schedule_preempt_disabled+0xa/0x10
2020-11-13T01:10:51.986339+09:00 <kern.warning> kernel: [ 363.670991] __mutex_lock.isra.8+0x2b5/0x4a0
2020-11-13T01:10:51.986342+09:00 <kern.warning> kernel: [ 363.670995] ? addrconf_notify+0x31c/0xae0
2020-11-13T01:10:51.986344+09:00 <kern.warning> kernel: [ 363.671006] nf_tables_netdev_event+0x9b/0x1a0 [nf_tables]
2020-11-13T01:10:51.986346+09:00 <kern.warning> kernel: [ 363.671012] notifier_call_chain+0x47/0x70
2020-11-13T01:10:51.986348+09:00 <kern.warning> kernel: [ 363.671017] dev_change_name+0x1fa/0x330
2020-11-13T01:10:51.986350+09:00 <kern.warning> kernel: [ 363.671022] do_setlink+0x729/0xef0
2020-11-13T01:10:51.986351+09:00 <kern.warning> kernel: [ 363.671028] ? blk_mq_dispatch_rq_list+0x392/0x590
2020-11-13T01:10:51.986353+09:00 <kern.warning> kernel: [ 363.671033] ? elv_rb_del+0x1f/0x30
2020-11-13T01:10:51.986355+09:00 <kern.warning> kernel: [ 363.671035] ? deadline_remove_request+0x55/0xc0
2020-11-13T01:10:51.986356+09:00 <kern.warning> kernel: [ 363.671038] ? blk_mq_do_dispatch_sched+0x91/0x120
2020-11-13T01:10:51.986358+09:00 <kern.warning> kernel: [ 363.671042] ? __d_alloc+0x24/0x240
2020-11-13T01:10:51.986360+09:00 <kern.warning> kernel: [ 363.671046] rtnl_setlink+0xd9/0x130
2020-11-13T01:10:51.986362+09:00 <kern.warning> kernel: [ 363.671054] rtnetlink_rcv_msg+0x2b1/0x360
2020-11-13T01:10:51.986364+09:00 <kern.warning> kernel: [ 363.671058] ? _cond_resched+0x15/0x30
2020-11-13T01:10:51.986365+09:00 <kern.warning> kernel: [ 363.671061] ? rtnl_calcit.isra.33+0x100/0x100
2020-11-13T01:10:51.986367+09:00 <kern.warning> kernel: [ 363.671065] netlink_rcv_skb+0x4c/0x120
2020-11-13T01:10:51.986368+09:00 <kern.warning> kernel: [ 363.671069] netlink_unicast+0x181/0x210
2020-11-13T01:10:51.986370+09:00 <kern.warning> kernel: [ 363.671073] netlink_sendmsg+0x204/0x3d0
2020-11-13T01:10:51.986371+09:00 <kern.warning> kernel: [ 363.671076] sock_sendmsg+0x36/0x40
2020-11-13T01:10:51.986373+09:00 <kern.warning> kernel: [ 363.671079] __sys_sendto+0xee/0x160
2020-11-13T01:10:51.986374+09:00 <kern.warning> kernel: [ 363.671084] ? syscall_trace_enter+0x192/0x2b0
2020-11-13T01:10:51.986376+09:00 <kern.warning> kernel: [ 363.671087] __x64_sys_sendto+0x24/0x30
2020-11-13T01:10:51.986378+09:00 <kern.warning> kernel: [ 363.671090] do_syscall_64+0x53/0x110
2020-11-13T01:10:51.986379+09:00 <kern.warning> kernel: [ 363.671093] entry_SYSCALL_64_after_hwframe+0x44/0xa9
2020-11-13T01:10:51.986381+09:00 <kern.warning> kernel: [ 363.671096] RIP: 0033:0x7fdf27a749b7
2020-11-13T01:10:51.986383+09:00 <kern.warning> kernel: [ 363.671103] Code: Bad RIP value.
2020-11-13T01:10:51.986384+09:00 <kern.warning> kernel: [ 363.671105] RSP: 002b:00007ffc41f57038 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
2020-11-13T01:10:51.986386+09:00 <kern.warning> kernel: [ 363.671108] RAX: ffffffffffffffda RBX: 0000557b32c847a0 RCX: 00007fdf27a749b7
2020-11-13T01:10:51.986387+09:00 <kern.warning> kernel: [ 363.671109] RDX: 0000000000000034 RSI: 0000557b32c84be0 RDI: 000000000000000f
2020-11-13T01:10:51.986389+09:00 <kern.warning> kernel: [ 363.671110] RBP: 0000000000000003 R08: 00007ffc41f570f0 R09: 0000000000000010
2020-11-13T01:10:51.986390+09:00 <kern.warning> kernel: [ 363.671111] R10: 0000000000000000 R11: 0000000000000246 R12: 0000557b32c77898
2020-11-13T01:10:51.986391+09:00 <kern.warning> kernel: [ 363.671112] R13: 0000557b32c84ca0 R14: 0000557b32c69150 R15: 0000557b32c5fe70
2020-11-13T01:10:51.986393+09:00 <kern.err> kernel: [ 363.671120] INFO: task systemd-udevd:320 blocked for more than 120 seconds.
2020-11-13T01:10:51.986394+09:00 <kern.err> kernel: [ 363.671125] Tainted: G U 4.19.0-12-amd64 #1 Debian 4.19.152-1
2020-11-13T01:10:51.986396+09:00 <kern.err> kernel: [ 363.671129] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2020-11-13T01:10:51.986398+09:00 <kern.info> kernel: [ 363.671132] systemd-udevd D 0 320 277 0x80000324
I am wondering if this is happens because of USB dongle, from above logs it seems something systemd-udevd netlink and rtnetlink send and receive message in “Call Trace”
Thank you so much in advance for your help.
Slack would not start all of a sudden. Slack was installed using Snap.
dmesg
:
[63983.140086] ThreadPoolForeg[58617]: segfault at 34d0 ip 00000000000034d0 sp 00007fdb3e23ce08 error 14
[63983.140096] Code: Bad RIP value.
[63983.375855] traps: Chrome_IOThread[58504] trap int3 ip:55c2908ba1c4 sp:7f0ec54347e0 error:0 in slack[55c28e42e000+5caf000]
Eliah Kagan
117k54 gold badges319 silver badges495 bronze badges
asked Jul 11, 2020 at 3:13
0
I also had issues with Slack launching after an upgrade, with what looks like the same error
Jul 11 16:45:43 samloyd kernel: [171452.625726] traps: Chrome_IOThread[114914] trap int3 ip:56465285b1c4 sp:7f543c1797e0 error:0 in slack[5646503cf000+5caf000]
Reverting to 4.4.3 fixed it.
Zanna♦
69.3k56 gold badges217 silver badges327 bronze badges
answered Jul 11, 2020 at 23:49
1
Followed guidance in this article on the snapcraft forum, and reverted snap.
sudo snap revert slack
Reverted slack to
slack reverted to 4.4.3
Slack was able to start after being reverted.
pomsky
67.2k21 gold badges234 silver badges244 bronze badges
answered Jul 13, 2020 at 16:43
PhilPhil
271 silver badge2 bronze badges
This workaround worked for me using latest Slack 4.7.0 (which was causing the issue at first place):
Open terminal and run snap shell:
snap run --shell slack
then execute slack binary:
$SNAP/usr/lib/slack/slack
Source:
https://forum.snapcraft.io/t/slack-4-7-0-sefgault-ubuntu-18-04/18708/3
So the SEGFAULT is caused by --no-sandbox
flag in command argument list.
EDIT: Ubuntu snap has updated so now on Slack 4.8.0 there’s no issue with startup.
answered Jul 20, 2020 at 9:45
stamsterstamster
8128 silver badges8 bronze badges
You must log in to answer this question.
Not the answer you’re looking for? Browse other questions tagged
.
Not the answer you’re looking for? Browse other questions tagged
.