Match messages in logs (every line would be required to be present in log output Copy from "Messages before crash" column below): | |
Match messages in full crash (every line would be required to be present in crash log output Copy from "Full Crash" column below): | |
Limit to a test: (Copy from below "Failing text"): | |
Delete these reports as invalid (real bug in review or some such) | |
Bug or comment: | |
Extra info: |
Failing Test | Full Crash | Messages before crash | Comment |
---|---|---|---|
replay-vbr test 5b: link checks version of target parent | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [khugepaged:43] Modules linked in: tls osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common virtio_balloon i2c_piix4 pcspkr joydev drm fuse ext4 mbcache jbd2 ata_generic ata_piix crct10dif_pclmul crc32_pclmul libata crc32c_intel virtio_net ghash_clmulni_intel virtio_blk net_failover failover serio_raw CPU: 0 PID: 43 Comm: khugepaged Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.40.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:copy_mc_enhanced_fast_string+0x6/0xf Code: 89 ca e9 cd fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 <f3> a4 31 c0 c3 cc cc cc cc 48 89 c8 c3 cc cc cc cc cc cc cc cc cc RSP: 0018:ffffa6a780167c58 EFLAGS: 00010286 RAX: ffff97e98adf2000 RBX: ffffde16802b7c80 RCX: 0000000000001000 RDX: 0000000000001000 RSI: ffff97e99edf3000 RDI: ffff97e98adf2000 RBP: ffff97e9828e1000 R08: ffff97e984cd6480 R09: ffffde16800a3828 R10: ffff97e9828e0000 R11: 000000000003a500 R12: ffff97e9828e0000 R13: ffff97e984cd6480 R14: ffffde16800a3828 R15: ffff97e9828e0f90 FS: 0000000000000000(0000) GS:ffff97ea3fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f89e61a1648 CR3: 000000008a610003 CR4: 00000000001706f0 Call Trace: <IRQ> ? show_trace_log_lvl+0x1c4/0x2df ? show_trace_log_lvl+0x1c4/0x2df ? __collapse_huge_page_copy.isra.0+0x6f/0x1c0 ? watchdog_timer_fn+0x1ad/0x210 ? __pfx_watchdog_timer_fn+0x10/0x10 ? __hrtimer_run_queues+0x112/0x2b0 ? hrtimer_interrupt+0xfc/0x210 ? kvm_sched_clock_read+0xd/0x20 ? __sysvec_apic_timer_interrupt+0x4e/0x100 ? sysvec_apic_timer_interrupt+0x6d/0x90 </IRQ> <TASK> ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? copy_mc_enhanced_fast_string+0x6/0xf __collapse_huge_page_copy.isra.0+0x6f/0x1c0 collapse_huge_page+0x4e7/0x740 hpage_collapse_scan_pmd+0x470/0x870 khugepaged_scan_mm_slot.constprop.0+0x2a3/0x520 khugepaged+0xdd/0x200 ? __pfx_khugepaged+0x10/0x10 kthread+0xe0/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 43 Comm: khugepaged Kdump: loaded Tainted: G OEL ------- --- 5.14.0-503.40.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack_lvl+0x34/0x48 panic+0x107/0x2bb watchdog_timer_fn.cold+0xc/0x16 ? __pfx_watchdog_timer_fn+0x10/0x10 __hrtimer_run_queues+0x112/0x2b0 hrtimer_interrupt+0xfc/0x210 ? kvm_sched_clock_read+0xd/0x20 __sysvec_apic_timer_interrupt+0x4e/0x100 sysvec_apic_timer_interrupt+0x6d/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x16/0x20 RIP: 0010:copy_mc_enhanced_fast_string+0x6/0xf | Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-48vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: onyx-48vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.lustre-MDT0000.sync_permission=0 Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdt.lustre-MDT0000.commit_on_sharing=0 Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds1_flakey --table "0 3964928 flakey 252:0 0 0 1800 1 drop_writes" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 LustreError: 55958:0:(obd_class.h:478:obd_check_dev()) Device 33 not setup LustreError: 55958:0:(obd_class.h:478:obd_check_dev()) Skipped 23 previous similar messages LDISKFS-fs (dm-3): unmounting filesystem 98bf3e3f-9929-4c32-b724-c68276b25f7b. Lustre: server umount lustre-MDT0000 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: 10093:0:(client.c:2447:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1750357198/real 1750357198] req@ffff97e9ae431380 x1835380953927040/t0(0) o400->MGC10.240.28.49@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1750357214 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0 projid:4294967295 LustreError: MGC10.240.28.49@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail | Link to test |
sanity-quota test 7d: Quota reintegration (Transfer index in multiple bulks) | watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [khugepaged:44] Modules linked in: tls mgc(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common pcspkr virtio_balloon i2c_piix4 joydev fuse drm ext4 mbcache jbd2 ata_generic ata_piix libata crct10dif_pclmul crc32_pclmul crc32c_intel virtio_net ghash_clmulni_intel virtio_blk net_failover failover serio_raw CPU: 1 PID: 44 Comm: khugepaged Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.38.1.el9_5.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:copy_mc_enhanced_fast_string+0x6/0xf Code: 89 ca e9 cd fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 <f3> a4 31 c0 c3 cc cc cc cc 48 89 c8 c3 cc cc cc cc cc cc cc cc cc RSP: 0018:ffffb62c4016fc58 EFLAGS: 00010286 RAX: ffff9771749cb000 RBX: ffffec87c0d272c0 RCX: 0000000000001000 RDX: 0000000000001000 RSI: ffff97725aa0d000 RDI: ffff9771749cb000 RBP: ffff977244e70000 R08: ffff97726a342f00 R09: ffffec87c4139be8 R10: ffff977244e6f000 R11: 000000000003a640 R12: ffff977244e6f000 R13: ffff97726a342f00 R14: ffffec87c4139be8 R15: ffff977244e6fe58 FS: 0000000000000000(0000) GS:ffff97727bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6b66b15000 CR3: 0000000050410003 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> ? show_trace_log_lvl+0x1c4/0x2df ? show_trace_log_lvl+0x1c4/0x2df ? __collapse_huge_page_copy.isra.0+0x6f/0x1c0 ? watchdog_timer_fn+0x1ad/0x210 ? __pfx_watchdog_timer_fn+0x10/0x10 ? __hrtimer_run_queues+0x112/0x2b0 ? hrtimer_interrupt+0xfc/0x210 ? kvm_sched_clock_read+0xd/0x20 ? __sysvec_apic_timer_interrupt+0x4e/0x100 ? sysvec_apic_timer_interrupt+0x6d/0x90 </IRQ> <TASK> ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? copy_mc_enhanced_fast_string+0x6/0xf __collapse_huge_page_copy.isra.0+0x6f/0x1c0 collapse_huge_page+0x4e7/0x740 hpage_collapse_scan_pmd+0x470/0x870 khugepaged_scan_mm_slot.constprop.0+0x2a3/0x520 ? __pfx_wq_barrier_func+0x10/0x10 khugepaged+0xdd/0x200 ? __pfx_khugepaged+0x10/0x10 kthread+0xe0/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 44 Comm: khugepaged Kdump: loaded Tainted: G OEL ------- --- 5.14.0-503.38.1.el9_5.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack_lvl+0x34/0x48 panic+0x107/0x2bb watchdog_timer_fn.cold+0xc/0x16 ? __pfx_watchdog_timer_fn+0x10/0x10 __hrtimer_run_queues+0x112/0x2b0 hrtimer_interrupt+0xfc/0x210 ? kvm_sched_clock_read+0xd/0x20 __sysvec_apic_timer_interrupt+0x4e/0x100 sysvec_apic_timer_interrupt+0x6d/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x16/0x20 RIP: 0010:copy_mc_enhanced_fast_string+0x6/0xf | Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-149vm3.onyx.whamcloud.com: executing _wait_recovery_complete \*.lustre-OST0000.recovery_status 1475 Lustre: DEBUG MARKER: onyx-149vm3.onyx.whamcloud.com: executing _wait_recovery_complete *.lustre-OST0000.recovery_status 1475 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-149vm3.onyx.whamcloud.com: executing _wait_recovery_complete \*.lustre-OST0001.recovery_status 1475 Lustre: DEBUG MARKER: onyx-149vm3.onyx.whamcloud.com: executing _wait_recovery_complete *.lustre-OST0001.recovery_status 1475 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-149vm3.onyx.whamcloud.com: executing _wait_recovery_complete \*.lustre-OST0002.recovery_status 1475 Lustre: DEBUG MARKER: onyx-149vm3.onyx.whamcloud.com: executing _wait_recovery_complete *.lustre-OST0002.recovery_status 1475 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-149vm3.onyx.whamcloud.com: executing _wait_recovery_complete \*.lustre-OST0003.recovery_status 1475 Lustre: DEBUG MARKER: onyx-149vm3.onyx.whamcloud.com: executing _wait_recovery_complete *.lustre-OST0003.recovery_status 1475 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-149vm3.onyx.whamcloud.com: executing _wait_recovery_complete \*.lustre-OST0004.recovery_status 1475 Lustre: DEBUG MARKER: onyx-149vm3.onyx.whamcloud.com: executing _wait_recovery_complete *.lustre-OST0004.recovery_status 1475 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-149vm3.onyx.whamcloud.com: executing _wait_recovery_complete \*.lustre-OST0005.recovery_status 1475 Lustre: DEBUG MARKER: onyx-149vm3.onyx.whamcloud.com: executing _wait_recovery_complete *.lustre-OST0005.recovery_status 1475 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-149vm3.onyx.whamcloud.com: executing _wait_recovery_complete \*.lustre-OST0006.recovery_status 1475 Lustre: DEBUG MARKER: onyx-149vm3.onyx.whamcloud.com: executing _wait_recovery_complete *.lustre-OST0006.recovery_status 1475 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-149vm3.onyx.whamcloud.com: executing _wait_recovery_complete \*.lustre-OST0007.recovery_status 1475 Lustre: DEBUG MARKER: onyx-149vm3.onyx.whamcloud.com: executing _wait_recovery_complete *.lustre-OST0007.recovery_status 1475 | Link to test |
sanity-pcc test 3a: Repeat attach/detach operations | watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [khugepaged:44] Modules linked in: tls osp(OE) ofd(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill i2c_piix4 joydev virtio_balloon intel_rapl_msr intel_rapl_common pcspkr sunrpc drm fuse ext4 mbcache jbd2 ata_generic crct10dif_pclmul crc32_pclmul crc32c_intel virtio_net ata_piix ghash_clmulni_intel libata virtio_blk net_failover failover serio_raw CPU: 0 PID: 44 Comm: khugepaged Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.31.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:copy_mc_enhanced_fast_string+0x6/0xf Code: 89 ca e9 cd fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 <f3> a4 31 c0 c3 cc cc cc cc 48 89 c8 c3 cc cc cc cc cc cc cc cc cc RSP: 0000:ffffa43a8016fc58 EFLAGS: 00010286 RAX: ffff97aa3b118000 RBX: ffffe58740ec4600 RCX: 0000000000001000 RDX: 0000000000001000 RSI: ffff97aa3db5d000 RDI: ffff97aa3b118000 RBP: ffff97aa2cf8a000 R08: ffff97aa36b9ce40 R09: ffffe58740b3e268 R10: ffff97aa2cf89000 R11: 000000000003a500 R12: ffff97aa2cf89000 R13: ffff97aa36b9ce40 R14: ffffe58740b3e268 R15: ffff97aa2cf898c0 FS: 0000000000000000(0000) GS:ffff97aabfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fafc7efd024 CR3: 000000005be10004 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> ? show_trace_log_lvl+0x1c4/0x2df ? show_trace_log_lvl+0x1c4/0x2df ? __collapse_huge_page_copy.isra.0+0x6f/0x1c0 ? watchdog_timer_fn+0x1ad/0x210 ? __pfx_watchdog_timer_fn+0x10/0x10 ? __hrtimer_run_queues+0x112/0x2b0 ? hrtimer_interrupt+0xfc/0x210 ? kvm_sched_clock_read+0xd/0x20 ? __sysvec_apic_timer_interrupt+0x4e/0x100 ? sysvec_apic_timer_interrupt+0x6d/0x90 </IRQ> <TASK> ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? copy_mc_enhanced_fast_string+0x6/0xf __collapse_huge_page_copy.isra.0+0x6f/0x1c0 collapse_huge_page+0x4e7/0x740 hpage_collapse_scan_pmd+0x470/0x870 khugepaged_scan_mm_slot.constprop.0+0x2a3/0x520 khugepaged+0xdd/0x200 ? __pfx_khugepaged+0x10/0x10 kthread+0xe0/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 44 Comm: khugepaged Kdump: loaded Tainted: G OEL ------- --- 5.14.0-503.31.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack_lvl+0x34/0x48 panic+0x107/0x2bb watchdog_timer_fn.cold+0xc/0x16 ? __pfx_watchdog_timer_fn+0x10/0x10 __hrtimer_run_queues+0x112/0x2b0 hrtimer_interrupt+0xfc/0x210 ? kvm_sched_clock_read+0xd/0x20 __sysvec_apic_timer_interrupt+0x4e/0x100 sysvec_apic_timer_interrupt+0x6d/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x16/0x20 RIP: 0010:copy_mc_enhanced_fast_string+0x6/0xf | Link to test | |
sanity test 276: Race between mount and obd_statfs | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [khugepaged:43] Modules linked in: dm_flakey tls osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill intel_rapl_msr intel_rapl_common virtio_balloon pcspkr sunrpc joydev i2c_piix4 drm fuse ext4 mbcache jbd2 ata_generic ata_piix libata crct10dif_pclmul crc32_pclmul virtio_net crc32c_intel ghash_clmulni_intel virtio_blk net_failover failover serio_raw [last unloaded: dm_flakey] CPU: 0 PID: 43 Comm: khugepaged Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.31.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:copy_mc_enhanced_fast_string+0x6/0xf Code: 89 ca e9 cd fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 <f3> a4 31 c0 c3 cc cc cc cc 48 89 c8 c3 cc cc cc cc cc cc cc cc cc RSP: 0000:ffffa302c0167c58 EFLAGS: 00010286 RAX: ffff8b4d435fb000 RBX: fffff912410d7ec0 RCX: 0000000000001000 RDX: 0000000000001000 RSI: ffff8b4d95b2c000 RDI: ffff8b4d435fb000 RBP: ffff8b4d64410000 R08: ffff8b4d90541480 R09: fffff912419103e8 R10: ffff8b4d6440f000 R11: 000000000003a500 R12: ffff8b4d6440f000 R13: ffff8b4d90541480 R14: fffff912419103e8 R15: ffff8b4d6440ffd8 FS: 0000000000000000(0000) GS:ffff8b4dbfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd213aa7000 CR3: 0000000014810006 CR4: 00000000001706f0 Call Trace: <IRQ> ? show_trace_log_lvl+0x1c4/0x2df ? show_trace_log_lvl+0x1c4/0x2df ? __collapse_huge_page_copy.isra.0+0x6f/0x1c0 ? watchdog_timer_fn+0x1ad/0x210 ? __pfx_watchdog_timer_fn+0x10/0x10 ? __hrtimer_run_queues+0x112/0x2b0 ? hrtimer_interrupt+0xfc/0x210 ? kvm_sched_clock_read+0xd/0x20 ? __sysvec_apic_timer_interrupt+0x4e/0x100 ? sysvec_apic_timer_interrupt+0x6d/0x90 </IRQ> <TASK> ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? copy_mc_enhanced_fast_string+0x6/0xf __collapse_huge_page_copy.isra.0+0x6f/0x1c0 collapse_huge_page+0x4e7/0x740 hpage_collapse_scan_pmd+0x470/0x870 khugepaged_scan_mm_slot.constprop.0+0x2a3/0x520 ? __pfx_wq_barrier_func+0x10/0x10 khugepaged+0xdd/0x200 ? __pfx_khugepaged+0x10/0x10 kthread+0xe0/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 43 Comm: khugepaged Kdump: loaded Tainted: G OEL ------- --- 5.14.0-503.31.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack_lvl+0x34/0x48 panic+0x107/0x2bb watchdog_timer_fn.cold+0xc/0x16 ? __pfx_watchdog_timer_fn+0x10/0x10 __hrtimer_run_queues+0x112/0x2b0 hrtimer_interrupt+0xfc/0x210 ? kvm_sched_clock_read+0xd/0x20 __sysvec_apic_timer_interrupt+0x4e/0x100 sysvec_apic_timer_interrupt+0x6d/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x16/0x20 RIP: 0010:copy_mc_enhanced_fast_string+0x6/0xf | Lustre: lustre-OST0000-osc-MDT0001: Connection to lustre-OST0000 (at 10.240.25.177@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 1 previous similar message Lustre: lustre-OST0000-osc-MDT0003: Connection restored to 10.240.25.177@tcp (at 10.240.25.177@tcp) Lustre: Skipped 2 previous similar messages Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all LustreError: lustre-OST0000-osc-MDT0001: operation ost_statfs to node 10.240.25.177@tcp failed: rc = -107 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Autotest: Test running for 255 minutes (lustre-reviews_review-dne-part-1_112718.29) Lustre: DEBUG MARKER: onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all LustreError: lustre-OST0000-osc-MDT0003: operation ost_statfs to node 10.240.25.177@tcp failed: rc = -107 LustreError: Skipped 15 previous similar messages Lustre: lustre-OST0000-osc-MDT0003: Connection to lustre-OST0000 (at 10.240.25.177@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 23 previous similar messages Lustre: lustre-OST0000-osc-MDT0003: Connection restored to 10.240.25.177@tcp (at 10.240.25.177@tcp) Lustre: Skipped 23 previous similar messages Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-67vm6.onyx.whamcloud.com: executing set_default_debug all all | Link to test |
obdfilter-survey test 1c: Object Storage Targets survey, big batch | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [khugepaged:44] Modules linked in: tls lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common virtio_balloon pcspkr i2c_piix4 joydev drm fuse ext4 mbcache jbd2 ata_generic ata_piix crct10dif_pclmul crc32_pclmul crc32c_intel libata ghash_clmulni_intel virtio_net virtio_blk net_failover failover serio_raw CPU: 0 PID: 44 Comm: khugepaged Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.31.1.el9_5.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:copy_mc_enhanced_fast_string+0x6/0xf Code: 89 ca e9 cd fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 <f3> a4 31 c0 c3 cc cc cc cc 48 89 c8 c3 cc cc cc cc cc cc cc cc cc RSP: 0018:ffff9ac80016fc58 EFLAGS: 00010286 RAX: ffff8a3672173000 RBX: ffffd20800c85cc0 RCX: 0000000000001000 RDX: 0000000000001000 RSI: ffff8a36598a9000 RDI: ffff8a3672173000 RBP: ffff8a3675fb1000 R08: ffff8a364a098300 R09: ffffd20800d7ec28 R10: ffff8a3675fb0000 R11: 000000000003a500 R12: ffff8a3675fb0000 R13: ffff8a364a098300 R14: ffffd20800d7ec28 R15: ffff8a3675fb0b98 FS: 0000000000000000(0000) GS:ffff8a36ffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1916fa8948 CR3: 0000000036134002 CR4: 00000000001706f0 Call Trace: <IRQ> ? show_trace_log_lvl+0x1c4/0x2df ? show_trace_log_lvl+0x1c4/0x2df ? __collapse_huge_page_copy.isra.0+0x6f/0x1c0 ? watchdog_timer_fn+0x1ad/0x210 ? __pfx_watchdog_timer_fn+0x10/0x10 ? __hrtimer_run_queues+0x112/0x2b0 ? hrtimer_interrupt+0xfc/0x210 ? kvm_sched_clock_read+0xd/0x20 ? __sysvec_apic_timer_interrupt+0x4e/0x100 ? sysvec_apic_timer_interrupt+0x6d/0x90 </IRQ> <TASK> ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? copy_mc_enhanced_fast_string+0x6/0xf __collapse_huge_page_copy.isra.0+0x6f/0x1c0 collapse_huge_page+0x4e7/0x740 hpage_collapse_scan_pmd+0x470/0x870 khugepaged_scan_mm_slot.constprop.0+0x2a3/0x520 khugepaged+0xdd/0x200 ? __pfx_khugepaged+0x10/0x10 kthread+0xe0/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 44 Comm: khugepaged Kdump: loaded Tainted: G OEL ------- --- 5.14.0-503.31.1.el9_5.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack_lvl+0x34/0x48 panic+0x107/0x2bb watchdog_timer_fn.cold+0xc/0x16 ? __pfx_watchdog_timer_fn+0x10/0x10 __hrtimer_run_queues+0x112/0x2b0 hrtimer_interrupt+0xfc/0x210 ? kvm_sched_clock_read+0xd/0x20 __sysvec_apic_timer_interrupt+0x4e/0x100 sysvec_apic_timer_interrupt+0x6d/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x16/0x20 RIP: 0010:copy_mc_enhanced_fast_string+0x6/0xf | Autotest: Test running for 10 minutes (lustre-reviews_review-dne-part-9_112456.37) Autotest: Test running for 15 minutes (lustre-reviews_review-dne-part-9_112456.37) Autotest: Test running for 20 minutes (lustre-reviews_review-dne-part-9_112456.37) | Link to test |
sanityn test 109: Race with several mount instances on 1 node | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [khugepaged:44] Modules linked in: xfs libcrc32c ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lzstd(OE) llz4hc(OE) llz4(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey nfsv3 nfs_acl loop dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlxdevm(OE) ib_uverbs(OE) ib_core(OE) psample mlxfw(OE) mlx_compat(OE) macsec tls pci_hyperv_intf intel_rapl_msr intel_rapl_common virtio_balloon i2c_piix4 pcspkr joydev sunrpc drm fuse ext4 mbcache jbd2 ata_generic ata_piix libata crct10dif_pclmul crc32_pclmul crc32c_intel virtio_net ghash_clmulni_intel net_failover virtio_blk failover serio_raw [last unloaded: libcfs(OE)] CPU: 1 PID: 44 Comm: khugepaged Kdump: loaded Tainted: G W OE ------- --- 5.14.0-503.34.1_lustre.ddn1.el9.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:copy_mc_enhanced_fast_string+0x6/0xf Code: 89 ca e9 cd fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 <f3> a4 31 c0 c3 cc cc cc cc 48 89 c8 c3 cc cc cc cc cc cc cc cc cc RSP: 0018:ffffb7bbc016fc58 EFLAGS: 00010286 RAX: ffff9b6f8c5e3000 RBX: fffff933413178c0 RCX: 0000000000001000 RDX: 0000000000001000 RSI: ffff9b6f89214000 RDI: ffff9b6f8c5e3000 RBP: ffff9b6fd650c000 R08: ffff9b6f47ddd240 R09: fffff933425942e8 R10: ffff9b6fd650b000 R11: 000000000003a500 R12: ffff9b6fd650b000 R13: ffff9b6f47ddd240 R14: fffff933425942e8 R15: ffff9b6fd650bf18 FS: 0000000000000000(0000) GS:ffff9b6fffd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055b8e6d8e008 CR3: 0000000003370004 CR4: 00000000001706f0 Call Trace: <IRQ> ? show_trace_log_lvl+0x1c4/0x2df ? show_trace_log_lvl+0x1c4/0x2df ? __collapse_huge_page_copy.isra.0+0x6f/0x1c0 ? watchdog_timer_fn+0x1ad/0x210 ? __pfx_watchdog_timer_fn+0x10/0x10 ? __hrtimer_run_queues+0x112/0x2b0 ? hrtimer_interrupt+0xfc/0x210 ? kvm_sched_clock_read+0xd/0x20 ? __sysvec_apic_timer_interrupt+0x4e/0x100 ? sysvec_apic_timer_interrupt+0x6d/0x90 </IRQ> <TASK> ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? copy_mc_enhanced_fast_string+0x6/0xf __collapse_huge_page_copy.isra.0+0x6f/0x1c0 collapse_huge_page+0x4e7/0x740 hpage_collapse_scan_pmd+0x470/0x870 khugepaged_scan_mm_slot.constprop.0+0x2a3/0x520 ? __pfx_wq_barrier_func+0x10/0x10 khugepaged+0xdd/0x200 ? __pfx_autoremove_wake_function+0x10/0x10 ? __pfx_khugepaged+0x10/0x10 kthread+0xe0/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 44 Comm: khugepaged Kdump: loaded Tainted: G W OEL ------- --- 5.14.0-503.34.1_lustre.ddn1.el9.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack_lvl+0x34/0x48 panic+0x107/0x2bb watchdog_timer_fn.cold+0xc/0x16 ? __pfx_watchdog_timer_fn+0x10/0x10 __hrtimer_run_queues+0x112/0x2b0 hrtimer_interrupt+0xfc/0x210 ? kvm_sched_clock_read+0xd/0x20 __sysvec_apic_timer_interrupt+0x4e/0x100 sysvec_apic_timer_interrupt+0x6d/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x16/0x20 RIP: 0010:copy_mc_enhanced_fast_string+0x6/0xf | Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 1 Lustre: DEBUG MARKER: Iteration 1 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 2 Lustre: DEBUG MARKER: Iteration 2 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 3 Lustre: DEBUG MARKER: Iteration 3 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 4 Lustre: DEBUG MARKER: Iteration 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 5 Lustre: DEBUG MARKER: Iteration 5 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 6 Lustre: DEBUG MARKER: Iteration 6 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 7 Lustre: DEBUG MARKER: Iteration 7 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 8 Lustre: DEBUG MARKER: Iteration 8 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 9 Lustre: DEBUG MARKER: Iteration 9 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 10 Lustre: DEBUG MARKER: Iteration 10 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 11 Lustre: DEBUG MARKER: Iteration 11 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 12 Lustre: DEBUG MARKER: Iteration 12 Autotest: Test running for 855 minutes (lustre-b_es7_0_full-part-3_100.59) Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 13 Lustre: DEBUG MARKER: Iteration 13 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 14 Lustre: DEBUG MARKER: Iteration 14 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 15 Lustre: DEBUG MARKER: Iteration 15 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 16 Lustre: DEBUG MARKER: Iteration 16 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 17 Lustre: DEBUG MARKER: Iteration 17 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 18 Lustre: DEBUG MARKER: Iteration 18 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 19 Lustre: DEBUG MARKER: Iteration 19 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 20 Lustre: DEBUG MARKER: Iteration 20 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 21 Lustre: DEBUG MARKER: Iteration 21 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 22 Lustre: DEBUG MARKER: Iteration 22 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 23 Lustre: DEBUG MARKER: Iteration 23 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 24 Lustre: DEBUG MARKER: Iteration 24 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 25 Lustre: DEBUG MARKER: Iteration 25 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 26 Lustre: DEBUG MARKER: Iteration 26 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 27 Lustre: DEBUG MARKER: Iteration 27 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 28 Lustre: DEBUG MARKER: Iteration 28 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 29 Lustre: DEBUG MARKER: Iteration 29 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 30 Lustre: DEBUG MARKER: Iteration 30 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 31 Lustre: DEBUG MARKER: Iteration 31 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Iteration 32 Lustre: DEBUG MARKER: Iteration 32 | Link to test |
sanity-lfsck test 44: umount while lfsck is stopping | watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [khugepaged:44] Modules linked in: dm_flakey tls osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill intel_rapl_msr intel_rapl_common joydev pcspkr i2c_piix4 virtio_balloon sunrpc drm fuse ext4 mbcache jbd2 ata_generic ata_piix libata virtio_net crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_blk net_failover failover serio_raw [last unloaded: dm_flakey] CPU: 1 PID: 44 Comm: khugepaged Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.23.2_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:copy_mc_enhanced_fast_string+0x6/0xf Code: 89 ca e9 cd fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 <f3> a4 31 c0 c3 cc cc cc cc 48 89 c8 c3 cc cc cc cc cc cc cc cc cc RSP: 0018:ffffafe34016fc58 EFLAGS: 00010286 RAX: ffff9db34b1d4000 RBX: ffffdb15c12c7500 RCX: 0000000000001000 RDX: 0000000000001000 RSI: ffff9db358da9000 RDI: ffff9db34b1d4000 RBP: ffff9db304484000 R08: ffff9db303e64300 R09: ffffdb15c01120e8 R10: ffff9db304483000 R11: 000000000003a500 R12: ffff9db304483000 R13: ffff9db303e64300 R14: ffffdb15c01120e8 R15: ffff9db304483ea0 FS: 0000000000000000(0000) GS:ffff9db3bfd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa966fec6b4 CR3: 000000003602e005 CR4: 00000000001706f0 Call Trace: <IRQ> ? show_trace_log_lvl+0x1c4/0x2df ? show_trace_log_lvl+0x1c4/0x2df ? __collapse_huge_page_copy.isra.0+0x6f/0x1c0 ? watchdog_timer_fn+0x1ad/0x210 ? __pfx_watchdog_timer_fn+0x10/0x10 ? __hrtimer_run_queues+0x112/0x2b0 ? hrtimer_interrupt+0xfc/0x210 ? kvm_sched_clock_read+0xd/0x20 ? __sysvec_apic_timer_interrupt+0x4e/0x100 ? sysvec_apic_timer_interrupt+0x6d/0x90 </IRQ> <TASK> ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? copy_mc_enhanced_fast_string+0x6/0xf __collapse_huge_page_copy.isra.0+0x6f/0x1c0 collapse_huge_page+0x4e7/0x740 hpage_collapse_scan_pmd+0x470/0x870 khugepaged_scan_mm_slot.constprop.0+0x2a3/0x520 ? __pfx_wq_barrier_func+0x10/0x10 khugepaged+0xdd/0x200 ? __pfx_khugepaged+0x10/0x10 kthread+0xe0/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 44 Comm: khugepaged Kdump: loaded Tainted: G OEL ------- --- 5.14.0-503.23.2_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack_lvl+0x34/0x48 panic+0x107/0x2bb watchdog_timer_fn.cold+0xc/0x16 ? __pfx_watchdog_timer_fn+0x10/0x10 __hrtimer_run_queues+0x112/0x2b0 hrtimer_interrupt+0xfc/0x210 ? kvm_sched_clock_read+0xd/0x20 __sysvec_apic_timer_interrupt+0x4e/0x100 sysvec_apic_timer_interrupt+0x6d/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x16/0x20 RIP: 0010:copy_mc_enhanced_fast_string+0x6/0xf | Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_val=3 fail_loc=0x1600 Lustre: DEBUG MARKER: /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r LustreError: 366549:0:(lfsck_engine.c:836:lfsck_master_oit_engine()) cfs_fail_timeout id 1600 sleeping for 3000ms LustreError: 366549:0:(lfsck_engine.c:836:lfsck_master_oit_engine()) Skipped 2 previous similar messages Lustre: DEBUG MARKER: /usr/sbin/lctl lfsck_stop -M lustre-MDT0000 Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 LustreError: 366549:0:(lfsck_engine.c:836:lfsck_master_oit_engine()) cfs_fail_timeout id 1600 awake LustreError: 366549:0:(lfsck_engine.c:836:lfsck_master_oit_engine()) Skipped 3 previous similar messages LDISKFS-fs (dm-3): unmounting filesystem 84690e1c-4763-4b59-ad0a-b04dc3e90d4b. Lustre: server umount lustre-MDT0000 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_val=0 fail_loc=0 Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1 Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre /dev/mapper/mds1_flakey /mnt/lustre-mds1 LDISKFS-fs (dm-3): mounted filesystem 84690e1c-4763-4b59-ad0a-b04dc3e90d4b r/w with ordered data mode. Quota mode: journalled. LustreError: MGC10.240.28.46@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail LustreError: Skipped 4 previous similar messages Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180 Lustre: Skipped 5 previous similar messages Lustre: lustre-MDT0000: in recovery but waiting for the first client to connect Lustre: Skipped 2 previous similar messages Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 5 clients reconnect Lustre: Skipped 2 previous similar messages Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: onyx-99vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: onyx-99vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace Autotest: Test running for 130 minutes (lustre-reviews_review-dne-part-2_111482.30) Lustre: lustre-MDT0000: Recovery over after 0:04, of 5 clients 5 recovered and 0 were evicted. Lustre: Skipped 2 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 LDISKFS-fs (dm-3): unmounting filesystem 84690e1c-4763-4b59-ad0a-b04dc3e90d4b. Lustre: server umount lustre-MDT0000 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup remove /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup mknodes >/dev/null 2>&1 Lustre: DEBUG MARKER: modprobe -r dm-flakey Lustre: 10105:0:(client.c:2340:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1741080224/real 1741080224] req@ffff9db33394a700 x1825647223615488/t0(0) o400->MGC10.240.28.46@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1741080240 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0 Lustre: 10105:0:(client.c:2340:ptlrpc_expire_one_request()) Skipped 13 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds3 LDISKFS-fs (dm-4): unmounting filesystem 3db05193-cc21-40ae-b04d-b9caccab2112. Lustre: server umount lustre-MDT0002 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds3_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds3_flakey Lustre: DEBUG MARKER: dmsetup remove /dev/mapper/mds3_flakey Lustre: DEBUG MARKER: dmsetup mknodes >/dev/null 2>&1 Lustre: DEBUG MARKER: modprobe -r dm-flakey Lustre: DEBUG MARKER: sysctl -wq kernel/kptr_restrict=1 || true Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lfsck: start setup 09:24:56 \(1741080296\) === Lustre: DEBUG MARKER: === sanity-lfsck: start setup 09:24:56 (1741080296) === Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds3_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-70vm7.onyx.whamcloud.com: executing set_hostid Lustre: DEBUG MARKER: onyx-70vm7.onyx.whamcloud.com: executing set_hostid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_hostid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm3.onyx.whamcloud.com: executing set_hostid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm3.onyx.whamcloud.com: executing set_hostid Lustre: DEBUG MARKER: onyx-99vm3.onyx.whamcloud.com: executing set_hostid Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_hostid Lustre: DEBUG MARKER: onyx-99vm3.onyx.whamcloud.com: executing set_hostid Lustre: DEBUG MARKER: [ -e /dev/vg_Role_MDS/mdt1 ] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: mkfs.lustre --mgs --fsname=lustre --mdt --index=0 --param=sys.timeout=20 --param=mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype=ldiskfs --device-size=1981808 --mkfsoptions="-b 4096" --reformat /dev/vg_Role_MDS/mdt1 LDISKFS-fs (dm-0): mounted filesystem f1be7494-9219-4a68-9eeb-469ba3c1059e r/w with ordered data mode. Quota mode: journalled. LDISKFS-fs (dm-0): unmounting filesystem f1be7494-9219-4a68-9eeb-469ba3c1059e. Autotest: Test running for 135 minutes (lustre-reviews_review-dne-part-2_111482.30) Autotest: Test running for 140 minutes (lustre-reviews_review-dne-part-2_111482.30) | Link to test |