Match messages in logs (every line would be required to be present in log output Copy from "Messages before crash" column below): | |
Match messages in full crash (every line would be required to be present in crash log output Copy from "Full Crash" column below): | |
Limit to a test: (Copy from below "Failing text"): | |
Delete these reports as invalid (real bug in review or some such) | |
Bug or comment: | |
Extra info: |
Failing Test | Full Crash | Messages before crash | Comment |
---|---|---|---|
recovery-small test 111: mdd setup fail should not cause umount oops | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ldlm_bl_03:16713] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc i2c_piix4 virtio_balloon joydev pcspkr ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel serio_raw virtio_blk virtio_net net_failover failover CPU: 0 PID: 16713 Comm: ldlm_bl_03 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.53.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 ee 15 2e e8 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0000:ffffba31c7c53d68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffba31c19b1008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffba31c1985000 RSI: ffffba31c7c53da0 RDI: ffff9de444eda100 RBP: ffffffffc0d10f10 R08: 0000000000000c62 R09: 000000000000000e R10: ffff9de473e7f000 R11: ffff9de473e7eb38 R12: 0000000000000000 R13: 0000000000000000 R14: ffff9de444eda100 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9de4ffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f9b27340030 CR3: 0000000070410002 CR4: 00000000001706f0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [obdclass] ? cfs_hash_for_each_relax+0x172/0x480 [obdclass] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [obdclass] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6e4/0x960 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x400/0x400 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 16713 Comm: ldlm_bl_03 Kdump: loaded Tainted: G OEL -------- - - 4.18.0-553.53.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? syscall_return_via_sysret+0x6e/0x94 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] | Lustre: DEBUG MARKER: lctl set_param fail_loc=0x151 Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 Lustre: lustre-MDT0000: Not available for connect from 10.240.29.129@tcp (stopping) Lustre: Skipped 9 previous similar messages LustreError: 12753:0:(client.c:1375:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff9de46214b740 x1835375116966400/t0(0) o41->lustre-MDT0002-osp-MDT0000@0@lo:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:QU/200/ffffffff rc 0/-1 job:'osp-pre-2-0.0' uid:0 gid:0 projid:4294967295 LustreError: 12753:0:(client.c:1375:ptlrpc_import_delay_req()) Skipped 2 previous similar messages | Link to test |
sanity test 160a: changelog sanity | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ldlm_bl_01:343647] Modules linked in: dm_flakey obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev i2c_piix4 virtio_balloon pcspkr sunrpc ext4 mbcache jbd2 ata_generic crc32c_intel ata_piix virtio_net libata serio_raw virtio_blk net_failover failover [last unloaded: dm_flakey] CPU: 0 PID: 343647 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.53.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 ee c5 fa e3 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffa1c281833d68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffa1c281f7c008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffa1c281f3d000 RSI: ffffa1c281833da0 RDI: ffff8d57794fef00 RBP: ffffffffc0e45f10 R08: 0000000000000783 R09: 000000000000000e R10: ffff8d5781c04000 R11: ffff8d5781c03659 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8d57794fef00 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8d57ffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000561d6b1eaf44 CR3: 000000007fe10003 CR4: 00000000001706f0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [obdclass] ? cfs_hash_for_each_relax+0x172/0x480 [obdclass] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [obdclass] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6e4/0x960 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x400/0x400 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 343647 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OEL -------- - - 4.18.0-553.53.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? syscall_return_via_sysret+0x6e/0x94 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param mdd.lustre-MDT0000.changelog_mask -n Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.lustre-MDT0000.changelog_mask=+hsm Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 changelog register -n Lustre: lustre-MDD0000: changelog on Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.*.changelog_mask=-MKDIR Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.*.changelog_mask=-CLOSE Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.*.changelog_mask=+MKDIR Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.*.changelog_mask=+CLOSE Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 | Link to test |
sanity-pcc test 4: Auto cache test for mmap | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ldlm_bl_01:98994] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon i2c_piix4 joydev pcspkr sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel serio_raw virtio_net net_failover failover virtio_blk CPU: 1 PID: 98994 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.50.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 3e 09 63 c4 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffab07c1d23d68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffab07c3d1e008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffab07c3cff000 RSI: ffffab07c1d23da0 RDI: ffff8a657b1c5100 RBP: ffffffffc0bc1dd0 R08: 0000000000000096 R09: 000000000000000e R10: ffff8a6574769000 R11: ffff8a6574768094 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8a657b1c5100 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8a65ffd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055d046569f44 CR3: 000000008ba10003 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [obdclass] ? cfs_hash_for_each_relax+0x172/0x480 [obdclass] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [obdclass] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6e4/0x960 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x400/0x400 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 98994 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OEL -------- - - 4.18.0-553.50.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? syscall_return_via_sysret+0x6e/0x94 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: lfs --list-commands Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 | Link to test |
recovery-small test 134: race between failover and search for reply data free slot | watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [ldlm_bl_02:13577] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net virtio_blk serio_raw net_failover failover CPU: 1 PID: 13577 Comm: ldlm_bl_02 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.50.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 fe 69 9d d1 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffa92c80fd3d68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffa92c82ed8008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffa92c82ea7000 RSI: ffffa92c80fd3da0 RDI: ffff8ffb67b8ed00 RBP: 0000000000000000 R08: 0000000000000008 R09: 000000000000000e R10: 000000000000031f R11: 000000000000b84e R12: 0000000000000000 R13: 0000000000000001 R14: ffff8ffb67b8ed00 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8ffbffd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0dcb81f010 CR3: 0000000078c10003 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cfs_hash_for_each_relax+0x17b/0x480 [obdclass] ? cfs_hash_for_each_relax+0x172/0x480 [obdclass] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [obdclass] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6e4/0x960 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x400/0x400 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 13577 Comm: ldlm_bl_02 Kdump: loaded Tainted: G OEL -------- - - 4.18.0-553.50.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? syscall_return_via_sysret+0x6e/0x94 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] | Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-106vm3.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: onyx-106vm3.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x722 fail_val=5 Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 Lustre: lustre-MDT0000: Not available for connect from 10.240.28.185@tcp (stopping) Lustre: Skipped 21 previous similar messages LustreError: lustre-MDT0000-osp-MDT0002: operation mds_statfs to node 0@lo failed: rc = -107 LustreError: Skipped 6 previous similar messages Lustre: lustre-MDT0000-osp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 19 previous similar messages | Link to test |
sanity-pfl test 9: Replay layout extend object instantiation | watchdog: BUG: soft lockup - CPU#1 stuck for 21s! [ldlm_bl_02:11449] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul virtio_balloon ghash_clmulni_intel joydev pcspkr i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic virtio_net ata_piix libata crc32c_intel serio_raw net_failover failover virtio_blk CPU: 1 PID: 11449 Comm: ldlm_bl_02 Kdump: loaded Tainted: G OE --------- - - 4.18.0-513.24.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs] Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 14 49 8b 46 38 48 8d 74 24 30 4c 89 f7 48 8b 00 e8 76 6b 3f eb 48 85 c0 0f 84 e6 01 00 00 <48> 8b 18 48 85 db 0f 84 c0 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0000:ffffa0ebc1013d70 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffa0ebc1e43008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffa0ebc1e07000 RSI: ffffa0ebc1013da0 RDI: ffff8aeeb063a100 RBP: 0000000000000000 R08: ffffa0ebc1013d38 R09: ffffa0ebc1013d40 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000001 R14: ffff8aeeb063a100 R15: 0000000000000002 FS: 0000000000000000(0000) GS:ffff8aef3fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000557997e03990 CR3: 000000003e210003 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cfs_hash_for_each_relax+0x173/0x460 [libcfs] ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc] ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc] cfs_hash_for_each_nolock+0x11f/0x1f0 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x172/0x180 [ptlrpc] ldlm_bl_thread_main+0x70c/0x930 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 11449 Comm: ldlm_bl_02 Kdump: loaded Tainted: G OEL --------- - - 4.18.0-513.24.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? __switch_to_asm+0x11/0x80 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs] | Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds1_flakey --table "0 4194304 flakey 252:0 0 0 1800 1 drop_writes" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 Lustre: lustre-MDT0000: Not available for connect from 10.240.25.213@tcp (stopping) Lustre: lustre-MDT0000: Not available for connect from 10.240.25.212@tcp (stopping) Lustre: Skipped 2 previous similar messages | Link to test |
sanity-lfsck test 44: umount while lfsck is stopping | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ldlm_bl_01:363922] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) zfs(POE) spl(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev virtio_balloon i2c_piix4 ext4 mbcache jbd2 ata_generic ata_piix libata virtio_net crc32c_intel net_failover serio_raw virtio_blk failover [last unloaded: obdecho] CPU: 1 PID: 363922 Comm: ldlm_bl_01 Kdump: loaded Tainted: P OE -------- - - 4.18.0-553.50.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 ce 4a 40 f8 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffb36702047d68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffb367055e8008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffb367055d1000 RSI: ffffb36702047da0 RDI: ffff986b545c3100 RBP: 0000000000000000 R08: ffff986bbfd34278 R09: 000000000000000e R10: 0000000000000213 R11: 000000000000b981 R12: 0000000000000000 R13: 0000000000000001 R14: ffff986b545c3100 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff986bbfd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a6ce3e6437 CR3: 000000005a410006 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cfs_hash_for_each_relax+0x17b/0x480 [obdclass] ? cfs_hash_for_each_relax+0x172/0x480 [obdclass] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [obdclass] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6e4/0x960 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x400/0x400 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 363922 Comm: ldlm_bl_01 Kdump: loaded Tainted: P OEL -------- - - 4.18.0-553.50.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? syscall_return_via_sysret+0x6e/0x94 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] | Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_val=3 fail_loc=0x1600 Lustre: DEBUG MARKER: /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r LustreError: 377721:0:(lfsck_engine.c:836:lfsck_master_oit_engine()) cfs_fail_timeout id 1600 sleeping for 3000ms LustreError: 377721:0:(lfsck_engine.c:836:lfsck_master_oit_engine()) Skipped 3 previous similar messages Lustre: DEBUG MARKER: /usr/sbin/lctl lfsck_stop -M lustre-MDT0000 Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 LustreError: 377721:0:(lfsck_engine.c:836:lfsck_master_oit_engine()) cfs_fail_timeout id 1600 awake | Link to test |
sanity test 804: verify agent entry for remote entry | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ldlm_bl_01:549101] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 joydev virtio_balloon pcspkr sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_blk serio_raw virtio_net net_failover failover [last unloaded: obdecho] CPU: 1 PID: 549101 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.40.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 ae b5 1e c3 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffb4c582413d70 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffb4c58610b008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffb4c5860cf000 RSI: ffffb4c582413da8 RDI: ffff8ac940308f00 RBP: ffffffffc0da6950 R08: 0000000000000008 R09: 000000000000000e R10: ffff8ac96bc0c000 R11: ffff8ac96bc0b228 R12: 0000000000000000 R13: ffff8ac964b43b64 R14: ffff8ac940308f00 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8ac9bfd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055e0eab9841c CR3: 0000000012c10002 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x15f/0x170 [ptlrpc] ldlm_bl_thread_main+0x73d/0x880 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3e0/0x3e0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 549101 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OEL -------- - - 4.18.0-553.40.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? syscall_return_via_sysret+0x6e/0x94 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] | Lustre: DEBUG MARKER: debugfs -c -R 'ls /REMOTE_PARENT_DIR' /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: debugfs -c -R 'ls /REMOTE_PARENT_DIR' /dev/mapper/mds3_flakey Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 Lustre: lustre-MDT0000: Not available for connect from 10.240.23.85@tcp (stopping) Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0000: Not available for connect from 10.240.23.86@tcp (stopping) Lustre: 549121:0:(service.c:2177:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s req@000000008151d895 x1828382061764160/t0(0) o400-><?>@<?>:0/0 lens 224/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 job:'kworker.0' Lustre: lustre-MDT0000: Not available for connect from 10.240.23.85@tcp (stopping) Lustre: Skipped 8 previous similar messages LustreError: 11-0: lustre-MDT0000-osp-MDT0002: operation mds_statfs to node 0@lo failed: rc = -107 LustreError: Skipped 1 previous similar message Lustre: lustre-MDT0000-osp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0000: Not available for connect from 10.240.23.85@tcp (stopping) Lustre: Skipped 15 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.23.85@tcp (stopping) Lustre: Skipped 15 previous similar messages LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.23.86@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 14 previous similar messages Lustre: server umount lustre-MDT0000 complete LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.51@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 10 previous similar messages LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.51@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 13 previous similar messages Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: e2fsck -h Lustre: DEBUG MARKER: e2fsck -d -v -t -t -f -n /dev/mapper/mds1_flakey -m8 LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.23.87@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 22 previous similar messages LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.51@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 59 previous similar messages Lustre: 13063:0:(client.c:2336:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1743693650/real 1743693650] req@000000009896d7cd x1828382061771584/t0(0) o400->MGC10.240.28.50@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1743693694 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker.0' LustreError: 166-1: MGC10.240.28.50@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.51@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 112 previous similar messages Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1 Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre -o localrecov,skpath=/tmp/test-framework-keys /dev/mapper/mds1_flakey /mnt/lustre-mds1 LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc LustreError: 857804:0:(import.c:361:ptlrpc_invalidate_import()) MGS: timeout waiting for callback (1 != 0) LustreError: 857804:0:(import.c:387:ptlrpc_invalidate_import()) @@@ still on sending list req@000000005ea29fd0 x1828382061790080/t0(0) o250->MGC10.240.28.50@tcp@0@lo:26/25 lens 520/544 e 0 to 0 dl 1743693727 ref 1 fl Rpc:NQr/0/ffffffff rc 0/-1 job:'kworker.0' LustreError: 857804:0:(import.c:401:ptlrpc_invalidate_import()) MGS: Unregistering RPCs found (0). Network is sluggish? Waiting for them to error out. Lustre: Evicted from MGS (at 10.240.28.50@tcp) after server handle changed from 0x0 to 0xcd04847fe4b8f86 Lustre: MGC10.240.28.50@tcp: Connection restored to 10.240.28.50@tcp (at 0@lo) Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0000: Not available for connect from 10.240.28.51@tcp (not set up) Lustre: Skipped 16 previous similar messages Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0000: in recovery but waiting for the first client to connect Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 5 clients reconnect Lustre: lustre-MDT0000: Recovery over after 0:07, of 5 clients 5 recovered and 0 were evicted. Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Autotest: Test running for 220 minutes (lustre-b_es-reviews_review-dne-selinux-ssk-part-1_22817.87) Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm7.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm7.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: onyx-99vm7.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: onyx-99vm7.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null LustreError: 11-0: lustre-MDT0001-osp-MDT0002: operation mds_statfs to node 10.240.28.51@tcp failed: rc = -107 Lustre: lustre-MDT0001-osp-MDT0002: Connection to lustre-MDT0001 (at 10.240.28.51@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 1 previous similar message Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm8.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: onyx-99vm8.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds3 Lustre: Failing over lustre-MDT0002 Lustre: lustre-MDT0001-osp-MDT0000: Connection restored to (at 10.240.28.51@tcp) Lustre: Skipped 2 previous similar messages Lustre: lustre-MDT0002: Not available for connect from 10.240.28.51@tcp (stopping) Lustre: Skipped 3 previous similar messages | Link to test |
sanity test 60a: llog_test run from kernel module and test llog_reader | watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [ldlm_bl_01:44113] Modules linked in: obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 intel_rapl_msr auth_rpcgss intel_rapl_common nfsv4 crct10dif_pclmul crc32_pclmul dns_resolver nfs ghash_clmulni_intel lockd grace fscache joydev pcspkr i2c_piix4 virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net serio_raw net_failover virtio_blk failover [last unloaded: llog_test] CPU: 1 PID: 44113 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.44.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 4e de b0 ee 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffbbab8174bd68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffbbab8274f008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffbbab82713000 RSI: ffffbbab8174bda0 RDI: ffff995283d9f600 RBP: ffffffffc0ce0290 R08: 0000000000000785 R09: 000000000000000e R10: ffff9952b4e2b000 R11: ffff9952b4e2a65b R12: 0000000000000000 R13: 0000000000000000 R14: ffff995283d9f600 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff99533fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055d16e43dda0 CR3: 0000000070210004 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [obdclass] ? cfs_hash_for_each_relax+0x172/0x480 [obdclass] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [obdclass] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6e4/0x960 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x400/0x400 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 44113 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OEL -------- - - 4.18.0-553.44.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? syscall_return_via_sysret+0x6e/0x94 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] | Lustre: DEBUG MARKER: ! which run-llog.sh &> /dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_60 run 4146 - from kernel mode Lustre: DEBUG MARKER: test_60 run 4146 - from kernel mode Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /dev/null Lustre: DEBUG MARKER: bash run-llog.sh Lustre: 191345:0:(llog_test.c:2295:llog_test_setup()) Setup llog-test device over MGS device Lustre: 191345:0:(llog_test.c:96:llog_test_1()) 1a: create a log with name: de1119b9 Lustre: 191345:0:(llog_test.c:113:llog_test_1()) 1b: close newly-created log Lustre: 191345:0:(llog_test.c:144:llog_test_2()) 2a: re-open a log with name: de1119b9 Lustre: 191345:0:(llog_test.c:164:llog_test_2()) 2b: create a log without specified NAME & LOGID Lustre: 191345:0:(llog_test.c:182:llog_test_2()) 2b: write 1 llog records, check llh_count Lustre: 191345:0:(llog_test.c:195:llog_test_2()) 2c: re-open the log by LOGID and verify llh_count Lustre: 191345:0:(llog_test.c:242:llog_test_2()) 2d: destroy this log Lustre: 191345:0:(llog_test.c:402:llog_test_3()) 3a: write 1023 fixed-size llog records Lustre: 191345:0:(llog_test.c:366:llog_test3_process()) test3: processing records from index 501 to the end Lustre: 191345:0:(llog_test.c:376:llog_test3_process()) test3: total 525 records processed with 0 paddings Lustre: 191345:0:(llog_test.c:366:llog_test3_process()) test3: processing records from index 501 to the end Lustre: 191345:0:(llog_test.c:458:llog_test_3()) 3b: write 566 variable size llog records Lustre: 191345:0:(llog_test.c:366:llog_test3_process()) test3: processing records from index 1026 to the end Lustre: 191345:0:(llog_test.c:376:llog_test3_process()) test3: total 568 records processed with 7 paddings Lustre: 191345:0:(llog_test.c:376:llog_test3_process()) Skipped 1 previous similar message Lustre: 191345:0:(llog_test.c:366:llog_test3_process()) test3: processing records from index 1026 to the end Lustre: 191345:0:(llog_test.c:530:llog_test_3()) 3c: write records with variable size until BITMAP_SIZE, return -ENOSPC Autotest: Test running for 175 minutes (lustre-reviews_review-ldiskfs-arm_112182.38) Lustre: 191345:0:(llog_test.c:553:llog_test_3()) 3c: wrote 63962 more records before end of llog is reached Lustre: 191345:0:(llog_test.c:582:llog_test_4()) 4a: create a catalog log with name: de1119ba Lustre: 191345:0:(llog_test.c:597:llog_test_4()) 4b: write 1 record into the catalog Lustre: 191345:0:(llog_test.c:624:llog_test_4()) 4c: cancel 1 log record Lustre: 191345:0:(llog_test.c:636:llog_test_4()) 4d: write 64767 more log records Lustre: 191345:0:(llog_test.c:652:llog_test_4()) 4e: add 5 large records, one record per block Lustre: 191345:0:(llog_test.c:672:llog_test_4()) 4f: put newly-created catalog Lustre: 191345:0:(llog_test.c:771:llog_test_5()) 5a: re-open catalog by id Lustre: 191345:0:(llog_test.c:784:llog_test_5()) 5b: print the catalog entries.. we expect 2 Lustre: 191410:0:(llog_test.c:701:cat_print_cb()) seeing record at index 1 - [0x1:0x34:0x0] in log [0xa:0x10:0x0] Lustre: 191345:0:(llog_test.c:796:llog_test_5()) 5c: Cancel 64767 records, see one log zapped Lustre: 191345:0:(llog_test.c:804:llog_test_5()) 5c: print the catalog entries.. we expect 1 Lustre: 191411:0:(llog_test.c:701:cat_print_cb()) seeing record at index 2 - [0x1:0x35:0x0] in log [0xa:0x10:0x0] Lustre: 191411:0:(llog_test.c:701:cat_print_cb()) Skipped 1 previous similar message Lustre: 191345:0:(llog_test.c:816:llog_test_5()) 5d: add 1 record to the log with many canceled empty pages Lustre: 191345:0:(llog_test.c:824:llog_test_5()) 5e: print plain log entries.. expect 6 Lustre: 191345:0:(llog_test.c:836:llog_test_5()) 5f: print plain log entries reversely.. expect 6 Lustre: 191345:0:(llog_test.c:850:llog_test_5()) 5g: close re-opened catalog Lustre: 191345:0:(llog_test.c:880:llog_test_6()) 6a: re-open log de1119b9 using client API Lustre: MGS: non-config logname received: de1119b9 Lustre: 191345:0:(llog_test.c:912:llog_test_6()) 6b: process log de1119b9 using client API Lustre: 191345:0:(llog_test.c:916:llog_test_6()) 6b: processed 63962 records Lustre: 191345:0:(llog_test.c:923:llog_test_6()) 6c: process log de1119b9 reversely using client API Lustre: 191345:0:(llog_test.c:927:llog_test_6()) 6c: processed 63962 records Lustre: 191345:0:(llog_test.c:1075:llog_test_7()) 7a: test llog_logid_rec Lustre: 191345:0:(llog_test.c:1086:llog_test_7()) 7b: test llog_unlink64_rec Lustre: 191345:0:(llog_test.c:1097:llog_test_7()) 7c: test llog_setattr64_rec Lustre: 191345:0:(llog_test.c:1108:llog_test_7()) 7d: test llog_size_change_rec Lustre: 191345:0:(llog_test.c:1119:llog_test_7()) 7e: test llog_changelog_rec Lustre: 191345:0:(llog_test.c:1026:llog_test_7_sub()) 7_sub: records are not aligned, written 64071 from 64767 Lustre: 191345:0:(llog_test.c:1131:llog_test_7()) 7f: test llog_changelog_user_rec2 Lustre: 191345:0:(llog_test.c:1026:llog_test_7_sub()) 7_sub: records are not aligned, written 64139 from 64767 Lustre: 191345:0:(llog_test.c:1142:llog_test_7()) 7g: test llog_gen_rec Lustre: 191345:0:(llog_test.c:1153:llog_test_7()) 7h: test llog_setattr64_rec_v2 Lustre: 191345:0:(llog_test.c:1026:llog_test_7_sub()) 7_sub: records are not aligned, written 64071 from 64767 Lustre: 191345:0:(llog_test.c:1260:llog_test_8()) 8a: fill the first plain llog Lustre: 191345:0:(llog_test.c:1289:llog_test_8()) 8b: first llog [0x1:0x41:0x0] Lustre: 191345:0:(llog_test.c:1307:llog_test_8()) 8b: fill the second plain llog Lustre: 191345:0:(llog_test.c:1331:llog_test_8()) 8b: pin llog [0x1:0x43:0x0] Lustre: 191345:0:(llog_test.c:1334:llog_test_8()) 8b: clean first llog record in catalog Lustre: 191345:0:(llog_test.c:1347:llog_test_8()) 8c: corrupt first chunk in the middle Lustre: 191345:0:(llog_test.c:1350:llog_test_8()) 8c: corrupt second chunk at start Lustre: 191345:0:(llog_test.c:1353:llog_test_8()) 8d: count survived records LustreError: 191345:0:(llog.c:468:llog_verify_record()) MGS: [0xa:0x10:0x0] rec type=0 idx=0 len=0, magic is bad Lustre: 191345:0:(llog_test.c:1383:llog_test_8()) 8d: close re-opened catalog Lustre: 191345:0:(llog_test.c:1446:llog_test_9()) 9a: test llog_logid_rec Lustre: 191345:0:(llog_test.c:1430:llog_test_9_sub()) 9_sub: record type 1064553b in log 0x1:0x45:0x0 Lustre: 191345:0:(llog_test.c:1457:llog_test_9()) 9b: test llog_obd_cfg_rec Lustre: 191345:0:(llog_test.c:1468:llog_test_9()) 9c: test llog_changelog_rec Lustre: 191345:0:(llog_test.c:1480:llog_test_9()) 9d: test llog_changelog_user_rec2 Lustre: 191345:0:(llog_test.c:1581:llog_test_10()) 10a: create a catalog log with name: de1119bb Lustre: 191345:0:(llog_test.c:1611:llog_test_10()) 10b: write 64767 log records Lustre: 191345:0:(llog_test.c:1637:llog_test_10()) 10c: write 129534 more log records Lustre: 191345:0:(llog_test.c:1669:llog_test_10()) 10c: write 64767 more log records Lustre: 191345:0:(llog_cat.c:80:llog_cat_new_log()) MGS: there are no more free slots in catalog de1119bb Lustre: 191345:0:(llog_test.c:1696:llog_test_10()) 10c: wrote 64011 records then 756 failed with ENOSPC Lustre: 191345:0:(llog_test.c:1715:llog_test_10()) 10d: Cancel 64767 records, see one log zapped Lustre: 191345:0:(llog_test.c:1729:llog_test_10()) 10d: print the catalog entries.. we expect 3 Lustre: 191423:0:(llog_test.c:701:cat_print_cb()) seeing record at index 2 - [0x1:0x4a:0x0] in log [0xa:0x11:0x0] Lustre: 191345:0:(llog_test.c:1759:llog_test_10()) 10e: write 64767 more log records Lustre: 191345:0:(llog_cat.c:80:llog_cat_new_log()) MGS: there are no more free slots in catalog de1119bb Lustre: 191345:0:(llog_cat.c:80:llog_cat_new_log()) Skipped 755 previous similar messages Lustre: 191345:0:(llog_test.c:1786:llog_test_10()) 10e: wrote 64578 records then 189 failed with ENOSPC Lustre: 191345:0:(llog_test.c:1788:llog_test_10()) 10e: print the catalog entries.. we expect 4 Lustre: 191345:0:(llog_cat.c:968:llog_cat_process_or_fork()) MGS: catlog [0xa:0x11:0x0] crosses index zero Lustre: 191345:0:(llog_test.c:701:cat_print_cb()) seeing record at index 2 - [0x1:0x4a:0x0] in log [0xa:0x11:0x0] Lustre: 191345:0:(llog_test.c:701:cat_print_cb()) Skipped 2 previous similar messages Lustre: 191345:0:(llog_test.c:1825:llog_test_10()) 10e: catalog successfully wrap around, last_idx 1, first 1 Lustre: 191345:0:(llog_test.c:1842:llog_test_10()) 10f: Cancel 64767 records, see one log zapped Lustre: 191345:0:(llog_test.c:1856:llog_test_10()) 10f: print the catalog entries.. we expect 3 Lustre: 191345:0:(llog_cat.c:968:llog_cat_process_or_fork()) MGS: catlog [0xa:0x11:0x0] crosses index zero Lustre: 191345:0:(llog_cat.c:968:llog_cat_process_or_fork()) Skipped 1 previous similar message Lustre: 191345:0:(llog_test.c:1887:llog_test_10()) 10f: write 64767 more log records Lustre: 191345:0:(llog_cat.c:80:llog_cat_new_log()) MGS: there are no more free slots in catalog de1119bb Lustre: 191345:0:(llog_cat.c:80:llog_cat_new_log()) Skipped 188 previous similar messages Lustre: 191345:0:(llog_test.c:1914:llog_test_10()) 10f: wrote 64578 records then 189 failed with ENOSPC Lustre: 191345:0:(llog_test.c:1961:llog_test_10()) 10g: Cancel 64767 records, see one log zapped Lustre: 191345:0:(llog_cat.c:968:llog_cat_process_or_fork()) MGS: catlog [0xa:0x11:0x0] crosses index zero Lustre: 191345:0:(llog_test.c:1973:llog_test_10()) 10g: print the catalog entries.. we expect 3 Lustre: 191345:0:(llog_cat.c:968:llog_cat_process_or_fork()) MGS: catlog [0xa:0x11:0x0] crosses index zero Lustre: 191345:0:(llog_test.c:701:cat_print_cb()) seeing record at index 4 - [0x1:0x4c:0x0] in log [0xa:0x11:0x0] Lustre: 191345:0:(llog_test.c:701:cat_print_cb()) Skipped 6 previous similar messages Lustre: 191345:0:(llog_test.c:2003:llog_test_10()) 10g: Cancel 64767 records, see one log zapped Lustre: 191345:0:(llog_test.c:2017:llog_test_10()) 10g: print the catalog entries.. we expect 2 Lustre: 191345:0:(llog_test.c:2055:llog_test_10()) 10g: Cancel 64767 records, see one log zapped Lustre: 191345:0:(llog_test.c:2069:llog_test_10()) 10g: print the catalog entries.. we expect 1 Lustre: 191345:0:(llog_test.c:2095:llog_test_10()) 10g: llh_cat_idx has also successfully wrapped! Lustre: 191425:0:(llog_test.c:1540:cat_check_old_cb()) seeing record at index 2 - [0x1:0x4e:0x0] in log [0xa:0x11:0x0] Lustre: 191345:0:(llog_test.c:2119:llog_test_10()) 10h: write 64767 more log records LustreError: 191345:0:(llog_osd.c:693:llog_osd_write_rec()) cfs_race id 1317 sleeping LustreError: 191425:0:(llog.c:638:llog_process_thread()) cfs_fail_race id 1317 waking LustreError: 191345:0:(llog_osd.c:693:llog_osd_write_rec()) cfs_fail_race id 1317 awake: rc=4490 LustreError: 191425:0:(llog.c:638:llog_process_thread()) cfs_fail_race id 1317 waking Lustre: 191425:0:(llog_test.c:1540:cat_check_old_cb()) seeing record at index 3 - [0x1:0x4f:0x0] in log [0xa:0x11:0x0] LustreError: 191345:0:(llog_osd.c:693:llog_osd_write_rec()) cfs_fail_race id 1317 waking Lustre: 191345:0:(llog_test.c:2146:llog_test_10()) 10h: wrote 64767 records then 0 failed with ENOSPC Lustre: 191345:0:(llog_test.c:2159:llog_test_10()) 10: put newly-created catalog Lustre: DEBUG MARKER: /usr/sbin/lctl dk Lustre: DEBUG MARKER: which llog_reader 2> /dev/null Lustre: DEBUG MARKER: ls -d /usr/sbin/llog_reader Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 Lustre: lustre-MDT0000: Not available for connect from 10.240.27.151@tcp (stopping) | Link to test |
sanity test 133f: Check reads/writes of client lustre proc files with bad area io | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ldlm_bl_01:12733] Modules linked in: obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 intel_rapl_msr dns_resolver intel_rapl_common nfs crct10dif_pclmul lockd grace fscache crc32_pclmul ghash_clmulni_intel pcspkr i2c_piix4 joydev virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata virtio_net net_failover crc32c_intel serio_raw failover virtio_blk [last unloaded: dm_flakey] CPU: 1 PID: 12733 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.44.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 4e 0e 7d c9 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffc108c103fd68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffc108c1f14008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffc108c1ee1000 RSI: ffffc108c103fda0 RDI: ffff9b797860de00 RBP: ffffffffc0c76290 R08: 0000000000000b06 R09: 000000000000000e R10: ffff9b7969e99000 R11: ffff9b7969e989dc R12: 0000000000000000 R13: 0000000000000000 R14: ffff9b797860de00 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9b79ffd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055ec238a14c0 CR3: 000000002f010005 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [obdclass] ? cfs_hash_for_each_relax+0x172/0x480 [obdclass] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [obdclass] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6e4/0x960 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x400/0x400 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 12733 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OEL -------- - - 4.18.0-553.44.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? syscall_return_via_sysret+0x6e/0x94 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] | Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 Lustre: lustre-MDT0000: Not available for connect from 10.240.28.47@tcp (stopping) Lustre: Skipped 3 previous similar messages Lustre: lustre-MDT0000-osp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 10 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) Lustre: Skipped 11 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.28.47@tcp (stopping) Lustre: Skipped 1 previous similar message LustreError: 13023:0:(ldlm_lib.c:1103:target_handle_connect()) lustre-MDT0000: not available for connect from 10.240.24.213@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 13023:0:(ldlm_lib.c:1103:target_handle_connect()) Skipped 10 previous similar messages LustreError: 391633:0:(ldlm_lib.c:1103:target_handle_connect()) lustre-MDT0000: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 391633:0:(ldlm_lib.c:1103:target_handle_connect()) Skipped 4 previous similar messages Lustre: server umount lustre-MDT0000 complete LustreError: 13024:0:(ldlm_lib.c:1103:target_handle_connect()) lustre-MDT0000: not available for connect from 10.240.28.47@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 13024:0:(ldlm_lib.c:1103:target_handle_connect()) Skipped 1 previous similar message Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup remove /dev/mapper/mds1_flakey LustreError: 390261:0:(ldlm_lib.c:1103:target_handle_connect()) lustre-MDT0000: not available for connect from 10.240.28.47@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 390261:0:(ldlm_lib.c:1103:target_handle_connect()) Skipped 12 previous similar messages Lustre: DEBUG MARKER: dmsetup mknodes >/dev/null 2>&1 Lustre: DEBUG MARKER: modprobe -r dm-flakey LustreError: 14163:0:(ldlm_lockd.c:2594:ldlm_cancel_handler()) ldlm_cancel from 10.240.28.47@tcp arrived at 1743633655 with bad export cookie 7738126123090489695 LustreError: 391633:0:(ldlm_lib.c:1103:target_handle_connect()) lustre-MDT0000: not available for connect from 10.240.28.47@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 391633:0:(ldlm_lib.c:1103:target_handle_connect()) Skipped 13 previous similar messages Lustre: 12688:0:(client.c:2346:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1743633643/real 1743633643] req@ffff9b7984109380 x1828323340011264/t0(0) o400->MGC10.240.28.46@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1743633659 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0 LustreError: MGC10.240.28.46@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail LustreError: 389970:0:(ldlm_lib.c:1103:target_handle_connect()) lustre-MDT0000: not available for connect from 10.240.28.47@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 389970:0:(ldlm_lib.c:1103:target_handle_connect()) Skipped 32 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds3 Lustre: lustre-MDT0002: Not available for connect from 10.240.24.213@tcp (stopping) Lustre: Skipped 3 previous similar messages Lustre: lustre-MDT0002: Not available for connect from 10.240.28.47@tcp (stopping) Lustre: Skipped 16 previous similar messages LustreError: 389970:0:(ldlm_lib.c:1103:target_handle_connect()) lustre-MDT0000: not available for connect from 10.240.28.47@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 389970:0:(ldlm_lib.c:1103:target_handle_connect()) Skipped 33 previous similar messages | Link to test |
sanity test 256: Check llog delete for empty and not full state | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ldlm_bl_01:640547] Modules linked in: dm_flakey obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net serio_raw net_failover failover virtio_blk [last unloaded: dm_flakey] CPU: 0 PID: 640547 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.44.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 de a5 70 d3 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffbff9025ebd68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffbff902cea008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffbff902cad000 RSI: ffffbff9025ebda0 RDI: ffff9f19b373ed00 RBP: ffffffffc0ee62e0 R08: 0000000000000008 R09: 000000000000000e R10: ffff9f19f8174000 R11: ffff9f19f8173658 R12: 0000000000000000 R13: 0000000000000000 R14: ffff9f19b373ed00 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9f1a3fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000557e4c8c5488 CR3: 000000001ac10005 CR4: 00000000001706f0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [obdclass] ? cfs_hash_for_each_relax+0x172/0x480 [obdclass] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [obdclass] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6e4/0x960 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x400/0x400 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 640547 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OEL -------- - - 4.18.0-553.44.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? syscall_return_via_sysret+0x6e/0x94 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param mdd.lustre-MDT0000.changelog_mask -n Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.lustre-MDT0000.changelog_mask=+hsm Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 changelog register -n Lustre: lustre-MDD0000: changelog on Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 Lustre: lustre-MDT0000: Not available for connect from 10.240.26.144@tcp (stopping) Lustre: Skipped 7 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.26.142@tcp (stopping) Lustre: Skipped 7 previous similar messages | Link to test |
sanity test 256: Check llog delete for empty and not full state | watchdog: BUG: soft lockup - CPU#1 stuck for 21s! [ldlm_bl_05:743397] Modules linked in: dm_flakey obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 virtio_balloon pcspkr joydev sunrpc ext4 mbcache jbd2 ata_generic ata_piix virtio_net libata crc32c_intel serio_raw net_failover virtio_blk failover [last unloaded: dm_flakey] CPU: 1 PID: 743397 Comm: ldlm_bl_05 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.40.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 ae ee 43 e9 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffa4950401bd68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffa49508bc4008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffa49508bb7000 RSI: ffffa4950401bda0 RDI: ffff96137ea3e200 RBP: 0000000000000000 R08: 0000000000000008 R09: 000000000000000e R10: ffff961376a68000 R11: ffff961376a67c18 R12: 0000000000000000 R13: 0000000000000001 R14: ffff96137ea3e200 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff9613fcd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055631275d6e0 CR3: 00000000ae010002 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cfs_hash_for_each_relax+0x17b/0x480 [obdclass] ? cfs_hash_for_each_relax+0x172/0x480 [obdclass] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [obdclass] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6e4/0x960 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x400/0x400 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 743397 Comm: ldlm_bl_05 Kdump: loaded Tainted: G OEL -------- - - 4.18.0-553.40.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? syscall_return_via_sysret+0x6e/0x94 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass] | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param mdd.lustre-MDT0000.changelog_mask -n Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.lustre-MDT0000.changelog_mask=+hsm Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 changelog register -n Lustre: lustre-MDD0000: changelog on Lustre: Skipped 1 previous similar message Lustre: DEBUG MARKER: /usr/sbin/lctl get_param mdd.lustre-MDT0002.changelog_mask -n Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.lustre-MDT0002.changelog_mask=+hsm Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0002 changelog register -n Lustre: lustre-MDD0002: changelog on Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 Lustre: lustre-MDT0000: Not available for connect from 10.240.30.120@tcp (stopping) Lustre: lustre-MDT0000-lwp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 5 previous similar messages LustreError: lustre-MDT0000-osp-MDT0002: operation mds_statfs to node 0@lo failed: rc = -107 Lustre: lustre-MDT0000: Not available for connect from 10.240.30.120@tcp (stopping) Lustre: Skipped 17 previous similar messages | Link to test |
sanity test 160a: changelog sanity | watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [ldlm_bl_02:341023] Modules linked in: dm_flakey obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic ata_piix crc32c_intel libata virtio_net serio_raw net_failover failover virtio_blk [last unloaded: dm_flakey] CPU: 0 PID: 341023 Comm: ldlm_bl_02 Kdump: loaded Tainted: G W OE -------- - - 4.18.0-553.37.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 0e ff 8e dd 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0000:ffffbc280111bd68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffbc2802e91008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffbc2802e55000 RSI: ffffbc280111bda0 RDI: ffff9c6aab4b5800 RBP: ffffffffc0b54940 R08: 0000000000000784 R09: 000000000000000e R10: ffff9c6a70a7a000 R11: ffff9c6a70a7965a R12: 0000000000000000 R13: 0000000000000000 R14: ffff9c6aab4b5800 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9c6affc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f209fb4c000 CR3: 0000000038810003 CR4: 00000000001706f0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x310/0x310 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? ldlm_lock_mode_downgrade+0x310/0x310 [ptlrpc] ? ldlm_lock_mode_downgrade+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6e4/0x960 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3e0/0x3e0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 341023 Comm: ldlm_bl_02 Kdump: loaded Tainted: G W OEL -------- - - 4.18.0-553.37.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? syscall_return_via_sysret+0x6e/0x94 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param mdd.lustre-MDT0000.changelog_mask -n Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.lustre-MDT0000.changelog_mask=+hsm Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 changelog register -n Lustre: lustre-MDD0000: changelog on Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.*.changelog_mask=-MKDIR Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.*.changelog_mask=-CLOSE Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.*.changelog_mask=+MKDIR Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.*.changelog_mask=+CLOSE Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 Lustre: lustre-MDT0000: Not available for connect from 10.240.28.169@tcp (stopping) Lustre: Skipped 6 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.28.170@tcp (stopping) Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0000: Not available for connect from 10.240.28.168@tcp (stopping) Lustre: Skipped 7 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.28.170@tcp (stopping) | Link to test |
sanity test 804: verify agent entry for remote entry | watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [ldlm_bl_03:389362] Modules linked in: dm_flakey obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 virtio_balloon pcspkr joydev sunrpc ext4 ata_generic mbcache jbd2 ata_piix libata virtio_net crc32c_intel net_failover virtio_blk failover serio_raw [last unloaded: dm_flakey] CPU: 1 PID: 389362 Comm: ldlm_bl_03 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.37.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 0e ff 33 f1 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffa67546fdfd68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffa67543e93008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffa67543e8f000 RSI: ffffa67546fdfda0 RDI: ffff8c72f5901c00 RBP: ffffffffc0bbf940 R08: 0000000000000ce0 R09: 000000000000000e R10: ffff8c72ff850000 R11: ffff8c72ff84fbb6 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8c72f5901c00 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8c737fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fea54f8b000 CR3: 0000000083c10003 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x310/0x310 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? ldlm_lock_mode_downgrade+0x310/0x310 [ptlrpc] ? ldlm_lock_mode_downgrade+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6e4/0x960 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3e0/0x3e0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 389362 Comm: ldlm_bl_03 Kdump: loaded Tainted: G OEL -------- - - 4.18.0-553.37.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? syscall_return_via_sysret+0x6e/0x94 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] | Lustre: DEBUG MARKER: debugfs -c -R 'ls /REMOTE_PARENT_DIR' /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: debugfs -c -R 'ls /REMOTE_PARENT_DIR' /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: debugfs -c -R 'ls /REMOTE_PARENT_DIR' /dev/mapper/mds4_flakey LustreError: lustre-MDT0000-osp-MDT0003: operation mds_statfs to node 10.240.28.45@tcp failed: rc = -107 LustreError: Skipped 1 previous similar message Lustre: lustre-MDT0000-osp-MDT0003: Connection to lustre-MDT0000 (at 10.240.28.45@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: 11161:0:(client.c:2346:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1740006744/real 1740006744] req@ffff8c72e3ed16c0 x1824483322337408/t0(0) o400->MGC10.240.28.45@tcp@10.240.28.45@tcp:26/25 lens 224/224 e 0 to 1 dl 1740006760 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0 LustreError: MGC10.240.28.45@tcp: Connection to MGS (at 10.240.28.45@tcp) was lost; in progress operations using this service will fail Lustre: lustre-MDT0000-lwp-MDT0003: Connection restored to (at 10.240.28.45@tcp) Lustre: Evicted from MGS (at 10.240.28.45@tcp) after server handle changed from 0xadc9f1bca10a4cb7 to 0xadc9f1bca126224e Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm2.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-99vm2.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 Lustre: Failing over lustre-MDT0001 Lustre: lustre-MDT0001: Not available for connect from 10.240.27.159@tcp (stopping) Lustre: server umount lustre-MDT0001 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && LustreError: 392596:0:(ldlm_lib.c:1094:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.27.158@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 392596:0:(ldlm_lib.c:1094:target_handle_connect()) Skipped 4 previous similar messages Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: e2fsck -h Lustre: DEBUG MARKER: e2fsck -d -v -t -t -f -n /dev/mapper/mds2_flakey -m8 LustreError: lustre-MDT0001-osp-MDT0003: operation mds_statfs to node 0@lo failed: rc = -107 LustreError: Skipped 1 previous similar message Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 3 previous similar messages LustreError: 353955:0:(ldlm_lib.c:1094:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.29.130@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 353955:0:(ldlm_lib.c:1094:target_handle_connect()) Skipped 21 previous similar messages LustreError: 352978:0:(ldlm_lib.c:1094:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.27.159@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 352978:0:(ldlm_lib.c:1094:target_handle_connect()) Skipped 39 previous similar messages Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2 Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,acl,no_mbcache,nodelalloc Lustre: 547460:0:(mgc_request_server.c:553:mgc_llog_local_copy()) MGC10.240.28.45@tcp: no remote llog for lustre-sptlrpc, check MGS config Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 5 clients reconnect Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm3.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm3.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-99vm3.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-99vm3.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: lustre-MDT0001: Recovery over after 0:04, of 5 clients 5 recovered and 0 were evicted. Lustre: lustre-MDT0001-osp-MDT0003: Connection restored to 10.240.28.46@tcp (at 0@lo) Lustre: Skipped 4 previous similar messages Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null LustreError: lustre-MDT0002-osp-MDT0003: operation mds_statfs to node 10.240.28.45@tcp failed: rc = -107 Lustre: lustre-MDT0002-osp-MDT0003: Connection to lustre-MDT0002 (at 10.240.28.45@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: lustre-MDT0002-osp-MDT0001: Connection restored to (at 10.240.28.45@tcp) Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm2.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-99vm2.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4 Lustre: Failing over lustre-MDT0003 Lustre: lustre-MDT0003: Not available for connect from 10.240.27.159@tcp (stopping) LustreError: lustre-MDT0003-osp-MDT0001: operation mds_statfs to node 0@lo failed: rc = -107 LustreError: Skipped 1 previous similar message Lustre: lustre-MDT0003-osp-MDT0001: Connection to lustre-MDT0003 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0003: Not available for connect from 10.240.28.45@tcp (stopping) Lustre: Skipped 11 previous similar messages | Link to test |
sanity test 160a: changelog sanity | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ldlm_bl_01:372798] Modules linked in: dm_flakey obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon pcspkr joydev i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel serio_raw virtio_blk virtio_net net_failover failover [last unloaded: dm_flakey] CPU: 1 PID: 372798 Comm: ldlm_bl_01 Kdump: loaded Tainted: G W OE -------- - - 4.18.0-553.37.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 0e 1f e8 c3 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffaf49c190bd68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffaf49c2273008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffaf49c2233000 RSI: ffffaf49c190bda0 RDI: ffff9e64eccabb00 RBP: ffffffffc0b59950 R08: 0000000000000783 R09: 000000000000000e R10: ffff9e6513683000 R11: ffff9e6513682659 R12: 0000000000000000 R13: 0000000000000000 R14: ffff9e64eccabb00 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9e657fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005611657df180 CR3: 0000000041410003 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x310/0x310 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? ldlm_lock_mode_downgrade+0x310/0x310 [ptlrpc] ? ldlm_lock_mode_downgrade+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6e4/0x960 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3e0/0x3e0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 372798 Comm: ldlm_bl_01 Kdump: loaded Tainted: G W OEL -------- - - 4.18.0-553.37.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? syscall_return_via_sysret+0x6e/0x94 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param mdd.lustre-MDT0000.changelog_mask -n Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.lustre-MDT0000.changelog_mask=+hsm Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 changelog register -n Lustre: lustre-MDD0000: changelog on Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.*.changelog_mask=-MKDIR Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.*.changelog_mask=-CLOSE Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.*.changelog_mask=+MKDIR Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.*.changelog_mask=+CLOSE Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 Lustre: lustre-MDT0000: Not available for connect from 10.240.26.141@tcp (stopping) | Link to test |
sanity test 818: unlink with failed llog | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ldlm_bl_05:716348] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon joydev pcspkr i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel serio_raw virtio_net net_failover virtio_blk failover [last unloaded: obdecho] CPU: 1 PID: 716348 Comm: ldlm_bl_05 Kdump: loaded Tainted: G W OE -------- - - 4.18.0-553.16.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 3e 0c 63 cc 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffbc1b0256fd68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffbc1b03b79008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffbc1b03b51000 RSI: ffffbc1b0256fda0 RDI: ffff9ea4b1132f00 RBP: ffffffffc0d5b360 R08: 0000000000000008 R09: 000000000000000e R10: ffff9ea4ea451000 R11: ffff9ea4ea450b3c R12: 0000000000000000 R13: 0000000000000000 R14: ffff9ea4b1132f00 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9ea53fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd497dc4048 CR3: 0000000062410004 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6e4/0x960 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 716348 Comm: ldlm_bl_05 Kdump: loaded Tainted: G W OEL -------- - - 4.18.0-553.16.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? syscall_return_via_sysret+0x6e/0x94 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] | Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 Lustre: lustre-MDT0000: Not available for connect from 10.240.24.214@tcp (stopping) Lustre: Skipped 28 previous similar messages LustreError: lustre-MDT0000-osp-MDT0002: operation mds_statfs to node 0@lo failed: rc = -107 LustreError: Skipped 5 previous similar messages Lustre: lustre-MDT0000-osp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 1 previous similar message | Link to test |
sanity-pcc test 13c: Check auto RW-PCC create caching for UID/GID/ProjID/fname rule | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ldlm_bl_01:2851513] Modules linked in: nfsd nfs_acl osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc i2c_piix4 virtio_balloon pcspkr joydev dm_mod ext4 mbcache jbd2 ata_generic ata_piix libata virtio_net serio_raw crc32c_intel virtio_blk net_failover failover [last unloaded: libcfs] CPU: 1 PID: 2851513 Comm: ldlm_bl_01 Kdump: loaded Tainted: G W OE --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 9e 08 e4 e4 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffa62f41cebd68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffa62f42699008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffa62f42689000 RSI: ffffa62f41cebda0 RDI: ffff9a18398b2c00 RBP: 0000000000000000 R08: 0000000000000008 R09: 000000000000000e R10: 00000000000000d1 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000001 R14: ffff9a18398b2c00 R15: 0000000000000005 FS: 0000000000000000(0000) GS:ffff9a18bfd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4ccf04b010 CR3: 000000004b410005 CR4: 00000000001706e0 Call Trace: ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6e4/0x960 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 2851513 Comm: ldlm_bl_01 Kdump: loaded Tainted: G W OEL --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? __switch_to_asm+0x51/0x80 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: lfs --list-commands Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 Lustre: lustre-MDT0000: Not available for connect from 10.240.25.235@tcp (stopping) Lustre: Skipped 54 previous similar messages | Link to test |
sanityn test complete, duration 10014 sec | watchdog: BUG: soft lockup - CPU#1 stuck for 21s! [ldlm_bl_02:249319] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon pcspkr joydev i2c_piix4 dm_mod ext4 mbcache jbd2 ata_generic ata_piix crc32c_intel libata virtio_net serio_raw net_failover failover virtio_blk CPU: 1 PID: 249319 Comm: ldlm_bl_02 Kdump: loaded Tainted: P W OE -------- - - 4.18.0-553.16.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 3e dc 7e da 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffa5e30189fd68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffa5e303c6b008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffa5e303c55000 RSI: ffffa5e30189fda0 RDI: ffff9ad0ceebe800 RBP: ffffffffc1111430 R08: 000000000000066f R09: 000000000000000e R10: ffff9ad0a6d1d000 R11: ffff9ad0a6d1c66d R12: 0000000000000000 R13: 0000000000000000 R14: ffff9ad0ceebe800 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9ad13fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000562f00e82008 CR3: 000000004a810002 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6e4/0x960 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 249319 Comm: ldlm_bl_02 Kdump: loaded Tainted: P W OEL -------- - - 4.18.0-553.16.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? syscall_return_via_sysret+0x6e/0x94 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] | Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 Lustre: lustre-MDT0000: Not available for connect from 10.240.24.50@tcp (stopping) Lustre: Skipped 3 previous similar messages Lustre: lustre-MDT0000-osp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 1 previous similar message | Link to test |
watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [ldlm_bl= _02:549925] Modules linked in: dm_flakey obdecho(OE) ptlrpc_gss(OE) os= p(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) l= quota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksoc= klnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod r= pcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache int= el_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni= _intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ext4 mbcache jbd2 at= a_generic crc32c_intel ata_piix virtio_net serio_raw net_failover virtio_= blk failover libata [last unloaded: dm_flakey] CPU: 1 PID: 549925 Comm: ldlm_bl_02 Kdump: loaded Tainted:= G W OE -------- - - 4.18.0-553.8.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 4= 8 8d 74 24 38 4c 89 f7 48 8b 00 e8 3e a6 43 fb 48 85 c0 0f 84 f1 01 00 00= <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffff9c8d818c3d68 EFLAGS: 00010282 ORIG_RAX: ffff= ffffffffff13 RAX: ffff9c8d84e20008 RBX: 0000000000000000 RCX: 000000000= 000000e RDX: ffff9c8d84e1f000 RSI: ffff9c8d818c3da0 RDI: ffff8982b= 9dfa000 RBP: ffffffffc0ca53f0 R08: 0000000000000008 R09: 000000000= 000000e R10: ffff8982aede8000 R11: ffff8982aede7659 R12: 000000000= 0000000 R13: 0000000000000000 R14: ffff8982b9dfa000 R15: 000000000= 0000000 FS: 0000000000000000(0000) GS:ffff89833fd00000(0000) knlG= S:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8c4f9ea000 CR3: 0000000055c10006 CR4: 000000000= 00606e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6e4/0x960 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 549925 Comm: ldlm_bl_02 Kdump: loaded Tainted:= G W OEL -------- - - 4.18.0-553.8.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? __switch_to_asm+0x11/0x80 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] | Link to test | ||
sanity test 804: verify agent entry for remote entry | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ldlm_bl_01:346316] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 joydev pcspkr virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic ata_piix crc32c_intel libata serio_raw virtio_net net_failover failover virtio_blk [last unloaded: dm_flakey] CPU: 1 PID: 346316 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 de 61 14 d7 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffb907c19f3d68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffb907c523e008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffb907c5203000 RSI: ffffb907c19f3da0 RDI: ffff9aa112ab0300 RBP: ffffffffc0c863e0 R08: 000000000000077e R09: 000000000000000e R10: ffff9aa0f29e5000 R11: ffff9aa0f29e4654 R12: 0000000000000000 R13: 0000000000000000 R14: ffff9aa112ab0300 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9aa17fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f75735373e0 CR3: 0000000063210003 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6e4/0x960 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 346316 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OEL -------- - - 4.18.0-553.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? __switch_to_asm+0x11/0x80 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] | Lustre: DEBUG MARKER: debugfs -c -R 'ls /REMOTE_PARENT_DIR' /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: debugfs -c -R 'ls /REMOTE_PARENT_DIR' /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: debugfs -c -R 'ls /REMOTE_PARENT_DIR' /dev/mapper/mds4_flakey LustreError: lustre-MDT0000-osp-MDT0003: operation mds_statfs to node 10.240.28.48@tcp failed: rc = -107 LustreError: Skipped 1 previous similar message Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug all all LustreError: MGC10.240.28.48@tcp: Connection to MGS (at 10.240.28.48@tcp) was lost; in progress operations using this service will fail Lustre: Evicted from MGS (at 10.240.28.48@tcp) after server handle changed from 0xc6782c38a5f5aba0 to 0xc6782c38a60ce77e Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 Lustre: Failing over lustre-MDT0001 Lustre: server umount lustre-MDT0001 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: e2fsck -h Lustre: DEBUG MARKER: e2fsck -d -v -t -t -f -n /dev/mapper/mds2_flakey -m8 LustreError: 346324:0:(ldlm_lib.c:1112:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.28.9@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 346324:0:(ldlm_lib.c:1112:target_handle_connect()) Skipped 20 previous similar messages Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 4 previous similar messages Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2 Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,acl,no_mbcache,nodelalloc Lustre: lustre-MDT0001: Not available for connect from 10.240.28.8@tcp (not set up) Lustre: Skipped 9 previous similar messages Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 5 clients reconnect Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null LustreError: lustre-MDT0002-osp-MDT0001: operation mds_statfs to node 10.240.28.48@tcp failed: rc = -107 LustreError: Skipped 1 previous similar message Lustre: lustre-MDT0001: Recovery over after 0:04, of 5 clients 5 recovered and 0 were evicted. Lustre: lustre-MDT0001-osp-MDT0003: Connection restored to (at 0@lo) Lustre: Skipped 5 previous similar messages Lustre: lustre-MDT0001: haven't heard from client lustre-MDT0002-mdtlov_UUID (at 10.240.28.48@tcp) in 31 seconds. I think it's dead, and I am evicting it. exp ffff9aa0f7665c00, cur 1719496597 expire 1719496567 last 1719496566 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4 Lustre: Failing over lustre-MDT0003 Lustre: lustre-MDT0003: Not available for connect from 10.240.28.9@tcp (stopping) LustreError: lustre-MDT0003-osp-MDT0001: operation mds_statfs to node 0@lo failed: rc = -107 | Link to test |
sanity test 160h: changelog gc thread stop upon umount, orphan records delete | watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [ldlm_bl_02:277952] Modules linked in: obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr virtio_balloon joydev i2c_piix4 ext4 mbcache ata_generic jbd2 ata_piix libata virtio_net crc32c_intel net_failover serio_raw virtio_blk failover [last unloaded: lnet_selftest] CPU: 0 PID: 277952 Comm: ldlm_bl_02 Kdump: loaded Tainted: P OE --------- - - 4.18.0-513.24.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs] Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 14 49 8b 46 38 48 8d 74 24 30 4c 89 f7 48 8b 00 e8 86 a4 db ec 48 85 c0 0f 84 e6 01 00 00 <48> 8b 18 48 85 db 0f 84 c0 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0000:ffff9da842c1fd70 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffff9da844809008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffff9da8447eb000 RSI: ffff9da842c1fda0 RDI: ffff8c67036b6300 RBP: ffffffffc0ff0d50 R08: 00000000000003bd R09: ffff9da842c1fc48 R10: ffff9da842c1fe28 R11: ffff8c6742bb73bb R12: 0000000000000000 R13: 0000000000000000 R14: ffff8c67036b6300 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8c67bfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005557984b9c70 CR3: 0000000018c10001 CR4: 00000000001706f0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc] ? cfs_hash_for_each_relax+0x173/0x460 [libcfs] ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc] ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc] cfs_hash_for_each_nolock+0x11f/0x1f0 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x172/0x180 [ptlrpc] ldlm_bl_thread_main+0x70c/0x930 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 277952 Comm: ldlm_bl_02 Kdump: loaded Tainted: P OEL --------- - - 4.18.0-513.24.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? __switch_to_asm+0x11/0x80 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs] | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param mdd.lustre-MDT0001.changelog_mask -n Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.lustre-MDT0001.changelog_mask=+hsm Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 changelog_register -n Lustre: DEBUG MARKER: /usr/sbin/lctl get_param mdd.lustre-MDT0003.changelog_mask -n Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.lustre-MDT0003.changelog_mask=+hsm Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0003 changelog_register -n Lustre: DEBUG MARKER: /usr/sbin/lctl get_param mdd.lustre-MDT0001.changelog_mask -n Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.lustre-MDT0001.changelog_mask=+hsm Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 changelog_register -n Lustre: DEBUG MARKER: /usr/sbin/lctl get_param mdd.lustre-MDT0003.changelog_mask -n Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.lustre-MDT0003.changelog_mask=+hsm Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0003 changelog_register -n Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.*.changelog_max_idle_time=10 Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.*.changelog_gc=1 Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.*.changelog_min_gc_interval=2 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0001.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0001.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0001.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0001.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0003.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0003.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0003.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0003.changelog_users Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x1316 Lustre: *** cfs_fail_loc=1316, val=0*** Lustre: 277959:0:(mdd_dir.c:895:mdd_changelog_store()) lustre-MDD0001: simulate starting changelog garbage collection Lustre: 277959:0:(mdd_dir.c:895:mdd_changelog_store()) Skipped 1 previous similar message Lustre: 314564:0:(mdd_trans.c:160:mdd_chlg_garbage_collect()) lustre-MDD0001: force deregister of changelog user cl11 idle for 23s with 3 unprocessed records Lustre: 314564:0:(mdd_trans.c:160:mdd_chlg_garbage_collect()) Skipped 1 previous similar message Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4 LustreError: 11-0: lustre-MDT0002-osp-MDT0001: operation mds_statfs to node 10.240.28.48@tcp failed: rc = -107 Lustre: lustre-MDT0002-osp-MDT0001: Connection to lustre-MDT0002 (at 10.240.28.48@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 2 previous similar messages LustreError: 11-0: lustre-MDT0000-lwp-MDT0001: operation mds_disconnect to node 10.240.28.48@tcp failed: rc = -107 LustreError: Skipped 1 previous similar message Lustre: Failing over lustre-MDT0001 LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation mds_disconnect to node 10.240.28.48@tcp failed: rc = -107 Lustre: lustre-MDT0001: Not available for connect from 10.240.24.151@tcp (stopping) Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0001: Not available for connect from 10.240.27.49@tcp (stopping) Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0001: Not available for connect from 10.240.24.152@tcp (stopping) Lustre: Skipped 7 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 10.240.24.151@tcp (stopping) Lustre: lustre-MDT0001: Not available for connect from 10.240.24.151@tcp (stopping) Lustre: Skipped 10 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 10.240.24.152@tcp (stopping) Lustre: Skipped 21 previous similar messages | Link to test |
sanity test 427: Failed DNE2 update request shouldn't corrupt updatelog | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ldlm_bl_02:327878] Modules linked in: dm_flakey obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon sunrpc i2c_piix4 ext4 mbcache jbd2 ata_generic crc32c_intel virtio_net ata_piix libata net_failover failover serio_raw virtio_blk [last unloaded: dm_flakey] CPU: 0 PID: 327878 Comm: ldlm_bl_02 Kdump: loaded Tainted: G OE --------- - - 4.18.0-513.24.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs] Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 14 49 8b 46 38 48 8d 74 24 30 4c 89 f7 48 8b 00 e8 86 64 81 e8 48 85 c0 0f 84 e6 01 00 00 <48> 8b 18 48 85 db 0f 84 c0 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffff9f6881633d70 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffff9f6888672008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffff9f6888641000 RSI: ffff9f6881633da0 RDI: ffff925438d0b900 RBP: ffffffffc0d34d50 R08: 0000000000000882 R09: ffff9f6881633c48 R10: ffff9f6881633e28 R11: ffff9254371f4880 R12: 0000000000000000 R13: 0000000000000000 R14: ffff925438d0b900 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9254bfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005633fc494020 CR3: 0000000081410006 CR4: 00000000001706f0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc] ? cfs_hash_for_each_relax+0x173/0x460 [libcfs] ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc] ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc] cfs_hash_for_each_nolock+0x11f/0x1f0 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x172/0x180 [ptlrpc] ldlm_bl_thread_main+0x70c/0x930 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 327878 Comm: ldlm_bl_02 Kdump: loaded Tainted: G OEL --------- - - 4.18.0-513.24.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? __switch_to_asm+0x11/0x80 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs] | Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x80001708 LustreError: 417350:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 1708 sleeping LustreError: 417348:0:(libcfs_fail.h:180:cfs_race()) cfs_fail_race id 1708 waking LustreError: 417350:0:(libcfs_fail.h:178:cfs_race()) cfs_fail_race id 1708 awake: rc=5000 LustreError: 417348:0:(libcfs_fail.h:180:cfs_race()) Skipped 4 previous similar messages LustreError: 327883:0:(libcfs_fail.h:180:cfs_race()) cfs_fail_race id 1708 waking Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 Lustre: Failing over lustre-MDT0001 Lustre: lustre-MDT0001: Not available for connect from 10.240.25.172@tcp (stopping) Lustre: lustre-MDT0001: Not available for connect from 10.240.25.173@tcp (stopping) Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 50 previous similar messages | Link to test |
sanity test 427: Failed DNE2 update request shouldn't corrupt updatelog | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ldlm_bl_02:329332] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul sunrpc ghash_clmulni_intel virtio_balloon joydev i2c_piix4 pcspkr dm_mod ext4 mbcache jbd2 ata_generic virtio_net ata_piix libata crc32c_intel net_failover serio_raw failover virtio_blk [last unloaded: dm_flakey] CPU: 0 PID: 329332 Comm: ldlm_bl_02 Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs] Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 14 49 8b 46 38 48 8d 74 24 30 4c 89 f7 48 8b 00 e8 66 56 4d f4 48 85 c0 0f 84 e6 01 00 00 <48> 8b 18 48 85 db 0f 84 c0 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffff9d29819fbd70 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffff9d298804a008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffff9d2988041000 RSI: ffff9d29819fbda0 RDI: ffff891babff6f00 RBP: ffffffffc0e5ed50 R08: 00000000000006f3 R09: ffff9d29819fbc48 R10: ffff9d29819fbe28 R11: ffff891bae5f26f1 R12: 0000000000000000 R13: 0000000000000000 R14: ffff891babff6f00 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff891c3fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f13f9ce42e0 CR3: 0000000081810003 CR4: 00000000001706f0 Call Trace: ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc] ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc] cfs_hash_for_each_nolock+0x11f/0x1f0 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x172/0x180 [ptlrpc] ldlm_bl_thread_main+0x70c/0x930 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 329332 Comm: ldlm_bl_02 Kdump: loaded Tainted: G OEL --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? __switch_to_asm+0x51/0x80 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs] | Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x80001708 LustreError: 419527:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 1708 sleeping LustreError: 419525:0:(libcfs_fail.h:180:cfs_race()) cfs_fail_race id 1708 waking LustreError: 419527:0:(libcfs_fail.h:178:cfs_race()) cfs_fail_race id 1708 awake: rc=4998 LustreError: 329339:0:(libcfs_fail.h:180:cfs_race()) cfs_fail_race id 1708 waking Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 LustreError: 419563:0:(llog_cat.c:753:llog_cat_cancel_arr_rec()) lustre-MDT0000-osp-MDT0001: fail to cancel 1 llog-records: rc = -116 LustreError: 419563:0:(llog_cat.c:753:llog_cat_cancel_arr_rec()) Skipped 2 previous similar messages LustreError: 419563:0:(llog_cat.c:789:llog_cat_cancel_records()) lustre-MDT0000-osp-MDT0001: fail to cancel 1 of 1 llog-records: rc = -116 LustreError: 419563:0:(llog_cat.c:789:llog_cat_cancel_records()) Skipped 2 previous similar messages Lustre: Failing over lustre-MDT0001 Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 10 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0001: Not available for connect from 10.240.28.44@tcp (stopping) Lustre: lustre-MDT0001: Not available for connect from 10.240.30.192@tcp (stopping) Lustre: Skipped 11 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 12 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 10.240.30.193@tcp (stopping) Lustre: Skipped 12 previous similar messages | Link to test |
sanity test 133f: Check reads/writes of client lustre proc files with bad area io | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ldlm_bl_04:407559] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon i2c_piix4 joydev pcspkr sunrpc ext4 mbcache jbd2 ata_generic virtio_net crc32c_intel ata_piix libata serio_raw net_failover failover virtio_blk [last unloaded: dm_flakey] CPU: 0 PID: 407559 Comm: ldlm_bl_04 Kdump: loaded Tainted: G OE --------- - - 4.18.0-513.18.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 8e a6 28 da 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffa6a3c3e17d68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffa6a3c554f008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffa6a3c5539000 RSI: ffffa6a3c3e17da0 RDI: ffff9b31c0fafc00 RBP: ffffffffc0d236f0 R08: 00000000000004f7 R09: 000000000000000e R10: 000000000000000f R11: ffff9b31c41f43cd R12: 0000000000000000 R13: 0000000000000000 R14: ffff9b31c0fafc00 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9b323fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdd45f7d000 CR3: 000000000f610001 CR4: 00000000001706f0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6d7/0x950 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 407559 Comm: ldlm_bl_04 Kdump: loaded Tainted: G OEL --------- - - 4.18.0-513.18.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? __switch_to_asm+0x11/0x80 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] | Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 Lustre: lustre-MDT0000: Not available for connect from 10.240.28.48@tcp (stopping) LustreError: 11-0: lustre-MDT0000-osp-MDT0002: operation mds_statfs to node 0@lo failed: rc = -107 LustreError: Skipped 1 previous similar message Lustre: lustre-MDT0000-osp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0000: Not available for connect from 10.240.26.145@tcp (stopping) Lustre: Skipped 9 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.28.48@tcp (stopping) Lustre: Skipped 3 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.26.145@tcp (stopping) Lustre: Skipped 9 previous similar messages Lustre: server umount lustre-MDT0000 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.26.145@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 21 previous similar messages Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup remove /dev/mapper/mds1_flakey LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.48@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 2 previous similar messages Lustre: DEBUG MARKER: dmsetup mknodes >/dev/null 2>&1 Lustre: DEBUG MARKER: modprobe -r dm-flakey LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.26.145@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 10 previous similar messages LustreError: 12998:0:(ldlm_lockd.c:2594:ldlm_cancel_handler()) ldlm_cancel from 10.240.28.48@tcp arrived at 1712940342 with bad export cookie 8664967581398574734 LustreError: 12998:0:(ldlm_lockd.c:2594:ldlm_cancel_handler()) Skipped 4 previous similar messages LustreError: 11-0: lustre-MDT0001-osp-MDT0002: operation mds_statfs to node 10.240.28.48@tcp failed: rc = -107 LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.26.145@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 12 previous similar messages Lustre: 11760:0:(client.c:2340:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1712940337/real 1712940337] req@ffff9b31be22f740 x1796133276873536/t0(0) o400->MGC10.240.28.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1712940353 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0 LustreError: 166-1: MGC10.240.28.47@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.48@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 27 previous similar messages Autotest: Test running for 240 minutes (lustre-reviews_review-dne-selinux-ssk-part-1_104115.32) LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.26.145@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 37 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds3 Lustre: lustre-MDT0002: Not available for connect from 10.240.28.48@tcp (stopping) Lustre: Skipped 6 previous similar messages | Link to test |
sanity test 427: Failed DNE2 update request shouldn't corrupt updatelog | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ldlm_bl_03:644106] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon pcspkr joydev i2c_piix4 ext4 mbcache jbd2 ata_generic crc32c_intel ata_piix virtio_net serio_raw virtio_blk libata net_failover failover CPU: 1 PID: 644106 Comm: ldlm_bl_03 Kdump: loaded Tainted: P OE --------- - - 4.18.0-513.9.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 2e 72 ec e3 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffa4964875bd68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffa4964c507008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffa4964c4f9000 RSI: ffffa4964875bda0 RDI: ffff8c86c04e4000 RBP: ffffffffc133a2c0 R08: 0000000000000008 R09: 000000000000000e R10: ffffa4964875bd68 R11: 0000000000000010 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8c86c04e4000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8c873fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055bb6f998ec0 CR3: 000000008ac10002 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6d7/0x950 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 644106 Comm: ldlm_bl_03 Kdump: loaded Tainted: P OEL --------- - - 4.18.0-513.9.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? __switch_to_asm+0x51/0x80 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] | Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x80001708 LustreError: 839930:0:(osp_md_object.c:1187:osp_write_interpreter()) cfs_race id 1708 sleeping LustreError: 839928:0:(osp_md_object.c:1187:osp_write_interpreter()) cfs_fail_race id 1708 waking LustreError: 839928:0:(osp_md_object.c:1187:osp_write_interpreter()) Skipped 4 previous similar messages LustreError: 839930:0:(osp_md_object.c:1187:osp_write_interpreter()) cfs_fail_race id 1708 awake: rc=4995 LustreError: 642529:0:(update_trans.c:1031:top_trans_stop()) cfs_fail_race id 1708 waking Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 Lustre: Failing over lustre-MDT0001 Lustre: lustre-MDT0001: Not available for connect from 10.240.28.48@tcp (stopping) Lustre: lustre-MDT0001: Not available for connect from 10.240.28.48@tcp (stopping) Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: lustre-MDT0001: Not available for connect from 10.240.24.250@tcp (stopping) Lustre: Skipped 3 previous similar messages | Link to test |
racer test complete, duration 400 sec | watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [ldlm_bl_02:27673] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common joydev pcspkr crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic crc32c_intel ata_piix libata virtio_net serio_raw virtio_blk net_failover failover [last unloaded: dm_flakey] CPU: 1 PID: 27673 Comm: ldlm_bl_02 Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 1 previous similar message RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 6e b2 ef f0 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffb068414a3d68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffb06842600008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffb068425f5000 RSI: ffffb068414a3da0 RDI: ffff8fd5f7f7e800 RBP: ffffffffc0e97ce0 R08: 000000000000021c R09: 000000000000000e R10: ffffb068414a3e28 R11: ffff8fd60064121a R12: 0000000000000000 R13: 0000000000000000 R14: ffff8fd5f7f7e800 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8fd67fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005606c8718a70 CR3: 000000001b810005 CR4: 00000000001706e0 Call Trace: ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6df/0x940 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 27673 Comm: ldlm_bl_02 Kdump: loaded Tainted: G OEL --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? __switch_to_asm+0x51/0x80 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] | Lustre: 10717:0:(client.c:2338:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1709134192/real 1709134192] req@000000009a4a5e12 x1792133931430912/t0(0) o400->MGC10.240.28.46@tcp@10.240.28.46@tcp:26/25 lens 224/224 e 0 to 1 dl 1709134208 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0 LustreError: 166-1: MGC10.240.28.46@tcp: Connection to MGS (at 10.240.28.46@tcp) was lost; in progress operations using this service will fail Lustre: lustre-MDT0000-osp-MDT0003: Connection to lustre-MDT0000 (at 10.240.28.46@tcp) was lost; in progress operations using this service will wait for recovery to complete LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation mds_statfs to node 10.240.28.46@tcp failed: rc = -107 LustreError: Skipped 1 previous similar message Lustre: MGC10.240.28.46@tcp: Connection restored to (at 10.240.28.46@tcp) Lustre: 10717:0:(client.c:2338:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1709134218/real 1709134218] req@000000001582e504 x1792133931447168/t0(0) o400->MGC10.240.28.46@tcp@10.240.28.46@tcp:26/25 lens 224/224 e 0 to 1 dl 1709134234 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0 LustreError: 166-1: MGC10.240.28.46@tcp: Connection to MGS (at 10.240.28.46@tcp) was lost; in progress operations using this service will fail Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2 Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 3 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 15 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: 10716:0:(client.c:2338:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1709134388/real 1709134388] req@000000000be96df7 x1792133931541056/t0(0) o400->lustre-OST0000-osc-MDT0001@10.240.25.137@tcp:28/4 lens 224/224 e 0 to 1 dl 1709134404 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0 Lustre: lustre-OST0000-osc-MDT0001: Connection to lustre-OST0000 (at 10.240.25.137@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) | Link to test |
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ldlm_bl= _01:1146673] Modules linked in: dm_flakey obdecho(OE) ptlrpc_gss(OE) os= p(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) l= quota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksoc= klnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod r= pcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache int= el_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni= _intel joydev pcspkr virtio_balloon i2c_piix4 sunrpc ext4 mbcache jbd2 at= a_generic ata_piix libata crc32c_intel serio_raw virtio_net virtio_blk ne= t_failover failover [last unloaded: dm_flakey] CPU: 1 PID: 1146673 Comm: ldlm_bl_01 Kdump: loaded Tainted= : G W OE --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 4= 8 8d 74 24 38 4c 89 f7 48 8b 00 e8 6e d2 90 c4 48 85 c0 0f 84 f1 01 00 00= <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffc3140420fd68 EFLAGS: 00010282 ORIG_RAX: ffff= ffffffffff13 RAX: ffffc314074b7008 RBX: 0000000000000000 RCX: 000000000= 000000e RDX: ffffc314074b1000 RSI: ffffc3140420fda0 RDI: ffff9fbe3= 1a5d000 RBP: 0000000000000000 R08: 0000000000000008 R09: 000000000= 000000e R10: ffffc3140420fd30 R11: ffff9fbdf8b37c70 R12: 000000000= 0000000 R13: 0000000000000001 R14: ffff9fbe31a5d000 R15: 000000000= 0000001 FS: 0000000000000000(0000) GS:ffff9fbe7fd00000(0000) knlG= S:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055bd251471c8 CR3: 0000000027010005 CR4: 000000000= 01706e0 Call Trace: ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x177/0x180 [ptlrpc] ldlm_bl_thread_main+0x6df/0x940 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 1146673 Comm: ldlm_bl_01 Kdump: loaded Tainted= : G W OEL --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? __switch_to_asm+0x51/0x80 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] | Link to test | ||
recovery-small test 111: mdd setup fail should not cause umount oops | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ldlm_bl_02:13362] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc pcspkr joydev i2c_piix4 virtio_balloon ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net serio_raw net_failover virtio_blk failover CPU: 0 PID: 13362 Comm: ldlm_bl_02 Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 7e ff 79 fa 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0000:ffffafbb80f7fd70 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffafbb82a0c008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffafbb82a09000 RSI: ffffafbb80f7fda8 RDI: ffff9c81452b7600 RBP: ffffffffc0dc0b60 R08: 0000000000000008 R09: 000000000000000e R10: 000000000000000f R11: ffff9c8173ea3af5 R12: 0000000000000000 R13: ffff9c8145acc664 R14: ffff9c81452b7600 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9c81ffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055adb643da08 CR3: 000000008c010005 CR4: 00000000001706f0 Call Trace: ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x15f/0x170 [ptlrpc] ldlm_bl_thread_main+0x725/0x8f0 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 13362 Comm: ldlm_bl_02 Kdump: loaded Tainted: G OEL --------- - - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? __switch_to_asm+0x51/0x80 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] | Lustre: DEBUG MARKER: lctl set_param fail_loc=0x151 Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 Lustre: Skipped 2 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.28.45@tcp (stopping) Lustre: Skipped 34 previous similar messages | Link to test |
sanity test 133f: Check reads/writes of client lustre proc files with bad area io | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ldlm_bl_03:12985] Modules linked in: obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net serio_raw net_failover virtio_blk failover [last unloaded: lnet_selftest] CPU: 0 PID: 12985 Comm: ldlm_bl_03 Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.21.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 8e 77 aa d1 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0000:ffffaf94869bbd68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffaf9482dc6008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffaf9482dbb000 RSI: ffffaf94869bbda0 RDI: ffff9021c3406600 RBP: ffffffffc0f37a80 R08: 00000000000001be R09: 000000000000000e R10: ffffaf94869bbd68 R11: ffff9021f2e7e094 R12: 0000000000000000 R13: 0000000000000000 R14: ffff9021c3406600 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff90227fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005557ea2bd660 CR3: 0000000072e10005 CR4: 00000000000606f0 Call Trace: ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] cfs_hash_for_each_nolock+0x126/0x1f0 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x172/0x180 [ptlrpc] ldlm_bl_thread_main+0x71b/0x920 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 12985 Comm: ldlm_bl_03 Kdump: loaded Tainted: G OEL --------- - - 4.18.0-477.21.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? __switch_to_asm+0x51/0x80 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] | Lustre: lustre-MDT0000-lwp-MDT0003: Connection to lustre-MDT0000 (at 10.240.39.229@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 2 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2 LustreError: 166-1: MGC10.240.39.229@tcp: Connection to MGS (at 10.240.39.229@tcp) was lost; in progress operations using this service will fail Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) | Link to test |
sanityn test complete, duration 10719 sec | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ldlm_bl_04:378549] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 virtio_balloon pcspkr sunrpc joydev dm_mod ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net serio_raw net_failover virtio_blk failover [last unloaded: dm_flakey] CPU: 1 PID: 378549 Comm: ldlm_bl_04 Kdump: loaded Tainted: G OE --------- - - 4.18.0-425.10.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17d/0x490 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 7c 46 d4 e2 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffa62285f4fd68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffa62282cd7008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffa62282cc5000 RSI: ffffa62285f4fda0 RDI: ffff8e2fc1bc5900 RBP: ffffffffc0bec160 R08: 0000000000000571 R09: 0000000000000000 R10: ffffa62285f4fe28 R11: ffff8e2fc55e756f R12: 0000000000000000 R13: 0000000000000000 R14: ffff8e2fc1bc5900 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8e307fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055b527eac9b7 CR3: 0000000098e10002 CR4: 00000000000606e0 Call Trace: ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc] ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc] cfs_hash_for_each_nolock+0x126/0x1f0 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x172/0x180 [ptlrpc] ldlm_bl_thread_main+0x70c/0x930 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x10b/0x130 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 378549 Comm: ldlm_bl_04 Kdump: loaded Tainted: G OEL --------- - - 4.18.0-425.10.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? __switch_to_asm+0x51/0x80 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17d/0x490 [libcfs] | Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 Lustre: lustre-MDT0000: Not available for connect from 10.240.38.82@tcp (stopping) Lustre: Skipped 4 previous similar messages | Link to test |
sanity test 427: Failed DNE2 update request shouldn't corrupt updatelog | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ldlm_bl_02:361447] Modules linked in: dm_flakey obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev virtio_balloon i2c_piix4 sunrpc dm_mod ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net serio_raw virtio_blk net_failover failover [last unloaded: dm_flakey] CPU: 1 PID: 361447 Comm: ldlm_bl_02 Kdump: loaded Tainted: G OE --------- - - 4.18.0-425.10.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17d/0x490 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 7c 06 50 ef 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffb5398104bd68 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffb5398254f008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffb53982513000 RSI: ffffb5398104bda0 RDI: ffff923580617800 RBP: ffffffffc0e34030 R08: 00000000000003c4 R09: 0000000000000000 R10: ffffb5398104be28 R11: ffff9235871503c2 R12: 0000000000000000 R13: 0000000000000000 R14: ffff923580617800 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9235fcd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1c10c8b000 CR3: 00000000b2210005 CR4: 00000000000606e0 Call Trace: ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc] ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc] cfs_hash_for_each_nolock+0x126/0x1f0 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x172/0x180 [ptlrpc] ldlm_bl_thread_main+0x70c/0x930 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x10b/0x130 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 361447 Comm: ldlm_bl_02 Kdump: loaded Tainted: G OEL --------- - - 4.18.0-425.10.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x41/0x60 panic+0xe7/0x2ac ? __switch_to_asm+0x51/0x80 watchdog_timer_fn.cold.10+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x101/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x17d/0x490 [libcfs] | Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x80001708 LustreError: 472534:0:(libcfs_fail.h:180:cfs_race()) cfs_fail_race id 1708 waking LustreError: 472536:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 1708 sleeping LustreError: 472536:0:(libcfs_fail.h:178:cfs_race()) cfs_fail_race id 1708 awake: rc=5000 LustreError: 519938:0:(libcfs_fail.h:180:cfs_race()) cfs_fail_race id 1708 waking Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 LustreError: 472570:0:(llog_cat.c:737:llog_cat_cancel_arr_rec()) lustre-MDT0000-osp-MDT0001: fail to cancel 1 llog-records: rc = -116 LustreError: 472570:0:(llog_cat.c:737:llog_cat_cancel_arr_rec()) Skipped 1 previous similar message LustreError: 472570:0:(llog_cat.c:773:llog_cat_cancel_records()) lustre-MDT0000-osp-MDT0001: fail to cancel 1 of 1 llog-records: rc = -116 LustreError: 472570:0:(llog_cat.c:773:llog_cat_cancel_records()) Skipped 1 previous similar message Lustre: Failing over lustre-MDT0001 LustreError: 11-0: lustre-MDT0001-osp-MDT0003: operation mds_statfs to node 0@lo failed: rc = -107 LustreError: Skipped 6 previous similar messages Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0001: Not available for connect from 10.240.25.133@tcp (stopping) Lustre: Skipped 8 previous similar messages Lustre: 10571:0:(client.c:2305:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1680086414/real 1680086414] req@00000000b1c02f04 x1761661942170496/t0(0) o13->lustre-OST0005-osc-MDT0003@10.240.25.132@tcp:7/4 lens 224/368 e 0 to 1 dl 1680086421 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'osp-pre-5-3.0' Lustre: 10571:0:(client.c:2305:ptlrpc_expire_one_request()) Skipped 1 previous similar message Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 3 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 1 previous similar message LustreError: 166-1: MGC10.240.25.133@tcp: Connection to MGS (at 10.240.25.133@tcp) was lost; in progress operations using this service will fail Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 2 previous similar messages | Link to test |
sanity test 133f: Check reads/writes of client lustre proc files with bad area io | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ldlm_bl_01:11358] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm sunrpc iw_cm ib_core intel_rapl_msr dm_mod intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon joydev pcspkr i2c_piix4 ip_tables ext4 mbcache jbd2 ata_generic 8139too ata_piix libata 8139cp crc32c_intel mii serio_raw virtio_blk CPU: 1 PID: 11358 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OE --------- - - 4.18.0-240.15.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x173/0x450 [libcfs] Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 14 49 8b 46 38 48 8d 74 24 30 4c 89 f7 48 8b 00 e8 c6 45 a6 f6 48 85 c0 0f 84 e2 01 00 00 <48> 8b 18 48 85 db 0f 84 bc 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffa49980a2bd78 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffa4998145e008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffa4998143f000 RSI: ffffa49980a2bda8 RDI: ffff8accd13e9a00 RBP: ffffffffc0cb4950 R08: 0000000000000876 R09: ffffa49980a2bc38 R10: ffffa49980a2be30 R11: ffff8acceddf3874 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8accd13e9a00 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8accffd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000088001 CR3: 0000000072e0a003 CR4: 00000000000606e0 Call Trace: ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc] cfs_hash_for_each_nolock+0x11d/0x1a0 [libcfs] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc] ldlm_export_cancel_locks+0x15b/0x170 [ptlrpc] ldlm_bl_thread_main+0x721/0x8f0 [ptlrpc] ? finish_wait+0x80/0x80 ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] kthread+0x112/0x130 ? kthread_flush_work_fn+0x10/0x10 ret_from_fork+0x35/0x40 Kernel panic - not syncing: softlockup: hung tasks CPU: 1 PID: 11358 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OEL --------- - - 4.18.0-240.15.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <IRQ> dump_stack+0x5c/0x80 panic+0xe7/0x2a9 ? __switch_to_asm+0x51/0x70 watchdog_timer_fn.cold.8+0x85/0x9e ? watchdog+0x30/0x30 __hrtimer_run_queues+0x100/0x280 hrtimer_interrupt+0x100/0x220 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:cfs_hash_for_each_relax+0x173/0x450 [libcfs] | Lustre: 14589:0:(sec_gss.c:2326:gss_svc_handle_destroy()) destroy svc ctx 000000009666d34a idx 0xc915cffc3a5f4261 (0->10.9.4.5@tcp) Lustre: 14589:0:(sec_gss.c:2326:gss_svc_handle_destroy()) Skipped 1 previous similar message Lustre: 280719:0:(sec_gss.c:1224:gss_cli_ctx_fini_common()) reverse sec 000000003e4b56e8: destroy ctx 00000000dac0caf7 LustreError: 11-0: lustre-MDT0000-osp-MDT0003: operation mds_statfs to node 10.9.4.8@tcp failed: rc = -107 Lustre: lustre-MDT0000-osp-MDT0003: Connection to lustre-MDT0000 (at 10.9.4.8@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 3 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2 LustreError: 166-1: MGC10.9.4.8@tcp: Connection to MGS (at 10.9.4.8@tcp) was lost; in progress operations using this service will fail LustreError: 11-0: lustre-MDT0001-osp-MDT0003: operation mds_statfs to node 0@lo failed: rc = -107 LustreError: Skipped 1 previous similar message Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 3 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: 299781:0:(client.c:2286:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1618164406/real 1618164406] req@0000000049e07da8 x1696759082285312/t0(0) o39->lustre-MDT0002-osp-MDT0001@10.9.4.8@tcp:24/4 lens 224/224 e 0 to 1 dl 1618164412 ref 2 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'umount.0' | Link to test |