Editing crashreport #69011

ReasonCrashing FunctionWhere to cut BacktraceReports Count
watchdog: BUG: soft lockup - cfs_hash_for_each_relaxcfs_hash_for_each_nolock
ldlm_namespace_cleanup
__ldlm_namespace_free
ldlm_namespace_free_prior
mdt_device_fini
class_cleanup
class_process_config
class_manual_cleanup
server_put_super
generic_shutdown_super
kill_anon_super
deactivate_locked_super
cleanup_mnt
task_work_run
exit_to_usermode_loop
do_syscall_64
entry_SYSCALL_64_after_hwframe
29

Added fields:

Match messages in logs
(every line would be required to be present in log output
Copy from "Messages before crash" column below):
Match messages in full crash
(every line would be required to be present in crash log output
Copy from "Full Crash" column below):
Limit to a test:
(Copy from below "Failing text"):
Delete these reports as invalid (real bug in review or some such)
Bug or comment:
Extra info:

Failures list (last 100):

Failing TestFull CrashMessages before crashComment
sanity-quota test 39: Project ID interface works correctly
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:510746]
Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev i2c_piix4 virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata virtio_net crc32c_intel serio_raw net_failover virtio_blk failover [last unloaded: dm_flakey]
CPU: 1 PID: 510746 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.58.1.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 4e 16 51 dc 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffb79084137ab0 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffb79083568008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffb79083531000 RSI: ffffb79084137ae8 RDI: ffff8f8dc5a5b500
RBP: ffffffffc0e863c0 R08: 0000000000000018 R09: 000000000000000e
R10: ffff8f8df3cad000 R11: ffff8f8df3cac3f0 R12: ffffb79084137b60
R13: 0000000000000000 R14: ffff8f8dc5a5b500 R15: 0000000000000000
FS: 00007f0b161a0080(0000) GS:ffff8f8e7fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055a45914e6e8 CR3: 00000000051c2004 CR4: 00000000001706e0
Call Trace:
<IRQ>
? watchdog_timer_fn.cold.10+0x46/0x9e
? watchdog+0x30/0x30
? __hrtimer_run_queues+0x101/0x280
? hrtimer_interrupt+0x100/0x220
? smp_apic_timer_interrupt+0x6a/0x130
? apic_timer_interrupt+0xf/0x20
</IRQ>
? cleanup_resource+0x310/0x310 [ptlrpc]
? cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0x557/0xf30 [mdt]
class_cleanup+0x6a3/0xbf0 [obdclass]
class_process_config+0x39e/0x1a40 [obdclass]
? class_manual_cleanup+0x191/0x740 [obdclass]
? __kmalloc+0x113/0x250
class_manual_cleanup+0x269/0x740 [obdclass]
server_put_super+0x7f9/0x1090 [obdclass]
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x195/0x1a0
entry_SYSCALL_64_after_hwframe+0x66/0xcb
RIP: 0033:0x7f0b150f88fb
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osp.*.destroys_in_flight
Lustre: DEBUG MARKER: lctl set_param fail_val=0 fail_loc=0
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2
LustreError: 166-1: MGC10.240.28.46@tcp: Connection to MGS (at 10.240.28.46@tcp) was lost; in progress operations using this service will fail
Lustre: lustre-MDT0001: Not available for connect from 10.240.28.46@tcp (stopping)
Lustre: Skipped 10 previous similar messages
Link to test
recovery-small test 110k: FID_QUERY failed during recovery
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:91928]
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon joydev pcspkr i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic virtio_net ata_piix crc32c_intel libata net_failover serio_raw virtio_blk failover
CPU: 1 PID: 91928 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.53.1.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 9e 27 f1 c2 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffad1548817a88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffad1541ace008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffad1541aaf000 RSI: ffffad1548817ac0 RDI: ffff8eeb21e2cc00
RBP: ffffffffc0e833c0 R08: 0000000000000018 R09: 000000000000000e
R10: ffff8eeb33df8000 R11: ffff8eeb33df78e3 R12: ffffad1548817b38
R13: 0000000000000000 R14: ffff8eeb21e2cc00 R15: 0000000000000000
FS: 00007fe9d2246080(0000) GS:ffff8eebbfd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000562ea4b092e0 CR3: 0000000034a20001 CR4: 00000000001706e0
Call Trace:
<IRQ>
? watchdog_timer_fn.cold.10+0x46/0x9e
? watchdog+0x30/0x30
? __hrtimer_run_queues+0x101/0x280
? hrtimer_interrupt+0x100/0x220
? smp_apic_timer_interrupt+0x6a/0x130
? apic_timer_interrupt+0xf/0x20
</IRQ>
? cleanup_resource+0x310/0x310 [ptlrpc]
? cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
? cfs_hash_for_each_relax+0x172/0x480 [libcfs]
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0x557/0xf30 [mdt]
class_cleanup+0x6a3/0xbf0 [obdclass]
class_process_config+0x39e/0x1a40 [obdclass]
? class_manual_cleanup+0x191/0x740 [obdclass]
class_manual_cleanup+0x269/0x740 [obdclass]
server_put_super+0x7f9/0x12b0 [obdclass]
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x195/0x1a0
entry_SYSCALL_64_after_hwframe+0x66/0xcb
RIP: 0033:0x7fe9d119e8fb
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2
Lustre: Failing over lustre-MDT0001
Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping)
Lustre: Skipped 1 previous similar message
Lustre: lustre-MDT0001: Not available for connect from 10.240.28.46@tcp (stopping)
Lustre: lustre-MDT0001: Not available for connect from 10.240.27.48@tcp (stopping)
Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping)
Lustre: Skipped 10 previous similar messages
Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping)
Lustre: Skipped 11 previous similar messages
Lustre: lustre-MDT0001: Not available for connect from 10.240.27.47@tcp (stopping)
Lustre: Skipped 22 previous similar messages
Autotest: Test running for 135 minutes (lustre-b_es-reviews_review-dne-part-5_24389.29)
Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping)
Lustre: Skipped 37 previous similar messages
Link to test
conf-sanity test 50d: lazystatfs client/server conn race
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:679883]
Modules linked in: dm_flakey ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lzstd(OE) llz4hc(OE) llz4(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) nfsv3 nfs_acl loop dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 sunrpc joydev pcspkr virtio_balloon ext4 ata_generic mbcache jbd2 ata_piix libata virtio_net crc32c_intel serio_raw virtio_blk net_failover failover [last unloaded: dm_flakey]
CPU: 0 PID: 679883 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.53.1.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 9e 57 78 f8 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffb8c141663a88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffb8c143167008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffb8c143145000 RSI: ffffb8c141663ac0 RDI: ffff9bd235009b00
RBP: ffffffffc0d6b3c0 R08: 0000000000000018 R09: 000000000000000e
R10: ffff9bd227b40000 R11: ffff9bd227b3f3ae R12: ffffb8c141663b38
R13: 0000000000000000 R14: ffff9bd235009b00 R15: 0000000000000000
FS: 00007f4448860080(0000) GS:ffff9bd2bfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000560159b986e8 CR3: 000000002ca8e002 CR4: 00000000001706f0
Call Trace:
<IRQ>
? watchdog_timer_fn.cold.10+0x46/0x9e
? watchdog+0x30/0x30
? __hrtimer_run_queues+0x101/0x280
? hrtimer_interrupt+0x100/0x220
? smp_apic_timer_interrupt+0x6a/0x130
? apic_timer_interrupt+0xf/0x20
</IRQ>
? cleanup_resource+0x310/0x310 [ptlrpc]
? cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
? cfs_hash_for_each_relax+0x172/0x480 [libcfs]
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0x557/0xf30 [mdt]
? lu_context_init+0xac/0x1a0 [obdclass]
class_cleanup+0x6a3/0xbf0 [obdclass]
class_process_config+0x39e/0x1a40 [obdclass]
? class_manual_cleanup+0x191/0x740 [obdclass]
class_manual_cleanup+0x269/0x740 [obdclass]
server_put_super+0x7f9/0x12b0 [obdclass]
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x195/0x1a0
entry_SYSCALL_64_after_hwframe+0x66/0xcb
RIP: 0033:0x7f44477b88fb
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey 2>&1
Lustre: DEBUG MARKER: test -b /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1
LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Lustre: MGS: Logs for fs lustre were removed by user request. All servers must be restarted in order to regenerate the logs: rc = 0
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null
Lustre: MGS: Regenerating lustre-MDT0001 log by user request: rc = 0
Lustre: Skipped 1 previous similar message
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm7.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-99vm7.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds3
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds3_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds3_flakey 2>&1
Lustre: DEBUG MARKER: test -b /dev/mapper/mds3_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds3_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds3; mount -t lustre -o localrecov /dev/mapper/mds3_flakey /mnt/lustre-mds3
LDISKFS-fs (dm-5): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Lustre: MGS: Regenerating lustre-MDT0002 log by user request: rc = 0
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: e2label /dev/mapper/mds3_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
Lustre: DEBUG MARKER: e2label /dev/mapper/mds3_flakey 2>/dev/null
Lustre: MGS: Regenerating lustre-MDT0003 log by user request: rc = 0
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm7.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-99vm7.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid
Autotest: Test running for 170 minutes (lustre-b_es-reviews_review-dne-part-3_24164.34)
Lustre: MGS: Regenerating lustre-OST0000 log by user request: rc = 0
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-39vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-39vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid
Lustre: DEBUG MARKER: onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid
Lustre: DEBUG MARKER: onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-39vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-39vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) osc.lustre-OST0001-osc-[-0-9a-f]*.ost_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) osc.lustre-OST0001-osc-[-0-9a-f]*.ost_server_uuid
Lustre: DEBUG MARKER: onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0001-osc-[-0-9a-f]*.ost_server_uuid
Lustre: DEBUG MARKER: onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0001-osc-[-0-9a-f]*.ost_server_uuid
LustreError: 11-0: lustre-OST0000-osc-MDT0000: operation ost_statfs to node 10.240.23.128@tcp failed: rc = -107
Lustre: lustre-OST0000-osc-MDT0002: Connection to lustre-OST0000 (at 10.240.23.128@tcp) was lost; in progress operations using this service will wait for recovery to complete
LustreError: Skipped 15 previous similar messages
Lustre: Skipped 30 previous similar messages
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
Link to test
sanity-flr test complete, duration 2056 sec
watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [umount:1309913]
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net serio_raw net_failover virtio_blk failover [last unloaded: dm_flakey]
CPU: 0 PID: 1309913 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.53.1.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 6e d7 48 c3 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffbe43c10ffa88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffbe43c723c008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffbe43c721d000 RSI: ffffbe43c10ffac0 RDI: ffff9ba3f6352800
RBP: ffffffffc0f5a3c0 R08: 0000000000000019 R09: 000000000000000e
R10: ffff9ba3f3ca7000 R11: ffff9ba3f3ca631e R12: ffffbe43c10ffb38
R13: 0000000000000000 R14: ffff9ba3f6352800 R15: 0000000000000000
FS: 00007f67b24ef080(0000) GS:ffff9ba47fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000559891819fa0 CR3: 000000002ea8e002 CR4: 00000000001706f0
Call Trace:
<IRQ>
? watchdog_timer_fn.cold.10+0x46/0x9e
? watchdog+0x30/0x30
? __hrtimer_run_queues+0x101/0x280
? hrtimer_interrupt+0x100/0x220
? smp_apic_timer_interrupt+0x6a/0x130
? apic_timer_interrupt+0xf/0x20
</IRQ>
? cleanup_resource+0x310/0x310 [ptlrpc]
? cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
? cfs_hash_for_each_relax+0x172/0x480 [libcfs]
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0x557/0xf30 [mdt]
? lu_context_init+0xac/0x1a0 [obdclass]
class_cleanup+0x6a3/0xbf0 [obdclass]
class_process_config+0x39e/0x1a40 [obdclass]
? class_manual_cleanup+0x191/0x740 [obdclass]
? __kmalloc+0x113/0x250
class_manual_cleanup+0x269/0x740 [obdclass]
server_put_super+0x7f9/0x12b0 [obdclass]
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x195/0x1a0
entry_SYSCALL_64_after_hwframe+0x66/0xcb
RIP: 0033:0x7f67b14478fb
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-flr: start cleanup 00:41:12 \(1749256872\) ===
Lustre: DEBUG MARKER: === sanity-flr: start cleanup 00:41:12 (1749256872) ===
Lustre: 1094478:0:(osd_handler.c:2078:osd_trans_start()) lustre-MDT0000: credits 2832 > trans_max 2464
Lustre: 1094478:0:(osd_handler.c:1979:osd_trans_dump_creds()) create: 10/40/0, destroy: 1/4/0
Lustre: 1094478:0:(osd_handler.c:1986:osd_trans_dump_creds()) attr_set: 137/137/0, xattr_set: 205/2024/0
Lustre: 1094478:0:(osd_handler.c:1996:osd_trans_dump_creds()) write: 44/433/0, punch: 0/0/0, quota 0/0/0
Lustre: 1094478:0:(osd_handler.c:2003:osd_trans_dump_creds()) insert: 11/186/0, delete: 2/5/0
Lustre: 1094478:0:(osd_handler.c:2010:osd_trans_dump_creds()) ref_add: 1/1/0, ref_del: 2/2/0
CPU: 1 PID: 1094478 Comm: mdt00_004 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.53.1.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Call Trace:
dump_stack+0x41/0x60
osd_trans_start+0x4be/0x520 [osd_ldiskfs]
top_trans_start+0x427/0x950 [ptlrpc]
? lod_trans_start+0x7a/0x330 [lod]
? mdd_buf_get+0x1e/0x90 [mdd]
mdd_unlink+0x4aa/0xc90 [mdd]
mdt_reint_unlink+0xbf4/0x1380 [mdt]
mdt_reint_rec+0x127/0x260 [mdt]
mdt_reint_internal+0x4ac/0x7a0 [mdt]
mdt_reint+0x5e/0x100 [mdt]
tgt_request_handle+0xc9c/0x1970 [ptlrpc]
ptlrpc_server_handle_request+0x346/0xc10 [ptlrpc]
? ptlrpc_server_handle_req_in+0x7a8/0x8f0 [ptlrpc]
ptlrpc_main+0xb45/0x13a0 [ptlrpc]
? ptlrpc_register_service+0xf30/0xf30 [ptlrpc]
kthread+0x134/0x150
? set_kthread_struct+0x50/0x50
ret_from_fork+0x35/0x40
Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.mdt=none
Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=none
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-flr: finish cleanup 00:41:15 \(1749256875\) ===
Lustre: DEBUG MARKER: === sanity-flr: finish cleanup 00:41:15 (1749256875) ===
Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: sanity-lsnapshot ============----- Sat Jun 7 00:41:15 UTC 2025
Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-lsnapshot ============----- Sat Jun 7 00:41:15 UTC 2025
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: cat /etc/system-release
Lustre: DEBUG MARKER: test -r /etc/os-release
Lustre: DEBUG MARKER: cat /etc/os-release
Lustre: DEBUG MARKER: cat /etc/system-release
Lustre: DEBUG MARKER: test -r /etc/os-release
Lustre: DEBUG MARKER: cat /etc/os-release
Lustre: DEBUG MARKER: ls /usr/lib64/lustre/tests/except/sanity-lsnapshot.*ex || true
Lustre: DEBUG MARKER: cat /usr/lib64/lustre/tests/except/sanity-lsnapshot.*ex 2>/dev/null ||true
Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests:
Lustre: DEBUG MARKER: excepting tests:
Lustre: DEBUG MARKER: /usr/sbin/lctl mark SKIP: sanity-lsnapshot ZFS only test
Lustre: DEBUG MARKER: SKIP: sanity-lsnapshot ZFS only test
Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: mmp ============----- Sat Jun 7 00:41:23 UTC 2025
Lustre: DEBUG MARKER: -----============= acceptance-small: mmp ============----- Sat Jun 7 00:41:23 UTC 2025
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: cat /etc/system-release
Lustre: DEBUG MARKER: test -r /etc/os-release
Lustre: DEBUG MARKER: cat /etc/os-release
Lustre: DEBUG MARKER: cat /etc/system-release
Lustre: DEBUG MARKER: test -r /etc/os-release
Lustre: DEBUG MARKER: cat /etc/os-release
Lustre: DEBUG MARKER: ls /usr/lib64/lustre/tests/except/mmp.*ex || true
Lustre: DEBUG MARKER: cat /usr/lib64/lustre/tests/except/mmp.*ex 2>/dev/null ||true
Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests:
Lustre: DEBUG MARKER: excepting tests:
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
Lustre: lustre-MDT0000-lwp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
LustreError: 11-0: lustre-MDT0000-osp-MDT0002: operation mds_statfs to node 0@lo failed: rc = -107
Lustre: Skipped 33 previous similar messages
LustreError: Skipped 7 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
Lustre: Skipped 24 previous similar messages
Lustre: server umount lustre-MDT0000 complete
Lustre: Skipped 1 previous similar message
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: dmsetup remove /dev/mapper/mds1_flakey
LustreError: 137-5: lustre-MDT0000: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 53 previous similar messages
Lustre: DEBUG MARKER: dmsetup mknodes >/dev/null 2>&1
Lustre: DEBUG MARKER: modprobe -r dm-flakey
LustreError: 1092726:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) ldlm_cancel from 10.240.28.47@tcp arrived at 1749256904 with bad export cookie 16998891392739004986
Lustre: 12831:0:(client.c:2355:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1749256902/real 1749256902] req@0000000092b6de4a x1834186612638720/t0(0) o400->MGC10.240.28.46@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1749256909 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker.0'
LustreError: 166-1: MGC10.240.28.46@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
LustreError: Skipped 1 previous similar message
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds3
Lustre: lustre-MDT0002: Not available for connect from 10.240.28.229@tcp (stopping)
Lustre: Skipped 13 previous similar messages
Lustre: lustre-MDT0002: Not available for connect from 10.240.28.229@tcp (stopping)
Lustre: Skipped 44 previous similar messages
Link to test
sanity-compr test complete, duration 2959 sec
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:50562]
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev i2c_piix4 virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic ata_piix crc32c_intel libata serio_raw virtio_net virtio_blk net_failover failover
CPU: 0 PID: 50562 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.53.1.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 6e 27 9a cd 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffaa37839b3a88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffaa378287d008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffaa3782855000 RSI: ffffaa37839b3ac0 RDI: ffff8e7ee1872600
RBP: ffffffffc0c453c0 R08: 0000000000000019 R09: 000000000000000e
R10: ffff8e7ef04da000 R11: ffff8e7ef04d9fa9 R12: ffffaa37839b3b38
R13: 0000000000000000 R14: ffff8e7ee1872600 R15: 0000000000000000
FS: 00007f803addd080(0000) GS:ffff8e7f7fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000558f3db950a8 CR3: 0000000030f60006 CR4: 00000000001706f0
Call Trace:
<IRQ>
? watchdog_timer_fn.cold.10+0x46/0x9e
? watchdog+0x30/0x30
? __hrtimer_run_queues+0x101/0x280
? hrtimer_interrupt+0x100/0x220
? smp_apic_timer_interrupt+0x6a/0x130
? apic_timer_interrupt+0xf/0x20
</IRQ>
? cleanup_resource+0x310/0x310 [ptlrpc]
? cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
? cfs_hash_for_each_relax+0x172/0x480 [libcfs]
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0x557/0xf30 [mdt]
class_cleanup+0x6a3/0xbf0 [obdclass]
class_process_config+0x39e/0x1a40 [obdclass]
? class_manual_cleanup+0x191/0x740 [obdclass]
? __kmalloc+0x113/0x250
class_manual_cleanup+0x269/0x740 [obdclass]
server_put_super+0x7f9/0x12b0 [obdclass]
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x195/0x1a0
entry_SYSCALL_64_after_hwframe+0x66/0xcb
RIP: 0033:0x7f8039d358fb
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-compr: start cleanup 01:33:18 \(1749087198\) ===
Lustre: DEBUG MARKER: === sanity-compr: start cleanup 01:33:18 (1749087198) ===
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-compr: finish cleanup 01:33:20 \(1749087200\) ===
Lustre: DEBUG MARKER: === sanity-compr: finish cleanup 01:33:20 (1749087200) ===
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
Lustre: lustre-MDT0000: Not available for connect from 10.240.29.191@tcp (stopping)
Lustre: lustre-MDT0000: Not available for connect from 10.240.29.191@tcp (stopping)
Lustre: Skipped 6 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 10.240.29.191@tcp (stopping)
Lustre: Skipped 6 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 10.240.29.191@tcp (stopping)
Lustre: Skipped 6 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 10.240.29.191@tcp (stopping)
Lustre: Skipped 6 previous similar messages
Link to test
sanity test 133f: Check reads/writes of client lustre proc files with bad area io
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:388366]
Modules linked in: lzstd(OE) llz4hc(OE) llz4(OE) obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev virtio_balloon i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic ata_piix crc32c_intel libata serio_raw virtio_net virtio_blk net_failover failover [last unloaded: dm_flakey]
CPU: 1 PID: 388366 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.53.1.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 6e 37 b2 c7 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffb25241263a88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffb25244c31008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffb25244bf5000 RSI: ffffb25241263ac0 RDI: ffff8d9a74669f00
RBP: ffffffffc0ec13c0 R08: 0000000000000019 R09: 000000000000000e
R10: ffff8d9a73319000 R11: ffff8d9a73318872 R12: ffffb25241263b38
R13: 0000000000000000 R14: ffff8d9a74669f00 R15: 0000000000000000
FS: 00007ff46f4a4080(0000) GS:ffff8d9affd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000561786e04048 CR3: 0000000005664001 CR4: 00000000001706e0
Call Trace:
<IRQ>
? watchdog_timer_fn.cold.10+0x46/0x9e
? watchdog+0x30/0x30
? __hrtimer_run_queues+0x101/0x280
? hrtimer_interrupt+0x100/0x220
? smp_apic_timer_interrupt+0x6a/0x130
? apic_timer_interrupt+0xf/0x20
</IRQ>
? cleanup_resource+0x310/0x310 [ptlrpc]
? cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
? cfs_hash_for_each_relax+0x172/0x480 [libcfs]
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0x557/0xf30 [mdt]
? lu_context_init+0xac/0x1a0 [obdclass]
class_cleanup+0x6a3/0xbf0 [obdclass]
class_process_config+0x39e/0x1a40 [obdclass]
? class_manual_cleanup+0x191/0x740 [obdclass]
? __kmalloc+0x113/0x250
class_manual_cleanup+0x269/0x740 [obdclass]
server_put_super+0x7f9/0x12b0 [obdclass]
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x195/0x1a0
entry_SYSCALL_64_after_hwframe+0x66/0xcb
RIP: 0033:0x7ff46e3fc8fb
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
Lustre: lustre-MDT0000-lwp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 6 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
Lustre: Skipped 1 previous similar message
Lustre: lustre-MDT0000: Not available for connect from 10.240.28.51@tcp (stopping)
Lustre: Skipped 8 previous similar messages
Lustre: server umount lustre-MDT0000 complete
LustreError: 137-5: lustre-MDT0000: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.51@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 9 previous similar messages
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: dmsetup remove /dev/mapper/mds1_flakey
LustreError: 137-5: lustre-MDT0000: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 3 previous similar messages
Lustre: DEBUG MARKER: dmsetup mknodes >/dev/null 2>&1
Lustre: DEBUG MARKER: modprobe -r dm-flakey
LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.51@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 9 previous similar messages
LustreError: 13426:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) ldlm_cancel from 10.240.28.51@tcp arrived at 1748345981 with bad export cookie 10960556717268018208
LustreError: 13426:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) Skipped 4 previous similar messages
LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.51@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 13 previous similar messages
LustreError: 137-5: lustre-MDT0000: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 16 previous similar messages
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds3
LustreError: 13426:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) ldlm_cancel from 0@lo arrived at 1748345996 with bad export cookie 10960556717268017746
LustreError: 166-1: MGC10.240.28.50@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
Lustre: lustre-MDT0002: Not available for connect from 10.240.28.51@tcp (stopping)
Lustre: Skipped 3 previous similar messages
Lustre: lustre-MDT0002: Not available for connect from 10.240.28.51@tcp (stopping)
Lustre: Skipped 9 previous similar messages
Lustre: lustre-MDT0002: Not available for connect from 10.240.28.51@tcp (stopping)
Lustre: Skipped 8 previous similar messages
LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.51@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 40 previous similar messages
Lustre: lustre-MDT0002: Not available for connect from 10.240.27.207@tcp (stopping)
Lustre: Skipped 17 previous similar messages
Link to test
sanity-lipe-scan3 test complete, duration 1164 sec
watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [umount:1228784]
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lzstd(OE) llz4hc(OE) llz4(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 pcspkr joydev virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic virtio_net crc32c_intel ata_piix libata serio_raw net_failover failover virtio_blk [last unloaded: dm_flakey]
CPU: 0 PID: 1228784 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.46.1.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 be b5 1a fb 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffab010117fa88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffab01082f4008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffab01082df000 RSI: ffffab010117fac0 RDI: ffff96a2f512a900
RBP: ffffffffc0b1b3c0 R08: 0000000000000019 R09: 000000000000000e
R10: ffff96a2f123d000 R11: ffff96a2f123c6ac R12: ffffab010117fb38
R13: 0000000000000000 R14: ffff96a2f512a900 R15: 0000000000000000
FS: 00007fc875f95080(0000) GS:ffff96a37fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055855f6e86e8 CR3: 00000000411ac001 CR4: 00000000001706f0
Call Trace:
<IRQ>
? watchdog_timer_fn.cold.10+0x46/0x9e
? watchdog+0x30/0x30
? __hrtimer_run_queues+0x101/0x280
? hrtimer_interrupt+0x100/0x220
? smp_apic_timer_interrupt+0x6a/0x130
? apic_timer_interrupt+0xf/0x20
</IRQ>
? cleanup_resource+0x310/0x310 [ptlrpc]
? cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
? cfs_hash_for_each_relax+0x172/0x480 [libcfs]
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0x557/0xf30 [mdt]
? lu_context_init+0xac/0x1a0 [obdclass]
class_cleanup+0x6a3/0xbf0 [obdclass]
class_process_config+0x39e/0x1a40 [obdclass]
? class_manual_cleanup+0x191/0x740 [obdclass]
? __kmalloc+0x113/0x250
class_manual_cleanup+0x269/0x740 [obdclass]
server_put_super+0x7f9/0x12b0 [obdclass]
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x195/0x1a0
entry_SYSCALL_64_after_hwframe+0x66/0xcb
RIP: 0033:0x7fc874eed8fb
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-scan3: start cleanup 23:55:57 \(1745452557\) ===
Lustre: DEBUG MARKER: === sanity-lipe-scan3: start cleanup 23:55:57 (1745452557) ===
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-scan3: finish cleanup 23:55:58 \(1745452558\) ===
Lustre: DEBUG MARKER: === sanity-lipe-scan3: finish cleanup 23:55:58 (1745452558) ===
Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: sanity-lipe-find3 ============----- Wed Apr 23 11:55:58 PM UTC 2025
Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-lipe-find3 ============----- Wed Apr 23 11:55:58 PM UTC 2025
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: cat /etc/system-release
Lustre: DEBUG MARKER: test -r /etc/os-release
Lustre: DEBUG MARKER: cat /etc/os-release
Lustre: DEBUG MARKER: cat /etc/system-release
Lustre: DEBUG MARKER: test -r /etc/os-release
Lustre: DEBUG MARKER: cat /etc/os-release
Lustre: DEBUG MARKER: which lipe_find3
Lustre: DEBUG MARKER: ls /usr/lib64/lustre/tests/except/sanity-lipe-find3.*ex || true
Lustre: DEBUG MARKER: cat /usr/lib64/lustre/tests/except/sanity-lipe-find3.*ex 2>/dev/null ||true
Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests: 361
Lustre: DEBUG MARKER: excepting tests: 361
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-find3: start setup 23:56:05 \(1745452565\) ===
Lustre: DEBUG MARKER: === sanity-lipe-find3: start setup 23:56:05 (1745452565) ===
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-154vm19.onyx.whamcloud.com: executing check_config_client \/mnt\/lustre
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-154vm20.onyx.whamcloud.com: executing check_config_client \/mnt\/lustre
Lustre: DEBUG MARKER: onyx-154vm19.onyx.whamcloud.com: executing check_config_client /mnt/lustre
Lustre: DEBUG MARKER: onyx-154vm20.onyx.whamcloud.com: executing check_config_client /mnt/lustre
Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds1' ' /proc/mounts);
Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds3' ' /proc/mounts);
Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds1' ' /proc/mounts);
Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null
Lustre: DEBUG MARKER: cat /proc/mounts
Lustre: DEBUG MARKER: e2label /dev/mapper/mds3_flakey 2>/dev/null
Lustre: DEBUG MARKER: cat /proc/mounts
Lustre: DEBUG MARKER: lctl get_param -n timeout
Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20
Lustre: DEBUG MARKER: Using TIMEOUT=20
Lustre: DEBUG MARKER: [ -f /sys/module/mgc/parameters/mgc_requeue_timeout_min ] && echo 1 > /sys/module/mgc/parameters/mgc_requeue_timeout_min; exit 0
Lustre: DEBUG MARKER: lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param lod.*.mdt_hash=crush
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/us
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-154vm20.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: onyx-154vm20.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-58vm2.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: onyx-58vm2.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm7.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: onyx-99vm7.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param osd-ldiskfs.track_declares_assert=1 || true
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n lod.lustre-MDT0000-mdtlov.enable_compr_rotational
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n lod.lustre-MDT0002-mdtlov.enable_compr_rotational
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-find3: finish setup 23:56:31 \(1745452591\) ===
Lustre: DEBUG MARKER: === sanity-lipe-find3: finish setup 23:56:31 (1745452591) ===
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param printk=+lfsck
Lustre: DEBUG MARKER: /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -r -A -t all
Lustre: lustre-MDT0000-osd: layout LFSCK reset: rc = 0
Lustre: lustre-MDT0000-osd: namespace LFSCK reset: rc = 0
Lustre: lustre-MDT0000: OI scrub prep, flags = 0x4
Lustre: lustre-MDT0000: reset OI scrub file, old flags = 0x0, add flags = 0x0
Lustre: lustre-MDT0000: store scrub file: rc = 0
Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread start
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 0, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 5, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 1, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 3, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 4, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 6, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 7, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 2, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master prep done, start pos [1]
Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread start
Lustre: lustre-MDT0000-osd: namespace LFSCK prep done, start pos [1, [0x0:0x0:0x0], 0x0]: rc = 0
Lustre: lustre-MDT0000: OI scrub start, flags = 0x4, pos = 12
Lustre: lustre-MDT0000-osd: layout LFSCK master checkpoint at the pos [13], status = 1: rc = 0
Lustre: lustre-MDT0000-osd: namespace LFSCK checkpoint at the pos [13, [0x0:0x0:0x0], 0x0], status = 1: rc = 0
Lustre: LFSCK entry: oit_flags = 0x60000, dir_flags = 0x8006, oit_cookie = 13, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 1226882
Lustre: lustre-MDT0002-osd: layout LFSCK reset: rc = 0
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from MDT 3, status 1, flags 0, flags2 0
Lustre: lustre-MDT0002-osd: layout LFSCK master handles notify 3 from MDT 3, status 1, flags 0, flags2 0
Lustre: lustre-MDT0000-osd: namespace LFSCK handles notify 3 from MDT 3, status 1, flags 0
Lustre: lustre-MDT0002-osd: namespace LFSCK handles notify 3 from MDT 3, status 1, flags 0
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from MDT 1, status 1, flags 0, flags2 0
Lustre: lustre-MDT0002-osd: layout LFSCK master handles notify 3 from MDT 1, status 1, flags 0, flags2 0
Lustre: lustre-MDT0000-osd: namespace LFSCK handles notify 3 from MDT 1, status 1, flags 0
Lustre: lustre-MDT0002-osd: namespace LFSCK handles notify 3 from MDT 1, status 1, flags 0
Lustre: lustre-MDT0000: OI scrub post with result = 1
Lustre: lustre-MDT0000: store scrub file: rc = 0
Lustre: lustre-MDT0000: OI scrub: stop, pos = 792801: rc = 1
Lustre: lustre-MDT0002-osd: namespace LFSCK reset: rc = 0
Lustre: lustre-MDT0002: OI scrub prep, flags = 0x46
Lustre: lustre-MDT0002: reset OI scrub file, old flags = 0x0, add flags = 0x0
Lustre: lustre-MDT0002: store scrub file: rc = 0
Lustre: lustre-MDT0002-osd: lfsck_layout LFSCK assistant thread start
Lustre: lustre-MDT0002-osd: layout LFSCK master prep done, start pos [1]
Lustre: lustre-MDT0002-osd: lfsck_namespace LFSCK assistant thread start
Lustre: lustre-MDT0002-osd: namespace LFSCK prep done, start pos [1, [0x0:0x0:0x0], 0x0]: rc = 0
Lustre: lustre-MDT0002: OI scrub start, flags = 0x46, pos = 12
Lustre: lustre-MDT0002-osd: layout LFSCK master checkpoint at the pos [13], status = 1: rc = 0
Lustre: lustre-MDT0002-osd: namespace LFSCK checkpoint at the pos [13, [0x0:0x0:0x0], 0x0], status = 1: rc = 0
Lustre: LFSCK entry: oit_flags = 0x60003, dir_flags = 0x8006, oit_cookie = 13, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 1226887
Lustre: LFSCK exit: oit_flags = 0x60000, dir_flags = 0x8006, oit_cookie = 308, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 1226882, rc = 1
Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_layout post, rc = 1
Lustre: lustre-MDT0000-osd: the assistant has done lfsck_layout post, rc = 1
Lustre: lustre-MDT0000-osd: layout LFSCK master post done: rc = 0
Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_namespace post, rc = 1
Lustre: lustre-MDT0000-osd: the assistant has done lfsck_namespace post, rc = 1
Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread post
Lustre: lustre-MDT0000-osd: namespace LFSCK post done: rc = 0
Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_layout double_scan, status 2
Lustre: lustre-MDT0002-osd: layout LFSCK master handles notify 3 from MDT 0, status 1, flags 0, flags2 0
Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread post
Lustre: lustre-MDT0002-osd: namespace LFSCK handles notify 3 from MDT 0, status 1, flags 1
Lustre: lustre-MDT0000-osd: LFSCK assistant notified others for lfsck_layout post: rc = 0
Lustre: lustre-MDT0000-osd: LFSCK assistant sync before the second-stage scaning
Lustre: lustre-MDT0000-osd: LFSCK assistant notified others for lfsck_namespace post: rc = 0
Lustre: lustre-MDT0000-osd: the assistant has done lfsck_layout double_scan, status 0
Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_namespace double_scan, status 2
Lustre: lustre-MDT0000-osd: LFSCK assistant sync before the second-stage scaning
Lustre: lustre-MDT0000-osd: the assistant has done lfsck_namespace double_scan, status 0
Lustre: lustre-MDT0000-osd: namespace LFSCK handles notify 4 from MDT 1, status 1, flags 0
Lustre: lustre-MDT0000-osd: namespace LFSCK handles notify 4 from MDT 3, status 1, flags 0
Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan start, synced: rc = 0
Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan start, synced: rc = 0
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout |
Lustre: lustre-MDT0002: OI scrub post with result = 1
Lustre: lustre-MDT0002: store scrub file: rc = 0
Lustre: lustre-MDT0002: OI scrub: stop, pos = 792801: rc = 1
Lustre: lustre-MDT0002-osd: namespace LFSCK add flags for [0x280000405:0x1:0x0] in the trace file, flags 1, old 0, new 1: rc = 0
Lustre: LFSCK exit: oit_flags = 0x60003, dir_flags = 0x8006, oit_cookie = 280, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 1226887, rc = 1
Lustre: lustre-MDT0002-osd: waiting for assistant to do lfsck_layout post, rc = 1
Lustre: lustre-MDT0002-osd: lfsck_layout LFSCK assistant thread post
Lustre: lustre-MDT0002-osd: the assistant has done lfsck_layout post, rc = 1
Lustre: lustre-MDT0002-osd: layout LFSCK master post done: rc = 0
Lustre: lustre-MDT0002-osd: waiting for assistant to do lfsck_namespace post, rc = 1
Lustre: lustre-MDT0002-osd: the assistant has done lfsck_namespace post, rc = 1
Lustre: lustre-MDT0002-osd: namespace LFSCK post done: rc = 0
Lustre: lustre-MDT0002-osd: waiting for assistant to do lfsck_layout double_scan, status 2
Lustre: lustre-MDT0002-osd: lfsck_namespace LFSCK assistant thread post
Lustre: lustre-MDT0000-osd: namespace LFSCK handles notify 3 from MDT 2, status 1, flags 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from MDT 2, status 1, flags 0, flags2 0
Lustre: lustre-MDT0002-osd: LFSCK assistant notified others for lfsck_namespace post: rc = 0
Lustre: lustre-MDT0002-osd: LFSCK assistant notified others for lfsck_layout post: rc = 0
Lustre: lustre-MDT0002-osd: LFSCK assistant sync before the second-stage scaning
Lustre: lustre-MDT0002-osd: the assistant has done lfsck_layout double_scan, status 0
Lustre: lustre-MDT0002-osd: waiting for assistant to do lfsck_namespace double_scan, status 2
Lustre: lustre-MDT0002-osd: LFSCK assistant sync before the second-stage scaning
Lustre: lustre-MDT0002-osd: the assistant has done lfsck_namespace double_scan, status 0
Lustre: lustre-MDT0002-osd: LFSCK assistant phase2 scan start, synced: rc = 0
Lustre: lustre-MDT0000-osd: namespace LFSCK phase2 scan start
Lustre: lustre-MDT0002-osd: LFSCK assistant phase2 scan start, synced: rc = 0
Lustre: lustre-MDT0000-osd: start to scan backend /lost+found
Lustre: lustre-MDT0002-osd: namespace LFSCK phase2 scan start
Lustre: lustre-MDT0002-osd: start to scan backend /lost+found
Lustre: lustre-MDT0002-osd: layout LFSCK phase2 scan start
Lustre: lustre-MDT0000-osd: layout LFSCK phase2 scan start
Lustre: lustre-MDT0000-osd: layout LFSCK phase2 scan stop: rc = 1
Lustre: lustre-MDT0002-osd: layout LFSCK phase2 scan stop: rc = 1
Lustre: lustre-MDT0002-osd: layout LFSCK master handles notify 4 from MDT 0, status 1, flags 0, flags2 0
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 4 from MDT 2, status 1, flags 0, flags2 0
Lustre: lustre-MDT0000-osd: stop to scan backend /lost+found: rc = 1
Lustre: lustre-MDT0000-osd: namespace LFSCK phase2 scan stop at the No. 16 trace file: rc = 1
Lustre: lustre-MDT0002-osd: namespace LFSCK handles notify 4 from MDT 0, status 1, flags 0
Lustre: lustre-MDT0002-osd: stop to scan backend /lost+found: rc = 1
Lustre: lustre-MDT0000-osd: LFSCK assistant sync before exit
Lustre: lustre-MDT0002-osd: namespace LFSCK phase2 scan stop at the No. 16 trace file: rc = 1
Lustre: lustre-MDT0002-osd: LFSCK assistant sync before exit
Lustre: lustre-MDT0002-osd: LFSCK assistant synced before exit: rc = 0
Lustre: lustre-MDT0002-osd: LFSCK assistant phase2 scan finished: rc = 1
Lustre: lustre-MDT0002-osd: lfsck_namespace LFSCK assistant thread exit: rc = 1
Lustre: lustre-MDT0002-osd: LFSCK assistant sync before exit
Lustre: lustre-MDT0000-osd: LFSCK assistant synced before exit: rc = 0
Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan finished: rc = 1
Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread exit: rc = 1
Lustre: lustre-MDT0000-osd: LFSCK assistant sync before exit
Lustre: lustre-MDT0000-osd: LFSCK assistant synced before exit: rc = 0
Lustre: lustre-MDT0000-osd: layout LFSCK double scan: rc = 1
Lustre: lustre-MDT0000-osd: layout LFSCK double scan result 3: rc = 0
Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan finished: rc = 1
Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread exit: rc = 1
Lustre: lustre-MDT0002-osd: LFSCK assistant synced before exit: rc = 0
Lustre: lustre-MDT0002-osd: layout LFSCK double scan: rc = 1
Lustre: lustre-MDT0002-osd: layout LFSCK double scan result 3: rc = 0
Lustre: lustre-MDT0002-osd: LFSCK assistant phase2 scan finished: rc = 1
Lustre: lustre-MDT0002-osd: lfsck_layout LFSCK assistant thread exit: rc = 1
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout |
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0002.lfsck_layout |
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0002.lfsck_namespace |
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-*.lfsck_*
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
Lustre: lustre-MDT0000: Not available for connect from 10.240.28.50@tcp (stopping)
Lustre: Skipped 16 previous similar messages
LustreError: 11-0: lustre-MDT0000-osp-MDT0002: operation mds_statfs to node 0@lo failed: rc = -107
Lustre: lustre-MDT0000-osp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
Lustre: lustre-MDT0000: Not available for connect from 10.240.28.50@tcp (stopping)
Lustre: Skipped 13 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 10.240.24.249@tcp (stopping)
Lustre: Skipped 25 previous similar messages
LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.24.249@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 10 previous similar messages
Lustre: server umount lustre-MDT0000 complete
Autotest: Test running for 200 minutes (lustre-b_es-reviews_review-dne-exa6-part-1_23070.72)
LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.24.249@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 13 previous similar messages
LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.24.249@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 13 previous similar messages
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: dmsetup remove /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: dmsetup mknodes >/dev/null 2>&1
Lustre: DEBUG MARKER: modprobe -r dm-flakey
LustreError: 1135962:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) ldlm_cancel from 10.240.28.50@tcp arrived at 1745452680 with bad export cookie 3096739595288226873
Lustre: lustre-MDT0001-osp-MDT0002: Connection to lustre-MDT0001 (at 10.240.28.50@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 1 previous similar message
LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.50@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 26 previous similar messages
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds3
LustreError: 1138269:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) ldlm_cancel from 0@lo arrived at 1745452698 with bad export cookie 3096739595288226663
LustreError: 1138269:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) Skipped 4 previous similar messages
LustreError: 166-1: MGC10.240.28.49@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.24.249@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 36 previous similar messages
Lustre: lustre-MDT0002: Not available for connect from 10.240.24.249@tcp (stopping)
Lustre: Skipped 13 previous similar messages
Lustre: lustre-MDT0002: Not available for connect from 10.240.28.50@tcp (stopping)
Lustre: Skipped 17 previous similar messages
Link to test
ost-pools test complete, duration 2813 sec
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:423378]
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev virtio_balloon i2c_piix4 ext4 mbcache jbd2 ata_generic crc32c_intel virtio_net net_failover serio_raw failover ata_piix libata virtio_blk
CPU: 1 PID: 423378 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.40.1.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 ae e5 f8 f3 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffbdee8142ba88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffbdee833c4008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffbdee83393000 RSI: ffffbdee8142bac0 RDI: ffff989af1d47500
RBP: ffffffffc0c593c0 R08: 0000000000000019 R09: 000000000000000e
R10: ffff989af32f7000 R11: ffff989af32f652c R12: ffffbdee8142bb38
R13: 0000000000000000 R14: ffff989af1d47500 R15: 0000000000000000
FS: 00007f0a26b4b080(0000) GS:ffff989b7fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005651ce8aac00 CR3: 000000003265e003 CR4: 00000000001706e0
Call Trace:
<IRQ>
? watchdog_timer_fn.cold.10+0x46/0x9e
? watchdog+0x30/0x30
? __hrtimer_run_queues+0x101/0x280
? hrtimer_interrupt+0x100/0x220
? smp_apic_timer_interrupt+0x6a/0x130
? apic_timer_interrupt+0xf/0x20
</IRQ>
? cleanup_resource+0x310/0x310 [ptlrpc]
? cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
? cfs_hash_for_each_relax+0x172/0x480 [libcfs]
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
? lu_context_fini+0xa7/0x190 [obdclass]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0x557/0xf30 [mdt]
class_cleanup+0x6a3/0xbf0 [obdclass]
class_process_config+0x39e/0x1a40 [obdclass]
? class_manual_cleanup+0x191/0x740 [obdclass]
? __kmalloc+0x113/0x250
class_manual_cleanup+0x269/0x740 [obdclass]
server_put_super+0x7f9/0x12b0 [obdclass]
? lustre_lwp_setup+0x880/0x880 [obdclass]
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x195/0x1a0
entry_SYSCALL_64_after_hwframe+0x66/0xcb
RIP: 0033:0x7f0a25aa38fb
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === ost-pools: start cleanup 18:31:18 \(1742927478\) ===
Lustre: DEBUG MARKER: === ost-pools: start cleanup 18:31:18 (1742927478) ===
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === ost-pools: finish cleanup 18:32:35 \(1742927555\) ===
Lustre: DEBUG MARKER: === ost-pools: finish cleanup 18:32:35 (1742927555) ===
Lustre: Evicted from MGS (at 10.240.28.46@tcp) after server handle changed from 0xbeaf07ac59863b39 to 0xbeaf07ac5986a11b
Lustre: Skipped 1 previous similar message
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: onyx-99vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-97vm7.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-66vm7.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-97vm7.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-66vm7.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation mds_statfs to node 10.240.28.46@tcp failed: rc = -107
LustreError: Skipped 1 previous similar message
Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 10.240.28.46@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 11 previous similar messages
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2
LustreError: 166-1: MGC10.240.28.46@tcp: Connection to MGS (at 10.240.28.46@tcp) was lost; in progress operations using this service will fail
LustreError: Skipped 2 previous similar messages
Lustre: lustre-MDT0001: Not available for connect from 10.240.25.38@tcp (stopping)
Lustre: Skipped 22 previous similar messages
Autotest: Test running for 230 minutes (lustre-b_es-reviews_review-dne-part-6_22664.23)
Link to test
conf-sanity test 152: seq allocation error in OSP
watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [umount:1463310]
Modules linked in: lzstd(OE) llz4hc(OE) llz4(OE) obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev i2c_piix4 virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic ata_piix crc32c_intel libata serio_raw virtio_net virtio_blk net_failover failover [last unloaded: lzstd]
CPU: 0 PID: 1463310 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.40.1.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 ae 45 21 e8 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffa95d8127ba88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffa95d82f17008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffa95d82f09000 RSI: ffffa95d8127bac0 RDI: ffff991a6de15100
RBP: ffffffffc0bd33c0 R08: 0000000000000018 R09: 000000000000000e
R10: ffff991a6ece7000 R11: ffff991a6ece6953 R12: ffffa95d8127bb38
R13: 0000000000000000 R14: ffff991a6de15100 R15: 0000000000000000
FS: 00007f98fe4fb080(0000) GS:ffff991affc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f98fe0b73e0 CR3: 0000000031c7e003 CR4: 00000000001706f0
Call Trace:
<IRQ>
? watchdog_timer_fn.cold.10+0x46/0x9e
? watchdog+0x30/0x30
? __hrtimer_run_queues+0x101/0x280
? hrtimer_interrupt+0x100/0x220
? smp_apic_timer_interrupt+0x6a/0x130
? apic_timer_interrupt+0xf/0x20
</IRQ>
? cleanup_resource+0x310/0x310 [ptlrpc]
? cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
? cfs_hash_for_each_relax+0x172/0x480 [libcfs]
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0x557/0xf30 [mdt]
? lu_context_init+0xac/0x1a0 [obdclass]
class_cleanup+0x6a3/0xbf0 [obdclass]
class_process_config+0x39e/0x1a40 [obdclass]
? class_manual_cleanup+0x191/0x740 [obdclass]
class_manual_cleanup+0x269/0x740 [obdclass]
server_put_super+0x7f9/0x12b0 [obdclass]
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x195/0x1a0
entry_SYSCALL_64_after_hwframe+0x66/0xcb
RIP: 0033:0x7f98fd4538fb
Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds2' ' /proc/mounts);
Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds4' ' /proc/mounts);
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm12.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-146vm12.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1
Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2
LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm12.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-146vm12.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1
Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4
LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey 2>/dev/null
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-146vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20
Lustre: DEBUG MARKER: Using TIMEOUT=20
Lustre: DEBUG MARKER: [ -f /sys/module/mgc/parameters/mgc_requeue_timeout_min ] && echo 1 > /sys/module/mgc/parameters/mgc_requeue_timeout_min; exit 0
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param lod.*.mdt_hash=crush
Lustre: DEBUG MARKER: /usr/sbin/lctl mark ADD OST9
Lustre: DEBUG MARKER: ADD OST9
Lustre: DEBUG MARKER: /usr/sbin/lctl mark STOP OST9
Lustre: DEBUG MARKER: STOP OST9
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark START OST9 again
Lustre: DEBUG MARKER: START OST9 again
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Autotest: Test running for 415 minutes (lustre-b_es-reviews_review-dne-part-3_22528.20)
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2
Lustre: lustre-MDT0003: haven't heard from client 912c563b-fd78-411a-a91b-bd9fa284e499 (at 10.240.27.23@tcp) in 31 seconds. I think it's dead, and I am evicting it. exp 00000000c860166c, cur 1742279170 expire 1742279140 last 1742279139
Lustre: Skipped 1 previous similar message
LustreError: 166-1: MGC10.240.27.31@tcp: Connection to MGS (at 10.240.27.31@tcp) was lost; in progress operations using this service will fail
LustreError: Skipped 4 previous similar messages
Link to test
sanity-sec test 27a: test fileset in various nodemaps
watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [umount:926254]
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul sunrpc ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net serio_raw net_failover failover virtio_blk
CPU: 1 PID: 926254 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.40.1.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 ae f5 12 cc 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffba5fc1097a88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffba5fc5190008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffba5fc517b000 RSI: ffffba5fc1097ac0 RDI: ffff9928b03db100
RBP: ffffffffc0cb83c0 R08: 0000000000000019 R09: 000000000000000e
R10: ffff9928b6d98000 R11: ffff9928b6d97fb3 R12: ffffba5fc1097b38
R13: 0000000000000000 R14: ffff9928b03db100 R15: 0000000000000000
FS: 00007fa6c1b02080(0000) GS:ffff99293fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055601a587080 CR3: 00000000265ae002 CR4: 00000000001706e0
Call Trace:
<IRQ>
? watchdog_timer_fn.cold.10+0x46/0x9e
? watchdog+0x30/0x30
? __hrtimer_run_queues+0x101/0x280
? hrtimer_interrupt+0x100/0x220
? smp_apic_timer_interrupt+0x6a/0x130
? apic_timer_interrupt+0xf/0x20
</IRQ>
? cleanup_resource+0x310/0x310 [ptlrpc]
? cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
? cfs_hash_for_each_relax+0x172/0x480 [libcfs]
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0x557/0xf30 [mdt]
class_cleanup+0x6a3/0xbf0 [obdclass]
class_process_config+0x39e/0x1a40 [obdclass]
? class_manual_cleanup+0x191/0x740 [obdclass]
? __kmalloc+0x113/0x250
class_manual_cleanup+0x269/0x740 [obdclass]
server_put_super+0x7f9/0x12b0 [obdclass]
? lustre_lwp_setup+0x880/0x880 [obdclass]
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x195/0x1a0
entry_SYSCALL_64_after_hwframe+0x66/0xcb
RIP: 0033:0x7fa6c0a5a8fb
Lustre: DEBUG MARKER: /usr/sbin/lctl nodemap_activate 1
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.active
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.active
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl nodemap_modify --name default --property admin --value 1
Lustre: DEBUG MARKER: /usr/sbin/lctl nodemap_modify --name default --property trusted --value 1
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.active
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.trusted_nodemap
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.default.admin_nodemap
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.default.trusted_nodemap
Lustre: DEBUG MARKER: /usr/sbin/lctl nodemap_modify --name default --property admin --value 1
Lustre: DEBUG MARKER: /usr/sbin/lctl nodemap_modify --name default --property trusted --value 1
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.active
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.admin_nodemap
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.active
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.trusted_nodemap
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl nodemap_modify --name default --property admin --value 1
Lustre: DEBUG MARKER: /usr/sbin/lctl nodemap_modify --name default --property trusted --value 1
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.active
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.admin_nodemap
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.active
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.trusted_nodemap
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl nodemap_set_fileset --name default --fileset /thisisaverylongsubdirtotestlongfilesetsandtotestmultiplefilesetfragmentsonthenodemapiam_default
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.fileset
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.active
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.fileset
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
Lustre: lustre-MDT0000: Not available for connect from 10.240.29.128@tcp (stopping)
Lustre: Skipped 6 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 10.240.29.128@tcp (stopping)
Lustre: Skipped 5 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 10.240.29.128@tcp (stopping)
Lustre: Skipped 6 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 10.240.29.128@tcp (stopping)
Lustre: Skipped 6 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 10.240.29.128@tcp (stopping)
Lustre: Skipped 13 previous similar messages
Link to test
sanity-lipe-scan3 test complete, duration 1231 sec
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:997067]
Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lzstd(OE) llz4hc(OE) llz4(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common virtio_balloon crct10dif_pclmul joydev pcspkr i2c_piix4 crc32_pclmul ghash_clmulni_intel sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata virtio_net net_failover serio_raw crc32c_intel failover virtio_blk [last unloaded: dm_flakey]
CPU: 0 PID: 997067 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 3e 1f f4 cc 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffb7dfc106ba98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffb7dfc30c6008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffb7dfc3095000 RSI: ffffb7dfc106bad0 RDI: ffff9d90afc49100
RBP: ffffffffc0d8b870 R08: 0000000000000019 R09: 000000000000000e
R10: ffff9d90afca5000 R11: ffff9d90afca4824 R12: ffffb7dfc106bb48
R13: 0000000000000000 R14: ffff9d90afc49100 R15: 0000000000000000
FS: 00007f9dcab39080(0000) GS:ffff9d913fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8b08b92000 CR3: 0000000024ecc003 CR4: 00000000001706f0
Call Trace:
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0xe2/0x910 [mdt]
class_cleanup+0x6a3/0xc00 [obdclass]
class_process_config+0x39e/0x1a40 [obdclass]
class_manual_cleanup+0x456/0x740 [obdclass]
server_put_super+0x7f0/0x12e0 [obdclass]
? lustre_register_lwp_item+0x690/0x690 [obdclass]
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7f9dc9aa3e9b
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-scan3: start cleanup 11:24:57 \(1734348297\) ===
Lustre: DEBUG MARKER: === sanity-lipe-scan3: start cleanup 11:24:57 (1734348297) ===
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-scan3: finish cleanup 11:25:00 \(1734348300\) ===
Lustre: DEBUG MARKER: === sanity-lipe-scan3: finish cleanup 11:25:00 (1734348300) ===
Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: sanity-lipe-find3 ============----- Mon Dec 16 11:25:01 AM UTC 2024
Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-lipe-find3 ============----- Mon Dec 16 11:25:01 AM UTC 2024
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: cat /etc/system-release
Lustre: DEBUG MARKER: test -r /etc/os-release
Lustre: DEBUG MARKER: cat /etc/os-release
Lustre: DEBUG MARKER: cat /etc/system-release
Lustre: DEBUG MARKER: test -r /etc/os-release
Lustre: DEBUG MARKER: cat /etc/os-release
Lustre: DEBUG MARKER: which lipe_find3
Lustre: DEBUG MARKER: ls /usr/lib64/lustre/tests/except/sanity-lipe-find3.*ex || true
Lustre: DEBUG MARKER: cat /usr/lib64/lustre/tests/except/sanity-lipe-find3.*ex 2>/dev/null ||true
Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests: 361
Lustre: DEBUG MARKER: excepting tests: 361
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-find3: start setup 11:25:17 \(1734348317\) ===
Lustre: DEBUG MARKER: === sanity-lipe-find3: start setup 11:25:17 (1734348317) ===
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-91vm19.onyx.whamcloud.com: executing check_config_client \/mnt\/lustre
Lustre: DEBUG MARKER: onyx-91vm19.onyx.whamcloud.com: executing check_config_client /mnt/lustre
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-91vm20.onyx.whamcloud.com: executing check_config_client \/mnt\/lustre
Lustre: DEBUG MARKER: onyx-91vm20.onyx.whamcloud.com: executing check_config_client /mnt/lustre
Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds1' ' /proc/mounts);
Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds1' ' /proc/mounts);
Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null
Lustre: DEBUG MARKER: cat /proc/mounts
Lustre: DEBUG MARKER: lctl get_param -n timeout
Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20
Lustre: DEBUG MARKER: Using TIMEOUT=20
Lustre: DEBUG MARKER: [ -f /sys/module/mgc/parameters/mgc_requeue_timeout_min ] && echo 1 > /sys/module/mgc/parameters/mgc_requeue_timeout_min; exit 0
Lustre: DEBUG MARKER: lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param lod.*.mdt_hash=crush
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/us
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-116vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: onyx-116vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-91vm20.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: onyx-91vm20.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: onyx-99vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: onyx-99vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param osd-ldiskfs.track_declares_assert=1 || true
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-find3: finish setup 11:26:08 \(1734348368\) ===
Lustre: DEBUG MARKER: === sanity-lipe-find3: finish setup 11:26:08 (1734348368) ===
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param printk=+lfsck
Lustre: DEBUG MARKER: /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -r -A -t all
Lustre: lustre-MDT0000-osd: layout LFSCK reset: rc = 0
Lustre: lustre-MDT0000-osd: namespace LFSCK reset: rc = 0
Lustre: lustre-MDT0000: OI scrub prep, flags = 0x4
Lustre: lustre-MDT0000: reset OI scrub file, old flags = 0x0, add flags = 0x0
Lustre: lustre-MDT0000: store scrub file: rc = 0
Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread start
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 5, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 3, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 1, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 2, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 4, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 0, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 6, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master prep done, start pos [1]
Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread start
Lustre: lustre-MDT0000-osd: namespace LFSCK prep done, start pos [1, [0x0:0x0:0x0], 0x0]: rc = 0
Lustre: lustre-MDT0000: OI scrub start, flags = 0x4, pos = 12
Lustre: lustre-MDT0000-osd: layout LFSCK master checkpoint at the pos [13], status = 1: rc = 0
Lustre: lustre-MDT0000-osd: namespace LFSCK checkpoint at the pos [13, [0x0:0x0:0x0], 0x0], status = 1: rc = 0
Lustre: LFSCK entry: oit_flags = 0x60000, dir_flags = 0x8006, oit_cookie = 13, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 996358
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout |
Lustre: lustre-MDT0000: OI scrub post with result = 1
Lustre: lustre-MDT0000: store scrub file: rc = 0
Lustre: lustre-MDT0000: OI scrub: stop, pos = 838865: rc = 1
Lustre: LFSCK exit: oit_flags = 0x60000, dir_flags = 0x8006, oit_cookie = 293, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 996358, rc = 1
Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_layout post, rc = 1
Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread post
Lustre: lustre-MDT0000-osd: LFSCK assistant notified others for lfsck_layout post: rc = 0
Lustre: lustre-MDT0000-osd: the assistant has done lfsck_layout post, rc = 1
Lustre: lustre-MDT0000-osd: layout LFSCK master post done: rc = 0
Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_namespace post, rc = 1
Lustre: lustre-MDT0000-osd: the assistant has done lfsck_namespace post, rc = 1
Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread post
Lustre: lustre-MDT0000-osd: namespace LFSCK post done: rc = 0
Lustre: lustre-MDT0000-osd: LFSCK assistant notified others for lfsck_namespace post: rc = 0
Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_layout double_scan, status 2
Lustre: lustre-MDT0000-osd: LFSCK assistant sync before the second-stage scaning
Lustre: lustre-MDT0000-osd: the assistant has done lfsck_layout double_scan, status 0
Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_namespace double_scan, status 2
Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan start, synced: rc = 0
Lustre: lustre-MDT0000-osd: layout LFSCK phase2 scan start
Lustre: lustre-MDT0000-osd: layout LFSCK phase2 scan stop: rc = 1
Lustre: lustre-MDT0000-osd: LFSCK assistant sync before the second-stage scaning
Lustre: lustre-MDT0000-osd: the assistant has done lfsck_namespace double_scan, status 0
Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan start, synced: rc = 0
Lustre: lustre-MDT0000-osd: namespace LFSCK phase2 scan start
Lustre: lustre-MDT0000-osd: LFSCK assistant sync before exit
Lustre: lustre-MDT0000-osd: start to scan backend /lost+found
Lustre: lustre-MDT0000-osd: LFSCK assistant synced before exit: rc = 0
Lustre: lustre-MDT0000-osd: layout LFSCK double scan: rc = 1
Lustre: lustre-MDT0000-osd: layout LFSCK double scan result 3: rc = 0
Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan finished: rc = 1
Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread exit: rc = 1
Lustre: lustre-MDT0000-osd: stop to scan backend /lost+found: rc = 1
Lustre: lustre-MDT0000-osd: namespace LFSCK phase2 scan stop at the No. 16 trace file: rc = 1
Lustre: lustre-MDT0000-osd: LFSCK assistant sync before exit
Lustre: lustre-MDT0000-osd: LFSCK assistant synced before exit: rc = 0
Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan finished: rc = 1
Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread exit: rc = 1
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout |
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-*.lfsck_*
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
Autotest: Test running for 170 minutes (lustre-b_es-reviews_review-dne-exa6-part-1_21170.61)
Lustre: lustre-MDT0000: Not available for connect from 10.240.29.130@tcp (stopping)
Lustre: Skipped 6 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 10.240.29.130@tcp (stopping)
Lustre: lustre-MDT0000: Not available for connect from 10.240.29.130@tcp (stopping)
Lustre: Skipped 6 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 10.240.29.130@tcp (stopping)
Lustre: Skipped 6 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 10.240.29.130@tcp (stopping)
Lustre: Skipped 6 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 10.240.29.130@tcp (stopping)
Lustre: Skipped 13 previous similar messages
Link to test
sanity-pcc test 34: Cache rule with comparator (>, <) for Project ID range
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:109539]
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic crc32c_intel ata_piix libata serio_raw virtio_net net_failover virtio_blk failover
CPU: 0 PID: 109539 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 3e af 38 ef 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffafeb01833a98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffafeb07162008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffafeb07161000 RSI: ffffafeb01833ad0 RDI: ffff9cecb5a80200
RBP: ffffffffc0e38870 R08: 0000000000000019 R09: 000000000000000e
R10: ffff9cecbbc64000 R11: ffff9cecbbc633a8 R12: ffffafeb01833b48
R13: 0000000000000000 R14: ffff9cecb5a80200 R15: 0000000000000000
FS: 00007f4228bc3080(0000) GS:ffff9ced3fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055aead4d1528 CR3: 000000004345e003 CR4: 00000000001706f0
Call Trace:
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0xe2/0x910 [mdt]
? lu_context_init+0xa8/0x1b0 [obdclass]
class_cleanup+0x6a3/0xc00 [obdclass]
class_process_config+0x39e/0x1a40 [obdclass]
? class_manual_cleanup+0x116/0x740 [obdclass]
? __kmalloc+0x113/0x250
class_manual_cleanup+0x456/0x740 [obdclass]
server_put_super+0x7f0/0x12e0 [obdclass]
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7f4227b2de9b
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: lfs --list-commands
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: lfs --list-commands
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
Link to test
sanity test 115: verify dynamic thread creation
watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [umount:311807]
Modules linked in: obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_umad mlx5_ib ib_uverbs mlx5_core psample intel_rapl_msr intel_rapl_common mlxfw crct10dif_pclmul tls pci_hyperv_intf crc32_pclmul ib_core ghash_clmulni_intel virtio_balloon pcspkr joydev i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic crc32c_intel ata_piix libata virtio_net net_failover failover virtio_blk serio_raw [last unloaded: llog_test]
CPU: 0 PID: 311807 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-425.19.2.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17d/0x490 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 9c df ab d0 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffc08704303a98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffc08705897008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffc08705873000 RSI: ffffc08704303ad0 RDI: ffff9e57b3af2c00
RBP: ffffffffc10ad8a0 R08: 0000000000000019 R09: 0000000000000000
R10: ffff9e57c1524000 R11: ffff9e57c152317f R12: ffffc08704303b48
R13: 0000000000000000 R14: ffff9e57b3af2c00 R15: 0000000000000000
FS: 00007ffade5e2080(0000) GS:ffff9e583fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f419bf8f44 CR3: 00000000344e4002 CR4: 00000000001706f0
Call Trace:
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0xe2/0x910 [mdt]
class_cleanup+0x6a3/0xc00 [obdclass]
class_process_config+0x39e/0x1a40 [obdclass]
? class_manual_cleanup+0x116/0x740 [obdclass]
? __kmalloc+0x113/0x250
class_manual_cleanup+0x456/0x740 [obdclass]
server_put_super+0x7f0/0x12e0 [obdclass]
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7ffadd54ce9b
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
Lustre: lustre-MDT0000-lwp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 6 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
Lustre: Skipped 3 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 10.240.24.111@tcp (stopping)
Lustre: Skipped 2 previous similar messages
LustreError: 11-0: lustre-MDT0000-osp-MDT0002: operation mds_statfs to node 0@lo failed: rc = -107
Lustre: lustre-MDT0000: Not available for connect from 10.240.28.49@tcp (stopping)
Lustre: Skipped 9 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 10.240.28.49@tcp (stopping)
Lustre: Skipped 3 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 10.240.28.49@tcp (stopping)
Lustre: Skipped 25 previous similar messages
LustreError: 311403:0:(client.c:1278:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@0000000091f543f0 x1815799527234560/t0(0) o101->lustre-MDT0000-lwp-MDT0000@0@lo:23/10 lens 456/496 e 0 to 0 dl 0 ref 2 fl Rpc:QU/0/ffffffff rc 0/-1 job:'qsd_reint_0.lus.0'
LustreError: 311403:0:(client.c:1278:ptlrpc_import_delay_req()) Skipped 1 previous similar message
LustreError: 311403:0:(qsd_reint.c:56:qsd_reint_completion()) lustre-MDT0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x10000:0x0], rc:-5
LustreError: 311403:0:(qsd_reint.c:56:qsd_reint_completion()) Skipped 1 previous similar message
LustreError: 137-5: lustre-MDT0000: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 16 previous similar messages
Lustre: server umount lustre-MDT0000 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.49@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 1 previous similar message
Lustre: DEBUG MARKER: modprobe dm-flakey;
LustreError: 12543:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) ldlm_cancel from 10.240.28.49@tcp arrived at 1731686845 with bad export cookie 9190102256405190223
LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.49@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 14 previous similar messages
Lustre: lustre-MDT0001-osp-MDT0002: Connection to lustre-MDT0001 (at 10.240.28.49@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 1 previous similar message
LustreError: 11-0: lustre-MDT0001-osp-MDT0002: operation mds_statfs to node 10.240.28.49@tcp failed: rc = -107
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds3
LustreError: 12543:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) ldlm_cancel from 0@lo arrived at 1731686853 with bad export cookie 9190102256405189740
LustreError: 12543:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) Skipped 4 previous similar messages
LustreError: 166-1: MGC10.240.28.48@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.49@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 12 previous similar messages
Lustre: lustre-MDT0002: Not available for connect from 10.240.24.111@tcp (stopping)
Lustre: Skipped 25 previous similar messages
LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.49@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 9 previous similar messages
LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.49@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 19 previous similar messages
Lustre: lustre-MDT0002: Not available for connect from 10.240.28.49@tcp (stopping)
Lustre: Skipped 35 previous similar messages
Link to test
sanity-sec test complete, duration 2943 sec
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:295653]
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 ext4 mbcache jbd2 ata_generic crc32c_intel ata_piix libata virtio_net serio_raw net_failover virtio_blk failover [last unloaded: obdecho]
CPU: 0 PID: 295653 Comm: umount Kdump: loaded Tainted: P OE --------- - - 4.18.0-513.24.1.el8_lustre.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs]
Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 14 49 8b 46 38 48 8d 74 24 30 4c 89 f7 48 8b 00 e8 56 8b 55 c1 48 85 c0 0f 84 e6 01 00 00 <48> 8b 18 48 85 db 0f 84 c0 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffaa4e823e7aa0 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffaa4e82a4e008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffaa4e82a49000 RSI: ffffaa4e823e7ad0 RDI: ffff9e89c360f300
RBP: ffffffffc12febb0 R08: 0000000000000000 R09: 0000000000000000
R10: ffff9e89c2c90000 R11: 0000000000000001 R12: ffffaa4e823e7b48
R13: 0000000000000000 R14: ffff9e89c360f300 R15: 0000000000000000
FS: 00007f179cc70080(0000) GS:ffff9e8a7fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055e220b9c010 CR3: 0000000002d08005 CR4: 00000000001706f0
Call Trace:
<IRQ>
? watchdog_timer_fn.cold.10+0x46/0x9e
? watchdog+0x30/0x30
? __hrtimer_run_queues+0x101/0x280
? hrtimer_interrupt+0x100/0x220
? smp_apic_timer_interrupt+0x6a/0x130
? apic_timer_interrupt+0xf/0x20
</IRQ>
? cleanup_resource+0x310/0x310 [ptlrpc]
? cfs_hash_for_each_relax+0x173/0x460 [libcfs]
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x11f/0x1f0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4f0 [ptlrpc]
? key_fini+0x4e/0x160 [obdclass]
? lu_context_fini+0xa6/0x1c0 [obdclass]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0xda/0x930 [mdt]
class_cleanup+0x6f5/0xc90 [obdclass]
class_process_config+0x3ad/0x2080 [obdclass]
? class_manual_cleanup+0x191/0x780 [obdclass]
? __kmalloc+0x113/0x250
class_manual_cleanup+0x456/0x780 [obdclass]
server_put_super+0xadc/0x1350 [obdclass]
? __dentry_kill+0x121/0x170
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7f179bbe0e9b
LustreError: 11-0: lustre-MDT0000-osp-MDT0003: operation mds_statfs to node 10.240.25.141@tcp failed: rc = -107
LustreError: Skipped 6 previous similar messages
Lustre: lustre-MDT0000-osp-MDT0003: Connection to lustre-MDT0000 (at 10.240.25.141@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 14 previous similar messages
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2
LustreError: 166-1: MGC10.240.25.141@tcp: Connection to MGS (at 10.240.25.141@tcp) was lost; in progress operations using this service will fail
Lustre: lustre-MDT0001: Not available for connect from 10.240.25.115@tcp (stopping)
Lustre: Skipped 6 previous similar messages
Link to test
recovery-small test complete, duration 5640 sec
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:105972]
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon joydev pcspkr i2c_piix4 ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net net_failover serio_raw virtio_blk failover
CPU: 1 PID: 105972 Comm: umount Kdump: loaded Tainted: P OE --------- - - 4.18.0-513.24.1.el8_lustre.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs]
Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 14 49 8b 46 38 48 8d 74 24 30 4c 89 f7 48 8b 00 e8 56 2b 0f c4 48 85 c0 0f 84 e6 01 00 00 <48> 8b 18 48 85 db 0f 84 c0 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffbafdc1d1baa0 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffbafdc7479008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffbafdc743d000 RSI: ffffbafdc1d1bad0 RDI: ffff99bc81b82800
RBP: ffffffffc0f92bb0 R08: 000000000000046b R09: ffffbafdc1d1ba18
R10: ffffbafdc1d1baa0 R11: ffff99bc831682db R12: ffffbafdc1d1bb48
R13: 0000000000000000 R14: ffff99bc81b82800 R15: 0000000000000000
FS: 00007f77704a4080(0000) GS:ffff99bcffd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055b7b83f48d0 CR3: 000000003b1a2004 CR4: 00000000001706e0
Call Trace:
<IRQ>
? watchdog_timer_fn.cold.10+0x46/0x9e
? watchdog+0x30/0x30
? __hrtimer_run_queues+0x101/0x280
? hrtimer_interrupt+0x100/0x220
? smp_apic_timer_interrupt+0x6a/0x130
? apic_timer_interrupt+0xf/0x20
</IRQ>
? cleanup_resource+0x310/0x310 [ptlrpc]
? cfs_hash_for_each_relax+0x173/0x460 [libcfs]
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x11f/0x1f0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4f0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0xda/0x930 [mdt]
class_cleanup+0x6f5/0xc90 [obdclass]
class_process_config+0x3ad/0x2080 [obdclass]
class_manual_cleanup+0x456/0x780 [obdclass]
server_put_super+0xadc/0x1350 [obdclass]
? __dentry_kill+0x121/0x170
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7f776f40ee9b
Autotest: Test running for 120 minutes (lustre-reviews_review-dne-zfs-part-5_108937.34)
Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: sanity-scrub ============----- Wed Nov 13 02:34:02 UTC 2024
Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-scrub ============----- Wed Nov 13 02:34:02 UTC 2024
Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests:
Lustre: DEBUG MARKER: excepting tests:
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2
Lustre: lustre-MDT0001: Not available for connect from 10.240.28.47@tcp (stopping)
Lustre: Skipped 2 previous similar messages
LustreError: 105580:0:(lprocfs_jobstats.c:137:job_stat_exit()) should not have any items
LustreError: 105580:0:(lprocfs_jobstats.c:137:job_stat_exit()) Skipped 2 previous similar messages
Lustre: server umount lustre-MDT0001 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: ! zpool list -H lustre-mdt2 >/dev/null 2>&1 ||
LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 22 previous similar messages
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds4
Lustre: lustre-MDT0003: Not available for connect from 10.240.29.210@tcp (stopping)
Lustre: Skipped 9 previous similar messages
LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.29.210@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 34 previous similar messages
Lustre: lustre-MDT0003: Not available for connect from 10.240.29.210@tcp (stopping)
Lustre: Skipped 15 previous similar messages
Link to test
sanity-sec test complete, duration 8000 sec
watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [umount:1031692]
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common i2c_piix4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc joydev virtio_balloon pcspkr ext4 mbcache jbd2 ata_generic virtio_net ata_piix libata crc32c_intel serio_raw net_failover failover virtio_blk [last unloaded: libcfs]
CPU: 1 PID: 1031692 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 3e 0f 50 cb 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffc10343fafa98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffc10341b9d008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffc10341b5d000 RSI: ffffc10343fafad0 RDI: ffff9d252d68b800
RBP: ffffffffc0ba08a0 R08: 0000000000000019 R09: 000000000000000e
R10: ffff9d2521e0c000 R11: ffff9d2521e0b99d R12: ffffc10343fafb48
R13: 0000000000000000 R14: ffff9d252d68b800 R15: 0000000000000000
FS: 00007fc5adfe5080(0000) GS:ffff9d25bfd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000562895a350a8 CR3: 000000002d1c2005 CR4: 00000000001706e0
Call Trace:
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0xe2/0x910 [mdt]
class_cleanup+0x6a3/0xc00 [obdclass]
class_process_config+0x39e/0x1a40 [obdclass]
class_manual_cleanup+0x456/0x740 [obdclass]
server_put_super+0x7f0/0x12e0 [obdclass]
? lustre_register_lwp_item+0x690/0x690 [obdclass]
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7fc5acf4fe9b
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-sec: start cleanup 01:07:51 \(1731460071\) ===
Lustre: DEBUG MARKER: === sanity-sec: start cleanup 01:07:51 (1731460071) ===
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-sec: finish cleanup 01:07:56 \(1731460076\) ===
Lustre: DEBUG MARKER: === sanity-sec: finish cleanup 01:07:56 (1731460076) ===
Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: sanity-lipe-scan3 ============----- Wed Nov 13 01:07:58 AM UTC 2024
Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-lipe-scan3 ============----- Wed Nov 13 01:07:58 AM UTC 2024
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: cat /etc/system-release
Lustre: DEBUG MARKER: test -r /etc/os-release
Lustre: DEBUG MARKER: cat /etc/os-release
Lustre: DEBUG MARKER: cat /etc/system-release
Lustre: DEBUG MARKER: test -r /etc/os-release
Lustre: DEBUG MARKER: cat /etc/os-release
Lustre: DEBUG MARKER: which lipe_scan3
Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests:
Lustre: DEBUG MARKER: excepting tests:
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-scan3: start setup 01:08:13 \(1731460093\) ===
Lustre: DEBUG MARKER: === sanity-lipe-scan3: start setup 01:08:13 (1731460093) ===
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-91vm21.onyx.whamcloud.com: executing check_config_client \/mnt\/lustre
Lustre: DEBUG MARKER: onyx-91vm21.onyx.whamcloud.com: executing check_config_client /mnt/lustre
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-91vm22.onyx.whamcloud.com: executing check_config_client \/mnt\/lustre
Lustre: DEBUG MARKER: onyx-91vm22.onyx.whamcloud.com: executing check_config_client /mnt/lustre
Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds1' ' /proc/mounts);
Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds1' ' /proc/mounts);
Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null
Lustre: DEBUG MARKER: cat /proc/mounts
Lustre: DEBUG MARKER: lctl get_param -n timeout
Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20
Lustre: DEBUG MARKER: Using TIMEOUT=20
Lustre: DEBUG MARKER: [ -f /sys/module/mgc/parameters/mgc_requeue_timeout_min ] && echo 1 > /sys/module/mgc/parameters/mgc_requeue_timeout_min; exit 0
Lustre: DEBUG MARKER: lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param lod.*.mdt_hash=crush
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/us
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-96vm4.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-91vm22.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: onyx-96vm4.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: onyx-91vm22.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm1.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm1.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: onyx-99vm1.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: onyx-99vm1.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param osd-ldiskfs.track_declares_assert=1 || true
Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-scan3: finish setup 01:09:08 \(1731460148\) ===
Lustre: DEBUG MARKER: === sanity-lipe-scan3: finish setup 01:09:08 (1731460148) ===
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param printk=+lfsck
Lustre: DEBUG MARKER: /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -r -A -t all
Lustre: lustre-MDT0000-osd: layout LFSCK reset: rc = 0
Lustre: lustre-MDT0000-osd: namespace LFSCK reset: rc = 0
Lustre: lustre-MDT0000: OI scrub prep, flags = 0x4
Lustre: lustre-MDT0000: reset OI scrub file, old flags = 0x0, add flags = 0x0
Lustre: lustre-MDT0000: store scrub file: rc = 0
Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread start
Lustre: lustre-MDT0000-osd: layout LFSCK master prep done, start pos [1]
Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread start
Lustre: lustre-MDT0000-osd: namespace LFSCK prep done, start pos [1, [0x0:0x0:0x0], 0x0]: rc = 0
Lustre: lustre-MDT0000: OI scrub start, flags = 0x4, pos = 12
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 3, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 6, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 2, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 4, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 0, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 1, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 5, status 1, flags 2, flags2 1
Lustre: lustre-MDT0000-osd: layout LFSCK master checkpoint at the pos [13], status = 1: rc = 0
Lustre: lustre-MDT0000-osd: namespace LFSCK checkpoint at the pos [13, [0x0:0x0:0x0], 0x0], status = 1: rc = 0
Lustre: LFSCK entry: oit_flags = 0x60000, dir_flags = 0x8006, oit_cookie = 13, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 1031043
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout |
Lustre: lustre-MDT0000: OI scrub post with result = 1
Lustre: lustre-MDT0000: store scrub file: rc = 0
Lustre: lustre-MDT0000: OI scrub: stop, pos = 838865: rc = 1
Lustre: LFSCK exit: oit_flags = 0x60000, dir_flags = 0x8006, oit_cookie = 423, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 1031043, rc = 1
Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_layout post, rc = 1
Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread post
Lustre: lustre-MDT0000-osd: the assistant has done lfsck_layout post, rc = 1
Lustre: lustre-MDT0000-osd: layout LFSCK master post done: rc = 0
Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_namespace post, rc = 1
Lustre: lustre-MDT0000-osd: the assistant has done lfsck_namespace post, rc = 1
Lustre: lustre-MDT0000-osd: namespace LFSCK post done: rc = 0
Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_layout double_scan, status 2
Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread post
Lustre: lustre-MDT0000-osd: LFSCK assistant notified others for lfsck_namespace post: rc = 0
Lustre: lustre-MDT0000-osd: LFSCK assistant notified others for lfsck_layout post: rc = 0
Lustre: lustre-MDT0000-osd: LFSCK assistant sync before the second-stage scaning
Lustre: lustre-MDT0000-osd: the assistant has done lfsck_layout double_scan, status 0
Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_namespace double_scan, status 2
Lustre: lustre-MDT0000-osd: LFSCK assistant sync before the second-stage scaning
Lustre: lustre-MDT0000-osd: the assistant has done lfsck_namespace double_scan, status 0
Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan start, synced: rc = 0
Lustre: lustre-MDT0000-osd: namespace LFSCK phase2 scan start
Lustre: lustre-MDT0000-osd: start to scan backend /lost+found
Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan start, synced: rc = 0
Lustre: lustre-MDT0000-osd: layout LFSCK phase2 scan start
Lustre: lustre-MDT0000-osd: layout LFSCK phase2 scan stop: rc = 1
Lustre: lustre-MDT0000-osd: stop to scan backend /lost+found: rc = 1
Lustre: lustre-MDT0000-osd: namespace LFSCK phase2 scan stop at the No. 16 trace file: rc = 1
Lustre: lustre-MDT0000-osd: LFSCK assistant sync before exit
Lustre: lustre-MDT0000-osd: LFSCK assistant synced before exit: rc = 0
Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan finished: rc = 1
Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread exit: rc = 1
Lustre: lustre-MDT0000-osd: LFSCK assistant sync before exit
Lustre: lustre-MDT0000-osd: LFSCK assistant synced before exit: rc = 0
Lustre: lustre-MDT0000-osd: layout LFSCK double scan: rc = 1
Lustre: lustre-MDT0000-osd: layout LFSCK double scan result 3: rc = 0
Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan finished: rc = 1
Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread exit: rc = 1
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout |
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-*.lfsck_*
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
Link to test
sanity-pcc test 38: Verify LFS pcc state does not trigger prefetch for auto PCC-RO
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:130106]
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net net_failover serio_raw failover virtio_blk
CPU: 0 PID: 130106 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 3e bf 03 f6 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffb20bc431fa98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffb20bc2179008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffb20bc2149000 RSI: ffffb20bc431fad0 RDI: ffff9bec67cd9a00
RBP: ffffffffc0b858a0 R08: 0000000000000019 R09: 000000000000000e
R10: ffff9bec6626b000 R11: ffff9bec6626afa5 R12: ffffb20bc431fb48
R13: 0000000000000000 R14: ffff9bec67cd9a00 R15: 0000000000000000
FS: 00007f24a79b2080(0000) GS:ffff9becffc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff5c2e2fb0 CR3: 000000002b5b2002 CR4: 00000000001706f0
Call Trace:
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0xe2/0x910 [mdt]
class_cleanup+0x6a3/0xc00 [obdclass]
class_process_config+0x39e/0x1a40 [obdclass]
class_manual_cleanup+0x456/0x740 [obdclass]
server_put_super+0x7f0/0x12e0 [obdclass]
? lustre_register_lwp_item+0x690/0x690 [obdclass]
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7f24a691ce9b
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: lfs --list-commands
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: lfs --list-commands
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
Lustre: lustre-MDT0000: Not available for connect from 10.240.26.104@tcp (stopping)
Lustre: Skipped 13 previous similar messages
Link to test
sanity test 133f: Check reads/writes of client lustre proc files with bad area io
watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [umount:392095]
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc pcspkr joydev virtio_balloon i2c_piix4 ext4 mbcache jbd2 ata_generic ata_piix virtio_net libata crc32c_intel serio_raw virtio_blk net_failover failover [last unloaded: dm_flakey]
CPU: 0 PID: 392095 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 3e 5f 68 d0 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffa3ea8104fa98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffa3ea81cc4008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffa3ea81ca9000 RSI: ffffa3ea8104fad0 RDI: ffff91369f148e00
RBP: ffffffffc0b3b8a0 R08: 0000000000000019 R09: 000000000000000e
R10: ffff91367a1c5000 R11: ffff91367a1c49b7 R12: ffffa3ea8104fb48
R13: 0000000000000000 R14: ffff91369f148e00 R15: 0000000000000000
FS: 00007fea99a5c080(0000) GS:ffff9136ffc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055b69d50d6a8 CR3: 0000000020a38005 CR4: 00000000001706f0
Call Trace:
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0xe2/0x910 [mdt]
class_cleanup+0x6a3/0xc00 [obdclass]
class_process_config+0x39e/0x1a40 [obdclass]
? class_manual_cleanup+0x116/0x740 [obdclass]
? __kmalloc+0x113/0x250
class_manual_cleanup+0x456/0x740 [obdclass]
server_put_super+0x7f0/0x12e0 [obdclass]
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7fea989c6e9b
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
Lustre: lustre-MDT0000-lwp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 2 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
Lustre: Skipped 4 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 10.240.24.253@tcp (stopping)
Lustre: Skipped 2 previous similar messages
Lustre: server umount lustre-MDT0000 complete
LustreError: 137-5: lustre-MDT0000: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 34 previous similar messages
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: dmsetup remove /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: dmsetup mknodes >/dev/null 2>&1
Lustre: DEBUG MARKER: modprobe -r dm-flakey
LustreError: 346614:0:(ldlm_lockd.c:2564:ldlm_cancel_handler()) ldlm_cancel from 10.240.28.47@tcp arrived at 1729843545 with bad export cookie 12976622451113876197
LustreError: 11-0: lustre-MDT0001-osp-MDT0002: operation mds_statfs to node 10.240.28.47@tcp failed: rc = -107
Lustre: lustre-MDT0001-osp-MDT0002: Connection to lustre-MDT0001 (at 10.240.28.47@tcp) was lost; in progress operations using this service will wait for recovery to complete
LustreError: Skipped 2 previous similar messages
Lustre: Skipped 1 previous similar message
LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.24.253@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 49 previous similar messages
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds3
LustreError: 346614:0:(ldlm_lockd.c:2564:ldlm_cancel_handler()) ldlm_cancel from 0@lo arrived at 1729843568 with bad export cookie 12976622451113875756
LustreError: 166-1: MGC10.240.28.46@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
Lustre: lustre-MDT0002: Not available for connect from 10.240.28.47@tcp (stopping)
Lustre: Skipped 2 previous similar messages
Lustre: lustre-MDT0002: Not available for connect from 10.240.28.47@tcp (stopping)
Lustre: Skipped 8 previous similar messages
LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.24.253@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 68 previous similar messages
Lustre: lustre-MDT0002: Not available for connect from 10.240.24.253@tcp (stopping)
Lustre: Skipped 18 previous similar messages
Link to test
conf-sanity test 49a: check PARAM_SYS_LDLM_TIMEOUT option of mkfs.lustre
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:573275]
Modules linked in: dm_flakey ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) nfsv3 nfs_acl loop dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver intel_rapl_msr nfs lockd grace fscache intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev i2c_piix4 pcspkr virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel serio_raw virtio_net virtio_blk net_failover failover [last unloaded: dm_flakey]
CPU: 1 PID: 573275 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.22.1.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 de 15 cf cd 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0000:ffffb3590b63ba88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffb35904487008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffb35904467000 RSI: ffffb3590b63bac0 RDI: ffff945e65c0d000
RBP: ffffffffc0bd98a0 R08: 0000000000000018 R09: 000000000000000e
R10: ffff945e43220000 R11: ffff945e4321fa3a R12: ffffb3590b63bb38
R13: 0000000000000000 R14: ffff945e65c0d000 R15: 0000000000000000
FS: 00007f15ba6bd080(0000) GS:ffff945effd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000562335d22048 CR3: 0000000005f1e004 CR4: 00000000001706e0
Call Trace:
<IRQ>
? watchdog_timer_fn.cold.10+0x46/0x9e
? watchdog+0x30/0x30
? __hrtimer_run_queues+0x101/0x280
? hrtimer_interrupt+0x100/0x220
? smp_apic_timer_interrupt+0x6a/0x130
? apic_timer_interrupt+0xf/0x20
</IRQ>
? cleanup_resource+0x310/0x310 [ptlrpc]
? cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
? cfs_hash_for_each_relax+0x172/0x480 [libcfs]
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0x480/0xf40 [mdt]
? lu_context_init+0xa8/0x1b0 [obdclass]
class_cleanup+0x6a3/0xc00 [obdclass]
class_process_config+0x39e/0x1a40 [obdclass]
? class_manual_cleanup+0x191/0x750 [obdclass]
class_manual_cleanup+0x443/0x750 [obdclass]
server_put_super+0x805/0x1300 [obdclass]
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x195/0x1a0
entry_SYSCALL_64_after_hwframe+0x66/0xcb
RIP: 0033:0x7f15b96158fb
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_hostid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-33vm4.onyx.whamcloud.com: executing set_hostid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_hostid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_hostid
Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_hostid
Lustre: DEBUG MARKER: onyx-33vm4.onyx.whamcloud.com: executing set_hostid
Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_hostid
Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_hostid
Lustre: DEBUG MARKER: [ -e /dev/mapper/mds1_flakey ]
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: mkfs.lustre --mgs --fsname=lustre --mdt --index=0 --param=sys.timeout=20 --param=sys.ldlm_timeout=20 --param=mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype=ldiskfs --device-size=200000 --mkfsoptions="-O ea_inode,large_dir -E lazy_itable_init" --
LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: errors=remount-ro
Lustre: DEBUG MARKER: [ -e /dev/mapper/mds3_flakey ]
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: mkfs.lustre --mgsnode=onyx-99vm5@tcp --fsname=lustre --mdt --index=2 --param=sys.timeout=20 --param=sys.ldlm_timeout=20 --param=mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype=ldiskfs --device-size=200000 --mkfsoptions="-O ea_inode,large_dir -E l
LDISKFS-fs (dm-5): mounted filesystem with ordered data mode. Opts: errors=remount-ro
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey 2>&1
Lustre: DEBUG MARKER: test -b /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1
LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: errors=remount-ro
LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Lustre: ctl-lustre-MDT0000: No data found on store. Initialize space: rc = -61
Lustre: lustre-MDT0000: new disk, initializing
Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400]:0:mdt
Lustre: Skipped 4 previous similar messages
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
Lustre: DEBUG MARKER: sync; sleep 1; sync
Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds3
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds3_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds3_flakey 2>&1
Lustre: DEBUG MARKER: test -b /dev/mapper/mds3_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds3_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds3; mount -t lustre -o localrecov /dev/mapper/mds3_flakey /mnt/lustre-mds3
LDISKFS-fs (dm-5): mounted filesystem with ordered data mode. Opts: errors=remount-ro
LDISKFS-fs (dm-5): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Lustre: cli-ctl-lustre-MDT0002: Allocated super-sequence [0x0000000280000400-0x00000002c0000400]:2:mdt]
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: e2label /dev/mapper/mds3_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
Lustre: DEBUG MARKER: sync; sleep 1; sync
Lustre: DEBUG MARKER: e2label /dev/mapper/mds3_flakey 2>/dev/null
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid
Lustre: lustre-OST0000-osc-MDT0000: update sequence from 0x100000000 to 0x300000403
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-33vm4.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: onyx-33vm4.onyx.whamcloud.com: executing set_default_debug -1 all 4
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid
Lustre: DEBUG MARKER: onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid
Lustre: DEBUG MARKER: onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n ldlm_timeout
LustreError: 11-0: lustre-OST0000-osc-MDT0002: operation ost_statfs to node 10.240.23.7@tcp failed: rc = -107
LustreError: Skipped 17 previous similar messages
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
Link to test
runtests test 1: All Runtests
watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [umount:693944]
Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 joydev pcspkr virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata virtio_net crc32c_intel net_failover failover virtio_blk serio_raw [last unloaded: dm_flakey]
CPU: 1 PID: 693944 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-513.24.1.el8_lustre.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs]
Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 14 49 8b 46 38 48 8d 74 24 30 4c 89 f7 48 8b 00 e8 86 64 50 ec 48 85 c0 0f 84 e6 01 00 00 <48> 8b 18 48 85 db 0f 84 c0 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffff9e3e8652faa0 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffff9e3e82379008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffff9e3e8235f000 RSI: ffff9e3e8652fad0 RDI: ffff8ce74510a900
RBP: ffffffffc0e47bb0 R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000005b R11: 0000000000005be5 R12: ffff9e3e8652fb48
R13: 0000000000000000 R14: ffff8ce74510a900 R15: 0000000000000000
FS: 00007fc2d105a080(0000) GS:ffff8ce7ffd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000561c46b705e8 CR3: 0000000066e4c001 CR4: 00000000001706e0
Call Trace:
<IRQ>
? watchdog_timer_fn.cold.10+0x46/0x9e
? watchdog+0x30/0x30
? __hrtimer_run_queues+0x101/0x280
? hrtimer_interrupt+0x100/0x220
? smp_apic_timer_interrupt+0x6a/0x130
? apic_timer_interrupt+0xf/0x20
</IRQ>
? cleanup_resource+0x310/0x310 [ptlrpc]
? cfs_hash_for_each_relax+0x173/0x460 [libcfs]
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x11f/0x1f0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4f0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0xda/0x930 [mdt]
class_cleanup+0x6f5/0xc90 [obdclass]
class_process_config+0x3ad/0x2080 [obdclass]
? class_manual_cleanup+0x191/0x780 [obdclass]
? __kmalloc+0x113/0x250
class_manual_cleanup+0x456/0x780 [obdclass]
server_put_super+0xadc/0x1350 [obdclass]
? __dentry_kill+0x121/0x170
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7fc2cffc4e9b
Lustre: DEBUG MARKER: /usr/sbin/lctl mark touching \/mnt\/lustre at Thu Jun 27 10:59:17 UTC 2024 \(@1719485957\)
Lustre: DEBUG MARKER: touching /mnt/lustre at Thu Jun 27 10:59:17 UTC 2024 (@1719485957)
Lustre: DEBUG MARKER: /usr/sbin/lctl mark create an empty file \/mnt\/lustre\/hosts.1024451
Lustre: DEBUG MARKER: create an empty file /mnt/lustre/hosts.1024451
Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.1024451
Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.1024451
Lustre: DEBUG MARKER: /usr/sbin/lctl mark comparing \/etc\/hosts and \/mnt\/lustre\/hosts.1024451
Lustre: DEBUG MARKER: comparing /etc/hosts and /mnt/lustre/hosts.1024451
Lustre: DEBUG MARKER: /usr/sbin/lctl mark renaming \/mnt\/lustre\/hosts.1024451 to \/mnt\/lustre\/hosts.1024451.ren
Lustre: DEBUG MARKER: renaming /mnt/lustre/hosts.1024451 to /mnt/lustre/hosts.1024451.ren
Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.1024451 again
Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.1024451 again
Lustre: DEBUG MARKER: /usr/sbin/lctl mark truncating \/mnt\/lustre\/hosts.1024451
Lustre: DEBUG MARKER: truncating /mnt/lustre/hosts.1024451
Lustre: DEBUG MARKER: /usr/sbin/lctl mark removing \/mnt\/lustre\/hosts.1024451
Lustre: DEBUG MARKER: removing /mnt/lustre/hosts.1024451
Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.1024451.2
Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.1024451.2
Lustre: DEBUG MARKER: /usr/sbin/lctl mark truncating \/mnt\/lustre\/hosts.1024451.2 to 123 bytes
Lustre: DEBUG MARKER: truncating /mnt/lustre/hosts.1024451.2 to 123 bytes
Lustre: DEBUG MARKER: /usr/sbin/lctl mark creating \/mnt\/lustre\/d1.runtests
Lustre: DEBUG MARKER: creating /mnt/lustre/d1.runtests
Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying 595 files from \/etc \/bin to \/mnt\/lustre\/d1.runtests\/etc \/bin at Thu Jun 27 10:59:25 UTC 2024
Lustre: DEBUG MARKER: copying 595 files from /etc /bin to /mnt/lustre/d1.runtests/etc /bin at Thu Jun 27 10:59:25 UTC 2024
Autotest: Test running for 810 minutes (lustre-b2_15_full-dne-part-1_93.112)
Lustre: DEBUG MARKER: /usr/sbin/lctl mark comparing 595 newly copied files at Thu Jun 27 10:59:59 UTC 2024
Lustre: DEBUG MARKER: comparing 595 newly copied files at Thu Jun 27 10:59:59 UTC 2024
Lustre: DEBUG MARKER: /usr/sbin/lctl mark running createmany -d \/mnt\/lustre\/d1.runtests\/d 595
Lustre: DEBUG MARKER: running createmany -d /mnt/lustre/d1.runtests/d 595
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param -n debug=0
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param debug="super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck"
Lustre: DEBUG MARKER: /usr/sbin/lctl mark finished at Thu Jun 27 11:00:09 UTC 2024 \(52\)
Lustre: DEBUG MARKER: finished at Thu Jun 27 11:00:09 UTC 2024 (52)
Lustre: lustre-MDT0000-lwp-MDT0001: Connection to lustre-MDT0000 (at 10.240.28.50@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 27 previous similar messages
Lustre: 11254:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1719486031/real 1719486031] req@00000000dac500b2 x1802961275982080/t0(0) o400->MGC10.240.28.50@tcp@10.240.28.50@tcp:26/25 lens 224/224 e 0 to 1 dl 1719486038 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker/u4:2.0'
Lustre: 11254:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 1 previous similar message
LustreError: 166-1: MGC10.240.28.50@tcp: Connection to MGS (at 10.240.28.50@tcp) was lost; in progress operations using this service will fail
LustreError: Skipped 3 previous similar messages
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2
Lustre: lustre-MDT0001: Not available for connect from 10.240.28.50@tcp (stopping)
Lustre: lustre-MDT0001: Not available for connect from 10.240.23.7@tcp (stopping)
Lustre: Skipped 1 previous similar message
LustreError: 11-0: lustre-MDT0001-osp-MDT0003: operation mds_statfs to node 0@lo failed: rc = -107
LustreError: Skipped 27 previous similar messages
Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping)
Lustre: lustre-MDT0001: Not available for connect from 10.240.23.7@tcp (stopping)
Lustre: Skipped 1 previous similar message
LustreError: 693551:0:(lprocfs_jobstats.c:137:job_stat_exit()) should not have any items
LustreError: 693551:0:(lprocfs_jobstats.c:137:job_stat_exit()) Skipped 11 previous similar messages
Lustre: server umount lustre-MDT0001 complete
Lustre: Skipped 4 previous similar messages
LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 99 previous similar messages
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: lustre-MDT0002-osp-MDT0003: Connection to lustre-MDT0002 (at 10.240.28.50@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 3 previous similar messages
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds4
Lustre: lustre-MDT0003: Not available for connect from 10.240.23.7@tcp (stopping)
Lustre: Skipped 7 previous similar messages
LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.23.7@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 155 previous similar messages
Lustre: lustre-MDT0003: Not available for connect from 10.240.23.7@tcp (stopping)
Lustre: Skipped 15 previous similar messages
Lustre: lustre-MDT0003: Not available for connect from 10.240.23.7@tcp (stopping)
Lustre: Skipped 31 previous similar messages
Link to test
runtests test 1: All Runtests
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:1250907]
Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc virtio_balloon i2c_piix4 pcspkr joydev ext4 mbcache jbd2 ata_generic ata_piix libata virtio_net serio_raw net_failover crc32c_intel failover virtio_blk [last unloaded: dm_flakey]
CPU: 1 PID: 1250907 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs]
Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 14 49 8b 46 38 48 8d 74 24 30 4c 89 f7 48 8b 00 e8 66 56 03 d9 48 85 c0 0f 84 e6 01 00 00 <48> 8b 18 48 85 db 0f 84 c0 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffb254852dbaa0 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffb25482919008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffb254828fb000 RSI: ffffb254852dbad0 RDI: ffff8cd0b2e28c00
RBP: ffffffffc0d02bb0 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000002d4 R11: 000000000000ac1a R12: ffffb254852dbb48
R13: 0000000000000000 R14: ffff8cd0b2e28c00 R15: 0000000000000000
FS: 00007f10e1fa8080(0000) GS:ffff8cd0ffd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055b881bfa680 CR3: 0000000004c9a004 CR4: 00000000001706e0
Call Trace:
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x11f/0x1f0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4f0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0xda/0x930 [mdt]
? lu_context_init+0xa5/0x1b0 [obdclass]
class_cleanup+0x6f5/0xc90 [obdclass]
class_process_config+0x3ad/0x2080 [obdclass]
? class_manual_cleanup+0x116/0x770 [obdclass]
? __kmalloc+0x113/0x250
class_manual_cleanup+0x469/0x770 [obdclass]
server_put_super+0xac7/0x1330 [obdclass]
? __dentry_kill+0x121/0x170
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7f10e0f12e9b
Lustre: DEBUG MARKER: /usr/sbin/lctl mark touching \/mnt\/lustre at Fri May 31 05:06:55 UTC 2024 \(@1717132015\)
Lustre: DEBUG MARKER: touching /mnt/lustre at Fri May 31 05:06:55 UTC 2024 (@1717132015)
Lustre: DEBUG MARKER: /usr/sbin/lctl mark create an empty file \/mnt\/lustre\/hosts.1028631
Lustre: DEBUG MARKER: create an empty file /mnt/lustre/hosts.1028631
Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.1028631
Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.1028631
Lustre: DEBUG MARKER: /usr/sbin/lctl mark comparing \/etc\/hosts and \/mnt\/lustre\/hosts.1028631
Lustre: DEBUG MARKER: comparing /etc/hosts and /mnt/lustre/hosts.1028631
Lustre: DEBUG MARKER: /usr/sbin/lctl mark renaming \/mnt\/lustre\/hosts.1028631 to \/mnt\/lustre\/hosts.1028631.ren
Lustre: DEBUG MARKER: renaming /mnt/lustre/hosts.1028631 to /mnt/lustre/hosts.1028631.ren
Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.1028631 again
Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.1028631 again
Lustre: DEBUG MARKER: /usr/sbin/lctl mark truncating \/mnt\/lustre\/hosts.1028631
Lustre: DEBUG MARKER: truncating /mnt/lustre/hosts.1028631
Lustre: DEBUG MARKER: /usr/sbin/lctl mark removing \/mnt\/lustre\/hosts.1028631
Lustre: DEBUG MARKER: removing /mnt/lustre/hosts.1028631
Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.1028631.2
Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.1028631.2
Lustre: DEBUG MARKER: /usr/sbin/lctl mark truncating \/mnt\/lustre\/hosts.1028631.2 to 123 bytes
Lustre: DEBUG MARKER: truncating /mnt/lustre/hosts.1028631.2 to 123 bytes
Lustre: DEBUG MARKER: /usr/sbin/lctl mark creating \/mnt\/lustre\/d1.runtests
Lustre: DEBUG MARKER: creating /mnt/lustre/d1.runtests
Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying 593 files from \/etc \/bin to \/mnt\/lustre\/d1.runtests\/etc \/bin at Fri May 31 05:07:05 UTC 2024
Lustre: DEBUG MARKER: copying 593 files from /etc /bin to /mnt/lustre/d1.runtests/etc /bin at Fri May 31 05:07:05 UTC 2024
Lustre: DEBUG MARKER: /usr/sbin/lctl mark comparing 593 newly copied files at Fri May 31 05:07:25 UTC 2024
Lustre: DEBUG MARKER: comparing 593 newly copied files at Fri May 31 05:07:25 UTC 2024
Lustre: DEBUG MARKER: /usr/sbin/lctl mark running createmany -d \/mnt\/lustre\/d1.runtests\/d 593
Lustre: DEBUG MARKER: running createmany -d /mnt/lustre/d1.runtests/d 593
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n debug
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param -n debug=0
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param debug="super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck"
Lustre: DEBUG MARKER: /usr/sbin/lctl mark finished at Fri May 31 05:07:34 UTC 2024 \(39\)
Lustre: DEBUG MARKER: finished at Fri May 31 05:07:34 UTC 2024 (39)
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
Lustre: lustre-MDT0000-lwp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 24 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
Lustre: Skipped 2 previous similar messages
LustreError: 11-0: lustre-MDT0000-osp-MDT0002: operation mds_statfs to node 0@lo failed: rc = -107
LustreError: Skipped 6 previous similar messages
LustreError: 1250509:0:(lprocfs_jobstats.c:137:job_stat_exit()) should not have any items
LustreError: 1250509:0:(lprocfs_jobstats.c:137:job_stat_exit()) Skipped 2 previous similar messages
LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.240.28.45@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 67 previous similar messages
Lustre: server umount lustre-MDT0000 complete
Lustre: 13019:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1717132075/real 1717132075] req@000000000c961a0d x1800490966519232/t0(0) o400->MGC10.240.28.44@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1717132082 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker/u4:2.0'
Lustre: 13019:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
LustreError: 166-1: MGC10.240.28.44@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
LustreError: Skipped 3 previous similar messages
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds3
Link to test
sanity-quota test complete, duration 8226 sec
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:762648]
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr i2c_piix4 virtio_balloon ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel serio_raw virtio_net net_failover virtio_blk failover
LustreError: 137-5: lustre-MDT0001: not available for connect from 10.240.25.220@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 59 previous similar messages
CPU: 0 PID: 762648 Comm: umount Kdump: loaded Tainted: P OE --------- - - 4.18.0-513.9.1.el8_lustre.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 6e a2 98 f1 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffba1a0139bab0 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffba1a0759f008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffba1a0759d000 RSI: ffffba1a0139bae8 RDI: ffff976ef2fba600
RBP: ffffffffc126cd00 R08: 0000000000000000 R09: 000000000000000e
R10: ffff976f220f4600 R11: 0000000000000001 R12: ffffba1a0139bb60
R13: 0000000000000000 R14: ffff976ef2fba600 R15: 0000000000000000
FS: 00007fd39a42f080(0000) GS:ffff976f7cc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055cb7fea8048 CR3: 0000000040052003 CR4: 00000000001706f0
Call Trace:
<IRQ>
? watchdog_timer_fn.cold.10+0x46/0x9e
? watchdog+0x30/0x30
? __hrtimer_run_queues+0x101/0x280
? hrtimer_interrupt+0x100/0x220
? smp_apic_timer_interrupt+0x6a/0x130
? apic_timer_interrupt+0xf/0x20
</IRQ>
? cleanup_resource+0x330/0x330 [ptlrpc]
? cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
? cleanup_resource+0x330/0x330 [ptlrpc]
? cleanup_resource+0x330/0x330 [ptlrpc]
cfs_hash_for_each_nolock+0x124/0x200 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
Lustre: lustre-MDT0003: Not available for connect from 10.240.25.220@tcp (stopping)
Lustre: Skipped 55 previous similar messages
__ldlm_namespace_free+0x52/0x4f0 [ptlrpc]
? _cond_resched+0x15/0x30
? mutex_lock+0xe/0x30
? set_cdt_state+0x37/0x50 [mdt]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0x474/0xf60 [mdt]
? lu_context_init+0xa5/0x1b0 [obdclass]
class_cleanup+0x70d/0xca0 [obdclass]
class_process_config+0x3ad/0x21e0 [obdclass]
? class_manual_cleanup+0x191/0x780 [obdclass]
? __kmalloc+0x113/0x250
? lprocfs_counter_add+0x12a/0x1a0 [obdclass]
class_manual_cleanup+0x456/0x780 [obdclass]
server_put_super+0x7b7/0x10f0 [ptlrpc]
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7fd399399e9b
LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation mds_statfs to node 10.240.28.44@tcp failed: rc = -107
LustreError: Skipped 1 previous similar message
Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 10.240.28.44@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 3 previous similar messages
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2
LustreError: 166-1: MGC10.240.28.44@tcp: Connection to MGS (at 10.240.28.44@tcp) was lost; in progress operations using this service will fail
Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping)
Lustre: Skipped 1 previous similar message
Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping)
Lustre: Skipped 9 previous similar messages
Lustre: lustre-MDT0001: Not available for connect from 10.240.25.220@tcp (stopping)
Lustre: Skipped 11 previous similar messages
Lustre: server umount lustre-MDT0001 complete
LustreError: 137-5: lustre-MDT0001: not available for connect from 10.240.25.220@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 16 previous similar messages
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: ! zpool list -H lustre-mdt2 >/dev/null 2>&1 ||
LustreError: 137-5: lustre-MDT0001: not available for connect from 10.240.25.220@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 44 previous similar messages
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds4
Lustre: lustre-MDT0003: Not available for connect from 10.240.25.220@tcp (stopping)
Lustre: Skipped 7 previous similar messages
Link to test
sanityn test complete, duration 7095 sec
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:921019]
Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 joydev pcspkr virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata virtio_net net_failover failover crc32c_intel serio_raw virtio_blk [last unloaded: dm_flakey]
CPU: 0 PID: 921019 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 6e a2 3c cd 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffaedc8562fa98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffaedc83d07008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffaedc83d03000 RSI: ffffaedc8562fad0 RDI: ffff9729a7dbec00
RBP: ffffffffc0bcbbb0 R08: 0000000000000000 R09: 000000000000000e
R10: 0000000000000000 R11: 000000000000000f R12: ffffaedc8562fb48
R13: 0000000000000000 R14: ffff9729a7dbec00 R15: 0000000000000000
FS: 00007f6d1b700080(0000) GS:ffff972a3cc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055e296f4b9d0 CR3: 0000000040afa001 CR4: 00000000001706f0
Call Trace:
? cleanup_resource+0x330/0x330 [ptlrpc]
? cleanup_resource+0x330/0x330 [ptlrpc]
cfs_hash_for_each_nolock+0x124/0x200 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4f0 [ptlrpc]
? set_cdt_state_locked.isra.11+0x15/0xd0 [mdt]
? set_cdt_state+0x37/0x50 [mdt]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0xda/0x920 [mdt]
class_cleanup+0x70d/0xca0 [obdclass]
class_process_config+0x3ad/0x2160 [obdclass]
class_manual_cleanup+0x469/0x770 [obdclass]
server_put_super+0x7a2/0x1300 [ptlrpc]
? lustre_register_lwp_item+0x6a0/0x6a0 [ptlrpc]
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7f6d1a66ae9b
Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 10.240.28.130@tcp) was lost; in progress operations using this service will wait for recovery to complete
LustreError: 11-0: lustre-MDT0000-osp-MDT0003: operation mds_statfs to node 10.240.28.130@tcp failed: rc = -107
LustreError: Skipped 1 previous similar message
Lustre: lustre-MDT0000-osp-MDT0003: Connection to lustre-MDT0000 (at 10.240.28.130@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2
LustreError: 166-1: MGC10.240.28.130@tcp: Connection to MGS (at 10.240.28.130@tcp) was lost; in progress operations using this service will fail
Lustre: lustre-MDT0001: Not available for connect from 10.240.28.193@tcp (stopping)
Lustre: Skipped 20 previous similar messages
LustreError: 11-0: lustre-MDT0001-osp-MDT0003: operation mds_statfs to node 0@lo failed: rc = -107
Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 2 previous similar messages
Link to test
replay-single test 110c: DNE: create striped dir, fail MDT2
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:270459]
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev virtio_balloon i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic ata_piix crc32c_intel virtio_net libata serio_raw virtio_blk net_failover failover
CPU: 1 PID: 270459 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 6e 02 55 f1 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffbb570a937a98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffbb5703a17008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffbb57039dd000 RSI: ffffbb570a937ad0 RDI: ffff969372749000
RBP: ffffffffc0c45bb0 R08: 0000000000000000 R09: 000000000000000e
R10: 0000000000000009 R11: 0000000000000000 R12: ffffbb570a937b48
R13: 0000000000000000 R14: ffff969372749000 R15: 0000000000000000
FS: 00007f18765d4080(0000) GS:ffff9693ffd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055adc3e6e858 CR3: 0000000030062003 CR4: 00000000001706e0
Call Trace:
? cleanup_resource+0x330/0x330 [ptlrpc]
? cleanup_resource+0x330/0x330 [ptlrpc]
cfs_hash_for_each_nolock+0x124/0x200 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4f0 [ptlrpc]
? set_cdt_state_locked.isra.11+0x15/0xd0 [mdt]
? set_cdt_state+0x37/0x50 [mdt]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0xda/0x920 [mdt]
? lu_context_init+0xa5/0x1b0 [obdclass]
class_cleanup+0x70d/0xca0 [obdclass]
class_process_config+0x3ad/0x2160 [obdclass]
? class_manual_cleanup+0x116/0x770 [obdclass]
? __kmalloc+0x113/0x250
? lprocfs_counter_add+0x12a/0x1a0 [obdclass]
class_manual_cleanup+0x469/0x770 [obdclass]
server_put_super+0x7a2/0x1300 [ptlrpc]
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7f187553ee9b
Lustre: DEBUG MARKER: sync; sync; sync
Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 notransno
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 3964928 flakey 252:0 0 0 1800 1 drop_writes"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds2 REPLAY BARRIER on lustre-MDT0001
Lustre: DEBUG MARKER: mds2 REPLAY BARRIER on lustre-MDT0001
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2
Lustre: Failing over lustre-MDT0001
Lustre: lustre-MDT0001: Not available for connect from 10.240.25.36@tcp (stopping)
Lustre: 10712:0:(client.c:2338:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1708742277/real 1708742277] req@ffff96937328b400 x1791737181335680/t0(0) o13->lustre-OST0005-osc-MDT0003@10.240.25.37@tcp:7/4 lens 224/368 e 0 to 1 dl 1708742293 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'osp-pre-5-3.0' uid:0 gid:0
Lustre: 10712:0:(client.c:2338:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Link to test
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:=
899023]
Modules linked in: dm_flakey obdecho(OE) ptlrpc_gss(OE) os=
p(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) l=
quota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksoc=
klnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod r=
pcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache int=
el_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni=
_intel virtio_balloon i2c_piix4 joydev pcspkr sunrpc ext4 mbcache jbd2 at=
a_generic crc32c_intel virtio_net serio_raw ata_piix libata net_failover =
failover virtio_blk [last unloaded: dm_flakey]
CPU: 0 PID: 899023 Comm: umount Kdump: loaded Tainted: G =
W OE --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 4=
8 8d 74 24 38 4c 89 f7 48 8b 00 e8 6e 55 c4 ea 48 85 c0 0f 84 f1 01 00 00=
<48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffab5082e63a98 EFLAGS: 00010282 ORIG_RAX: ffff=
ffffffffff13
RAX: ffffab50836b2008 RBX: 0000000000000000 RCX: 000000000=
000000e
RDX: ffffab5083691000 RSI: ffffab5082e63ad0 RDI: ffff9d7b3=
e70fb00
RBP: ffffffffc0b4d890 R08: 00000000000003f1 R09: 000000000=
000000e
R10: ffffab5082e63a98 R11: ffff9d7b34283261 R12: ffffab508=
2e63b48
R13: 0000000000000000 R14: ffff9d7b3e70fb00 R15: 000000000=
0000000
FS: 00007f2970f21080(0000) GS:ffff9d7bbfc00000(0000) knlG=
S:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fea4b835000 CR3: 000000002c768004 CR4: 000000000=
01706f0
Call Trace:
? cleanup_resource+0x330/0x330 [ptlrpc]
? cleanup_resource+0x330/0x330 [ptlrpc]
cfs_hash_for_each_nolock+0x126/0x1f0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4f0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0xda/0x920 [mdt]
class_cleanup+0x705/0xca0 [obdclass]
class_process_config+0x3ad/0x2160 [obdclass]
class_manual_cleanup+0x469/0x770 [obdclass]
server_put_super+0x7a5/0x1300 [ptlrpc]
? lustre_register_lwp_item+0x690/0x690 [ptlrpc]
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7f296fe8be9b
Link to test
obdfilter-survey test 3a: Network survey
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:657010]
Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul sunrpc ghash_clmulni_intel virtio_balloon joydev pcspkr i2c_piix4 ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel serio_raw virtio_blk virtio_net net_failover failover [last unloaded: dm_flakey]
CPU: 1 PID: 657010 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-425.19.2.el8_lustre.ddn17.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17d/0x490 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 5c 6f 95 eb 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffbbec0131fa98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffbbec05fa7008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffbbec05f79000 RSI: ffffbbec0131fad0 RDI: ffffa0185c594800
RBP: ffffffffc0c0b880 R08: 0000000000000000 R09: 0000000000000000
R10: ffffa01867811200 R11: 0000000000000001 R12: ffffbbec0131fb48
R13: 0000000000000000 R14: ffffa0185c594800 R15: 0000000000000000
FS: 00007f1a2c7d9080(0000) GS:ffffa018fcd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f9ecf8de000 CR3: 0000000004eb6003 CR4: 00000000000606e0
Call Trace:
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4c0 [ptlrpc]
? _raw_spin_lock+0xc/0x30
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0xda/0x8f0 [mdt]
class_cleanup+0x6a3/0xc00 [obdclass]
class_process_config+0x393/0x1a40 [obdclass]
class_manual_cleanup+0x453/0x740 [obdclass]
server_put_super+0x7e2/0x12d0 [obdclass]
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7f1a2b743e9b
LustreError: 11-0: lustre-MDT0000-osp-MDT0003: operation mds_statfs to node 10.240.41.37@tcp failed: rc = -107
LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation mds_statfs to node 10.240.41.37@tcp failed: rc = -107
LustreError: Skipped 29 previous similar messages
LustreError: Skipped 29 previous similar messages
Lustre: lustre-MDT0000-osp-MDT0003: Connection to lustre-MDT0000 (at 10.240.41.37@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 47 previous similar messages
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2
LustreError: 166-1: MGC10.240.41.37@tcp: Connection to MGS (at 10.240.41.37@tcp) was lost; in progress operations using this service will fail
Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping)
Lustre: Skipped 52 previous similar messages
Lustre: 11362:0:(client.c:2321:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1707965786/real 1707965786] req@00000000c06f3ad3 x1790910940088512/t0(0) o400->lustre-MDT0002-osp-MDT0003@10.240.41.37@tcp:24/4 lens 224/224 e 0 to 1 dl 1707965793 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker.0'
Lustre: 11362:0:(client.c:2321:ptlrpc_expire_one_request()) Skipped 20 previous similar messages
Link to test
runtests test 1: All Runtests
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:107515]
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 joydev pcspkr virtio_balloon ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel serio_raw virtio_net net_failover virtio_blk failover [last unloaded: obdecho]
CPU: 1 PID: 107515 Comm: umount Kdump: loaded Tainted: P OE --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 ae f5 91 c2 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffaa1681e0ba98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffaa1687049008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffaa168701f000 RSI: ffffaa1681e0bad0 RDI: ffff8fdd2f4d5b00
RBP: ffffffffc12dbab0 R08: 0000000000000000 R09: 000000000000000e
R10: ffff8fdd4168f000 R11: 0000000000000001 R12: ffffaa1681e0bb48
R13: 0000000000000000 R14: ffff8fdd2f4d5b00 R15: 0000000000000000
FS: 00007f43dfb81080(0000) GS:ffff8fddbfd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055c21b5f3cb0 CR3: 000000004df70006 CR4: 00000000001706e0
Call Trace:
? cleanup_resource+0x330/0x330 [ptlrpc]
? cleanup_resource+0x330/0x330 [ptlrpc]
cfs_hash_for_each_nolock+0x126/0x1f0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4f0 [ptlrpc]
? _cond_resched+0x15/0x30
? mutex_lock+0xe/0x30
? set_cdt_state+0x37/0x50 [mdt]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0xda/0x920 [mdt]
? lu_context_init+0xa5/0x1b0 [obdclass]
class_cleanup+0x705/0xca0 [obdclass]
class_process_config+0x3ad/0x2160 [obdclass]
? class_manual_cleanup+0x116/0x770 [obdclass]
? __kmalloc+0x113/0x250
? lprocfs_counter_add+0x12a/0x1a0 [obdclass]
class_manual_cleanup+0x469/0x770 [obdclass]
server_put_super+0x7a5/0x1300 [ptlrpc]
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7f43deaebe9b
Lustre: DEBUG MARKER: /usr/sbin/lctl mark touching \/mnt\/lustre at Fri Feb 9 02:02:10 UTC 2024 \(@1707444130\)
Lustre: DEBUG MARKER: touching /mnt/lustre at Fri Feb 9 02:02:10 UTC 2024 (@1707444130)
Lustre: DEBUG MARKER: /usr/sbin/lctl mark create an empty file \/mnt\/lustre\/hosts.146389
Lustre: DEBUG MARKER: create an empty file /mnt/lustre/hosts.146389
Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.146389
Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.146389
Lustre: DEBUG MARKER: /usr/sbin/lctl mark comparing \/etc\/hosts and \/mnt\/lustre\/hosts.146389
Lustre: DEBUG MARKER: comparing /etc/hosts and /mnt/lustre/hosts.146389
Lustre: DEBUG MARKER: /usr/sbin/lctl mark renaming \/mnt\/lustre\/hosts.146389 to \/mnt\/lustre\/hosts.146389.ren
Lustre: DEBUG MARKER: renaming /mnt/lustre/hosts.146389 to /mnt/lustre/hosts.146389.ren
Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.146389 again
Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.146389 again
Lustre: DEBUG MARKER: /usr/sbin/lctl mark truncating \/mnt\/lustre\/hosts.146389
Lustre: DEBUG MARKER: truncating /mnt/lustre/hosts.146389
Lustre: DEBUG MARKER: /usr/sbin/lctl mark removing \/mnt\/lustre\/hosts.146389
Lustre: DEBUG MARKER: removing /mnt/lustre/hosts.146389
Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.146389.2
Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.146389.2
Lustre: DEBUG MARKER: /usr/sbin/lctl mark truncating \/mnt\/lustre\/hosts.146389.2 to 123 bytes
Lustre: DEBUG MARKER: truncating /mnt/lustre/hosts.146389.2 to 123 bytes
Lustre: DEBUG MARKER: /usr/sbin/lctl mark creating \/mnt\/lustre\/d1.runtests
Lustre: DEBUG MARKER: creating /mnt/lustre/d1.runtests
Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying 589 files from \/etc \/bin to \/mnt\/lustre\/d1.runtests\/etc \/bin at Fri Feb 9 02:02:17 UTC 2024
Lustre: DEBUG MARKER: copying 589 files from /etc /bin to /mnt/lustre/d1.runtests/etc /bin at Fri Feb 9 02:02:17 UTC 2024
Lustre: DEBUG MARKER: /usr/sbin/lctl mark comparing 589 newly copied files at Fri Feb 9 02:02:24 UTC 2024
Lustre: DEBUG MARKER: comparing 589 newly copied files at Fri Feb 9 02:02:24 UTC 2024
Lustre: DEBUG MARKER: /usr/sbin/lctl mark running createmany -d \/mnt\/lustre\/d1.runtests\/d 589
Lustre: DEBUG MARKER: running createmany -d /mnt/lustre/d1.runtests/d 589
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param -n debug=0
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param debug="super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck"
Lustre: DEBUG MARKER: /usr/sbin/lctl mark finished at Fri Feb 9 02:02:44 UTC 2024 \(34\)
Lustre: DEBUG MARKER: finished at Fri Feb 9 02:02:44 UTC 2024 (34)
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2
Lustre: server umount lustre-MDT0001 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: ! zpool list -H lustre-mdt2 >/dev/null 2>&1 ||
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds4
Link to test
sanity-quota test complete, duration 9937 sec
watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [umount:610505]
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 ext4 mbcache jbd2 ata_generic ata_piix crc32c_intel virtio_net serio_raw libata net_failover virtio_blk failover
CPU: 0 PID: 610505 Comm: umount Kdump: loaded Tainted: P OE --------- - - 4.18.0-477.21.1.el8_lustre.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 8e c7 ea f2 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffb6ab4cba7a98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffb6ab44acb008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffb6ab44abb000 RSI: ffffb6ab4cba7ad0 RDI: ffff9b0bf012e400
RBP: ffffffffc13b8920 R08: 0000000000000000 R09: 000000000000000e
R10: 0000000000000017 R11: 0000000000000000 R12: ffffb6ab4cba7b48
R13: 0000000000000000 R14: ffff9b0bf012e400 R15: 0000000000000000
FS: 00007f5e33d39080(0000) GS:ffff9b0c7cc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055655b0343a0 CR3: 0000000024b06006 CR4: 00000000000606f0
Call Trace:
? cleanup_resource+0x330/0x330 [ptlrpc]
? cleanup_resource+0x330/0x330 [ptlrpc]
cfs_hash_for_each_nolock+0x126/0x1f0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4f0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0xda/0x920 [mdt]
class_cleanup+0x6f5/0xc90 [obdclass]
class_process_config+0x3ad/0x2160 [obdclass]
? class_manual_cleanup+0x116/0x770 [obdclass]
? __kmalloc+0x113/0x250
? lprocfs_counter_add+0x12a/0x1a0 [obdclass]
class_manual_cleanup+0x469/0x770 [obdclass]
server_put_super+0x7a5/0x1300 [ptlrpc]
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7f5e32ca3e9b
LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation mds_statfs to node 10.240.39.107@tcp failed: rc = -107
LustreError: Skipped 1 previous similar message
Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 10.240.39.107@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 3 previous similar messages
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2
LustreError: 166-1: MGC10.240.39.107@tcp: Connection to MGS (at 10.240.39.107@tcp) was lost; in progress operations using this service will fail
Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping)
Lustre: Skipped 9 previous similar messages
Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping)
Lustre: Skipped 8 previous similar messages
Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping)
Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping)
Lustre: Skipped 2 previous similar messages
Lustre: 10309:0:(client.c:2310:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1695910174/real 1695910174] req@00000000ce5dd934 x1778270680951936/t0(0) o400->lustre-OST0000-osc-MDT0003@10.240.39.106@tcp:28/4 lens 224/224 e 0 to 1 dl 1695910190 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0
Lustre: 10309:0:(client.c:2310:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1695910177/real 1695910177] req@0000000030fbe816 x1778270680952512/t0(0) o13->lustre-OST0003-osc-MDT0003@10.240.39.106@tcp:7/4 lens 224/368 e 0 to 1 dl 1695910193 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'osp-pre-3-3.0' uid:0 gid:0
Lustre: 10309:0:(client.c:2310:ptlrpc_expire_one_request()) Skipped 8 previous similar messages
Lustre: 10309:0:(client.c:2310:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1695910178/real 1695910178] req@00000000397a5abd x1778270680953152/t0(0) o400->lustre-OST0000-osc-MDT0003@10.240.39.106@tcp:28/4 lens 224/224 e 0 to 1 dl 1695910194 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0
Lustre: 10309:0:(client.c:2310:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
Link to test
sanity-benchmark test complete, duration 7045 sec
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:84920]
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 virtio_balloon joydev pcspkr sunrpc dm_mod ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net serio_raw net_failover virtio_blk failover
CPU: 0 PID: 84920 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-425.3.1.el8_lustre.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs]
Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 14 49 8b 46 38 48 8d 74 24 30 4c 89 f7 48 8b 00 e8 a6 56 69 eb 48 85 c0 0f 84 e6 01 00 00 <48> 8b 18 48 85 db 0f 84 c0 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
RSP: 0018:ffffb6ac40f13aa0 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
RAX: ffffb6ac44752008 RBX: 0000000000000000 RCX: 000000000000000e
RDX: ffffb6ac4473b000 RSI: ffffb6ac40f13ad0 RDI: ffff9c6d6d3cc900
RBP: ffffffffc0ca0b90 R08: 0000000000000000 R09: 0000000000000000
R10: ffff9c6d70be2898 R11: 000000000000016f R12: ffffb6ac40f13b48
R13: 0000000000000000 R14: ffff9c6d6d3cc900 R15: 0000000000000000
FS: 00007f545d8fb080(0000) GS:ffff9c6dffc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff159b580a8 CR3: 00000000483d0002 CR4: 00000000000606f0
Call Trace:
? cleanup_resource+0x310/0x310 [ptlrpc]
? cleanup_resource+0x310/0x310 [ptlrpc]
cfs_hash_for_each_nolock+0x11f/0x1f0 [libcfs]
ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
__ldlm_namespace_free+0x52/0x4f0 [ptlrpc]
ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc]
mdt_device_fini+0xda/0x930 [mdt]
? lu_context_init+0xa5/0x1b0 [obdclass]
class_cleanup+0x6f5/0xca0 [obdclass]
class_process_config+0x3ad/0x1ff0 [obdclass]
? class_manual_cleanup+0x116/0x770 [obdclass]
? __kmalloc+0x113/0x250
class_manual_cleanup+0x469/0x770 [obdclass]
server_put_super+0xac7/0x1330 [obdclass]
? __dentry_kill+0x121/0x170
? evict_inodes+0x160/0x1b0
generic_shutdown_super+0x6c/0x110
kill_anon_super+0x14/0x30
deactivate_locked_super+0x34/0x70
cleanup_mnt+0x3b/0x70
task_work_run+0x8a/0xb0
exit_to_usermode_loop+0xef/0x100
do_syscall_64+0x19c/0x1b0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7f545c865e9b
Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: conf-sanity ============----- Thu Aug 3 01:29:34 UTC 2023
Lustre: DEBUG MARKER: -----============= acceptance-small: conf-sanity ============----- Thu Aug 3 01:29:34 UTC 2023
Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests: 32newtarball 110
Lustre: DEBUG MARKER: excepting tests: 32newtarball 110
Lustre: lustre-MDT0003: haven't heard from client 09f8f5e7-51ea-48ae-b4f8-f28e030221a6 (at 10.240.25.39@tcp) in 48 seconds. I think it's dead, and I am evicting it. exp 00000000f3d752b6, cur 1691026226 expire 1691026196 last 1691026178
Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 10.240.25.42@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 4 previous similar messages
Lustre: 10685:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1691026237/real 1691026237] req@00000000e3e59d80 x1773146447045504/t0(0) o400->MGC10.240.25.42@tcp@10.240.25.42@tcp:26/25 lens 224/224 e 0 to 1 dl 1691026244 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker/u4:0.0'
LustreError: 166-1: MGC10.240.25.42@tcp: Connection to MGS (at 10.240.25.42@tcp) was lost; in progress operations using this service will fail
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2
Lustre: lustre-MDT0001: Not available for connect from 10.240.25.42@tcp (stopping)
Lustre: lustre-MDT0001: Not available for connect from 10.240.25.42@tcp (stopping)
Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 2 previous similar messages
Lustre: lustre-MDT0001: Not available for connect from 10.240.25.42@tcp (stopping)
LustreError: 11-0: lustre-MDT0001-osp-MDT0003: operation mds_statfs to node 0@lo failed: rc = -107
LustreError: Skipped 1 previous similar message
LustreError: 84477:0:(lprocfs_jobstats.c:137:job_stat_exit()) should not have any items
Lustre: lustre-MDT0001: Not available for connect from 10.240.25.41@tcp (stopping)
Lustre: Skipped 9 previous similar messages
Lustre: server umount lustre-MDT0001 complete
LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.25.41@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 7 previous similar messages
LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.25.42@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 1 previous similar message
LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.25.41@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.25.42@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 8 previous similar messages
LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.25.42@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 9 previous similar messages
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.25.41@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 19 previous similar messages
Lustre: lustre-MDT0002-osp-MDT0003: Connection to lustre-MDT0002 (at 10.240.25.42@tcp) was lost; in progress operations using this service will wait for recovery to complete
LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.25.41@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 28 previous similar messages
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds4
Lustre: lustre-MDT0003: Not available for connect from 10.240.25.41@tcp (stopping)
Lustre: Skipped 7 previous similar messages
Lustre: lustre-MDT0003: Not available for connect from 10.240.25.41@tcp (stopping)
Lustre: Skipped 15 previous similar messages
LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.25.41@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 82 previous similar messages
Link to test
Return to new crashes list