Match messages in logs (every line would be required to be present in log output Copy from "Messages before crash" column below): | |
Match messages in full crash (every line would be required to be present in crash log output Copy from "Full Crash" column below): | |
Limit to a test: (Copy from below "Failing text"): | |
Delete these reports as invalid (real bug in review or some such) | |
Bug or comment: | |
Extra info: |
Failing Test | Full Crash | Messages before crash | Comment |
---|---|---|---|
sanity-quota test 39: Project ID interface works correctly | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:510746] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev i2c_piix4 virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata virtio_net crc32c_intel serio_raw net_failover virtio_blk failover [last unloaded: dm_flakey] CPU: 1 PID: 510746 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.58.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 4e 16 51 dc 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffb79084137ab0 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffb79083568008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffb79083531000 RSI: ffffb79084137ae8 RDI: ffff8f8dc5a5b500 RBP: ffffffffc0e863c0 R08: 0000000000000018 R09: 000000000000000e R10: ffff8f8df3cad000 R11: ffff8f8df3cac3f0 R12: ffffb79084137b60 R13: 0000000000000000 R14: ffff8f8dc5a5b500 R15: 0000000000000000 FS: 00007f0b161a0080(0000) GS:ffff8f8e7fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a45914e6e8 CR3: 00000000051c2004 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cleanup_resource+0x310/0x310 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0x557/0xf30 [mdt] class_cleanup+0x6a3/0xbf0 [obdclass] class_process_config+0x39e/0x1a40 [obdclass] ? class_manual_cleanup+0x191/0x740 [obdclass] ? __kmalloc+0x113/0x250 class_manual_cleanup+0x269/0x740 [obdclass] server_put_super+0x7f9/0x1090 [obdclass] generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x195/0x1a0 entry_SYSCALL_64_after_hwframe+0x66/0xcb RIP: 0033:0x7f0b150f88fb | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osc.*MDT*.sync_* Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osp.*.destroys_in_flight Lustre: DEBUG MARKER: lctl set_param fail_val=0 fail_loc=0 Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2 LustreError: 166-1: MGC10.240.28.46@tcp: Connection to MGS (at 10.240.28.46@tcp) was lost; in progress operations using this service will fail Lustre: lustre-MDT0001: Not available for connect from 10.240.28.46@tcp (stopping) Lustre: Skipped 10 previous similar messages | Link to test |
recovery-small test 110k: FID_QUERY failed during recovery | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:91928] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon joydev pcspkr i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic virtio_net ata_piix crc32c_intel libata net_failover serio_raw virtio_blk failover CPU: 1 PID: 91928 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.53.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 9e 27 f1 c2 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffad1548817a88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffad1541ace008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffad1541aaf000 RSI: ffffad1548817ac0 RDI: ffff8eeb21e2cc00 RBP: ffffffffc0e833c0 R08: 0000000000000018 R09: 000000000000000e R10: ffff8eeb33df8000 R11: ffff8eeb33df78e3 R12: ffffad1548817b38 R13: 0000000000000000 R14: ffff8eeb21e2cc00 R15: 0000000000000000 FS: 00007fe9d2246080(0000) GS:ffff8eebbfd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000562ea4b092e0 CR3: 0000000034a20001 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cleanup_resource+0x310/0x310 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0x557/0xf30 [mdt] class_cleanup+0x6a3/0xbf0 [obdclass] class_process_config+0x39e/0x1a40 [obdclass] ? class_manual_cleanup+0x191/0x740 [obdclass] class_manual_cleanup+0x269/0x740 [obdclass] server_put_super+0x7f9/0x12b0 [obdclass] ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x195/0x1a0 entry_SYSCALL_64_after_hwframe+0x66/0xcb RIP: 0033:0x7fe9d119e8fb | Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 Lustre: Failing over lustre-MDT0001 Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0001: Not available for connect from 10.240.28.46@tcp (stopping) Lustre: lustre-MDT0001: Not available for connect from 10.240.27.48@tcp (stopping) Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 10 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 11 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 10.240.27.47@tcp (stopping) Lustre: Skipped 22 previous similar messages Autotest: Test running for 135 minutes (lustre-b_es-reviews_review-dne-part-5_24389.29) Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 37 previous similar messages | Link to test |
conf-sanity test 50d: lazystatfs client/server conn race | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:679883] Modules linked in: dm_flakey ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lzstd(OE) llz4hc(OE) llz4(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) nfsv3 nfs_acl loop dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 sunrpc joydev pcspkr virtio_balloon ext4 ata_generic mbcache jbd2 ata_piix libata virtio_net crc32c_intel serio_raw virtio_blk net_failover failover [last unloaded: dm_flakey] CPU: 0 PID: 679883 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.53.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 9e 57 78 f8 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffb8c141663a88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffb8c143167008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffb8c143145000 RSI: ffffb8c141663ac0 RDI: ffff9bd235009b00 RBP: ffffffffc0d6b3c0 R08: 0000000000000018 R09: 000000000000000e R10: ffff9bd227b40000 R11: ffff9bd227b3f3ae R12: ffffb8c141663b38 R13: 0000000000000000 R14: ffff9bd235009b00 R15: 0000000000000000 FS: 00007f4448860080(0000) GS:ffff9bd2bfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000560159b986e8 CR3: 000000002ca8e002 CR4: 00000000001706f0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cleanup_resource+0x310/0x310 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0x557/0xf30 [mdt] ? lu_context_init+0xac/0x1a0 [obdclass] class_cleanup+0x6a3/0xbf0 [obdclass] class_process_config+0x39e/0x1a40 [obdclass] ? class_manual_cleanup+0x191/0x740 [obdclass] class_manual_cleanup+0x269/0x740 [obdclass] server_put_super+0x7f9/0x12b0 [obdclass] ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x195/0x1a0 entry_SYSCALL_64_after_hwframe+0x66/0xcb RIP: 0033:0x7f44477b88fb | Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1 Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc Lustre: MGS: Logs for fs lustre were removed by user request. All servers must be restarted in order to regenerate the logs: rc = 0 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null Lustre: MGS: Regenerating lustre-MDT0001 log by user request: rc = 0 Lustre: Skipped 1 previous similar message Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm7.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-99vm7.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds3 Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds3_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds3_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds3_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds3_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds3; mount -t lustre -o localrecov /dev/mapper/mds3_flakey /mnt/lustre-mds3 LDISKFS-fs (dm-5): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc Lustre: MGS: Regenerating lustre-MDT0002 log by user request: rc = 0 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: e2label /dev/mapper/mds3_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds3_flakey 2>/dev/null Lustre: MGS: Regenerating lustre-MDT0003 log by user request: rc = 0 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm7.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-99vm7.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid Autotest: Test running for 170 minutes (lustre-b_es-reviews_review-dne-part-3_24164.34) Lustre: MGS: Regenerating lustre-OST0000 log by user request: rc = 0 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-39vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-39vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid Lustre: DEBUG MARKER: onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid Lustre: DEBUG MARKER: onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-39vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-39vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) osc.lustre-OST0001-osc-[-0-9a-f]*.ost_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) osc.lustre-OST0001-osc-[-0-9a-f]*.ost_server_uuid Lustre: DEBUG MARKER: onyx-77vm5.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0001-osc-[-0-9a-f]*.ost_server_uuid Lustre: DEBUG MARKER: onyx-77vm9.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0001-osc-[-0-9a-f]*.ost_server_uuid LustreError: 11-0: lustre-OST0000-osc-MDT0000: operation ost_statfs to node 10.240.23.128@tcp failed: rc = -107 Lustre: lustre-OST0000-osc-MDT0002: Connection to lustre-OST0000 (at 10.240.23.128@tcp) was lost; in progress operations using this service will wait for recovery to complete LustreError: Skipped 15 previous similar messages Lustre: Skipped 30 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 | Link to test |
sanity-flr test complete, duration 2056 sec | watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [umount:1309913] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net serio_raw net_failover virtio_blk failover [last unloaded: dm_flakey] CPU: 0 PID: 1309913 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.53.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 6e d7 48 c3 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffbe43c10ffa88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffbe43c723c008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffbe43c721d000 RSI: ffffbe43c10ffac0 RDI: ffff9ba3f6352800 RBP: ffffffffc0f5a3c0 R08: 0000000000000019 R09: 000000000000000e R10: ffff9ba3f3ca7000 R11: ffff9ba3f3ca631e R12: ffffbe43c10ffb38 R13: 0000000000000000 R14: ffff9ba3f6352800 R15: 0000000000000000 FS: 00007f67b24ef080(0000) GS:ffff9ba47fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000559891819fa0 CR3: 000000002ea8e002 CR4: 00000000001706f0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cleanup_resource+0x310/0x310 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0x557/0xf30 [mdt] ? lu_context_init+0xac/0x1a0 [obdclass] class_cleanup+0x6a3/0xbf0 [obdclass] class_process_config+0x39e/0x1a40 [obdclass] ? class_manual_cleanup+0x191/0x740 [obdclass] ? __kmalloc+0x113/0x250 class_manual_cleanup+0x269/0x740 [obdclass] server_put_super+0x7f9/0x12b0 [obdclass] ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x195/0x1a0 entry_SYSCALL_64_after_hwframe+0x66/0xcb RIP: 0033:0x7f67b14478fb | Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-flr: start cleanup 00:41:12 \(1749256872\) === Lustre: DEBUG MARKER: === sanity-flr: start cleanup 00:41:12 (1749256872) === Lustre: 1094478:0:(osd_handler.c:2078:osd_trans_start()) lustre-MDT0000: credits 2832 > trans_max 2464 Lustre: 1094478:0:(osd_handler.c:1979:osd_trans_dump_creds()) create: 10/40/0, destroy: 1/4/0 Lustre: 1094478:0:(osd_handler.c:1986:osd_trans_dump_creds()) attr_set: 137/137/0, xattr_set: 205/2024/0 Lustre: 1094478:0:(osd_handler.c:1996:osd_trans_dump_creds()) write: 44/433/0, punch: 0/0/0, quota 0/0/0 Lustre: 1094478:0:(osd_handler.c:2003:osd_trans_dump_creds()) insert: 11/186/0, delete: 2/5/0 Lustre: 1094478:0:(osd_handler.c:2010:osd_trans_dump_creds()) ref_add: 1/1/0, ref_del: 2/2/0 CPU: 1 PID: 1094478 Comm: mdt00_004 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.53.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: dump_stack+0x41/0x60 osd_trans_start+0x4be/0x520 [osd_ldiskfs] top_trans_start+0x427/0x950 [ptlrpc] ? lod_trans_start+0x7a/0x330 [lod] ? mdd_buf_get+0x1e/0x90 [mdd] mdd_unlink+0x4aa/0xc90 [mdd] mdt_reint_unlink+0xbf4/0x1380 [mdt] mdt_reint_rec+0x127/0x260 [mdt] mdt_reint_internal+0x4ac/0x7a0 [mdt] mdt_reint+0x5e/0x100 [mdt] tgt_request_handle+0xc9c/0x1970 [ptlrpc] ptlrpc_server_handle_request+0x346/0xc10 [ptlrpc] ? ptlrpc_server_handle_req_in+0x7a8/0x8f0 [ptlrpc] ptlrpc_main+0xb45/0x13a0 [ptlrpc] ? ptlrpc_register_service+0xf30/0xf30 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40 Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.mdt=none Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=none Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-flr: finish cleanup 00:41:15 \(1749256875\) === Lustre: DEBUG MARKER: === sanity-flr: finish cleanup 00:41:15 (1749256875) === Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: sanity-lsnapshot ============----- Sat Jun 7 00:41:15 UTC 2025 Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-lsnapshot ============----- Sat Jun 7 00:41:15 UTC 2025 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: cat /etc/system-release Lustre: DEBUG MARKER: test -r /etc/os-release Lustre: DEBUG MARKER: cat /etc/os-release Lustre: DEBUG MARKER: cat /etc/system-release Lustre: DEBUG MARKER: test -r /etc/os-release Lustre: DEBUG MARKER: cat /etc/os-release Lustre: DEBUG MARKER: ls /usr/lib64/lustre/tests/except/sanity-lsnapshot.*ex || true Lustre: DEBUG MARKER: cat /usr/lib64/lustre/tests/except/sanity-lsnapshot.*ex 2>/dev/null ||true Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests: Lustre: DEBUG MARKER: excepting tests: Lustre: DEBUG MARKER: /usr/sbin/lctl mark SKIP: sanity-lsnapshot ZFS only test Lustre: DEBUG MARKER: SKIP: sanity-lsnapshot ZFS only test Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: mmp ============----- Sat Jun 7 00:41:23 UTC 2025 Lustre: DEBUG MARKER: -----============= acceptance-small: mmp ============----- Sat Jun 7 00:41:23 UTC 2025 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: cat /etc/system-release Lustre: DEBUG MARKER: test -r /etc/os-release Lustre: DEBUG MARKER: cat /etc/os-release Lustre: DEBUG MARKER: cat /etc/system-release Lustre: DEBUG MARKER: test -r /etc/os-release Lustre: DEBUG MARKER: cat /etc/os-release Lustre: DEBUG MARKER: ls /usr/lib64/lustre/tests/except/mmp.*ex || true Lustre: DEBUG MARKER: cat /usr/lib64/lustre/tests/except/mmp.*ex 2>/dev/null ||true Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests: Lustre: DEBUG MARKER: excepting tests: Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 Lustre: lustre-MDT0000-lwp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete LustreError: 11-0: lustre-MDT0000-osp-MDT0002: operation mds_statfs to node 0@lo failed: rc = -107 Lustre: Skipped 33 previous similar messages LustreError: Skipped 7 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) Lustre: Skipped 24 previous similar messages Lustre: server umount lustre-MDT0000 complete Lustre: Skipped 1 previous similar message Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup remove /dev/mapper/mds1_flakey LustreError: 137-5: lustre-MDT0000: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 53 previous similar messages Lustre: DEBUG MARKER: dmsetup mknodes >/dev/null 2>&1 Lustre: DEBUG MARKER: modprobe -r dm-flakey LustreError: 1092726:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) ldlm_cancel from 10.240.28.47@tcp arrived at 1749256904 with bad export cookie 16998891392739004986 Lustre: 12831:0:(client.c:2355:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1749256902/real 1749256902] req@0000000092b6de4a x1834186612638720/t0(0) o400->MGC10.240.28.46@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1749256909 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker.0' LustreError: 166-1: MGC10.240.28.46@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail LustreError: Skipped 1 previous similar message Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds3 Lustre: lustre-MDT0002: Not available for connect from 10.240.28.229@tcp (stopping) Lustre: Skipped 13 previous similar messages Lustre: lustre-MDT0002: Not available for connect from 10.240.28.229@tcp (stopping) Lustre: Skipped 44 previous similar messages | Link to test |
sanity-compr test complete, duration 2959 sec | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:50562] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev i2c_piix4 virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic ata_piix crc32c_intel libata serio_raw virtio_net virtio_blk net_failover failover CPU: 0 PID: 50562 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.53.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 6e 27 9a cd 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffaa37839b3a88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffaa378287d008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffaa3782855000 RSI: ffffaa37839b3ac0 RDI: ffff8e7ee1872600 RBP: ffffffffc0c453c0 R08: 0000000000000019 R09: 000000000000000e R10: ffff8e7ef04da000 R11: ffff8e7ef04d9fa9 R12: ffffaa37839b3b38 R13: 0000000000000000 R14: ffff8e7ee1872600 R15: 0000000000000000 FS: 00007f803addd080(0000) GS:ffff8e7f7fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000558f3db950a8 CR3: 0000000030f60006 CR4: 00000000001706f0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cleanup_resource+0x310/0x310 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0x557/0xf30 [mdt] class_cleanup+0x6a3/0xbf0 [obdclass] class_process_config+0x39e/0x1a40 [obdclass] ? class_manual_cleanup+0x191/0x740 [obdclass] ? __kmalloc+0x113/0x250 class_manual_cleanup+0x269/0x740 [obdclass] server_put_super+0x7f9/0x12b0 [obdclass] ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x195/0x1a0 entry_SYSCALL_64_after_hwframe+0x66/0xcb RIP: 0033:0x7f8039d358fb | Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-compr: start cleanup 01:33:18 \(1749087198\) === Lustre: DEBUG MARKER: === sanity-compr: start cleanup 01:33:18 (1749087198) === Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-compr: finish cleanup 01:33:20 \(1749087200\) === Lustre: DEBUG MARKER: === sanity-compr: finish cleanup 01:33:20 (1749087200) === Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 Lustre: lustre-MDT0000: Not available for connect from 10.240.29.191@tcp (stopping) Lustre: lustre-MDT0000: Not available for connect from 10.240.29.191@tcp (stopping) Lustre: Skipped 6 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.29.191@tcp (stopping) Lustre: Skipped 6 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.29.191@tcp (stopping) Lustre: Skipped 6 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.29.191@tcp (stopping) Lustre: Skipped 6 previous similar messages | Link to test |
sanity test 133f: Check reads/writes of client lustre proc files with bad area io | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:388366] Modules linked in: lzstd(OE) llz4hc(OE) llz4(OE) obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev virtio_balloon i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic ata_piix crc32c_intel libata serio_raw virtio_net virtio_blk net_failover failover [last unloaded: dm_flakey] CPU: 1 PID: 388366 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.53.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 6e 37 b2 c7 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffb25241263a88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffb25244c31008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffb25244bf5000 RSI: ffffb25241263ac0 RDI: ffff8d9a74669f00 RBP: ffffffffc0ec13c0 R08: 0000000000000019 R09: 000000000000000e R10: ffff8d9a73319000 R11: ffff8d9a73318872 R12: ffffb25241263b38 R13: 0000000000000000 R14: ffff8d9a74669f00 R15: 0000000000000000 FS: 00007ff46f4a4080(0000) GS:ffff8d9affd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000561786e04048 CR3: 0000000005664001 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cleanup_resource+0x310/0x310 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0x557/0xf30 [mdt] ? lu_context_init+0xac/0x1a0 [obdclass] class_cleanup+0x6a3/0xbf0 [obdclass] class_process_config+0x39e/0x1a40 [obdclass] ? class_manual_cleanup+0x191/0x740 [obdclass] ? __kmalloc+0x113/0x250 class_manual_cleanup+0x269/0x740 [obdclass] server_put_super+0x7f9/0x12b0 [obdclass] ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x195/0x1a0 entry_SYSCALL_64_after_hwframe+0x66/0xcb RIP: 0033:0x7ff46e3fc8fb | Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 Lustre: lustre-MDT0000-lwp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 6 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0000: Not available for connect from 10.240.28.51@tcp (stopping) Lustre: Skipped 8 previous similar messages Lustre: server umount lustre-MDT0000 complete LustreError: 137-5: lustre-MDT0000: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.51@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 9 previous similar messages Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup remove /dev/mapper/mds1_flakey LustreError: 137-5: lustre-MDT0000: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 3 previous similar messages Lustre: DEBUG MARKER: dmsetup mknodes >/dev/null 2>&1 Lustre: DEBUG MARKER: modprobe -r dm-flakey LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.51@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 9 previous similar messages LustreError: 13426:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) ldlm_cancel from 10.240.28.51@tcp arrived at 1748345981 with bad export cookie 10960556717268018208 LustreError: 13426:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) Skipped 4 previous similar messages LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.51@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 13 previous similar messages LustreError: 137-5: lustre-MDT0000: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 16 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds3 LustreError: 13426:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) ldlm_cancel from 0@lo arrived at 1748345996 with bad export cookie 10960556717268017746 LustreError: 166-1: MGC10.240.28.50@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Lustre: lustre-MDT0002: Not available for connect from 10.240.28.51@tcp (stopping) Lustre: Skipped 3 previous similar messages Lustre: lustre-MDT0002: Not available for connect from 10.240.28.51@tcp (stopping) Lustre: Skipped 9 previous similar messages Lustre: lustre-MDT0002: Not available for connect from 10.240.28.51@tcp (stopping) Lustre: Skipped 8 previous similar messages LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.51@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 40 previous similar messages Lustre: lustre-MDT0002: Not available for connect from 10.240.27.207@tcp (stopping) Lustre: Skipped 17 previous similar messages | Link to test |
sanity-lipe-scan3 test complete, duration 1164 sec | watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [umount:1228784] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lzstd(OE) llz4hc(OE) llz4(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 pcspkr joydev virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic virtio_net crc32c_intel ata_piix libata serio_raw net_failover failover virtio_blk [last unloaded: dm_flakey] CPU: 0 PID: 1228784 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.46.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 be b5 1a fb 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffab010117fa88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffab01082f4008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffab01082df000 RSI: ffffab010117fac0 RDI: ffff96a2f512a900 RBP: ffffffffc0b1b3c0 R08: 0000000000000019 R09: 000000000000000e R10: ffff96a2f123d000 R11: ffff96a2f123c6ac R12: ffffab010117fb38 R13: 0000000000000000 R14: ffff96a2f512a900 R15: 0000000000000000 FS: 00007fc875f95080(0000) GS:ffff96a37fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055855f6e86e8 CR3: 00000000411ac001 CR4: 00000000001706f0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cleanup_resource+0x310/0x310 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0x557/0xf30 [mdt] ? lu_context_init+0xac/0x1a0 [obdclass] class_cleanup+0x6a3/0xbf0 [obdclass] class_process_config+0x39e/0x1a40 [obdclass] ? class_manual_cleanup+0x191/0x740 [obdclass] ? __kmalloc+0x113/0x250 class_manual_cleanup+0x269/0x740 [obdclass] server_put_super+0x7f9/0x12b0 [obdclass] ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x195/0x1a0 entry_SYSCALL_64_after_hwframe+0x66/0xcb RIP: 0033:0x7fc874eed8fb | Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-scan3: start cleanup 23:55:57 \(1745452557\) === Lustre: DEBUG MARKER: === sanity-lipe-scan3: start cleanup 23:55:57 (1745452557) === Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-scan3: finish cleanup 23:55:58 \(1745452558\) === Lustre: DEBUG MARKER: === sanity-lipe-scan3: finish cleanup 23:55:58 (1745452558) === Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: sanity-lipe-find3 ============----- Wed Apr 23 11:55:58 PM UTC 2025 Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-lipe-find3 ============----- Wed Apr 23 11:55:58 PM UTC 2025 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: cat /etc/system-release Lustre: DEBUG MARKER: test -r /etc/os-release Lustre: DEBUG MARKER: cat /etc/os-release Lustre: DEBUG MARKER: cat /etc/system-release Lustre: DEBUG MARKER: test -r /etc/os-release Lustre: DEBUG MARKER: cat /etc/os-release Lustre: DEBUG MARKER: which lipe_find3 Lustre: DEBUG MARKER: ls /usr/lib64/lustre/tests/except/sanity-lipe-find3.*ex || true Lustre: DEBUG MARKER: cat /usr/lib64/lustre/tests/except/sanity-lipe-find3.*ex 2>/dev/null ||true Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests: 361 Lustre: DEBUG MARKER: excepting tests: 361 Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-find3: start setup 23:56:05 \(1745452565\) === Lustre: DEBUG MARKER: === sanity-lipe-find3: start setup 23:56:05 (1745452565) === Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-154vm19.onyx.whamcloud.com: executing check_config_client \/mnt\/lustre Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-154vm20.onyx.whamcloud.com: executing check_config_client \/mnt\/lustre Lustre: DEBUG MARKER: onyx-154vm19.onyx.whamcloud.com: executing check_config_client /mnt/lustre Lustre: DEBUG MARKER: onyx-154vm20.onyx.whamcloud.com: executing check_config_client /mnt/lustre Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds1' ' /proc/mounts); Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds3' ' /proc/mounts); Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds1' ' /proc/mounts); Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null Lustre: DEBUG MARKER: cat /proc/mounts Lustre: DEBUG MARKER: e2label /dev/mapper/mds3_flakey 2>/dev/null Lustre: DEBUG MARKER: cat /proc/mounts Lustre: DEBUG MARKER: lctl get_param -n timeout Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20 Lustre: DEBUG MARKER: Using TIMEOUT=20 Lustre: DEBUG MARKER: [ -f /sys/module/mgc/parameters/mgc_requeue_timeout_min ] && echo 1 > /sys/module/mgc/parameters/mgc_requeue_timeout_min; exit 0 Lustre: DEBUG MARKER: lctl dl | grep ' IN osc ' 2>/dev/null | wc -l Lustre: DEBUG MARKER: /usr/sbin/lctl set_param lod.*.mdt_hash=crush Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/us Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-154vm20.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: onyx-154vm20.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-58vm2.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: onyx-58vm2.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm7.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: onyx-99vm7.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl set_param osd-ldiskfs.track_declares_assert=1 || true Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n lod.lustre-MDT0000-mdtlov.enable_compr_rotational Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n lod.lustre-MDT0002-mdtlov.enable_compr_rotational Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-find3: finish setup 23:56:31 \(1745452591\) === Lustre: DEBUG MARKER: === sanity-lipe-find3: finish setup 23:56:31 (1745452591) === Lustre: DEBUG MARKER: /usr/sbin/lctl set_param printk=+lfsck Lustre: DEBUG MARKER: /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -r -A -t all Lustre: lustre-MDT0000-osd: layout LFSCK reset: rc = 0 Lustre: lustre-MDT0000-osd: namespace LFSCK reset: rc = 0 Lustre: lustre-MDT0000: OI scrub prep, flags = 0x4 Lustre: lustre-MDT0000: reset OI scrub file, old flags = 0x0, add flags = 0x0 Lustre: lustre-MDT0000: store scrub file: rc = 0 Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread start Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 0, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 5, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 1, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 3, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 4, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 6, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 7, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 2, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master prep done, start pos [1] Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread start Lustre: lustre-MDT0000-osd: namespace LFSCK prep done, start pos [1, [0x0:0x0:0x0], 0x0]: rc = 0 Lustre: lustre-MDT0000: OI scrub start, flags = 0x4, pos = 12 Lustre: lustre-MDT0000-osd: layout LFSCK master checkpoint at the pos [13], status = 1: rc = 0 Lustre: lustre-MDT0000-osd: namespace LFSCK checkpoint at the pos [13, [0x0:0x0:0x0], 0x0], status = 1: rc = 0 Lustre: LFSCK entry: oit_flags = 0x60000, dir_flags = 0x8006, oit_cookie = 13, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 1226882 Lustre: lustre-MDT0002-osd: layout LFSCK reset: rc = 0 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from MDT 3, status 1, flags 0, flags2 0 Lustre: lustre-MDT0002-osd: layout LFSCK master handles notify 3 from MDT 3, status 1, flags 0, flags2 0 Lustre: lustre-MDT0000-osd: namespace LFSCK handles notify 3 from MDT 3, status 1, flags 0 Lustre: lustre-MDT0002-osd: namespace LFSCK handles notify 3 from MDT 3, status 1, flags 0 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from MDT 1, status 1, flags 0, flags2 0 Lustre: lustre-MDT0002-osd: layout LFSCK master handles notify 3 from MDT 1, status 1, flags 0, flags2 0 Lustre: lustre-MDT0000-osd: namespace LFSCK handles notify 3 from MDT 1, status 1, flags 0 Lustre: lustre-MDT0002-osd: namespace LFSCK handles notify 3 from MDT 1, status 1, flags 0 Lustre: lustre-MDT0000: OI scrub post with result = 1 Lustre: lustre-MDT0000: store scrub file: rc = 0 Lustre: lustre-MDT0000: OI scrub: stop, pos = 792801: rc = 1 Lustre: lustre-MDT0002-osd: namespace LFSCK reset: rc = 0 Lustre: lustre-MDT0002: OI scrub prep, flags = 0x46 Lustre: lustre-MDT0002: reset OI scrub file, old flags = 0x0, add flags = 0x0 Lustre: lustre-MDT0002: store scrub file: rc = 0 Lustre: lustre-MDT0002-osd: lfsck_layout LFSCK assistant thread start Lustre: lustre-MDT0002-osd: layout LFSCK master prep done, start pos [1] Lustre: lustre-MDT0002-osd: lfsck_namespace LFSCK assistant thread start Lustre: lustre-MDT0002-osd: namespace LFSCK prep done, start pos [1, [0x0:0x0:0x0], 0x0]: rc = 0 Lustre: lustre-MDT0002: OI scrub start, flags = 0x46, pos = 12 Lustre: lustre-MDT0002-osd: layout LFSCK master checkpoint at the pos [13], status = 1: rc = 0 Lustre: lustre-MDT0002-osd: namespace LFSCK checkpoint at the pos [13, [0x0:0x0:0x0], 0x0], status = 1: rc = 0 Lustre: LFSCK entry: oit_flags = 0x60003, dir_flags = 0x8006, oit_cookie = 13, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 1226887 Lustre: LFSCK exit: oit_flags = 0x60000, dir_flags = 0x8006, oit_cookie = 308, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 1226882, rc = 1 Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_layout post, rc = 1 Lustre: lustre-MDT0000-osd: the assistant has done lfsck_layout post, rc = 1 Lustre: lustre-MDT0000-osd: layout LFSCK master post done: rc = 0 Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_namespace post, rc = 1 Lustre: lustre-MDT0000-osd: the assistant has done lfsck_namespace post, rc = 1 Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread post Lustre: lustre-MDT0000-osd: namespace LFSCK post done: rc = 0 Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_layout double_scan, status 2 Lustre: lustre-MDT0002-osd: layout LFSCK master handles notify 3 from MDT 0, status 1, flags 0, flags2 0 Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread post Lustre: lustre-MDT0002-osd: namespace LFSCK handles notify 3 from MDT 0, status 1, flags 1 Lustre: lustre-MDT0000-osd: LFSCK assistant notified others for lfsck_layout post: rc = 0 Lustre: lustre-MDT0000-osd: LFSCK assistant sync before the second-stage scaning Lustre: lustre-MDT0000-osd: LFSCK assistant notified others for lfsck_namespace post: rc = 0 Lustre: lustre-MDT0000-osd: the assistant has done lfsck_layout double_scan, status 0 Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_namespace double_scan, status 2 Lustre: lustre-MDT0000-osd: LFSCK assistant sync before the second-stage scaning Lustre: lustre-MDT0000-osd: the assistant has done lfsck_namespace double_scan, status 0 Lustre: lustre-MDT0000-osd: namespace LFSCK handles notify 4 from MDT 1, status 1, flags 0 Lustre: lustre-MDT0000-osd: namespace LFSCK handles notify 4 from MDT 3, status 1, flags 0 Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan start, synced: rc = 0 Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan start, synced: rc = 0 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout | Lustre: lustre-MDT0002: OI scrub post with result = 1 Lustre: lustre-MDT0002: store scrub file: rc = 0 Lustre: lustre-MDT0002: OI scrub: stop, pos = 792801: rc = 1 Lustre: lustre-MDT0002-osd: namespace LFSCK add flags for [0x280000405:0x1:0x0] in the trace file, flags 1, old 0, new 1: rc = 0 Lustre: LFSCK exit: oit_flags = 0x60003, dir_flags = 0x8006, oit_cookie = 280, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 1226887, rc = 1 Lustre: lustre-MDT0002-osd: waiting for assistant to do lfsck_layout post, rc = 1 Lustre: lustre-MDT0002-osd: lfsck_layout LFSCK assistant thread post Lustre: lustre-MDT0002-osd: the assistant has done lfsck_layout post, rc = 1 Lustre: lustre-MDT0002-osd: layout LFSCK master post done: rc = 0 Lustre: lustre-MDT0002-osd: waiting for assistant to do lfsck_namespace post, rc = 1 Lustre: lustre-MDT0002-osd: the assistant has done lfsck_namespace post, rc = 1 Lustre: lustre-MDT0002-osd: namespace LFSCK post done: rc = 0 Lustre: lustre-MDT0002-osd: waiting for assistant to do lfsck_layout double_scan, status 2 Lustre: lustre-MDT0002-osd: lfsck_namespace LFSCK assistant thread post Lustre: lustre-MDT0000-osd: namespace LFSCK handles notify 3 from MDT 2, status 1, flags 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from MDT 2, status 1, flags 0, flags2 0 Lustre: lustre-MDT0002-osd: LFSCK assistant notified others for lfsck_namespace post: rc = 0 Lustre: lustre-MDT0002-osd: LFSCK assistant notified others for lfsck_layout post: rc = 0 Lustre: lustre-MDT0002-osd: LFSCK assistant sync before the second-stage scaning Lustre: lustre-MDT0002-osd: the assistant has done lfsck_layout double_scan, status 0 Lustre: lustre-MDT0002-osd: waiting for assistant to do lfsck_namespace double_scan, status 2 Lustre: lustre-MDT0002-osd: LFSCK assistant sync before the second-stage scaning Lustre: lustre-MDT0002-osd: the assistant has done lfsck_namespace double_scan, status 0 Lustre: lustre-MDT0002-osd: LFSCK assistant phase2 scan start, synced: rc = 0 Lustre: lustre-MDT0000-osd: namespace LFSCK phase2 scan start Lustre: lustre-MDT0002-osd: LFSCK assistant phase2 scan start, synced: rc = 0 Lustre: lustre-MDT0000-osd: start to scan backend /lost+found Lustre: lustre-MDT0002-osd: namespace LFSCK phase2 scan start Lustre: lustre-MDT0002-osd: start to scan backend /lost+found Lustre: lustre-MDT0002-osd: layout LFSCK phase2 scan start Lustre: lustre-MDT0000-osd: layout LFSCK phase2 scan start Lustre: lustre-MDT0000-osd: layout LFSCK phase2 scan stop: rc = 1 Lustre: lustre-MDT0002-osd: layout LFSCK phase2 scan stop: rc = 1 Lustre: lustre-MDT0002-osd: layout LFSCK master handles notify 4 from MDT 0, status 1, flags 0, flags2 0 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 4 from MDT 2, status 1, flags 0, flags2 0 Lustre: lustre-MDT0000-osd: stop to scan backend /lost+found: rc = 1 Lustre: lustre-MDT0000-osd: namespace LFSCK phase2 scan stop at the No. 16 trace file: rc = 1 Lustre: lustre-MDT0002-osd: namespace LFSCK handles notify 4 from MDT 0, status 1, flags 0 Lustre: lustre-MDT0002-osd: stop to scan backend /lost+found: rc = 1 Lustre: lustre-MDT0000-osd: LFSCK assistant sync before exit Lustre: lustre-MDT0002-osd: namespace LFSCK phase2 scan stop at the No. 16 trace file: rc = 1 Lustre: lustre-MDT0002-osd: LFSCK assistant sync before exit Lustre: lustre-MDT0002-osd: LFSCK assistant synced before exit: rc = 0 Lustre: lustre-MDT0002-osd: LFSCK assistant phase2 scan finished: rc = 1 Lustre: lustre-MDT0002-osd: lfsck_namespace LFSCK assistant thread exit: rc = 1 Lustre: lustre-MDT0002-osd: LFSCK assistant sync before exit Lustre: lustre-MDT0000-osd: LFSCK assistant synced before exit: rc = 0 Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan finished: rc = 1 Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread exit: rc = 1 Lustre: lustre-MDT0000-osd: LFSCK assistant sync before exit Lustre: lustre-MDT0000-osd: LFSCK assistant synced before exit: rc = 0 Lustre: lustre-MDT0000-osd: layout LFSCK double scan: rc = 1 Lustre: lustre-MDT0000-osd: layout LFSCK double scan result 3: rc = 0 Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan finished: rc = 1 Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread exit: rc = 1 Lustre: lustre-MDT0002-osd: LFSCK assistant synced before exit: rc = 0 Lustre: lustre-MDT0002-osd: layout LFSCK double scan: rc = 1 Lustre: lustre-MDT0002-osd: layout LFSCK double scan result 3: rc = 0 Lustre: lustre-MDT0002-osd: LFSCK assistant phase2 scan finished: rc = 1 Lustre: lustre-MDT0002-osd: lfsck_layout LFSCK assistant thread exit: rc = 1 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0002.lfsck_layout | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0002.lfsck_namespace | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-*.lfsck_* Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 Lustre: lustre-MDT0000: Not available for connect from 10.240.28.50@tcp (stopping) Lustre: Skipped 16 previous similar messages LustreError: 11-0: lustre-MDT0000-osp-MDT0002: operation mds_statfs to node 0@lo failed: rc = -107 Lustre: lustre-MDT0000-osp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: lustre-MDT0000: Not available for connect from 10.240.28.50@tcp (stopping) Lustre: Skipped 13 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.24.249@tcp (stopping) Lustre: Skipped 25 previous similar messages LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.24.249@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 10 previous similar messages Lustre: server umount lustre-MDT0000 complete Autotest: Test running for 200 minutes (lustre-b_es-reviews_review-dne-exa6-part-1_23070.72) LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.24.249@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 13 previous similar messages LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.24.249@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 13 previous similar messages Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup remove /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup mknodes >/dev/null 2>&1 Lustre: DEBUG MARKER: modprobe -r dm-flakey LustreError: 1135962:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) ldlm_cancel from 10.240.28.50@tcp arrived at 1745452680 with bad export cookie 3096739595288226873 Lustre: lustre-MDT0001-osp-MDT0002: Connection to lustre-MDT0001 (at 10.240.28.50@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 1 previous similar message LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.50@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 26 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds3 LustreError: 1138269:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) ldlm_cancel from 0@lo arrived at 1745452698 with bad export cookie 3096739595288226663 LustreError: 1138269:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) Skipped 4 previous similar messages LustreError: 166-1: MGC10.240.28.49@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.24.249@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 36 previous similar messages Lustre: lustre-MDT0002: Not available for connect from 10.240.24.249@tcp (stopping) Lustre: Skipped 13 previous similar messages Lustre: lustre-MDT0002: Not available for connect from 10.240.28.50@tcp (stopping) Lustre: Skipped 17 previous similar messages | Link to test |
ost-pools test complete, duration 2813 sec | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:423378] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev virtio_balloon i2c_piix4 ext4 mbcache jbd2 ata_generic crc32c_intel virtio_net net_failover serio_raw failover ata_piix libata virtio_blk CPU: 1 PID: 423378 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.40.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 ae e5 f8 f3 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffbdee8142ba88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffbdee833c4008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffbdee83393000 RSI: ffffbdee8142bac0 RDI: ffff989af1d47500 RBP: ffffffffc0c593c0 R08: 0000000000000019 R09: 000000000000000e R10: ffff989af32f7000 R11: ffff989af32f652c R12: ffffbdee8142bb38 R13: 0000000000000000 R14: ffff989af1d47500 R15: 0000000000000000 FS: 00007f0a26b4b080(0000) GS:ffff989b7fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005651ce8aac00 CR3: 000000003265e003 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cleanup_resource+0x310/0x310 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ? lu_context_fini+0xa7/0x190 [obdclass] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0x557/0xf30 [mdt] class_cleanup+0x6a3/0xbf0 [obdclass] class_process_config+0x39e/0x1a40 [obdclass] ? class_manual_cleanup+0x191/0x740 [obdclass] ? __kmalloc+0x113/0x250 class_manual_cleanup+0x269/0x740 [obdclass] server_put_super+0x7f9/0x12b0 [obdclass] ? lustre_lwp_setup+0x880/0x880 [obdclass] generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x195/0x1a0 entry_SYSCALL_64_after_hwframe+0x66/0xcb RIP: 0033:0x7f0a25aa38fb | Lustre: DEBUG MARKER: /usr/sbin/lctl mark === ost-pools: start cleanup 18:31:18 \(1742927478\) === Lustre: DEBUG MARKER: === ost-pools: start cleanup 18:31:18 (1742927478) === Lustre: DEBUG MARKER: /usr/sbin/lctl mark === ost-pools: finish cleanup 18:32:35 \(1742927555\) === Lustre: DEBUG MARKER: === ost-pools: finish cleanup 18:32:35 (1742927555) === Lustre: Evicted from MGS (at 10.240.28.46@tcp) after server handle changed from 0xbeaf07ac59863b39 to 0xbeaf07ac5986a11b Lustre: Skipped 1 previous similar message Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: onyx-99vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-97vm7.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-66vm7.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-97vm7.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-66vm7.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation mds_statfs to node 10.240.28.46@tcp failed: rc = -107 LustreError: Skipped 1 previous similar message Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 10.240.28.46@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 11 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2 LustreError: 166-1: MGC10.240.28.46@tcp: Connection to MGS (at 10.240.28.46@tcp) was lost; in progress operations using this service will fail LustreError: Skipped 2 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 10.240.25.38@tcp (stopping) Lustre: Skipped 22 previous similar messages Autotest: Test running for 230 minutes (lustre-b_es-reviews_review-dne-part-6_22664.23) | Link to test |
conf-sanity test 152: seq allocation error in OSP | watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [umount:1463310] Modules linked in: lzstd(OE) llz4hc(OE) llz4(OE) obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev i2c_piix4 virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic ata_piix crc32c_intel libata serio_raw virtio_net virtio_blk net_failover failover [last unloaded: lzstd] CPU: 0 PID: 1463310 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.40.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 ae 45 21 e8 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffa95d8127ba88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffa95d82f17008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffa95d82f09000 RSI: ffffa95d8127bac0 RDI: ffff991a6de15100 RBP: ffffffffc0bd33c0 R08: 0000000000000018 R09: 000000000000000e R10: ffff991a6ece7000 R11: ffff991a6ece6953 R12: ffffa95d8127bb38 R13: 0000000000000000 R14: ffff991a6de15100 R15: 0000000000000000 FS: 00007f98fe4fb080(0000) GS:ffff991affc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f98fe0b73e0 CR3: 0000000031c7e003 CR4: 00000000001706f0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cleanup_resource+0x310/0x310 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0x557/0xf30 [mdt] ? lu_context_init+0xac/0x1a0 [obdclass] class_cleanup+0x6a3/0xbf0 [obdclass] class_process_config+0x39e/0x1a40 [obdclass] ? class_manual_cleanup+0x191/0x740 [obdclass] class_manual_cleanup+0x269/0x740 [obdclass] server_put_super+0x7f9/0x12b0 [obdclass] ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x195/0x1a0 entry_SYSCALL_64_after_hwframe+0x66/0xcb RIP: 0033:0x7f98fd4538fb | Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds2' ' /proc/mounts); Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds4' ' /proc/mounts); Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm12.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-146vm12.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2 Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm12.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-146vm12.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4 Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4 LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-146vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20 Lustre: DEBUG MARKER: Using TIMEOUT=20 Lustre: DEBUG MARKER: [ -f /sys/module/mgc/parameters/mgc_requeue_timeout_min ] && echo 1 > /sys/module/mgc/parameters/mgc_requeue_timeout_min; exit 0 Lustre: DEBUG MARKER: /usr/sbin/lctl set_param lod.*.mdt_hash=crush Lustre: DEBUG MARKER: /usr/sbin/lctl mark ADD OST9 Lustre: DEBUG MARKER: ADD OST9 Lustre: DEBUG MARKER: /usr/sbin/lctl mark STOP OST9 Lustre: DEBUG MARKER: STOP OST9 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark START OST9 again Lustre: DEBUG MARKER: START OST9 again Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-146vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Autotest: Test running for 415 minutes (lustre-b_es-reviews_review-dne-part-3_22528.20) Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2 Lustre: lustre-MDT0003: haven't heard from client 912c563b-fd78-411a-a91b-bd9fa284e499 (at 10.240.27.23@tcp) in 31 seconds. I think it's dead, and I am evicting it. exp 00000000c860166c, cur 1742279170 expire 1742279140 last 1742279139 Lustre: Skipped 1 previous similar message LustreError: 166-1: MGC10.240.27.31@tcp: Connection to MGS (at 10.240.27.31@tcp) was lost; in progress operations using this service will fail LustreError: Skipped 4 previous similar messages | Link to test |
sanity-sec test 27a: test fileset in various nodemaps | watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [umount:926254] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul sunrpc ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net serio_raw net_failover failover virtio_blk CPU: 1 PID: 926254 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.40.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 ae f5 12 cc 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffba5fc1097a88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffba5fc5190008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffba5fc517b000 RSI: ffffba5fc1097ac0 RDI: ffff9928b03db100 RBP: ffffffffc0cb83c0 R08: 0000000000000019 R09: 000000000000000e R10: ffff9928b6d98000 R11: ffff9928b6d97fb3 R12: ffffba5fc1097b38 R13: 0000000000000000 R14: ffff9928b03db100 R15: 0000000000000000 FS: 00007fa6c1b02080(0000) GS:ffff99293fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055601a587080 CR3: 00000000265ae002 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cleanup_resource+0x310/0x310 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0x557/0xf30 [mdt] class_cleanup+0x6a3/0xbf0 [obdclass] class_process_config+0x39e/0x1a40 [obdclass] ? class_manual_cleanup+0x191/0x740 [obdclass] ? __kmalloc+0x113/0x250 class_manual_cleanup+0x269/0x740 [obdclass] server_put_super+0x7f9/0x12b0 [obdclass] ? lustre_lwp_setup+0x880/0x880 [obdclass] generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x195/0x1a0 entry_SYSCALL_64_after_hwframe+0x66/0xcb RIP: 0033:0x7fa6c0a5a8fb | Lustre: DEBUG MARKER: /usr/sbin/lctl nodemap_activate 1 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.active Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @ Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.active Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @ Lustre: DEBUG MARKER: /usr/sbin/lctl nodemap_modify --name default --property admin --value 1 Lustre: DEBUG MARKER: /usr/sbin/lctl nodemap_modify --name default --property trusted --value 1 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.active Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @ Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.trusted_nodemap Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @ Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.default.admin_nodemap Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.default.trusted_nodemap Lustre: DEBUG MARKER: /usr/sbin/lctl nodemap_modify --name default --property admin --value 1 Lustre: DEBUG MARKER: /usr/sbin/lctl nodemap_modify --name default --property trusted --value 1 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.active Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @ Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.admin_nodemap Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @ Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.active Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @ Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.trusted_nodemap Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @ Lustre: DEBUG MARKER: /usr/sbin/lctl nodemap_modify --name default --property admin --value 1 Lustre: DEBUG MARKER: /usr/sbin/lctl nodemap_modify --name default --property trusted --value 1 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.active Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @ Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.admin_nodemap Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @ Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.active Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @ Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.trusted_nodemap Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @ Lustre: DEBUG MARKER: /usr/sbin/lctl nodemap_set_fileset --name default --fileset /thisisaverylongsubdirtotestlongfilesetsandtotestmultiplefilesetfragmentsonthenodemapiam_default Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.fileset Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n nodemap.active Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @ Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @ Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.fileset Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 Lustre: lustre-MDT0000: Not available for connect from 10.240.29.128@tcp (stopping) Lustre: Skipped 6 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.29.128@tcp (stopping) Lustre: Skipped 5 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.29.128@tcp (stopping) Lustre: Skipped 6 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.29.128@tcp (stopping) Lustre: Skipped 6 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.29.128@tcp (stopping) Lustre: Skipped 13 previous similar messages | Link to test |
sanity-lipe-scan3 test complete, duration 1231 sec | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:997067] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lzstd(OE) llz4hc(OE) llz4(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common virtio_balloon crct10dif_pclmul joydev pcspkr i2c_piix4 crc32_pclmul ghash_clmulni_intel sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata virtio_net net_failover serio_raw crc32c_intel failover virtio_blk [last unloaded: dm_flakey] CPU: 0 PID: 997067 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 3e 1f f4 cc 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffb7dfc106ba98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffb7dfc30c6008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffb7dfc3095000 RSI: ffffb7dfc106bad0 RDI: ffff9d90afc49100 RBP: ffffffffc0d8b870 R08: 0000000000000019 R09: 000000000000000e R10: ffff9d90afca5000 R11: ffff9d90afca4824 R12: ffffb7dfc106bb48 R13: 0000000000000000 R14: ffff9d90afc49100 R15: 0000000000000000 FS: 00007f9dcab39080(0000) GS:ffff9d913fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8b08b92000 CR3: 0000000024ecc003 CR4: 00000000001706f0 Call Trace: ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0xe2/0x910 [mdt] class_cleanup+0x6a3/0xc00 [obdclass] class_process_config+0x39e/0x1a40 [obdclass] class_manual_cleanup+0x456/0x740 [obdclass] server_put_super+0x7f0/0x12e0 [obdclass] ? lustre_register_lwp_item+0x690/0x690 [obdclass] generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7f9dc9aa3e9b | Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-scan3: start cleanup 11:24:57 \(1734348297\) === Lustre: DEBUG MARKER: === sanity-lipe-scan3: start cleanup 11:24:57 (1734348297) === Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-scan3: finish cleanup 11:25:00 \(1734348300\) === Lustre: DEBUG MARKER: === sanity-lipe-scan3: finish cleanup 11:25:00 (1734348300) === Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: sanity-lipe-find3 ============----- Mon Dec 16 11:25:01 AM UTC 2024 Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-lipe-find3 ============----- Mon Dec 16 11:25:01 AM UTC 2024 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: cat /etc/system-release Lustre: DEBUG MARKER: test -r /etc/os-release Lustre: DEBUG MARKER: cat /etc/os-release Lustre: DEBUG MARKER: cat /etc/system-release Lustre: DEBUG MARKER: test -r /etc/os-release Lustre: DEBUG MARKER: cat /etc/os-release Lustre: DEBUG MARKER: which lipe_find3 Lustre: DEBUG MARKER: ls /usr/lib64/lustre/tests/except/sanity-lipe-find3.*ex || true Lustre: DEBUG MARKER: cat /usr/lib64/lustre/tests/except/sanity-lipe-find3.*ex 2>/dev/null ||true Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests: 361 Lustre: DEBUG MARKER: excepting tests: 361 Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-find3: start setup 11:25:17 \(1734348317\) === Lustre: DEBUG MARKER: === sanity-lipe-find3: start setup 11:25:17 (1734348317) === Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-91vm19.onyx.whamcloud.com: executing check_config_client \/mnt\/lustre Lustre: DEBUG MARKER: onyx-91vm19.onyx.whamcloud.com: executing check_config_client /mnt/lustre Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-91vm20.onyx.whamcloud.com: executing check_config_client \/mnt\/lustre Lustre: DEBUG MARKER: onyx-91vm20.onyx.whamcloud.com: executing check_config_client /mnt/lustre Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds1' ' /proc/mounts); Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds1' ' /proc/mounts); Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null Lustre: DEBUG MARKER: cat /proc/mounts Lustre: DEBUG MARKER: lctl get_param -n timeout Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20 Lustre: DEBUG MARKER: Using TIMEOUT=20 Lustre: DEBUG MARKER: [ -f /sys/module/mgc/parameters/mgc_requeue_timeout_min ] && echo 1 > /sys/module/mgc/parameters/mgc_requeue_timeout_min; exit 0 Lustre: DEBUG MARKER: lctl dl | grep ' IN osc ' 2>/dev/null | wc -l Lustre: DEBUG MARKER: /usr/sbin/lctl set_param lod.*.mdt_hash=crush Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/us Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-116vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: onyx-116vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-91vm20.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: onyx-91vm20.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: onyx-99vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: onyx-99vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl set_param osd-ldiskfs.track_declares_assert=1 || true Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-find3: finish setup 11:26:08 \(1734348368\) === Lustre: DEBUG MARKER: === sanity-lipe-find3: finish setup 11:26:08 (1734348368) === Lustre: DEBUG MARKER: /usr/sbin/lctl set_param printk=+lfsck Lustre: DEBUG MARKER: /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -r -A -t all Lustre: lustre-MDT0000-osd: layout LFSCK reset: rc = 0 Lustre: lustre-MDT0000-osd: namespace LFSCK reset: rc = 0 Lustre: lustre-MDT0000: OI scrub prep, flags = 0x4 Lustre: lustre-MDT0000: reset OI scrub file, old flags = 0x0, add flags = 0x0 Lustre: lustre-MDT0000: store scrub file: rc = 0 Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread start Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 5, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 3, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 1, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 2, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 4, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 0, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 6, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master prep done, start pos [1] Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread start Lustre: lustre-MDT0000-osd: namespace LFSCK prep done, start pos [1, [0x0:0x0:0x0], 0x0]: rc = 0 Lustre: lustre-MDT0000: OI scrub start, flags = 0x4, pos = 12 Lustre: lustre-MDT0000-osd: layout LFSCK master checkpoint at the pos [13], status = 1: rc = 0 Lustre: lustre-MDT0000-osd: namespace LFSCK checkpoint at the pos [13, [0x0:0x0:0x0], 0x0], status = 1: rc = 0 Lustre: LFSCK entry: oit_flags = 0x60000, dir_flags = 0x8006, oit_cookie = 13, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 996358 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout | Lustre: lustre-MDT0000: OI scrub post with result = 1 Lustre: lustre-MDT0000: store scrub file: rc = 0 Lustre: lustre-MDT0000: OI scrub: stop, pos = 838865: rc = 1 Lustre: LFSCK exit: oit_flags = 0x60000, dir_flags = 0x8006, oit_cookie = 293, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 996358, rc = 1 Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_layout post, rc = 1 Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread post Lustre: lustre-MDT0000-osd: LFSCK assistant notified others for lfsck_layout post: rc = 0 Lustre: lustre-MDT0000-osd: the assistant has done lfsck_layout post, rc = 1 Lustre: lustre-MDT0000-osd: layout LFSCK master post done: rc = 0 Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_namespace post, rc = 1 Lustre: lustre-MDT0000-osd: the assistant has done lfsck_namespace post, rc = 1 Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread post Lustre: lustre-MDT0000-osd: namespace LFSCK post done: rc = 0 Lustre: lustre-MDT0000-osd: LFSCK assistant notified others for lfsck_namespace post: rc = 0 Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_layout double_scan, status 2 Lustre: lustre-MDT0000-osd: LFSCK assistant sync before the second-stage scaning Lustre: lustre-MDT0000-osd: the assistant has done lfsck_layout double_scan, status 0 Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_namespace double_scan, status 2 Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan start, synced: rc = 0 Lustre: lustre-MDT0000-osd: layout LFSCK phase2 scan start Lustre: lustre-MDT0000-osd: layout LFSCK phase2 scan stop: rc = 1 Lustre: lustre-MDT0000-osd: LFSCK assistant sync before the second-stage scaning Lustre: lustre-MDT0000-osd: the assistant has done lfsck_namespace double_scan, status 0 Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan start, synced: rc = 0 Lustre: lustre-MDT0000-osd: namespace LFSCK phase2 scan start Lustre: lustre-MDT0000-osd: LFSCK assistant sync before exit Lustre: lustre-MDT0000-osd: start to scan backend /lost+found Lustre: lustre-MDT0000-osd: LFSCK assistant synced before exit: rc = 0 Lustre: lustre-MDT0000-osd: layout LFSCK double scan: rc = 1 Lustre: lustre-MDT0000-osd: layout LFSCK double scan result 3: rc = 0 Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan finished: rc = 1 Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread exit: rc = 1 Lustre: lustre-MDT0000-osd: stop to scan backend /lost+found: rc = 1 Lustre: lustre-MDT0000-osd: namespace LFSCK phase2 scan stop at the No. 16 trace file: rc = 1 Lustre: lustre-MDT0000-osd: LFSCK assistant sync before exit Lustre: lustre-MDT0000-osd: LFSCK assistant synced before exit: rc = 0 Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan finished: rc = 1 Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread exit: rc = 1 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-*.lfsck_* Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 Autotest: Test running for 170 minutes (lustre-b_es-reviews_review-dne-exa6-part-1_21170.61) Lustre: lustre-MDT0000: Not available for connect from 10.240.29.130@tcp (stopping) Lustre: Skipped 6 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.29.130@tcp (stopping) Lustre: lustre-MDT0000: Not available for connect from 10.240.29.130@tcp (stopping) Lustre: Skipped 6 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.29.130@tcp (stopping) Lustre: Skipped 6 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.29.130@tcp (stopping) Lustre: Skipped 6 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.29.130@tcp (stopping) Lustre: Skipped 13 previous similar messages | Link to test |
sanity-pcc test 34: Cache rule with comparator (>, <) for Project ID range | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:109539] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic crc32c_intel ata_piix libata serio_raw virtio_net net_failover virtio_blk failover CPU: 0 PID: 109539 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 3e af 38 ef 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffafeb01833a98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffafeb07162008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffafeb07161000 RSI: ffffafeb01833ad0 RDI: ffff9cecb5a80200 RBP: ffffffffc0e38870 R08: 0000000000000019 R09: 000000000000000e R10: ffff9cecbbc64000 R11: ffff9cecbbc633a8 R12: ffffafeb01833b48 R13: 0000000000000000 R14: ffff9cecb5a80200 R15: 0000000000000000 FS: 00007f4228bc3080(0000) GS:ffff9ced3fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055aead4d1528 CR3: 000000004345e003 CR4: 00000000001706f0 Call Trace: ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0xe2/0x910 [mdt] ? lu_context_init+0xa8/0x1b0 [obdclass] class_cleanup+0x6a3/0xc00 [obdclass] class_process_config+0x39e/0x1a40 [obdclass] ? class_manual_cleanup+0x116/0x740 [obdclass] ? __kmalloc+0x113/0x250 class_manual_cleanup+0x456/0x740 [obdclass] server_put_super+0x7f0/0x12e0 [obdclass] ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7f4227b2de9b | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: lfs --list-commands Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: lfs --list-commands Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 | Link to test |
sanity test 115: verify dynamic thread creation | watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [umount:311807] Modules linked in: obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_umad mlx5_ib ib_uverbs mlx5_core psample intel_rapl_msr intel_rapl_common mlxfw crct10dif_pclmul tls pci_hyperv_intf crc32_pclmul ib_core ghash_clmulni_intel virtio_balloon pcspkr joydev i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic crc32c_intel ata_piix libata virtio_net net_failover failover virtio_blk serio_raw [last unloaded: llog_test] CPU: 0 PID: 311807 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-425.19.2.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17d/0x490 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 9c df ab d0 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffc08704303a98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffc08705897008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffc08705873000 RSI: ffffc08704303ad0 RDI: ffff9e57b3af2c00 RBP: ffffffffc10ad8a0 R08: 0000000000000019 R09: 0000000000000000 R10: ffff9e57c1524000 R11: ffff9e57c152317f R12: ffffc08704303b48 R13: 0000000000000000 R14: ffff9e57b3af2c00 R15: 0000000000000000 FS: 00007ffade5e2080(0000) GS:ffff9e583fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f419bf8f44 CR3: 00000000344e4002 CR4: 00000000001706f0 Call Trace: ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0xe2/0x910 [mdt] class_cleanup+0x6a3/0xc00 [obdclass] class_process_config+0x39e/0x1a40 [obdclass] ? class_manual_cleanup+0x116/0x740 [obdclass] ? __kmalloc+0x113/0x250 class_manual_cleanup+0x456/0x740 [obdclass] server_put_super+0x7f0/0x12e0 [obdclass] ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7ffadd54ce9b | Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 Lustre: lustre-MDT0000-lwp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 6 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) Lustre: Skipped 3 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.24.111@tcp (stopping) Lustre: Skipped 2 previous similar messages LustreError: 11-0: lustre-MDT0000-osp-MDT0002: operation mds_statfs to node 0@lo failed: rc = -107 Lustre: lustre-MDT0000: Not available for connect from 10.240.28.49@tcp (stopping) Lustre: Skipped 9 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.28.49@tcp (stopping) Lustre: Skipped 3 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.28.49@tcp (stopping) Lustre: Skipped 25 previous similar messages LustreError: 311403:0:(client.c:1278:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@0000000091f543f0 x1815799527234560/t0(0) o101->lustre-MDT0000-lwp-MDT0000@0@lo:23/10 lens 456/496 e 0 to 0 dl 0 ref 2 fl Rpc:QU/0/ffffffff rc 0/-1 job:'qsd_reint_0.lus.0' LustreError: 311403:0:(client.c:1278:ptlrpc_import_delay_req()) Skipped 1 previous similar message LustreError: 311403:0:(qsd_reint.c:56:qsd_reint_completion()) lustre-MDT0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x10000:0x0], rc:-5 LustreError: 311403:0:(qsd_reint.c:56:qsd_reint_completion()) Skipped 1 previous similar message LustreError: 137-5: lustre-MDT0000: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 16 previous similar messages Lustre: server umount lustre-MDT0000 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.49@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 1 previous similar message Lustre: DEBUG MARKER: modprobe dm-flakey; LustreError: 12543:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) ldlm_cancel from 10.240.28.49@tcp arrived at 1731686845 with bad export cookie 9190102256405190223 LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.49@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 14 previous similar messages Lustre: lustre-MDT0001-osp-MDT0002: Connection to lustre-MDT0001 (at 10.240.28.49@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 1 previous similar message LustreError: 11-0: lustre-MDT0001-osp-MDT0002: operation mds_statfs to node 10.240.28.49@tcp failed: rc = -107 Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds3 LustreError: 12543:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) ldlm_cancel from 0@lo arrived at 1731686853 with bad export cookie 9190102256405189740 LustreError: 12543:0:(ldlm_lockd.c:2566:ldlm_cancel_handler()) Skipped 4 previous similar messages LustreError: 166-1: MGC10.240.28.48@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.49@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 12 previous similar messages Lustre: lustre-MDT0002: Not available for connect from 10.240.24.111@tcp (stopping) Lustre: Skipped 25 previous similar messages LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.49@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 9 previous similar messages LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.28.49@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 19 previous similar messages Lustre: lustre-MDT0002: Not available for connect from 10.240.28.49@tcp (stopping) Lustre: Skipped 35 previous similar messages | Link to test |
sanity-sec test complete, duration 2943 sec | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:295653] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 ext4 mbcache jbd2 ata_generic crc32c_intel ata_piix libata virtio_net serio_raw net_failover virtio_blk failover [last unloaded: obdecho] CPU: 0 PID: 295653 Comm: umount Kdump: loaded Tainted: P OE --------- - - 4.18.0-513.24.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs] Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 14 49 8b 46 38 48 8d 74 24 30 4c 89 f7 48 8b 00 e8 56 8b 55 c1 48 85 c0 0f 84 e6 01 00 00 <48> 8b 18 48 85 db 0f 84 c0 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffaa4e823e7aa0 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffaa4e82a4e008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffaa4e82a49000 RSI: ffffaa4e823e7ad0 RDI: ffff9e89c360f300 RBP: ffffffffc12febb0 R08: 0000000000000000 R09: 0000000000000000 R10: ffff9e89c2c90000 R11: 0000000000000001 R12: ffffaa4e823e7b48 R13: 0000000000000000 R14: ffff9e89c360f300 R15: 0000000000000000 FS: 00007f179cc70080(0000) GS:ffff9e8a7fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055e220b9c010 CR3: 0000000002d08005 CR4: 00000000001706f0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cleanup_resource+0x310/0x310 [ptlrpc] ? cfs_hash_for_each_relax+0x173/0x460 [libcfs] ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x11f/0x1f0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4f0 [ptlrpc] ? key_fini+0x4e/0x160 [obdclass] ? lu_context_fini+0xa6/0x1c0 [obdclass] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0xda/0x930 [mdt] class_cleanup+0x6f5/0xc90 [obdclass] class_process_config+0x3ad/0x2080 [obdclass] ? class_manual_cleanup+0x191/0x780 [obdclass] ? __kmalloc+0x113/0x250 class_manual_cleanup+0x456/0x780 [obdclass] server_put_super+0xadc/0x1350 [obdclass] ? __dentry_kill+0x121/0x170 ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7f179bbe0e9b | LustreError: 11-0: lustre-MDT0000-osp-MDT0003: operation mds_statfs to node 10.240.25.141@tcp failed: rc = -107 LustreError: Skipped 6 previous similar messages Lustre: lustre-MDT0000-osp-MDT0003: Connection to lustre-MDT0000 (at 10.240.25.141@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 14 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2 LustreError: 166-1: MGC10.240.25.141@tcp: Connection to MGS (at 10.240.25.141@tcp) was lost; in progress operations using this service will fail Lustre: lustre-MDT0001: Not available for connect from 10.240.25.115@tcp (stopping) Lustre: Skipped 6 previous similar messages | Link to test |
recovery-small test complete, duration 5640 sec | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:105972] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon joydev pcspkr i2c_piix4 ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net net_failover serio_raw virtio_blk failover CPU: 1 PID: 105972 Comm: umount Kdump: loaded Tainted: P OE --------- - - 4.18.0-513.24.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs] Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 14 49 8b 46 38 48 8d 74 24 30 4c 89 f7 48 8b 00 e8 56 2b 0f c4 48 85 c0 0f 84 e6 01 00 00 <48> 8b 18 48 85 db 0f 84 c0 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffbafdc1d1baa0 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffbafdc7479008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffbafdc743d000 RSI: ffffbafdc1d1bad0 RDI: ffff99bc81b82800 RBP: ffffffffc0f92bb0 R08: 000000000000046b R09: ffffbafdc1d1ba18 R10: ffffbafdc1d1baa0 R11: ffff99bc831682db R12: ffffbafdc1d1bb48 R13: 0000000000000000 R14: ffff99bc81b82800 R15: 0000000000000000 FS: 00007f77704a4080(0000) GS:ffff99bcffd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055b7b83f48d0 CR3: 000000003b1a2004 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cleanup_resource+0x310/0x310 [ptlrpc] ? cfs_hash_for_each_relax+0x173/0x460 [libcfs] ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x11f/0x1f0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4f0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0xda/0x930 [mdt] class_cleanup+0x6f5/0xc90 [obdclass] class_process_config+0x3ad/0x2080 [obdclass] class_manual_cleanup+0x456/0x780 [obdclass] server_put_super+0xadc/0x1350 [obdclass] ? __dentry_kill+0x121/0x170 ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7f776f40ee9b | Autotest: Test running for 120 minutes (lustre-reviews_review-dne-zfs-part-5_108937.34) Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: sanity-scrub ============----- Wed Nov 13 02:34:02 UTC 2024 Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-scrub ============----- Wed Nov 13 02:34:02 UTC 2024 Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests: Lustre: DEBUG MARKER: excepting tests: Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2 Lustre: lustre-MDT0001: Not available for connect from 10.240.28.47@tcp (stopping) Lustre: Skipped 2 previous similar messages LustreError: 105580:0:(lprocfs_jobstats.c:137:job_stat_exit()) should not have any items LustreError: 105580:0:(lprocfs_jobstats.c:137:job_stat_exit()) Skipped 2 previous similar messages Lustre: server umount lustre-MDT0001 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: ! zpool list -H lustre-mdt2 >/dev/null 2>&1 || LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 22 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds4 Lustre: lustre-MDT0003: Not available for connect from 10.240.29.210@tcp (stopping) Lustre: Skipped 9 previous similar messages LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.29.210@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 34 previous similar messages Lustre: lustre-MDT0003: Not available for connect from 10.240.29.210@tcp (stopping) Lustre: Skipped 15 previous similar messages | Link to test |
sanity-sec test complete, duration 8000 sec | watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [umount:1031692] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common i2c_piix4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc joydev virtio_balloon pcspkr ext4 mbcache jbd2 ata_generic virtio_net ata_piix libata crc32c_intel serio_raw net_failover failover virtio_blk [last unloaded: libcfs] CPU: 1 PID: 1031692 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 3e 0f 50 cb 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffc10343fafa98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffc10341b9d008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffc10341b5d000 RSI: ffffc10343fafad0 RDI: ffff9d252d68b800 RBP: ffffffffc0ba08a0 R08: 0000000000000019 R09: 000000000000000e R10: ffff9d2521e0c000 R11: ffff9d2521e0b99d R12: ffffc10343fafb48 R13: 0000000000000000 R14: ffff9d252d68b800 R15: 0000000000000000 FS: 00007fc5adfe5080(0000) GS:ffff9d25bfd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000562895a350a8 CR3: 000000002d1c2005 CR4: 00000000001706e0 Call Trace: ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0xe2/0x910 [mdt] class_cleanup+0x6a3/0xc00 [obdclass] class_process_config+0x39e/0x1a40 [obdclass] class_manual_cleanup+0x456/0x740 [obdclass] server_put_super+0x7f0/0x12e0 [obdclass] ? lustre_register_lwp_item+0x690/0x690 [obdclass] generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7fc5acf4fe9b | Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-sec: start cleanup 01:07:51 \(1731460071\) === Lustre: DEBUG MARKER: === sanity-sec: start cleanup 01:07:51 (1731460071) === Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-sec: finish cleanup 01:07:56 \(1731460076\) === Lustre: DEBUG MARKER: === sanity-sec: finish cleanup 01:07:56 (1731460076) === Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: sanity-lipe-scan3 ============----- Wed Nov 13 01:07:58 AM UTC 2024 Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-lipe-scan3 ============----- Wed Nov 13 01:07:58 AM UTC 2024 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: cat /etc/system-release Lustre: DEBUG MARKER: test -r /etc/os-release Lustre: DEBUG MARKER: cat /etc/os-release Lustre: DEBUG MARKER: cat /etc/system-release Lustre: DEBUG MARKER: test -r /etc/os-release Lustre: DEBUG MARKER: cat /etc/os-release Lustre: DEBUG MARKER: which lipe_scan3 Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests: Lustre: DEBUG MARKER: excepting tests: Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-scan3: start setup 01:08:13 \(1731460093\) === Lustre: DEBUG MARKER: === sanity-lipe-scan3: start setup 01:08:13 (1731460093) === Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-91vm21.onyx.whamcloud.com: executing check_config_client \/mnt\/lustre Lustre: DEBUG MARKER: onyx-91vm21.onyx.whamcloud.com: executing check_config_client /mnt/lustre Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-91vm22.onyx.whamcloud.com: executing check_config_client \/mnt\/lustre Lustre: DEBUG MARKER: onyx-91vm22.onyx.whamcloud.com: executing check_config_client /mnt/lustre Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds1' ' /proc/mounts); Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds1' ' /proc/mounts); Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null Lustre: DEBUG MARKER: cat /proc/mounts Lustre: DEBUG MARKER: lctl get_param -n timeout Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20 Lustre: DEBUG MARKER: Using TIMEOUT=20 Lustre: DEBUG MARKER: [ -f /sys/module/mgc/parameters/mgc_requeue_timeout_min ] && echo 1 > /sys/module/mgc/parameters/mgc_requeue_timeout_min; exit 0 Lustre: DEBUG MARKER: lctl dl | grep ' IN osc ' 2>/dev/null | wc -l Lustre: DEBUG MARKER: /usr/sbin/lctl set_param lod.*.mdt_hash=crush Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/us Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-96vm4.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-91vm22.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: onyx-96vm4.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: onyx-91vm22.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm1.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm1.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: onyx-99vm1.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: onyx-99vm1.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl set_param osd-ldiskfs.track_declares_assert=1 || true Lustre: DEBUG MARKER: /usr/sbin/lctl mark === sanity-lipe-scan3: finish setup 01:09:08 \(1731460148\) === Lustre: DEBUG MARKER: === sanity-lipe-scan3: finish setup 01:09:08 (1731460148) === Lustre: DEBUG MARKER: /usr/sbin/lctl set_param printk=+lfsck Lustre: DEBUG MARKER: /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -r -A -t all Lustre: lustre-MDT0000-osd: layout LFSCK reset: rc = 0 Lustre: lustre-MDT0000-osd: namespace LFSCK reset: rc = 0 Lustre: lustre-MDT0000: OI scrub prep, flags = 0x4 Lustre: lustre-MDT0000: reset OI scrub file, old flags = 0x0, add flags = 0x0 Lustre: lustre-MDT0000: store scrub file: rc = 0 Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread start Lustre: lustre-MDT0000-osd: layout LFSCK master prep done, start pos [1] Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread start Lustre: lustre-MDT0000-osd: namespace LFSCK prep done, start pos [1, [0x0:0x0:0x0], 0x0]: rc = 0 Lustre: lustre-MDT0000: OI scrub start, flags = 0x4, pos = 12 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 3, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 6, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 2, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 4, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 0, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 1, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master handles notify 3 from OST 5, status 1, flags 2, flags2 1 Lustre: lustre-MDT0000-osd: layout LFSCK master checkpoint at the pos [13], status = 1: rc = 0 Lustre: lustre-MDT0000-osd: namespace LFSCK checkpoint at the pos [13, [0x0:0x0:0x0], 0x0], status = 1: rc = 0 Lustre: LFSCK entry: oit_flags = 0x60000, dir_flags = 0x8006, oit_cookie = 13, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 1031043 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout | Lustre: lustre-MDT0000: OI scrub post with result = 1 Lustre: lustre-MDT0000: store scrub file: rc = 0 Lustre: lustre-MDT0000: OI scrub: stop, pos = 838865: rc = 1 Lustre: LFSCK exit: oit_flags = 0x60000, dir_flags = 0x8006, oit_cookie = 423, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 1031043, rc = 1 Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_layout post, rc = 1 Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread post Lustre: lustre-MDT0000-osd: the assistant has done lfsck_layout post, rc = 1 Lustre: lustre-MDT0000-osd: layout LFSCK master post done: rc = 0 Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_namespace post, rc = 1 Lustre: lustre-MDT0000-osd: the assistant has done lfsck_namespace post, rc = 1 Lustre: lustre-MDT0000-osd: namespace LFSCK post done: rc = 0 Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_layout double_scan, status 2 Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread post Lustre: lustre-MDT0000-osd: LFSCK assistant notified others for lfsck_namespace post: rc = 0 Lustre: lustre-MDT0000-osd: LFSCK assistant notified others for lfsck_layout post: rc = 0 Lustre: lustre-MDT0000-osd: LFSCK assistant sync before the second-stage scaning Lustre: lustre-MDT0000-osd: the assistant has done lfsck_layout double_scan, status 0 Lustre: lustre-MDT0000-osd: waiting for assistant to do lfsck_namespace double_scan, status 2 Lustre: lustre-MDT0000-osd: LFSCK assistant sync before the second-stage scaning Lustre: lustre-MDT0000-osd: the assistant has done lfsck_namespace double_scan, status 0 Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan start, synced: rc = 0 Lustre: lustre-MDT0000-osd: namespace LFSCK phase2 scan start Lustre: lustre-MDT0000-osd: start to scan backend /lost+found Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan start, synced: rc = 0 Lustre: lustre-MDT0000-osd: layout LFSCK phase2 scan start Lustre: lustre-MDT0000-osd: layout LFSCK phase2 scan stop: rc = 1 Lustre: lustre-MDT0000-osd: stop to scan backend /lost+found: rc = 1 Lustre: lustre-MDT0000-osd: namespace LFSCK phase2 scan stop at the No. 16 trace file: rc = 1 Lustre: lustre-MDT0000-osd: LFSCK assistant sync before exit Lustre: lustre-MDT0000-osd: LFSCK assistant synced before exit: rc = 0 Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan finished: rc = 1 Lustre: lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread exit: rc = 1 Lustre: lustre-MDT0000-osd: LFSCK assistant sync before exit Lustre: lustre-MDT0000-osd: LFSCK assistant synced before exit: rc = 0 Lustre: lustre-MDT0000-osd: layout LFSCK double scan: rc = 1 Lustre: lustre-MDT0000-osd: layout LFSCK double scan result 3: rc = 0 Lustre: lustre-MDT0000-osd: LFSCK assistant phase2 scan finished: rc = 1 Lustre: lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread exit: rc = 1 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdd.lustre-*.lfsck_* Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 | Link to test |
sanity-pcc test 38: Verify LFS pcc state does not trigger prefetch for auto PCC-RO | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:130106] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net net_failover serio_raw failover virtio_blk CPU: 0 PID: 130106 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 3e bf 03 f6 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffb20bc431fa98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffb20bc2179008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffb20bc2149000 RSI: ffffb20bc431fad0 RDI: ffff9bec67cd9a00 RBP: ffffffffc0b858a0 R08: 0000000000000019 R09: 000000000000000e R10: ffff9bec6626b000 R11: ffff9bec6626afa5 R12: ffffb20bc431fb48 R13: 0000000000000000 R14: ffff9bec67cd9a00 R15: 0000000000000000 FS: 00007f24a79b2080(0000) GS:ffff9becffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff5c2e2fb0 CR3: 000000002b5b2002 CR4: 00000000001706f0 Call Trace: ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0xe2/0x910 [mdt] class_cleanup+0x6a3/0xc00 [obdclass] class_process_config+0x39e/0x1a40 [obdclass] class_manual_cleanup+0x456/0x740 [obdclass] server_put_super+0x7f0/0x12e0 [obdclass] ? lustre_register_lwp_item+0x690/0x690 [obdclass] generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7f24a691ce9b | Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: lfs --list-commands Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: lfs --list-commands Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 Lustre: lustre-MDT0000: Not available for connect from 10.240.26.104@tcp (stopping) Lustre: Skipped 13 previous similar messages | Link to test |
sanity test 133f: Check reads/writes of client lustre proc files with bad area io | watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [umount:392095] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc pcspkr joydev virtio_balloon i2c_piix4 ext4 mbcache jbd2 ata_generic ata_piix virtio_net libata crc32c_intel serio_raw virtio_blk net_failover failover [last unloaded: dm_flakey] CPU: 0 PID: 392095 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 3e 5f 68 d0 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffa3ea8104fa98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffa3ea81cc4008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffa3ea81ca9000 RSI: ffffa3ea8104fad0 RDI: ffff91369f148e00 RBP: ffffffffc0b3b8a0 R08: 0000000000000019 R09: 000000000000000e R10: ffff91367a1c5000 R11: ffff91367a1c49b7 R12: ffffa3ea8104fb48 R13: 0000000000000000 R14: ffff91369f148e00 R15: 0000000000000000 FS: 00007fea99a5c080(0000) GS:ffff9136ffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055b69d50d6a8 CR3: 0000000020a38005 CR4: 00000000001706f0 Call Trace: ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0xe2/0x910 [mdt] class_cleanup+0x6a3/0xc00 [obdclass] class_process_config+0x39e/0x1a40 [obdclass] ? class_manual_cleanup+0x116/0x740 [obdclass] ? __kmalloc+0x113/0x250 class_manual_cleanup+0x456/0x740 [obdclass] server_put_super+0x7f0/0x12e0 [obdclass] ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7fea989c6e9b | Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 Lustre: lustre-MDT0000-lwp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 2 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) Lustre: Skipped 4 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.240.24.253@tcp (stopping) Lustre: Skipped 2 previous similar messages Lustre: server umount lustre-MDT0000 complete LustreError: 137-5: lustre-MDT0000: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 34 previous similar messages Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup remove /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup mknodes >/dev/null 2>&1 Lustre: DEBUG MARKER: modprobe -r dm-flakey LustreError: 346614:0:(ldlm_lockd.c:2564:ldlm_cancel_handler()) ldlm_cancel from 10.240.28.47@tcp arrived at 1729843545 with bad export cookie 12976622451113876197 LustreError: 11-0: lustre-MDT0001-osp-MDT0002: operation mds_statfs to node 10.240.28.47@tcp failed: rc = -107 Lustre: lustre-MDT0001-osp-MDT0002: Connection to lustre-MDT0001 (at 10.240.28.47@tcp) was lost; in progress operations using this service will wait for recovery to complete LustreError: Skipped 2 previous similar messages Lustre: Skipped 1 previous similar message LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.24.253@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 49 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds3 LustreError: 346614:0:(ldlm_lockd.c:2564:ldlm_cancel_handler()) ldlm_cancel from 0@lo arrived at 1729843568 with bad export cookie 12976622451113875756 LustreError: 166-1: MGC10.240.28.46@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Lustre: lustre-MDT0002: Not available for connect from 10.240.28.47@tcp (stopping) Lustre: Skipped 2 previous similar messages Lustre: lustre-MDT0002: Not available for connect from 10.240.28.47@tcp (stopping) Lustre: Skipped 8 previous similar messages LustreError: 137-5: lustre-MDT0000: not available for connect from 10.240.24.253@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 68 previous similar messages Lustre: lustre-MDT0002: Not available for connect from 10.240.24.253@tcp (stopping) Lustre: Skipped 18 previous similar messages | Link to test |
conf-sanity test 49a: check PARAM_SYS_LDLM_TIMEOUT option of mkfs.lustre | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:573275] Modules linked in: dm_flakey ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) nfsv3 nfs_acl loop dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver intel_rapl_msr nfs lockd grace fscache intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev i2c_piix4 pcspkr virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel serio_raw virtio_net virtio_blk net_failover failover [last unloaded: dm_flakey] CPU: 1 PID: 573275 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.22.1.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 de 15 cf cd 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0000:ffffb3590b63ba88 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffb35904487008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffb35904467000 RSI: ffffb3590b63bac0 RDI: ffff945e65c0d000 RBP: ffffffffc0bd98a0 R08: 0000000000000018 R09: 000000000000000e R10: ffff945e43220000 R11: ffff945e4321fa3a R12: ffffb3590b63bb38 R13: 0000000000000000 R14: ffff945e65c0d000 R15: 0000000000000000 FS: 00007f15ba6bd080(0000) GS:ffff945effd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000562335d22048 CR3: 0000000005f1e004 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cleanup_resource+0x310/0x310 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs] ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0x480/0xf40 [mdt] ? lu_context_init+0xa8/0x1b0 [obdclass] class_cleanup+0x6a3/0xc00 [obdclass] class_process_config+0x39e/0x1a40 [obdclass] ? class_manual_cleanup+0x191/0x750 [obdclass] class_manual_cleanup+0x443/0x750 [obdclass] server_put_super+0x805/0x1300 [obdclass] ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x195/0x1a0 entry_SYSCALL_64_after_hwframe+0x66/0xcb RIP: 0033:0x7f15b96158fb | Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_hostid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-33vm4.onyx.whamcloud.com: executing set_hostid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_hostid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_hostid Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_hostid Lustre: DEBUG MARKER: onyx-33vm4.onyx.whamcloud.com: executing set_hostid Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_hostid Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_hostid Lustre: DEBUG MARKER: [ -e /dev/mapper/mds1_flakey ] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: mkfs.lustre --mgs --fsname=lustre --mdt --index=0 --param=sys.timeout=20 --param=sys.ldlm_timeout=20 --param=mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype=ldiskfs --device-size=200000 --mkfsoptions="-O ea_inode,large_dir -E lazy_itable_init" -- LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: errors=remount-ro Lustre: DEBUG MARKER: [ -e /dev/mapper/mds3_flakey ] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: mkfs.lustre --mgsnode=onyx-99vm5@tcp --fsname=lustre --mdt --index=2 --param=sys.timeout=20 --param=sys.ldlm_timeout=20 --param=mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype=ldiskfs --device-size=200000 --mkfsoptions="-O ea_inode,large_dir -E l LDISKFS-fs (dm-5): mounted filesystem with ordered data mode. Opts: errors=remount-ro Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1 Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: errors=remount-ro LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc Lustre: ctl-lustre-MDT0000: No data found on store. Initialize space: rc = -61 Lustre: lustre-MDT0000: new disk, initializing Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400]:0:mdt Lustre: Skipped 4 previous similar messages Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: sync; sleep 1; sync Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds3 Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds3_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds3_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds3_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds3_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds3; mount -t lustre -o localrecov /dev/mapper/mds3_flakey /mnt/lustre-mds3 LDISKFS-fs (dm-5): mounted filesystem with ordered data mode. Opts: errors=remount-ro LDISKFS-fs (dm-5): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc Lustre: cli-ctl-lustre-MDT0002: Allocated super-sequence [0x0000000280000400-0x00000002c0000400]:2:mdt] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-99vm5.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: e2label /dev/mapper/mds3_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: sync; sleep 1; sync Lustre: DEBUG MARKER: e2label /dev/mapper/mds3_flakey 2>/dev/null Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-99vm6.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid Lustre: lustre-OST0000-osc-MDT0000: update sequence from 0x100000000 to 0x300000403 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-33vm4.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: onyx-33vm4.onyx.whamcloud.com: executing set_default_debug -1 all 4 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid Lustre: DEBUG MARKER: onyx-34vm5.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid Lustre: DEBUG MARKER: onyx-34vm4.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n ldlm_timeout LustreError: 11-0: lustre-OST0000-osc-MDT0002: operation ost_statfs to node 10.240.23.7@tcp failed: rc = -107 LustreError: Skipped 17 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 | Link to test |
runtests test 1: All Runtests | watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [umount:693944] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 joydev pcspkr virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata virtio_net crc32c_intel net_failover failover virtio_blk serio_raw [last unloaded: dm_flakey] CPU: 1 PID: 693944 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-513.24.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs] Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 14 49 8b 46 38 48 8d 74 24 30 4c 89 f7 48 8b 00 e8 86 64 50 ec 48 85 c0 0f 84 e6 01 00 00 <48> 8b 18 48 85 db 0f 84 c0 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffff9e3e8652faa0 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffff9e3e82379008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffff9e3e8235f000 RSI: ffff9e3e8652fad0 RDI: ffff8ce74510a900 RBP: ffffffffc0e47bb0 R08: 0000000000000000 R09: 0000000000000000 R10: 000000000000005b R11: 0000000000005be5 R12: ffff9e3e8652fb48 R13: 0000000000000000 R14: ffff8ce74510a900 R15: 0000000000000000 FS: 00007fc2d105a080(0000) GS:ffff8ce7ffd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000561c46b705e8 CR3: 0000000066e4c001 CR4: 00000000001706e0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cleanup_resource+0x310/0x310 [ptlrpc] ? cfs_hash_for_each_relax+0x173/0x460 [libcfs] ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x11f/0x1f0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4f0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0xda/0x930 [mdt] class_cleanup+0x6f5/0xc90 [obdclass] class_process_config+0x3ad/0x2080 [obdclass] ? class_manual_cleanup+0x191/0x780 [obdclass] ? __kmalloc+0x113/0x250 class_manual_cleanup+0x456/0x780 [obdclass] server_put_super+0xadc/0x1350 [obdclass] ? __dentry_kill+0x121/0x170 ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7fc2cffc4e9b | Lustre: DEBUG MARKER: /usr/sbin/lctl mark touching \/mnt\/lustre at Thu Jun 27 10:59:17 UTC 2024 \(@1719485957\) Lustre: DEBUG MARKER: touching /mnt/lustre at Thu Jun 27 10:59:17 UTC 2024 (@1719485957) Lustre: DEBUG MARKER: /usr/sbin/lctl mark create an empty file \/mnt\/lustre\/hosts.1024451 Lustre: DEBUG MARKER: create an empty file /mnt/lustre/hosts.1024451 Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.1024451 Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.1024451 Lustre: DEBUG MARKER: /usr/sbin/lctl mark comparing \/etc\/hosts and \/mnt\/lustre\/hosts.1024451 Lustre: DEBUG MARKER: comparing /etc/hosts and /mnt/lustre/hosts.1024451 Lustre: DEBUG MARKER: /usr/sbin/lctl mark renaming \/mnt\/lustre\/hosts.1024451 to \/mnt\/lustre\/hosts.1024451.ren Lustre: DEBUG MARKER: renaming /mnt/lustre/hosts.1024451 to /mnt/lustre/hosts.1024451.ren Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.1024451 again Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.1024451 again Lustre: DEBUG MARKER: /usr/sbin/lctl mark truncating \/mnt\/lustre\/hosts.1024451 Lustre: DEBUG MARKER: truncating /mnt/lustre/hosts.1024451 Lustre: DEBUG MARKER: /usr/sbin/lctl mark removing \/mnt\/lustre\/hosts.1024451 Lustre: DEBUG MARKER: removing /mnt/lustre/hosts.1024451 Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.1024451.2 Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.1024451.2 Lustre: DEBUG MARKER: /usr/sbin/lctl mark truncating \/mnt\/lustre\/hosts.1024451.2 to 123 bytes Lustre: DEBUG MARKER: truncating /mnt/lustre/hosts.1024451.2 to 123 bytes Lustre: DEBUG MARKER: /usr/sbin/lctl mark creating \/mnt\/lustre\/d1.runtests Lustre: DEBUG MARKER: creating /mnt/lustre/d1.runtests Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying 595 files from \/etc \/bin to \/mnt\/lustre\/d1.runtests\/etc \/bin at Thu Jun 27 10:59:25 UTC 2024 Lustre: DEBUG MARKER: copying 595 files from /etc /bin to /mnt/lustre/d1.runtests/etc /bin at Thu Jun 27 10:59:25 UTC 2024 Autotest: Test running for 810 minutes (lustre-b2_15_full-dne-part-1_93.112) Lustre: DEBUG MARKER: /usr/sbin/lctl mark comparing 595 newly copied files at Thu Jun 27 10:59:59 UTC 2024 Lustre: DEBUG MARKER: comparing 595 newly copied files at Thu Jun 27 10:59:59 UTC 2024 Lustre: DEBUG MARKER: /usr/sbin/lctl mark running createmany -d \/mnt\/lustre\/d1.runtests\/d 595 Lustre: DEBUG MARKER: running createmany -d /mnt/lustre/d1.runtests/d 595 Lustre: DEBUG MARKER: /usr/sbin/lctl set_param -n debug=0 Lustre: DEBUG MARKER: /usr/sbin/lctl set_param debug="super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck" Lustre: DEBUG MARKER: /usr/sbin/lctl mark finished at Thu Jun 27 11:00:09 UTC 2024 \(52\) Lustre: DEBUG MARKER: finished at Thu Jun 27 11:00:09 UTC 2024 (52) Lustre: lustre-MDT0000-lwp-MDT0001: Connection to lustre-MDT0000 (at 10.240.28.50@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 27 previous similar messages Lustre: 11254:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1719486031/real 1719486031] req@00000000dac500b2 x1802961275982080/t0(0) o400->MGC10.240.28.50@tcp@10.240.28.50@tcp:26/25 lens 224/224 e 0 to 1 dl 1719486038 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker/u4:2.0' Lustre: 11254:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 1 previous similar message LustreError: 166-1: MGC10.240.28.50@tcp: Connection to MGS (at 10.240.28.50@tcp) was lost; in progress operations using this service will fail LustreError: Skipped 3 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2 Lustre: lustre-MDT0001: Not available for connect from 10.240.28.50@tcp (stopping) Lustre: lustre-MDT0001: Not available for connect from 10.240.23.7@tcp (stopping) Lustre: Skipped 1 previous similar message LustreError: 11-0: lustre-MDT0001-osp-MDT0003: operation mds_statfs to node 0@lo failed: rc = -107 LustreError: Skipped 27 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: lustre-MDT0001: Not available for connect from 10.240.23.7@tcp (stopping) Lustre: Skipped 1 previous similar message LustreError: 693551:0:(lprocfs_jobstats.c:137:job_stat_exit()) should not have any items LustreError: 693551:0:(lprocfs_jobstats.c:137:job_stat_exit()) Skipped 11 previous similar messages Lustre: server umount lustre-MDT0001 complete Lustre: Skipped 4 previous similar messages LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 99 previous similar messages Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: lustre-MDT0002-osp-MDT0003: Connection to lustre-MDT0002 (at 10.240.28.50@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 3 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds4 Lustre: lustre-MDT0003: Not available for connect from 10.240.23.7@tcp (stopping) Lustre: Skipped 7 previous similar messages LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.23.7@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 155 previous similar messages Lustre: lustre-MDT0003: Not available for connect from 10.240.23.7@tcp (stopping) Lustre: Skipped 15 previous similar messages Lustre: lustre-MDT0003: Not available for connect from 10.240.23.7@tcp (stopping) Lustre: Skipped 31 previous similar messages | Link to test |
runtests test 1: All Runtests | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:1250907] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc virtio_balloon i2c_piix4 pcspkr joydev ext4 mbcache jbd2 ata_generic ata_piix libata virtio_net serio_raw net_failover crc32c_intel failover virtio_blk [last unloaded: dm_flakey] CPU: 1 PID: 1250907 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs] Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 14 49 8b 46 38 48 8d 74 24 30 4c 89 f7 48 8b 00 e8 66 56 03 d9 48 85 c0 0f 84 e6 01 00 00 <48> 8b 18 48 85 db 0f 84 c0 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffb254852dbaa0 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffb25482919008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffb254828fb000 RSI: ffffb254852dbad0 RDI: ffff8cd0b2e28c00 RBP: ffffffffc0d02bb0 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000000002d4 R11: 000000000000ac1a R12: ffffb254852dbb48 R13: 0000000000000000 R14: ffff8cd0b2e28c00 R15: 0000000000000000 FS: 00007f10e1fa8080(0000) GS:ffff8cd0ffd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055b881bfa680 CR3: 0000000004c9a004 CR4: 00000000001706e0 Call Trace: ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x11f/0x1f0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4f0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0xda/0x930 [mdt] ? lu_context_init+0xa5/0x1b0 [obdclass] class_cleanup+0x6f5/0xc90 [obdclass] class_process_config+0x3ad/0x2080 [obdclass] ? class_manual_cleanup+0x116/0x770 [obdclass] ? __kmalloc+0x113/0x250 class_manual_cleanup+0x469/0x770 [obdclass] server_put_super+0xac7/0x1330 [obdclass] ? __dentry_kill+0x121/0x170 ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7f10e0f12e9b | Lustre: DEBUG MARKER: /usr/sbin/lctl mark touching \/mnt\/lustre at Fri May 31 05:06:55 UTC 2024 \(@1717132015\) Lustre: DEBUG MARKER: touching /mnt/lustre at Fri May 31 05:06:55 UTC 2024 (@1717132015) Lustre: DEBUG MARKER: /usr/sbin/lctl mark create an empty file \/mnt\/lustre\/hosts.1028631 Lustre: DEBUG MARKER: create an empty file /mnt/lustre/hosts.1028631 Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.1028631 Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.1028631 Lustre: DEBUG MARKER: /usr/sbin/lctl mark comparing \/etc\/hosts and \/mnt\/lustre\/hosts.1028631 Lustre: DEBUG MARKER: comparing /etc/hosts and /mnt/lustre/hosts.1028631 Lustre: DEBUG MARKER: /usr/sbin/lctl mark renaming \/mnt\/lustre\/hosts.1028631 to \/mnt\/lustre\/hosts.1028631.ren Lustre: DEBUG MARKER: renaming /mnt/lustre/hosts.1028631 to /mnt/lustre/hosts.1028631.ren Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.1028631 again Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.1028631 again Lustre: DEBUG MARKER: /usr/sbin/lctl mark truncating \/mnt\/lustre\/hosts.1028631 Lustre: DEBUG MARKER: truncating /mnt/lustre/hosts.1028631 Lustre: DEBUG MARKER: /usr/sbin/lctl mark removing \/mnt\/lustre\/hosts.1028631 Lustre: DEBUG MARKER: removing /mnt/lustre/hosts.1028631 Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.1028631.2 Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.1028631.2 Lustre: DEBUG MARKER: /usr/sbin/lctl mark truncating \/mnt\/lustre\/hosts.1028631.2 to 123 bytes Lustre: DEBUG MARKER: truncating /mnt/lustre/hosts.1028631.2 to 123 bytes Lustre: DEBUG MARKER: /usr/sbin/lctl mark creating \/mnt\/lustre\/d1.runtests Lustre: DEBUG MARKER: creating /mnt/lustre/d1.runtests Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying 593 files from \/etc \/bin to \/mnt\/lustre\/d1.runtests\/etc \/bin at Fri May 31 05:07:05 UTC 2024 Lustre: DEBUG MARKER: copying 593 files from /etc /bin to /mnt/lustre/d1.runtests/etc /bin at Fri May 31 05:07:05 UTC 2024 Lustre: DEBUG MARKER: /usr/sbin/lctl mark comparing 593 newly copied files at Fri May 31 05:07:25 UTC 2024 Lustre: DEBUG MARKER: comparing 593 newly copied files at Fri May 31 05:07:25 UTC 2024 Lustre: DEBUG MARKER: /usr/sbin/lctl mark running createmany -d \/mnt\/lustre\/d1.runtests\/d 593 Lustre: DEBUG MARKER: running createmany -d /mnt/lustre/d1.runtests/d 593 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n debug Lustre: DEBUG MARKER: /usr/sbin/lctl set_param -n debug=0 Lustre: DEBUG MARKER: /usr/sbin/lctl set_param debug="super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck" Lustre: DEBUG MARKER: /usr/sbin/lctl mark finished at Fri May 31 05:07:34 UTC 2024 \(39\) Lustre: DEBUG MARKER: finished at Fri May 31 05:07:34 UTC 2024 (39) Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 Lustre: lustre-MDT0000-lwp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 24 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) Lustre: Skipped 2 previous similar messages LustreError: 11-0: lustre-MDT0000-osp-MDT0002: operation mds_statfs to node 0@lo failed: rc = -107 LustreError: Skipped 6 previous similar messages LustreError: 1250509:0:(lprocfs_jobstats.c:137:job_stat_exit()) should not have any items LustreError: 1250509:0:(lprocfs_jobstats.c:137:job_stat_exit()) Skipped 2 previous similar messages LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.240.28.45@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 67 previous similar messages Lustre: server umount lustre-MDT0000 complete Lustre: 13019:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1717132075/real 1717132075] req@000000000c961a0d x1800490966519232/t0(0) o400->MGC10.240.28.44@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1717132082 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker/u4:2.0' Lustre: 13019:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 6 previous similar messages LustreError: 166-1: MGC10.240.28.44@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail LustreError: Skipped 3 previous similar messages Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds3' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds3 | Link to test |
sanity-quota test complete, duration 8226 sec | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:762648] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr i2c_piix4 virtio_balloon ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel serio_raw virtio_net net_failover virtio_blk failover LustreError: 137-5: lustre-MDT0001: not available for connect from 10.240.25.220@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 59 previous similar messages CPU: 0 PID: 762648 Comm: umount Kdump: loaded Tainted: P OE --------- - - 4.18.0-513.9.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 6e a2 98 f1 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffba1a0139bab0 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffba1a0759f008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffba1a0759d000 RSI: ffffba1a0139bae8 RDI: ffff976ef2fba600 RBP: ffffffffc126cd00 R08: 0000000000000000 R09: 000000000000000e R10: ffff976f220f4600 R11: 0000000000000001 R12: ffffba1a0139bb60 R13: 0000000000000000 R14: ffff976ef2fba600 R15: 0000000000000000 FS: 00007fd39a42f080(0000) GS:ffff976f7cc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055cb7fea8048 CR3: 0000000040052003 CR4: 00000000001706f0 Call Trace: <IRQ> ? watchdog_timer_fn.cold.10+0x46/0x9e ? watchdog+0x30/0x30 ? __hrtimer_run_queues+0x101/0x280 ? hrtimer_interrupt+0x100/0x220 ? smp_apic_timer_interrupt+0x6a/0x130 ? apic_timer_interrupt+0xf/0x20 </IRQ> ? cleanup_resource+0x330/0x330 [ptlrpc] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs] ? cleanup_resource+0x330/0x330 [ptlrpc] ? cleanup_resource+0x330/0x330 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] Lustre: lustre-MDT0003: Not available for connect from 10.240.25.220@tcp (stopping) Lustre: Skipped 55 previous similar messages __ldlm_namespace_free+0x52/0x4f0 [ptlrpc] ? _cond_resched+0x15/0x30 ? mutex_lock+0xe/0x30 ? set_cdt_state+0x37/0x50 [mdt] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0x474/0xf60 [mdt] ? lu_context_init+0xa5/0x1b0 [obdclass] class_cleanup+0x70d/0xca0 [obdclass] class_process_config+0x3ad/0x21e0 [obdclass] ? class_manual_cleanup+0x191/0x780 [obdclass] ? __kmalloc+0x113/0x250 ? lprocfs_counter_add+0x12a/0x1a0 [obdclass] class_manual_cleanup+0x456/0x780 [obdclass] server_put_super+0x7b7/0x10f0 [ptlrpc] generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7fd399399e9b | LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation mds_statfs to node 10.240.28.44@tcp failed: rc = -107 LustreError: Skipped 1 previous similar message Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 10.240.28.44@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 3 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2 LustreError: 166-1: MGC10.240.28.44@tcp: Connection to MGS (at 10.240.28.44@tcp) was lost; in progress operations using this service will fail Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 9 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 10.240.25.220@tcp (stopping) Lustre: Skipped 11 previous similar messages Lustre: server umount lustre-MDT0001 complete LustreError: 137-5: lustre-MDT0001: not available for connect from 10.240.25.220@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 16 previous similar messages Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: ! zpool list -H lustre-mdt2 >/dev/null 2>&1 || LustreError: 137-5: lustre-MDT0001: not available for connect from 10.240.25.220@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 44 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds4 Lustre: lustre-MDT0003: Not available for connect from 10.240.25.220@tcp (stopping) Lustre: Skipped 7 previous similar messages | Link to test |
sanityn test complete, duration 7095 sec | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:921019] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 joydev pcspkr virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata virtio_net net_failover failover crc32c_intel serio_raw virtio_blk [last unloaded: dm_flakey] CPU: 0 PID: 921019 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 6e a2 3c cd 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffaedc8562fa98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffaedc83d07008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffaedc83d03000 RSI: ffffaedc8562fad0 RDI: ffff9729a7dbec00 RBP: ffffffffc0bcbbb0 R08: 0000000000000000 R09: 000000000000000e R10: 0000000000000000 R11: 000000000000000f R12: ffffaedc8562fb48 R13: 0000000000000000 R14: ffff9729a7dbec00 R15: 0000000000000000 FS: 00007f6d1b700080(0000) GS:ffff972a3cc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055e296f4b9d0 CR3: 0000000040afa001 CR4: 00000000001706f0 Call Trace: ? cleanup_resource+0x330/0x330 [ptlrpc] ? cleanup_resource+0x330/0x330 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4f0 [ptlrpc] ? set_cdt_state_locked.isra.11+0x15/0xd0 [mdt] ? set_cdt_state+0x37/0x50 [mdt] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0xda/0x920 [mdt] class_cleanup+0x70d/0xca0 [obdclass] class_process_config+0x3ad/0x2160 [obdclass] class_manual_cleanup+0x469/0x770 [obdclass] server_put_super+0x7a2/0x1300 [ptlrpc] ? lustre_register_lwp_item+0x6a0/0x6a0 [ptlrpc] generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7f6d1a66ae9b | Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 10.240.28.130@tcp) was lost; in progress operations using this service will wait for recovery to complete LustreError: 11-0: lustre-MDT0000-osp-MDT0003: operation mds_statfs to node 10.240.28.130@tcp failed: rc = -107 LustreError: Skipped 1 previous similar message Lustre: lustre-MDT0000-osp-MDT0003: Connection to lustre-MDT0000 (at 10.240.28.130@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2 LustreError: 166-1: MGC10.240.28.130@tcp: Connection to MGS (at 10.240.28.130@tcp) was lost; in progress operations using this service will fail Lustre: lustre-MDT0001: Not available for connect from 10.240.28.193@tcp (stopping) Lustre: Skipped 20 previous similar messages LustreError: 11-0: lustre-MDT0001-osp-MDT0003: operation mds_statfs to node 0@lo failed: rc = -107 Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 2 previous similar messages | Link to test |
replay-single test 110c: DNE: create striped dir, fail MDT2 | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:270459] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev virtio_balloon i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic ata_piix crc32c_intel virtio_net libata serio_raw virtio_blk net_failover failover CPU: 1 PID: 270459 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 6e 02 55 f1 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffbb570a937a98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffbb5703a17008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffbb57039dd000 RSI: ffffbb570a937ad0 RDI: ffff969372749000 RBP: ffffffffc0c45bb0 R08: 0000000000000000 R09: 000000000000000e R10: 0000000000000009 R11: 0000000000000000 R12: ffffbb570a937b48 R13: 0000000000000000 R14: ffff969372749000 R15: 0000000000000000 FS: 00007f18765d4080(0000) GS:ffff9693ffd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055adc3e6e858 CR3: 0000000030062003 CR4: 00000000001706e0 Call Trace: ? cleanup_resource+0x330/0x330 [ptlrpc] ? cleanup_resource+0x330/0x330 [ptlrpc] cfs_hash_for_each_nolock+0x124/0x200 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4f0 [ptlrpc] ? set_cdt_state_locked.isra.11+0x15/0xd0 [mdt] ? set_cdt_state+0x37/0x50 [mdt] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0xda/0x920 [mdt] ? lu_context_init+0xa5/0x1b0 [obdclass] class_cleanup+0x70d/0xca0 [obdclass] class_process_config+0x3ad/0x2160 [obdclass] ? class_manual_cleanup+0x116/0x770 [obdclass] ? __kmalloc+0x113/0x250 ? lprocfs_counter_add+0x12a/0x1a0 [obdclass] class_manual_cleanup+0x469/0x770 [obdclass] server_put_super+0x7a2/0x1300 [ptlrpc] ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7f187553ee9b | Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 notransno Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 3964928 flakey 252:0 0 0 1800 1 drop_writes" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds2 REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: mds2 REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 Lustre: Failing over lustre-MDT0001 Lustre: lustre-MDT0001: Not available for connect from 10.240.25.36@tcp (stopping) Lustre: 10712:0:(client.c:2338:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1708742277/real 1708742277] req@ffff96937328b400 x1791737181335680/t0(0) o13->lustre-OST0005-osc-MDT0003@10.240.25.37@tcp:7/4 lens 224/368 e 0 to 1 dl 1708742293 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'osp-pre-5-3.0' uid:0 gid:0 Lustre: 10712:0:(client.c:2338:ptlrpc_expire_one_request()) Skipped 1 previous similar message | Link to test |
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:= 899023] Modules linked in: dm_flakey obdecho(OE) ptlrpc_gss(OE) os= p(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) l= quota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksoc= klnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod r= pcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache int= el_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni= _intel virtio_balloon i2c_piix4 joydev pcspkr sunrpc ext4 mbcache jbd2 at= a_generic crc32c_intel virtio_net serio_raw ata_piix libata net_failover = failover virtio_blk [last unloaded: dm_flakey] CPU: 0 PID: 899023 Comm: umount Kdump: loaded Tainted: G = W OE --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 4= 8 8d 74 24 38 4c 89 f7 48 8b 00 e8 6e 55 c4 ea 48 85 c0 0f 84 f1 01 00 00= <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffab5082e63a98 EFLAGS: 00010282 ORIG_RAX: ffff= ffffffffff13 RAX: ffffab50836b2008 RBX: 0000000000000000 RCX: 000000000= 000000e RDX: ffffab5083691000 RSI: ffffab5082e63ad0 RDI: ffff9d7b3= e70fb00 RBP: ffffffffc0b4d890 R08: 00000000000003f1 R09: 000000000= 000000e R10: ffffab5082e63a98 R11: ffff9d7b34283261 R12: ffffab508= 2e63b48 R13: 0000000000000000 R14: ffff9d7b3e70fb00 R15: 000000000= 0000000 FS: 00007f2970f21080(0000) GS:ffff9d7bbfc00000(0000) knlG= S:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fea4b835000 CR3: 000000002c768004 CR4: 000000000= 01706f0 Call Trace: ? cleanup_resource+0x330/0x330 [ptlrpc] ? cleanup_resource+0x330/0x330 [ptlrpc] cfs_hash_for_each_nolock+0x126/0x1f0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4f0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0xda/0x920 [mdt] class_cleanup+0x705/0xca0 [obdclass] class_process_config+0x3ad/0x2160 [obdclass] class_manual_cleanup+0x469/0x770 [obdclass] server_put_super+0x7a5/0x1300 [ptlrpc] ? lustre_register_lwp_item+0x690/0x690 [ptlrpc] generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7f296fe8be9b | Link to test | ||
obdfilter-survey test 3a: Network survey | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:657010] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul sunrpc ghash_clmulni_intel virtio_balloon joydev pcspkr i2c_piix4 ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel serio_raw virtio_blk virtio_net net_failover failover [last unloaded: dm_flakey] CPU: 1 PID: 657010 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-425.19.2.el8_lustre.ddn17.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17d/0x490 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 5c 6f 95 eb 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffbbec0131fa98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffbbec05fa7008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffbbec05f79000 RSI: ffffbbec0131fad0 RDI: ffffa0185c594800 RBP: ffffffffc0c0b880 R08: 0000000000000000 R09: 0000000000000000 R10: ffffa01867811200 R11: 0000000000000001 R12: ffffbbec0131fb48 R13: 0000000000000000 R14: ffffa0185c594800 R15: 0000000000000000 FS: 00007f1a2c7d9080(0000) GS:ffffa018fcd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f9ecf8de000 CR3: 0000000004eb6003 CR4: 00000000000606e0 Call Trace: ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x12e/0x1c0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4c0 [ptlrpc] ? _raw_spin_lock+0xc/0x30 ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0xda/0x8f0 [mdt] class_cleanup+0x6a3/0xc00 [obdclass] class_process_config+0x393/0x1a40 [obdclass] class_manual_cleanup+0x453/0x740 [obdclass] server_put_super+0x7e2/0x12d0 [obdclass] ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7f1a2b743e9b | LustreError: 11-0: lustre-MDT0000-osp-MDT0003: operation mds_statfs to node 10.240.41.37@tcp failed: rc = -107 LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation mds_statfs to node 10.240.41.37@tcp failed: rc = -107 LustreError: Skipped 29 previous similar messages LustreError: Skipped 29 previous similar messages Lustre: lustre-MDT0000-osp-MDT0003: Connection to lustre-MDT0000 (at 10.240.41.37@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 47 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2 LustreError: 166-1: MGC10.240.41.37@tcp: Connection to MGS (at 10.240.41.37@tcp) was lost; in progress operations using this service will fail Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 52 previous similar messages Lustre: 11362:0:(client.c:2321:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1707965786/real 1707965786] req@00000000c06f3ad3 x1790910940088512/t0(0) o400->lustre-MDT0002-osp-MDT0003@10.240.41.37@tcp:24/4 lens 224/224 e 0 to 1 dl 1707965793 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker.0' Lustre: 11362:0:(client.c:2321:ptlrpc_expire_one_request()) Skipped 20 previous similar messages | Link to test |
runtests test 1: All Runtests | watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:107515] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 joydev pcspkr virtio_balloon ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel serio_raw virtio_net net_failover virtio_blk failover [last unloaded: obdecho] CPU: 1 PID: 107515 Comm: umount Kdump: loaded Tainted: P OE --------- - - 4.18.0-477.27.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 ae f5 91 c2 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffaa1681e0ba98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffaa1687049008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffaa168701f000 RSI: ffffaa1681e0bad0 RDI: ffff8fdd2f4d5b00 RBP: ffffffffc12dbab0 R08: 0000000000000000 R09: 000000000000000e R10: ffff8fdd4168f000 R11: 0000000000000001 R12: ffffaa1681e0bb48 R13: 0000000000000000 R14: ffff8fdd2f4d5b00 R15: 0000000000000000 FS: 00007f43dfb81080(0000) GS:ffff8fddbfd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055c21b5f3cb0 CR3: 000000004df70006 CR4: 00000000001706e0 Call Trace: ? cleanup_resource+0x330/0x330 [ptlrpc] ? cleanup_resource+0x330/0x330 [ptlrpc] cfs_hash_for_each_nolock+0x126/0x1f0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4f0 [ptlrpc] ? _cond_resched+0x15/0x30 ? mutex_lock+0xe/0x30 ? set_cdt_state+0x37/0x50 [mdt] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0xda/0x920 [mdt] ? lu_context_init+0xa5/0x1b0 [obdclass] class_cleanup+0x705/0xca0 [obdclass] class_process_config+0x3ad/0x2160 [obdclass] ? class_manual_cleanup+0x116/0x770 [obdclass] ? __kmalloc+0x113/0x250 ? lprocfs_counter_add+0x12a/0x1a0 [obdclass] class_manual_cleanup+0x469/0x770 [obdclass] server_put_super+0x7a5/0x1300 [ptlrpc] ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7f43deaebe9b | Lustre: DEBUG MARKER: /usr/sbin/lctl mark touching \/mnt\/lustre at Fri Feb 9 02:02:10 UTC 2024 \(@1707444130\) Lustre: DEBUG MARKER: touching /mnt/lustre at Fri Feb 9 02:02:10 UTC 2024 (@1707444130) Lustre: DEBUG MARKER: /usr/sbin/lctl mark create an empty file \/mnt\/lustre\/hosts.146389 Lustre: DEBUG MARKER: create an empty file /mnt/lustre/hosts.146389 Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.146389 Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.146389 Lustre: DEBUG MARKER: /usr/sbin/lctl mark comparing \/etc\/hosts and \/mnt\/lustre\/hosts.146389 Lustre: DEBUG MARKER: comparing /etc/hosts and /mnt/lustre/hosts.146389 Lustre: DEBUG MARKER: /usr/sbin/lctl mark renaming \/mnt\/lustre\/hosts.146389 to \/mnt\/lustre\/hosts.146389.ren Lustre: DEBUG MARKER: renaming /mnt/lustre/hosts.146389 to /mnt/lustre/hosts.146389.ren Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.146389 again Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.146389 again Lustre: DEBUG MARKER: /usr/sbin/lctl mark truncating \/mnt\/lustre\/hosts.146389 Lustre: DEBUG MARKER: truncating /mnt/lustre/hosts.146389 Lustre: DEBUG MARKER: /usr/sbin/lctl mark removing \/mnt\/lustre\/hosts.146389 Lustre: DEBUG MARKER: removing /mnt/lustre/hosts.146389 Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying \/etc\/hosts to \/mnt\/lustre\/hosts.146389.2 Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.146389.2 Lustre: DEBUG MARKER: /usr/sbin/lctl mark truncating \/mnt\/lustre\/hosts.146389.2 to 123 bytes Lustre: DEBUG MARKER: truncating /mnt/lustre/hosts.146389.2 to 123 bytes Lustre: DEBUG MARKER: /usr/sbin/lctl mark creating \/mnt\/lustre\/d1.runtests Lustre: DEBUG MARKER: creating /mnt/lustre/d1.runtests Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying 589 files from \/etc \/bin to \/mnt\/lustre\/d1.runtests\/etc \/bin at Fri Feb 9 02:02:17 UTC 2024 Lustre: DEBUG MARKER: copying 589 files from /etc /bin to /mnt/lustre/d1.runtests/etc /bin at Fri Feb 9 02:02:17 UTC 2024 Lustre: DEBUG MARKER: /usr/sbin/lctl mark comparing 589 newly copied files at Fri Feb 9 02:02:24 UTC 2024 Lustre: DEBUG MARKER: comparing 589 newly copied files at Fri Feb 9 02:02:24 UTC 2024 Lustre: DEBUG MARKER: /usr/sbin/lctl mark running createmany -d \/mnt\/lustre\/d1.runtests\/d 589 Lustre: DEBUG MARKER: running createmany -d /mnt/lustre/d1.runtests/d 589 Lustre: DEBUG MARKER: /usr/sbin/lctl set_param -n debug=0 Lustre: DEBUG MARKER: /usr/sbin/lctl set_param debug="super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck" Lustre: DEBUG MARKER: /usr/sbin/lctl mark finished at Fri Feb 9 02:02:44 UTC 2024 \(34\) Lustre: DEBUG MARKER: finished at Fri Feb 9 02:02:44 UTC 2024 (34) Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2 Lustre: server umount lustre-MDT0001 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: ! zpool list -H lustre-mdt2 >/dev/null 2>&1 || Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds4 | Link to test |
sanity-quota test complete, duration 9937 sec | watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [umount:610505] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 ext4 mbcache jbd2 ata_generic ata_piix crc32c_intel virtio_net serio_raw libata net_failover virtio_blk failover CPU: 0 PID: 610505 Comm: umount Kdump: loaded Tainted: P OE --------- - - 4.18.0-477.21.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 8e c7 ea f2 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffb6ab4cba7a98 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffb6ab44acb008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffb6ab44abb000 RSI: ffffb6ab4cba7ad0 RDI: ffff9b0bf012e400 RBP: ffffffffc13b8920 R08: 0000000000000000 R09: 000000000000000e R10: 0000000000000017 R11: 0000000000000000 R12: ffffb6ab4cba7b48 R13: 0000000000000000 R14: ffff9b0bf012e400 R15: 0000000000000000 FS: 00007f5e33d39080(0000) GS:ffff9b0c7cc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055655b0343a0 CR3: 0000000024b06006 CR4: 00000000000606f0 Call Trace: ? cleanup_resource+0x330/0x330 [ptlrpc] ? cleanup_resource+0x330/0x330 [ptlrpc] cfs_hash_for_each_nolock+0x126/0x1f0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4f0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0xda/0x920 [mdt] class_cleanup+0x6f5/0xc90 [obdclass] class_process_config+0x3ad/0x2160 [obdclass] ? class_manual_cleanup+0x116/0x770 [obdclass] ? __kmalloc+0x113/0x250 ? lprocfs_counter_add+0x12a/0x1a0 [obdclass] class_manual_cleanup+0x469/0x770 [obdclass] server_put_super+0x7a5/0x1300 [ptlrpc] ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7f5e32ca3e9b | LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation mds_statfs to node 10.240.39.107@tcp failed: rc = -107 LustreError: Skipped 1 previous similar message Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 10.240.39.107@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 3 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2 LustreError: 166-1: MGC10.240.39.107@tcp: Connection to MGS (at 10.240.39.107@tcp) was lost; in progress operations using this service will fail Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 9 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 8 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 2 previous similar messages Lustre: 10309:0:(client.c:2310:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1695910174/real 1695910174] req@00000000ce5dd934 x1778270680951936/t0(0) o400->lustre-OST0000-osc-MDT0003@10.240.39.106@tcp:28/4 lens 224/224 e 0 to 1 dl 1695910190 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0 Lustre: 10309:0:(client.c:2310:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1695910177/real 1695910177] req@0000000030fbe816 x1778270680952512/t0(0) o13->lustre-OST0003-osc-MDT0003@10.240.39.106@tcp:7/4 lens 224/368 e 0 to 1 dl 1695910193 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'osp-pre-3-3.0' uid:0 gid:0 Lustre: 10309:0:(client.c:2310:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Lustre: 10309:0:(client.c:2310:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1695910178/real 1695910178] req@00000000397a5abd x1778270680953152/t0(0) o400->lustre-OST0000-osc-MDT0003@10.240.39.106@tcp:28/4 lens 224/224 e 0 to 1 dl 1695910194 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0 Lustre: 10309:0:(client.c:2310:ptlrpc_expire_one_request()) Skipped 6 previous similar messages | Link to test |
sanity-benchmark test complete, duration 7045 sec | watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:84920] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4 virtio_balloon joydev pcspkr sunrpc dm_mod ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net serio_raw net_failover virtio_blk failover CPU: 0 PID: 84920 Comm: umount Kdump: loaded Tainted: G OE --------- - - 4.18.0-425.3.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs] Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 14 49 8b 46 38 48 8d 74 24 30 4c 89 f7 48 8b 00 e8 a6 56 69 eb 48 85 c0 0f 84 e6 01 00 00 <48> 8b 18 48 85 db 0f 84 c0 01 00 00 49 8b 46 28 48 89 de 4c 89 f7 RSP: 0018:ffffb6ac40f13aa0 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13 RAX: ffffb6ac44752008 RBX: 0000000000000000 RCX: 000000000000000e RDX: ffffb6ac4473b000 RSI: ffffb6ac40f13ad0 RDI: ffff9c6d6d3cc900 RBP: ffffffffc0ca0b90 R08: 0000000000000000 R09: 0000000000000000 R10: ffff9c6d70be2898 R11: 000000000000016f R12: ffffb6ac40f13b48 R13: 0000000000000000 R14: ffff9c6d6d3cc900 R15: 0000000000000000 FS: 00007f545d8fb080(0000) GS:ffff9c6dffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff159b580a8 CR3: 00000000483d0002 CR4: 00000000000606f0 Call Trace: ? cleanup_resource+0x310/0x310 [ptlrpc] ? cleanup_resource+0x310/0x310 [ptlrpc] cfs_hash_for_each_nolock+0x11f/0x1f0 [libcfs] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] __ldlm_namespace_free+0x52/0x4f0 [ptlrpc] ldlm_namespace_free_prior+0x5d/0x200 [ptlrpc] mdt_device_fini+0xda/0x930 [mdt] ? lu_context_init+0xa5/0x1b0 [obdclass] class_cleanup+0x6f5/0xca0 [obdclass] class_process_config+0x3ad/0x1ff0 [obdclass] ? class_manual_cleanup+0x116/0x770 [obdclass] ? __kmalloc+0x113/0x250 class_manual_cleanup+0x469/0x770 [obdclass] server_put_super+0xac7/0x1330 [obdclass] ? __dentry_kill+0x121/0x170 ? evict_inodes+0x160/0x1b0 generic_shutdown_super+0x6c/0x110 kill_anon_super+0x14/0x30 deactivate_locked_super+0x34/0x70 cleanup_mnt+0x3b/0x70 task_work_run+0x8a/0xb0 exit_to_usermode_loop+0xef/0x100 do_syscall_64+0x19c/0x1b0 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7f545c865e9b | Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: conf-sanity ============----- Thu Aug 3 01:29:34 UTC 2023 Lustre: DEBUG MARKER: -----============= acceptance-small: conf-sanity ============----- Thu Aug 3 01:29:34 UTC 2023 Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests: 32newtarball 110 Lustre: DEBUG MARKER: excepting tests: 32newtarball 110 Lustre: lustre-MDT0003: haven't heard from client 09f8f5e7-51ea-48ae-b4f8-f28e030221a6 (at 10.240.25.39@tcp) in 48 seconds. I think it's dead, and I am evicting it. exp 00000000f3d752b6, cur 1691026226 expire 1691026196 last 1691026178 Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 10.240.25.42@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 4 previous similar messages Lustre: 10685:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1691026237/real 1691026237] req@00000000e3e59d80 x1773146447045504/t0(0) o400->MGC10.240.25.42@tcp@10.240.25.42@tcp:26/25 lens 224/224 e 0 to 1 dl 1691026244 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker/u4:0.0' LustreError: 166-1: MGC10.240.25.42@tcp: Connection to MGS (at 10.240.25.42@tcp) was lost; in progress operations using this service will fail Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2 Lustre: lustre-MDT0001: Not available for connect from 10.240.25.42@tcp (stopping) Lustre: lustre-MDT0001: Not available for connect from 10.240.25.42@tcp (stopping) Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 2 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 10.240.25.42@tcp (stopping) LustreError: 11-0: lustre-MDT0001-osp-MDT0003: operation mds_statfs to node 0@lo failed: rc = -107 LustreError: Skipped 1 previous similar message LustreError: 84477:0:(lprocfs_jobstats.c:137:job_stat_exit()) should not have any items Lustre: lustre-MDT0001: Not available for connect from 10.240.25.41@tcp (stopping) Lustre: Skipped 9 previous similar messages Lustre: server umount lustre-MDT0001 complete LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.25.41@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 7 previous similar messages LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.25.42@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 1 previous similar message LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.25.41@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.25.42@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 8 previous similar messages LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.25.42@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 9 previous similar messages Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.25.41@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 19 previous similar messages Lustre: lustre-MDT0002-osp-MDT0003: Connection to lustre-MDT0002 (at 10.240.25.42@tcp) was lost; in progress operations using this service will wait for recovery to complete LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.25.41@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 28 previous similar messages Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds4 Lustre: lustre-MDT0003: Not available for connect from 10.240.25.41@tcp (stopping) Lustre: Skipped 7 previous similar messages Lustre: lustre-MDT0003: Not available for connect from 10.240.25.41@tcp (stopping) Lustre: Skipped 15 previous similar messages LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.240.25.41@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 82 previous similar messages | Link to test |