Editing crashreport #74689

ReasonCrashing FunctionWhere to cut BacktraceReports Count
ASSERTION( nidtbl_is_sane(tbl) ) failedmgs_nidtbl_readmgs_nidtbl_read
mgs_get_ir_logs
mgs_config_read
tgt_handle_request0
tgt_request_handle
ptlrpc_server_handle_request
ptlrpc_main
kthread
ret_from_fork
11

Added fields:

Match messages in logs
(every line would be required to be present in log output
Copy from "Messages before crash" column below):
Match messages in full crash
(every line would be required to be present in crash log output
Copy from "Full Crash" column below):
Limit to a test:
(Copy from below "Failing text"):
Delete these reports as invalid (real bug in review or some such)
Bug or comment:
Extra info:

Failures list (last 100):

Failing TestFull CrashMessages before crashComment
replay-dual test 26: dbench and tar with mds failover
LustreError: 57138:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed:
LustreError: 57138:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG
CPU: 0 PID: 57138 Comm: ll_mgs_0000 Kdump: loaded Tainted: G OE ------- --- 5.14.0-570.62.1_lustre.el9.x86_64 #1
Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
lbug_with_loc.cold+0x5/0x43 [libcfs]
mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs]
? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc]
mgs_get_ir_logs+0x4f4/0xa00 [mgs]
? __req_capsule_get+0x143/0x3b0 [ptlrpc]
mgs_config_read+0x1c6/0x1e0 [mgs]
tgt_handle_request0+0x147/0x770 [ptlrpc]
tgt_request_handle+0x3fd/0xd00 [ptlrpc]
ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc]
? srso_alias_return_thunk+0x5/0xfbef5
ptlrpc_main+0x9bf/0xea0 [ptlrpc]
? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
kthread+0xdd/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x50
</TASK>
Lustre: DEBUG MARKER: hostname -I
Lustre: 5659:0:(client.c:2472:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1774425810/real 1774425810] req@ff2fed3e83d223c0 x1860617644960128/t0(0) o103->MGC10.240.28.143@tcp@10.240.28.143@tcp:17/18 lens 328/224 e 0 to 1 dl 1774425826 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'ldlm_bl.0' uid:0 gid:0 projid:4294967295
LustreError: 54653:0:(ldlm_resource.c:1167:ldlm_resource_complain()) MGC10.240.28.143@tcp: namespace resource [0x65727473756c:0x2:0x0].0x0 (ff2fed3e45377cc0) refcount nonzero (1) after lock cleanup; forcing cleanup.
LustreError: 54653:0:(ldlm_resource.c:1167:ldlm_resource_complain()) Skipped 1 previous similar message
Lustre: 5688:0:(mgc_request.c:2047:mgc_process_log()) MGC10.240.28.143@tcp: IR log lustre-mdtir failed, not fatal: rc = -5
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm163.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: onyx-157vm163.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds1 1 times
Lustre: DEBUG MARKER: test_26 fail mds1 1 times
Lustre: 5688:0:(mgc_request.c:2047:mgc_process_log()) MGC10.240.28.143@tcp: IR log lustre-mdtir failed, not fatal: rc = -5
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm165.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: onyx-157vm165.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-157vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm162.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: onyx-157vm162.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: sync; sync; sync
Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 notransno
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 flakey 252:0 0 0 1800 1 drop_writes"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds2 REPLAY BARRIER on lustre-MDT0001
Lustre: DEBUG MARKER: mds2 REPLAY BARRIER on lustre-MDT0001
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds2 2 times
Lustre: DEBUG MARKER: test_26 fail mds2 2 times
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2
Lustre: Failing over lustre-MDT0001
Lustre: lustre-MDT0001: Not available for connect from 10.240.28.140@tcp (stopping)
Lustre: Skipped 24 previous similar messages
LDISKFS-fs (dm-3): unmounting filesystem 38fd68a5-cde2-41dd-9dcf-f098238957fe.
Lustre: server umount lustre-MDT0001 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 linear 252:0 0"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2
LDISKFS-fs (dm-3): 12 truncates cleaned up
LDISKFS-fs (dm-3): recovery complete
LDISKFS-fs (dm-3): mounted filesystem 38fd68a5-cde2-41dd-9dcf-f098238957fe r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180
Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect
Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 7 clients reconnect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm166.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm166.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: onyx-157vm166.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: onyx-157vm166.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null
Lustre: lustre-MDT0001: Recovery over after 0:05, of 7 clients 7 recovered and 0 were evicted.
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-157vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm162.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: onyx-157vm162.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds3 3 times
Lustre: DEBUG MARKER: test_26 fail mds3 3 times
LustreError: lustre-MDT0002-osp-MDT0003: operation ldlm_enqueue to node 10.240.28.143@tcp failed: rc = -19
LustreError: Skipped 26 previous similar messages
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm165.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: onyx-157vm165.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-157vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm162.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: onyx-157vm162.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: sync; sync; sync
Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0003 notransno
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 flakey 252:1 0 0 1800 1 drop_writes"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds4 REPLAY BARRIER on lustre-MDT0003
Lustre: DEBUG MARKER: mds4 REPLAY BARRIER on lustre-MDT0003
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds4 4 times
Lustre: DEBUG MARKER: test_26 fail mds4 4 times
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4
Lustre: Failing over lustre-MDT0003
Lustre: lustre-MDT0003: Not available for connect from 10.240.28.140@tcp (stopping)
Autotest: Test running for 50 minutes (lustre-reviews_review-dne-part-2_122892.52)
LDISKFS-fs (dm-4): unmounting filesystem 7cfa4b01-d375-4c46-aa03-96010de6b3f6.
Lustre: server umount lustre-MDT0003 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
LustreError: 5695:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0003: not available for connect from 10.240.28.143@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 5695:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 88 previous similar messages
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 linear 252:1 0"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4
LDISKFS-fs (dm-4): 21 truncates cleaned up
LDISKFS-fs (dm-4): recovery complete
LDISKFS-fs (dm-4): mounted filesystem 7cfa4b01-d375-4c46-aa03-96010de6b3f6 r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0003: Imperative Recovery not enabled, recovery window 60-180
Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Link to test
insanity test 7: Seventh Failure Mode: CLIENT/MDS Wed Mar 25 07:43:26 AM UTC 2026
LustreError: 39156:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed:
LustreError: 39156:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG
CPU: 1 PID: 39156 Comm: ll_mgs_0002 Kdump: loaded Tainted: G OE ------- --- 5.14.0-570.62.1_lustre.el9.x86_64 #1
Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
lbug_with_loc.cold+0x5/0x43 [libcfs]
mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs]
? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc]
mgs_get_ir_logs+0x4f4/0xa00 [mgs]
? __req_capsule_get+0x143/0x3b0 [ptlrpc]
mgs_config_read+0x1c6/0x1e0 [mgs]
tgt_handle_request0+0x147/0x770 [ptlrpc]
tgt_request_handle+0x3fd/0xd00 [ptlrpc]
ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc]
? srso_alias_return_thunk+0x5/0xfbef5
ptlrpc_main+0x9bf/0xea0 [ptlrpc]
? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
kthread+0xdd/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x50
</TASK>
Lustre: DEBUG MARKER: /usr/sbin/lctl mark Request fail clients: 2, to fail: 1, failed: 0
Lustre: DEBUG MARKER: Request fail clients: 2, to fail: 1, failed: 0
Lustre: DEBUG MARKER: /usr/sbin/lctl mark shutdown_client: onyx-155vm164 uptime: 07:43:29 up 28 min, 0 users, load average: 0.06, 0.06, 0.03
Lustre: DEBUG MARKER: shutdown_client: onyx-155vm164 uptime: 07:43:29 up 28 min, 0 users, load average: 0.06, 0.06, 0.03
Lustre: lustre-MDT0003: haven't heard from client 6a20e600-9f84-4ad3-9b3e-f31516c4e3ee (at 10.240.31.168@tcp) in 101 seconds. I think it's dead, and I am evicting it. exp ff3027c3a50a4800, cur 1774424709 deadline 1774424708 last 1774424608
Lustre: Skipped 1 previous similar message
Autotest: Test running for 30 minutes (lustre-reviews_review-dne-part-4_122892.54)
Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 10.240.31.170@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 17 previous similar messages
LustreError: lustre-MDT0000-osp-MDT0001: operation mds_statfs to node 10.240.31.170@tcp failed: rc = -107
LustreError: Skipped 10 previous similar messages
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm166.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: onyx-155vm166.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
LustreError: MGC10.240.31.170@tcp: Connection to MGS (at 10.240.31.170@tcp) was lost; in progress operations using this service will fail
Lustre: Evicted from MGS (at 10.240.31.170@tcp) after server handle changed from 0x3c867190b29de547 to 0x3c867190b29df1db
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2
Lustre: Failing over lustre-MDT0001
LDISKFS-fs (dm-3): unmounting filesystem e57ebb81-5709-48ea-9f86-2953df0fbfb9.
Lustre: server umount lustre-MDT0001 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
LustreError: 34098:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.31.170@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 34098:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 83 previous similar messages
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1
Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2
LDISKFS-fs (dm-3): mounted filesystem e57ebb81-5709-48ea-9f86-2953df0fbfb9 r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180
Lustre: Skipped 1 previous similar message
Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect
Lustre: Skipped 1 previous similar message
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm167.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm167.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: onyx-155vm167.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: onyx-155vm167.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null
Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 4 clients reconnect
Lustre: Skipped 1 previous similar message
Lustre: lustre-MDT0001: Recovery over after 0:02, of 4 clients 4 recovered and 0 were evicted.
Lustre: Skipped 1 previous similar message
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm166.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: onyx-155vm166.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4
Lustre: Failing over lustre-MDT0003
LDISKFS-fs (dm-4): unmounting filesystem d8712dfe-f7e9-48a0-babf-c9fef9d341f2.
Lustre: server umount lustre-MDT0003 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1
Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4
LDISKFS-fs (dm-4): mounted filesystem d8712dfe-f7e9-48a0-babf-c9fef9d341f2 r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Link to test
sanity-hsm test 302: HSM tunnable are persistent when CDT is off
LustreError: 183239:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed:
LustreError: 183239:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG
CPU: 0 PID: 183239 Comm: ll_mgs_0001 Kdump: loaded Tainted: G OE ------- --- 5.14.0-570.62.1_lustre.el9.x86_64 #1
Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
lbug_with_loc.cold+0x5/0x43 [libcfs]
mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs]
? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc]
mgs_get_ir_logs+0x4f4/0xa00 [mgs]
? __req_capsule_get+0x143/0x3b0 [ptlrpc]
mgs_config_read+0x1c6/0x1e0 [mgs]
tgt_handle_request0+0x147/0x770 [ptlrpc]
tgt_request_handle+0x3fd/0xd00 [ptlrpc]
ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc]
? srso_alias_return_thunk+0x5/0xfbef5
ptlrpc_main+0x9bf/0xea0 [ptlrpc]
? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
kthread+0xdd/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x50
</TASK>
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdt.lustre-MDT0001.hsm_control='shutdown'
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdt.lustre-MDT0003.hsm_control='shutdown'
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0001.hsm_control
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0003.hsm_control
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm166.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: onyx-155vm166.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
LustreError: MGC10.240.31.170@tcp: Connection to MGS (at 10.240.31.170@tcp) was lost; in progress operations using this service will fail
Lustre: Evicted from MGS (at 10.240.31.170@tcp) after server handle changed from 0xfe0d6ed644a8050f to 0xfe0d6ed644a80c0f
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2
Lustre: Failing over lustre-MDT0001
LDISKFS-fs (dm-3): unmounting filesystem 2d8baffc-0e0e-41b5-8a62-8ae0ced334cf.
Lustre: server umount lustre-MDT0001 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 7 previous similar messages
LustreError: 17306:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
Lustre: DEBUG MARKER: modprobe dm-flakey;
LustreError: 15516:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.31.168@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 15516:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 3 previous similar messages
LustreError: 18198:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.31.168@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 18198:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 14 previous similar messages
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1
Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2
LDISKFS-fs (dm-3): mounted filesystem 2d8baffc-0e0e-41b5-8a62-8ae0ced334cf r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180
Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 7 clients reconnect
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
debugfs: Directory '10.240.31.167@tcp' with parent 'exports' already present!
debugfs: Directory '10.240.31.169@tcp' with parent 'exports' already present!
debugfs: File 'stats' in directory '/' already present!
debugfs: File 'ldlm_stats' in directory '/' already present!
debugfs: File 'open_files' in directory '/' already present!
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm167.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm167.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: onyx-155vm167.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: onyx-155vm167.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null
Lustre: lustre-MDT0001-osp-MDT0003: Connection restored to 0@lo (at 0@lo)
Lustre: Skipped 9 previous similar messages
Lustre: lustre-MDT0001: Recovery over after 0:05, of 7 clients 7 recovered and 0 were evicted.
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec
LustreError: lustre-MDT0002-osp-MDT0001: operation mds_statfs to node 10.240.31.170@tcp failed: rc = -107
LustreError: Skipped 1 previous similar message
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm166.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: onyx-155vm166.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4
Lustre: Failing over lustre-MDT0003
LDISKFS-fs (dm-4): unmounting filesystem 69b6b624-61a9-4f1c-953b-dfeaf95c3731.
Lustre: server umount lustre-MDT0003 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
LustreError: 130351:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0003: not available for connect from 10.240.31.169@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 130351:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 14 previous similar messages
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1
Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4
LDISKFS-fs (dm-4): mounted filesystem 69b6b624-61a9-4f1c-953b-dfeaf95c3731 r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0003: Imperative Recovery not enabled, recovery window 60-180
Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: lustre-MDT0003: Will be in recovery for at least 1:00, or until 7 clients reconnect
Link to test
replay-dual test 26: dbench and tar with mds failover
LustreError: 53360:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed:
LustreError: 53360:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG
CPU: 1 PID: 53360 Comm: ll_mgs_0000 Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.40.1_lustre.el9.x86_64 #1
Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
lbug_with_loc.cold+0x5/0x43 [libcfs]
mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs]
? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc]
mgs_get_ir_logs+0x4f4/0xa00 [mgs]
? __req_capsule_get+0x143/0x3b0 [ptlrpc]
mgs_config_read+0x1c6/0x1e0 [mgs]
tgt_handle_request0+0x147/0x770 [ptlrpc]
tgt_request_handle+0x3fd/0xd00 [ptlrpc]
ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc]
? srso_alias_return_thunk+0x5/0xfbef5
ptlrpc_main+0x9bf/0xea0 [ptlrpc]
? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
kthread+0xdd/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x50
</TASK>
Lustre: DEBUG MARKER: hostname -I
Lustre: 5306:0:(client.c:2472:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1774425600/real 1774425600] req@ff1a5db243f89a00 x1860617578804736/t0(0) o103->MGC10.240.47.145@tcp@10.240.47.145@tcp:17/18 lens 328/224 e 0 to 1 dl 1774425616 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'ldlm_bl.0' uid:0 gid:0 projid:4294967295
Lustre: 5335:0:(mgc_request.c:2047:mgc_process_log()) MGC10.240.47.145@tcp: IR log lustre-mdtir failed, not fatal: rc = -5
LustreError: 51041:0:(ldlm_resource.c:1167:ldlm_resource_complain()) MGC10.240.47.145@tcp: namespace resource [0x65727473756c:0x0:0x0].0x0 (ff1a5db25614cd80) refcount nonzero (1) after lock cleanup; forcing cleanup.
LustreError: 51041:0:(ldlm_resource.c:1167:ldlm_resource_complain()) Skipped 1 previous similar message
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm97.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: trevis-152vm97.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds1 1 times
Lustre: DEBUG MARKER: test_26 fail mds1 1 times
LustreError: lustre-MDT0000-osp-MDT0003: operation mds_statfs to node 10.240.47.145@tcp failed: rc = -107
LustreError: Skipped 17 previous similar messages
Lustre: 5335:0:(mgc_request.c:2047:mgc_process_log()) MGC10.240.47.145@tcp: IR log lustre-mdtir failed, not fatal: rc = -5
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm99.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: trevis-152vm99.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-152vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm96.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: trevis-152vm96.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: sync; sync; sync
Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 notransno
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 flakey 252:0 0 0 1800 1 drop_writes"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds2 REPLAY BARRIER on lustre-MDT0001
Lustre: DEBUG MARKER: mds2 REPLAY BARRIER on lustre-MDT0001
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds2 2 times
Lustre: DEBUG MARKER: test_26 fail mds2 2 times
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2
Lustre: Failing over lustre-MDT0001
LustreError: 52430:0:(ldlm_resource.c:1167:ldlm_resource_complain()) lustre-MDT0000-osp-MDT0001: namespace resource [0x200011d41:0x38:0x0].0x0 (ff1a5db269c29240) refcount nonzero (1) after lock cleanup; forcing cleanup.
Lustre: lustre-MDT0001: Not available for connect from 10.240.47.142@tcp (stopping)
Lustre: Skipped 18 previous similar messages
LDISKFS-fs (dm-3): unmounting filesystem 541cd59f-c6c5-4d67-9a3e-c78820cde152.
Lustre: server umount lustre-MDT0001 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 linear 252:0 0"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2
LDISKFS-fs (dm-3): 12 truncates cleaned up
LDISKFS-fs (dm-3): recovery complete
LDISKFS-fs (dm-3): mounted filesystem 541cd59f-c6c5-4d67-9a3e-c78820cde152 r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180
Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 7 clients reconnect
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm100.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm100.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: trevis-152vm100.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: trevis-152vm100.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null
Lustre: lustre-MDT0001: Recovery over after 0:05, of 7 clients 7 recovered and 0 were evicted.
Lustre: 5338:0:(mdt_recovery.c:102:mdt_req_from_lrd()) @@@ restoring transno req@ff1a5db244075040 x1860617287465984/t30064775061(0) o36->c825b920-6586-4065-9c52-62f281e71c4e@10.240.47.142@tcp:575/0 lens 488/3152 e 0 to 0 dl 1774425735 ref 1 fl Interpret:/202/0 rc 0/0 job:'tar.0' uid:0 gid:0 projid:4294967295
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-152vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm96.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: trevis-152vm96.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds3 3 times
Lustre: DEBUG MARKER: test_26 fail mds3 3 times
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm99.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: trevis-152vm99.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-152vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm96.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: trevis-152vm96.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: sync; sync; sync
Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0003 notransno
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 flakey 252:1 0 0 1800 1 drop_writes"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds4 REPLAY BARRIER on lustre-MDT0003
Lustre: DEBUG MARKER: mds4 REPLAY BARRIER on lustre-MDT0003
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds4 4 times
Lustre: DEBUG MARKER: test_26 fail mds4 4 times
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4
Lustre: Failing over lustre-MDT0003
Lustre: lustre-MDT0003: Not available for connect from 10.240.47.142@tcp (stopping)
Lustre: Skipped 19 previous similar messages
LDISKFS-fs (dm-4): unmounting filesystem 6333fee0-e171-4a98-a5bb-7f435750d7cf.
Lustre: server umount lustre-MDT0003 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
LustreError: 5343:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0003: not available for connect from 10.240.47.145@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 5343:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 51 previous similar messages
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 linear 252:1 0"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4
LDISKFS-fs (dm-4): 21 truncates cleaned up
LDISKFS-fs (dm-4): recovery complete
LDISKFS-fs (dm-4): mounted filesystem 6333fee0-e171-4a98-a5bb-7f435750d7cf r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0003: Imperative Recovery not enabled, recovery window 60-180
Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Link to test
insanity test 7: Seventh Failure Mode: CLIENT/MDS Wed Mar 25 07:41:11 AM UTC 2026
LustreError: 36686:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed:
LustreError: 36686:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG
CPU: 0 PID: 36686 Comm: ll_mgs_0001 Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.40.1_lustre.el9.x86_64 #1
Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
lbug_with_loc.cold+0x5/0x43 [libcfs]
mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs]
? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc]
mgs_get_ir_logs+0x4f4/0xa00 [mgs]
? __req_capsule_get+0x143/0x3b0 [ptlrpc]
mgs_config_read+0x1c6/0x1e0 [mgs]
tgt_handle_request0+0x147/0x770 [ptlrpc]
tgt_request_handle+0x3fd/0xd00 [ptlrpc]
ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc]
? srso_alias_return_thunk+0x5/0xfbef5
ptlrpc_main+0x9bf/0xea0 [ptlrpc]
? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
kthread+0xdd/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x50
</TASK>
Lustre: DEBUG MARKER: /usr/sbin/lctl mark Request fail clients: 2, to fail: 1, failed: 0
Lustre: DEBUG MARKER: Request fail clients: 2, to fail: 1, failed: 0
Lustre: DEBUG MARKER: /usr/sbin/lctl mark shutdown_client: trevis-156vm98 uptime: 07:41:14 up 27 min, 0 users, load average: 0.09, 0.05, 0.01
Lustre: DEBUG MARKER: shutdown_client: trevis-156vm98 uptime: 07:41:14 up 27 min, 0 users, load average: 0.09, 0.05, 0.01
Lustre: lustre-MDT0003: haven't heard from client c1c0ea49-a747-445c-b7d1-94d151f7f8d2 (at 10.240.46.200@tcp) in 101 seconds. I think it's dead, and I am evicting it. exp ff3c941817a63400, cur 1774424576 deadline 1774424575 last 1774424475
Lustre: Skipped 1 previous similar message
LustreError: lustre-MDT0000-osp-MDT0003: operation mds_statfs to node 10.240.46.202@tcp failed: rc = -107
Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 10.240.46.202@tcp) was lost; in progress operations using this service will wait for recovery to complete
LustreError: Skipped 16 previous similar messages
Lustre: Skipped 16 previous similar messages
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm100.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: trevis-156vm100.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
LustreError: MGC10.240.46.202@tcp: Connection to MGS (at 10.240.46.202@tcp) was lost; in progress operations using this service will fail
Lustre: Evicted from MGS (at 10.240.46.202@tcp) after server handle changed from 0x8aec7cde9c7bb0c4 to 0x8aec7cde9c7bbe7e
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2
Lustre: Failing over lustre-MDT0001
LDISKFS-fs (dm-3): unmounting filesystem 56cf54a1-ca28-46ef-9432-1fa315f27ed8.
Lustre: server umount lustre-MDT0001 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
LustreError: 31940:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.46.202@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 31940:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 74 previous similar messages
Autotest: Test running for 30 minutes (lustre-reviews_review-dne-part-4_122892.46)
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1
Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2
LDISKFS-fs (dm-3): mounted filesystem 56cf54a1-ca28-46ef-9432-1fa315f27ed8 r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180
Lustre: Skipped 1 previous similar message
Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect
Lustre: Skipped 1 previous similar message
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null
Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 4 clients reconnect
Lustre: Skipped 1 previous similar message
Lustre: lustre-MDT0001: Recovery over after 0:01, of 4 clients 4 recovered and 0 were evicted.
Lustre: Skipped 1 previous similar message
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm100.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: trevis-156vm100.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4
Lustre: Failing over lustre-MDT0003
LDISKFS-fs (dm-4): unmounting filesystem 961f933e-1dba-4213-887e-3725b11f823d.
Lustre: server umount lustre-MDT0003 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1
Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4
LDISKFS-fs (dm-4): mounted filesystem 961f933e-1dba-4213-887e-3725b11f823d r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Link to test
sanity-hsm test 302: HSM tunnable are persistent when CDT is off
LustreError: 170570:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed:
LustreError: 170570:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG
CPU: 1 PID: 170570 Comm: ll_mgs_0002 Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.40.1_lustre.el9.x86_64 #1
Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
lbug_with_loc.cold+0x5/0x43 [libcfs]
mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs]
? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc]
mgs_get_ir_logs+0x4f4/0xa00 [mgs]
? __req_capsule_get+0x143/0x3b0 [ptlrpc]
mgs_config_read+0x1c6/0x1e0 [mgs]
tgt_handle_request0+0x147/0x770 [ptlrpc]
tgt_request_handle+0x3fd/0xd00 [ptlrpc]
ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc]
? srso_alias_return_thunk+0x5/0xfbef5
ptlrpc_main+0x9bf/0xea0 [ptlrpc]
? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
kthread+0xdd/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x50
</TASK>
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdt.lustre-MDT0001.hsm_control='shutdown'
Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdt.lustre-MDT0003.hsm_control='shutdown'
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0001.hsm_control
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0003.hsm_control
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm100.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
LustreError: MGC10.240.46.202@tcp: Connection to MGS (at 10.240.46.202@tcp) was lost; in progress operations using this service will fail
Lustre: lustre-MDT0000-lwp-MDT0001: Connection restored to 10.240.46.202@tcp (at 10.240.46.202@tcp)
Lustre: Skipped 9 previous similar messages
Lustre: Evicted from MGS (at 10.240.46.202@tcp) after server handle changed from 0x2ad993970e1a1e60 to 0x2ad993970e1a253d
Lustre: DEBUG MARKER: trevis-156vm100.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2
Lustre: Failing over lustre-MDT0001
LDISKFS-fs (dm-3): unmounting filesystem ce689057-7d87-4efa-968f-b47826aa6305.
Lustre: server umount lustre-MDT0001 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
LustreError: 14506:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.46.202@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 14507:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.46.202@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 14507:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 3 previous similar messages
LustreError: 14506:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 3 previous similar messages
LustreError: 105618:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.46.201@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 105618:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 5 previous similar messages
LustreError: lustre-MDT0001-osp-MDT0003: operation mds_statfs to node 0@lo failed: rc = -107
LustreError: Skipped 1 previous similar message
Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 5 previous similar messages
LustreError: 105618:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.46.200@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 105618:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 3 previous similar messages
LustreError: 16310:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.46.202@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 16310:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 2 previous similar messages
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1
LustreError: 14507:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.46.202@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 14507:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 14 previous similar messages
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1
Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2
LDISKFS-fs (dm-3): mounted filesystem ce689057-7d87-4efa-968f-b47826aa6305 r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180
Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 7 clients reconnect
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
debugfs: Directory '10.240.46.200@tcp' with parent 'exports' already present!
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null
Autotest: Test running for 85 minutes (lustre-reviews_review-dne-part-4_122892.46)
debugfs: Directory '10.240.46.202@tcp' with parent 'exports' already present!
debugfs: File 'stats' in directory '/' already present!
debugfs: File 'ldlm_stats' in directory '/' already present!
debugfs: File 'open_files' in directory '/' already present!
debugfs: Directory '10.240.46.201@tcp' with parent 'exports' already present!
debugfs: File 'stats' in directory '/' already present!
debugfs: File 'ldlm_stats' in directory '/' already present!
debugfs: File 'open_files' in directory '/' already present!
Lustre: lustre-MDT0001: Recovery over after 0:05, of 7 clients 7 recovered and 0 were evicted.
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec
LustreError: lustre-MDT0002-osp-MDT0001: operation mds_statfs to node 10.240.46.202@tcp failed: rc = -107
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm100.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: trevis-156vm100.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4
Lustre: Failing over lustre-MDT0003
LDISKFS-fs (dm-4): unmounting filesystem 931af2f6-deea-43bb-9243-f8e3f10102c7.
Lustre: server umount lustre-MDT0003 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
LustreError: 16310:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0003: not available for connect from 10.240.46.202@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 16310:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 9 previous similar messages
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1
Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4
LDISKFS-fs (dm-4): mounted filesystem 931af2f6-deea-43bb-9243-f8e3f10102c7 r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0003: Imperative Recovery not enabled, recovery window 60-180
Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Link to test
sanity test 804: verify agent entry for remote entry
LustreError: 466238:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed:
LustreError: 466238:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG
CPU: 0 PID: 466238 Comm: ll_mgs_0002 Kdump: loaded Tainted: G OE ------- --- 5.14.0-570.62.1_lustre.el9.x86_64 #1
Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
lbug_with_loc.cold+0x5/0x43 [libcfs]
mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs]
? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc]
mgs_get_ir_logs+0x4f4/0xa00 [mgs]
? __req_capsule_get+0x143/0x3b0 [ptlrpc]
mgs_config_read+0x1c6/0x1e0 [mgs]
tgt_handle_request0+0x147/0x770 [ptlrpc]
tgt_request_handle+0x3fd/0xd00 [ptlrpc]
ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc]
? srso_alias_return_thunk+0x5/0xfbef5
ptlrpc_main+0x9bf/0xea0 [ptlrpc]
? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
kthread+0xdd/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x50
</TASK>
Lustre: DEBUG MARKER: debugfs -c -R 'ls /REMOTE_PARENT_DIR' /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: debugfs -c -R 'ls /REMOTE_PARENT_DIR' /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: debugfs -c -R 'ls /REMOTE_PARENT_DIR' /dev/mapper/mds4_flakey
LustreError: lustre-MDT0000-osp-MDT0003: operation mds_statfs to node 10.240.44.97@tcp failed: rc = -107
LustreError: MGC10.240.44.97@tcp: Connection to MGS (at 10.240.44.97@tcp) was lost; in progress operations using this service will fail
Lustre: Evicted from MGS (at 10.240.44.97@tcp) after server handle changed from 0x9d26a2c865163d56 to 0x9d26a2c86530bb53
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-153vm103.trevis.whamcloud.com: executing set_default_debug all all
Lustre: DEBUG MARKER: trevis-153vm103.trevis.whamcloud.com: executing set_default_debug all all
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2
Lustre: Failing over lustre-MDT0001
LustreError: 5635:0:(import.c:706:ptlrpc_connect_import_locked()) can't connect to a closed import
LDISKFS-fs (dm-3): unmounting filesystem 4905abfe-0623-4437-ab65-2a2921dfa2dc.
Lustre: server umount lustre-MDT0001 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: e2fsck -h
LustreError: 298584:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.44.96@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 298584:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 14 previous similar messages
Lustre: DEBUG MARKER: e2fsck -d -v -t -t -f -n /dev/mapper/mds2_flakey -m8
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1
Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2
LDISKFS-fs (dm-3): mounted filesystem 4905abfe-0623-4437-ab65-2a2921dfa2dc r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180
Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-153vm104.trevis.whamcloud.com: executing set_default_debug all all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-153vm104.trevis.whamcloud.com: executing set_default_debug all all
Lustre: DEBUG MARKER: trevis-153vm104.trevis.whamcloud.com: executing set_default_debug all all
Lustre: DEBUG MARKER: trevis-153vm104.trevis.whamcloud.com: executing set_default_debug all all
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null
Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 5 clients reconnect
Lustre: lustre-MDT0001: Received new MDS connection from 10.240.44.97@tcp, keep former export from same NID
Lustre: lustre-MDT0001: Received new MDS connection from 10.240.44.97@tcp, keep former export from same NID
Lustre: lustre-MDT0001: Client lustre-MDT0002-mdtlov_UUID (at 10.240.44.97@tcp) refused connection, still busy with 7 references
Lustre: lustre-MDT0001: Received new MDS connection from 10.240.44.97@tcp, keep former export from same NID
Lustre: lustre-MDT0001: Client lustre-MDT0002-mdtlov_UUID (at 10.240.44.97@tcp) refused connection, still busy with 7 references
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-153vm103.trevis.whamcloud.com: executing set_default_debug all all
Lustre: DEBUG MARKER: trevis-153vm103.trevis.whamcloud.com: executing set_default_debug all all
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4
Lustre: Failing over lustre-MDT0003
LustreError: 5635:0:(import.c:706:ptlrpc_connect_import_locked()) can't connect to a closed import
LDISKFS-fs (dm-4): unmounting filesystem ca84123f-0ba0-4ed7-a94c-06a09f5e229c.
Lustre: server umount lustre-MDT0003 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: e2fsck -h
Lustre: DEBUG MARKER: e2fsck -d -v -t -t -f -n /dev/mapper/mds4_flakey -m8
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1
Lustre: lustre-MDT0003-osp-MDT0001: Connection to lustre-MDT0003 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 7 previous similar messages
Lustre: lustre-MDT0001: Received new MDS connection from 10.240.44.97@tcp, keep former export from same NID
Lustre: lustre-MDT0001: Client lustre-MDT0002-mdtlov_UUID (at 10.240.44.97@tcp) refused connection, still busy with 7 references
Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4
LDISKFS-fs (dm-4): mounted filesystem ca84123f-0ba0-4ed7-a94c-06a09f5e229c r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0001: Recovery over after 0:12, of 5 clients 5 recovered and 0 were evicted.
Lustre: lustre-MDT0003: Imperative Recovery not enabled, recovery window 60-180
Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: lustre-MDT0003: Will be in recovery for at least 1:00, or until 5 clients reconnect
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-153vm104.trevis.whamcloud.com: executing set_default_debug all all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-153vm104.trevis.whamcloud.com: executing set_default_debug all all
Link to test
replay-single test 112d: DNE: cross MDT rename, fail MDT4
LustreError: 188667:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed:
LustreError: 188667:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG
CPU: 1 PID: 188667 Comm: ll_mgs_0000 Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.40.1_lustre.el9.x86_64 #1
Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
lbug_with_loc.cold+0x5/0x43 [libcfs]
mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs]
? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc]
mgs_get_ir_logs+0x4f4/0xa00 [mgs]
? __req_capsule_get+0x143/0x3b0 [ptlrpc]
mgs_config_read+0x1c6/0x1e0 [mgs]
tgt_handle_request0+0x147/0x770 [ptlrpc]
tgt_request_handle+0x3fd/0xd00 [ptlrpc]
ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc]
? srso_alias_return_thunk+0x5/0xfbef5
ptlrpc_main+0x9bf/0xea0 [ptlrpc]
? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
kthread+0xdd/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x50
</TASK>
Lustre: DEBUG MARKER: sync; sync; sync
Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0003 notransno
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 flakey 252:1 0 0 1800 1 drop_writes"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds4 REPLAY BARRIER on lustre-MDT0003
Lustre: DEBUG MARKER: mds4 REPLAY BARRIER on lustre-MDT0003
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4
Lustre: Failing over lustre-MDT0003
LDISKFS-fs (dm-4): unmounting filesystem 57190e03-b795-45ec-8ae8-0bc75a4cf658.
Lustre: server umount lustre-MDT0003 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 linear 252:1 0"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4
LDISKFS-fs (dm-4): 12 truncates cleaned up
LDISKFS-fs (dm-4): recovery complete
LDISKFS-fs (dm-4): mounted filesystem 57190e03-b795-45ec-8ae8-0bc75a4cf658 r/w with ordered data mode. Quota mode: journalled.
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Link to test
replay-single test 70d: mkdir/rmdir striped dir 4mdts recovery
LustreError: 81456:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed:
LustreError: 81456:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG
CPU: 0 PID: 81456 Comm: ll_mgs_0000 Kdump: loaded Tainted: G OE ------- --- 5.14.0-570.62.1_lustre.el9.x86_64 #1
Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
lbug_with_loc.cold+0x5/0x43 [libcfs]
mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs]
? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc]
mgs_get_ir_logs+0x4f4/0xa00 [mgs]
? __req_capsule_get+0x143/0x3b0 [ptlrpc]
mgs_config_read+0x1c6/0x1e0 [mgs]
tgt_handle_request0+0x147/0x770 [ptlrpc]
tgt_request_handle+0x3fd/0xd00 [ptlrpc]
ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc]
? srso_alias_return_thunk+0x5/0xfbef5
ptlrpc_main+0x9bf/0xea0 [ptlrpc]
? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
kthread+0xdd/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x50
</TASK>
Lustre: DEBUG MARKER: hostname -I
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-155vm105.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: trevis-155vm105.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: sync; sync; sync
Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 notransno
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 flakey 252:0 0 0 1800 1 drop_writes"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds2 REPLAY BARRIER on lustre-MDT0001
Lustre: DEBUG MARKER: mds2 REPLAY BARRIER on lustre-MDT0001
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_70d fail mds2 1 times
Lustre: DEBUG MARKER: test_70d fail mds2 1 times
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2
Lustre: Failing over lustre-MDT0001
Lustre: lustre-MDT0001: Not available for connect from 10.240.46.6@tcp (stopping)
Lustre: Skipped 2 previous similar messages
LustreError: 80424:0:(ldlm_resource.c:1167:ldlm_resource_complain()) lustre-MDT0002-osp-MDT0001: namespace resource [0x280000fbc:0x966:0x0].0x0 (ff2b95df028f5f00) refcount nonzero (1) after lock cleanup; forcing cleanup.
LustreError: lustre-MDT0001-osp-MDT0003: operation mds_statfs to node 0@lo failed: rc = -107
LustreError: Skipped 6 previous similar messages
Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 11 previous similar messages
LustreError: 15995:0:(ldlm_lockd.c:2560:ldlm_cancel_handler()) ldlm_cancel from 10.240.46.9@tcp arrived at 1774427240 with bad export cookie 3003157324093779003
Autotest: Test running for 70 minutes (lustre-reviews_review-dne-part-6_122892.56)
LDISKFS-fs (dm-3): unmounting filesystem 2c9bb5f6-8089-4839-a321-516fc16f5c19.
Lustre: server umount lustre-MDT0001 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
LustreError: 5669:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.46.6@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 5669:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 35 previous similar messages
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 linear 252:0 0"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2
LDISKFS-fs (dm-3): 12 truncates cleaned up
LDISKFS-fs (dm-3): recovery complete
LDISKFS-fs (dm-3): mounted filesystem 2c9bb5f6-8089-4839-a321-516fc16f5c19 r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180
Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 5 clients reconnect
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-155vm108.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-155vm108.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: trevis-155vm108.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: trevis-155vm108.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null
Lustre: lustre-MDT0001: Recovery over after 0:07, of 5 clients 5 recovered and 0 were evicted.
Lustre: lustre-MDT0001-osp-MDT0003: Connection restored to 0@lo (at 0@lo)
Lustre: Skipped 14 previous similar messages
Lustre: 75556:0:(mdt_recovery.c:102:mdt_req_from_lrd()) @@@ restoring transno req@ff2b95df1c5b5a00 x1860617651767808/t12885020237(0) o36->ea54ed17-e5df-4930-83f1-a1cef264e6b1@10.240.46.6@tcp:647/0 lens 496/2888 e 0 to 0 dl 1774427317 ref 1 fl Interpret:/202/0 rc 0/0 job:'rm.0' uid:0 gid:0 projid:4294967295
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-155vm105.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-155vm105.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: sync; sync; sync
Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0003 notransno
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 flakey 252:1 0 0 1800 1 drop_writes"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds4 REPLAY BARRIER on lustre-MDT0003
Lustre: DEBUG MARKER: mds4 REPLAY BARRIER on lustre-MDT0003
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_70d fail mds4 2 times
Lustre: DEBUG MARKER: test_70d fail mds4 2 times
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4
Lustre: Failing over lustre-MDT0003
Lustre: lustre-MDT0003: Not available for connect from 10.240.46.6@tcp (stopping)
Lustre: Skipped 17 previous similar messages
LDISKFS-fs (dm-4): unmounting filesystem 249c35b5-9887-4289-92b0-3c440fe3bf8f.
Lustre: server umount lustre-MDT0003 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
LustreError: 78652:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0003: not available for connect from 10.240.46.8@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 78652:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 37 previous similar messages
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 linear 252:1 0"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4
LDISKFS-fs (dm-4): 12 truncates cleaned up
LDISKFS-fs (dm-4): recovery complete
LDISKFS-fs (dm-4): mounted filesystem 249c35b5-9887-4289-92b0-3c440fe3bf8f r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0003: Imperative Recovery not enabled, recovery window 60-180
Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: lustre-MDT0003: Will be in recovery for at least 1:00, or until 5 clients reconnect
Link to test
replay-dual test 26: dbench and tar with mds failover
LustreError: 68871:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed:
LustreError: 68871:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG
CPU: 0 PID: 68871 Comm: ll_mgs_0000 Kdump: loaded Tainted: G OE ------- --- 5.14.0-570.62.1_lustre.el9.x86_64 #1
Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
lbug_with_loc.cold+0x5/0x43 [libcfs]
mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs]
? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc]
mgs_get_ir_logs+0x4f4/0xa00 [mgs]
? __req_capsule_get+0x143/0x3b0 [ptlrpc]
mgs_config_read+0x1c6/0x1e0 [mgs]
tgt_handle_request0+0x147/0x770 [ptlrpc]
tgt_request_handle+0x3fd/0xd00 [ptlrpc]
ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc]
? srso_alias_return_thunk+0x5/0xfbef5
ptlrpc_main+0x9bf/0xea0 [ptlrpc]
? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
kthread+0xdd/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x50
</TASK>
Lustre: DEBUG MARKER: hostname -I
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm166.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: onyx-156vm166.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds1 1 times
Lustre: DEBUG MARKER: test_26 fail mds1 1 times
Lustre: 5658:0:(client.c:2472:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1774427817/real 1774427817] req@ff17b6b06a3230c0 x1860617434105984/t0(0) o103->MGC10.240.24.16@tcp@10.240.24.16@tcp:17/18 lens 328/224 e 0 to 1 dl 1774427833 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'ldlm_bl.0' uid:0 gid:0 projid:4294967295
Lustre: 5687:0:(mgc_request.c:2047:mgc_process_log()) MGC10.240.24.16@tcp: IR log lustre-mdtir failed, not fatal: rc = -5
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm168.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: onyx-156vm168.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: 5687:0:(mgc_request.c:2047:mgc_process_log()) MGC10.240.24.16@tcp: IR log lustre-mdtir failed, not fatal: rc = -5
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm166.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-156vm166.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm165.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: onyx-156vm165.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: sync; sync; sync
Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 notransno
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 flakey 252:0 0 0 1800 1 drop_writes"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds2 REPLAY BARRIER on lustre-MDT0001
Lustre: DEBUG MARKER: mds2 REPLAY BARRIER on lustre-MDT0001
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds2 2 times
Lustre: DEBUG MARKER: test_26 fail mds2 2 times
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2
Lustre: Failing over lustre-MDT0001
Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping)
Lustre: Skipped 14 previous similar messages
LDISKFS-fs (dm-3): unmounting filesystem 4510c1d6-0961-4c6f-9f82-601c6611a776.
Lustre: server umount lustre-MDT0001 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
LustreError: 8485:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.23.243@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 8485:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 31 previous similar messages
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 linear 252:0 0"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2
LDISKFS-fs (dm-3): 12 truncates cleaned up
LDISKFS-fs (dm-3): recovery complete
LDISKFS-fs (dm-3): mounted filesystem 4510c1d6-0961-4c6f-9f82-601c6611a776 r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180
Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 7 clients reconnect
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm169.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm169.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: onyx-156vm169.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: onyx-156vm169.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null
Lustre: lustre-MDT0001: Recovery over after 0:05, of 7 clients 7 recovered and 0 were evicted.
Lustre: lustre-MDT0001-osp-MDT0003: Connection restored to 0@lo (at 0@lo)
Lustre: Skipped 68 previous similar messages
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm166.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-156vm166.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm165.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: onyx-156vm165.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds3 3 times
Lustre: DEBUG MARKER: test_26 fail mds3 3 times
LustreError: lustre-MDT0002-osp-MDT0001: operation mds_statfs to node 10.240.24.16@tcp failed: rc = -107
LustreError: Skipped 19 previous similar messages
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm168.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: onyx-156vm168.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm166.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: onyx-156vm166.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm165.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: onyx-156vm165.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: sync; sync; sync
Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0003 notransno
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 flakey 252:1 0 0 1800 1 drop_writes"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds4 REPLAY BARRIER on lustre-MDT0003
Lustre: DEBUG MARKER: mds4 REPLAY BARRIER on lustre-MDT0003
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds4 4 times
Lustre: DEBUG MARKER: test_26 fail mds4 4 times
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4
Lustre: Failing over lustre-MDT0003
LDISKFS-fs (dm-4): unmounting filesystem 619002bf-cc7e-4d1a-94c4-e05ca1825a5c.
Lustre: server umount lustre-MDT0003 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 linear 252:1 0"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4
LDISKFS-fs (dm-4): 21 truncates cleaned up
LDISKFS-fs (dm-4): recovery complete
LDISKFS-fs (dm-4): mounted filesystem 619002bf-cc7e-4d1a-94c4-e05ca1825a5c r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0003: Imperative Recovery not enabled, recovery window 60-180
Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: lustre-MDT0003: Will be in recovery for at least 1:00, or until 7 clients reconnect
Link to test
replay-dual test 26: dbench and tar with mds failover
LustreError: 64194:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed:
LustreError: 64194:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG
CPU: 1 PID: 64194 Comm: ll_mgs_0000 Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.40.1_lustre.el9.x86_64 #1
Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
lbug_with_loc.cold+0x5/0x43 [libcfs]
mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs]
? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc]
mgs_get_ir_logs+0x4f4/0xa00 [mgs]
? __req_capsule_get+0x143/0x3b0 [ptlrpc]
mgs_config_read+0x1c6/0x1e0 [mgs]
tgt_handle_request0+0x147/0x770 [ptlrpc]
tgt_request_handle+0x3fd/0xd00 [ptlrpc]
ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc]
? srso_alias_return_thunk+0x5/0xfbef5
ptlrpc_main+0x9bf/0xea0 [ptlrpc]
? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
kthread+0xdd/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x50
</TASK>
Lustre: DEBUG MARKER: hostname -I
Lustre: 5305:0:(client.c:2472:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1774427640/real 1774427640] req@ff305529e9263a80 x1860617328025088/t0(0) o103->MGC10.240.41.242@tcp@10.240.41.242@tcp:17/18 lens 328/224 e 0 to 1 dl 1774427656 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'ldlm_bl.0' uid:0 gid:0 projid:4294967295
Lustre: 5335:0:(mgc_request.c:2047:mgc_process_log()) MGC10.240.41.242@tcp: IR log lustre-mdtir failed, not fatal: rc = -5
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm101.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: trevis-154vm101.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds1 1 times
Lustre: DEBUG MARKER: test_26 fail mds1 1 times
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm103.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: trevis-154vm103.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: 5305:0:(client.c:2472:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1774427667/real 1774427667] req@ff305529e74349c0 x1860617328616192/t0(0) o400->MGC10.240.41.242@tcp@10.240.41.242@tcp:26/25 lens 224/224 e 0 to 1 dl 1774427683 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0 projid:4294967295
Lustre: 5305:0:(client.c:2472:ptlrpc_expire_one_request()) Skipped 1 previous similar message
LustreError: 62086:0:(ldlm_resource.c:1167:ldlm_resource_complain()) MGC10.240.41.242@tcp: namespace resource [0x65727473756c:0x2:0x0].0x0 (ff305529d7edb540) refcount nonzero (1) after lock cleanup; forcing cleanup.
LustreError: 62086:0:(ldlm_resource.c:1167:ldlm_resource_complain()) Skipped 1 previous similar message
Lustre: 5335:0:(mgc_request.c:2047:mgc_process_log()) MGC10.240.41.242@tcp: IR log lustre-mdtir failed, not fatal: rc = -5
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm101.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-154vm101.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm100.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: trevis-154vm100.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: sync; sync; sync
Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 notransno
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 flakey 252:0 0 0 1800 1 drop_writes"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds2 REPLAY BARRIER on lustre-MDT0001
Lustre: DEBUG MARKER: mds2 REPLAY BARRIER on lustre-MDT0001
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds2 2 times
Lustre: DEBUG MARKER: test_26 fail mds2 2 times
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2
Lustre: Failing over lustre-MDT0001
Lustre: lustre-MDT0001: Not available for connect from 10.240.41.239@tcp (stopping)
Lustre: Skipped 16 previous similar messages
LDISKFS-fs (dm-3): unmounting filesystem 7305e73d-5281-45c6-87f2-dcff4964fe9d.
Lustre: server umount lustre-MDT0001 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Autotest: Test running for 80 minutes (lustre-reviews_review-dne-part-8_122892.50)
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 linear 252:0 0"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2
LDISKFS-fs (dm-3): 12 truncates cleaned up
LDISKFS-fs (dm-3): recovery complete
LDISKFS-fs (dm-3): mounted filesystem 7305e73d-5281-45c6-87f2-dcff4964fe9d r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180
Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 7 clients reconnect
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm104.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm104.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: trevis-154vm104.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: trevis-154vm104.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null
Lustre: lustre-MDT0001: Recovery over after 0:04, of 7 clients 7 recovered and 0 were evicted.
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm101.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-154vm101.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm100.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: trevis-154vm100.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds3 3 times
Lustre: DEBUG MARKER: test_26 fail mds3 3 times
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm103.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: trevis-154vm103.trevis.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm101.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: trevis-154vm101.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm100.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: trevis-154vm100.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid
Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec
Lustre: DEBUG MARKER: sync; sync; sync
Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0003 notransno
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 flakey 252:1 0 0 1800 1 drop_writes"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds4 REPLAY BARRIER on lustre-MDT0003
Lustre: DEBUG MARKER: mds4 REPLAY BARRIER on lustre-MDT0003
Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds4 4 times
Lustre: DEBUG MARKER: test_26 fail mds4 4 times
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4
Lustre: Failing over lustre-MDT0003
Lustre: lustre-MDT0003: Not available for connect from 10.240.41.239@tcp (stopping)
Lustre: Skipped 1 previous similar message
LDISKFS-fs (dm-4): unmounting filesystem 8b9f3749-5b8b-46aa-9107-16227669ef3d.
Lustre: server umount lustre-MDT0003 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
LustreError: 5342:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0003: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 5342:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 89 previous similar messages
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 linear 252:1 0"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4
LDISKFS-fs (dm-4): 21 truncates cleaned up
LDISKFS-fs (dm-4): recovery complete
LDISKFS-fs (dm-4): mounted filesystem 8b9f3749-5b8b-46aa-9107-16227669ef3d r/w with ordered data mode. Quota mode: journalled.
Lustre: lustre-MDT0003: Imperative Recovery not enabled, recovery window 60-180
Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect
Lustre: lustre-MDT0003: Will be in recovery for at least 1:00, or until 7 clients reconnect
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Link to test
Return to new crashes list