| Match messages in logs (every line would be required to be present in log output Copy from "Messages before crash" column below): | |
| Match messages in full crash (every line would be required to be present in crash log output Copy from "Full Crash" column below): | |
| Limit to a test: (Copy from below "Failing text"): | |
| Delete these reports as invalid (real bug in review or some such) | |
| Bug or comment: | |
| Extra info: |
| Failing Test | Full Crash | Messages before crash | Comment |
|---|---|---|---|
| replay-dual test 26: dbench and tar with mds failover | LustreError: 57138:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed: LustreError: 57138:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG CPU: 0 PID: 57138 Comm: ll_mgs_0000 Kdump: loaded Tainted: G OE ------- --- 5.14.0-570.62.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 lbug_with_loc.cold+0x5/0x43 [libcfs] mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs] ? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc] mgs_get_ir_logs+0x4f4/0xa00 [mgs] ? __req_capsule_get+0x143/0x3b0 [ptlrpc] mgs_config_read+0x1c6/0x1e0 [mgs] tgt_handle_request0+0x147/0x770 [ptlrpc] tgt_request_handle+0x3fd/0xd00 [ptlrpc] ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc] ? srso_alias_return_thunk+0x5/0xfbef5 ptlrpc_main+0x9bf/0xea0 [ptlrpc] ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc] kthread+0xdd/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK> | Lustre: DEBUG MARKER: hostname -I Lustre: 5659:0:(client.c:2472:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1774425810/real 1774425810] req@ff2fed3e83d223c0 x1860617644960128/t0(0) o103->MGC10.240.28.143@tcp@10.240.28.143@tcp:17/18 lens 328/224 e 0 to 1 dl 1774425826 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'ldlm_bl.0' uid:0 gid:0 projid:4294967295 LustreError: 54653:0:(ldlm_resource.c:1167:ldlm_resource_complain()) MGC10.240.28.143@tcp: namespace resource [0x65727473756c:0x2:0x0].0x0 (ff2fed3e45377cc0) refcount nonzero (1) after lock cleanup; forcing cleanup. LustreError: 54653:0:(ldlm_resource.c:1167:ldlm_resource_complain()) Skipped 1 previous similar message Lustre: 5688:0:(mgc_request.c:2047:mgc_process_log()) MGC10.240.28.143@tcp: IR log lustre-mdtir failed, not fatal: rc = -5 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm163.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: onyx-157vm163.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds1 1 times Lustre: DEBUG MARKER: test_26 fail mds1 1 times Lustre: 5688:0:(mgc_request.c:2047:mgc_process_log()) MGC10.240.28.143@tcp: IR log lustre-mdtir failed, not fatal: rc = -5 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm165.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: onyx-157vm165.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: onyx-157vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm162.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: onyx-157vm162.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 notransno Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 flakey 252:0 0 0 1800 1 drop_writes" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds2 REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: mds2 REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds2 2 times Lustre: DEBUG MARKER: test_26 fail mds2 2 times Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 Lustre: Failing over lustre-MDT0001 Lustre: lustre-MDT0001: Not available for connect from 10.240.28.140@tcp (stopping) Lustre: Skipped 24 previous similar messages LDISKFS-fs (dm-3): unmounting filesystem 38fd68a5-cde2-41dd-9dcf-f098238957fe. Lustre: server umount lustre-MDT0001 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 linear 252:0 0" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 LDISKFS-fs (dm-3): 12 truncates cleaned up LDISKFS-fs (dm-3): recovery complete LDISKFS-fs (dm-3): mounted filesystem 38fd68a5-cde2-41dd-9dcf-f098238957fe r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 7 clients reconnect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm166.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm166.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: onyx-157vm166.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: onyx-157vm166.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null Lustre: lustre-MDT0001: Recovery over after 0:05, of 7 clients 7 recovered and 0 were evicted. Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: onyx-157vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm162.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: onyx-157vm162.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds3 3 times Lustre: DEBUG MARKER: test_26 fail mds3 3 times LustreError: lustre-MDT0002-osp-MDT0003: operation ldlm_enqueue to node 10.240.28.143@tcp failed: rc = -19 LustreError: Skipped 26 previous similar messages Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm165.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: onyx-157vm165.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: onyx-157vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-157vm162.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: onyx-157vm162.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0003 notransno Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 flakey 252:1 0 0 1800 1 drop_writes" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds4 REPLAY BARRIER on lustre-MDT0003 Lustre: DEBUG MARKER: mds4 REPLAY BARRIER on lustre-MDT0003 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds4 4 times Lustre: DEBUG MARKER: test_26 fail mds4 4 times Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4 Lustre: Failing over lustre-MDT0003 Lustre: lustre-MDT0003: Not available for connect from 10.240.28.140@tcp (stopping) Autotest: Test running for 50 minutes (lustre-reviews_review-dne-part-2_122892.52) LDISKFS-fs (dm-4): unmounting filesystem 7cfa4b01-d375-4c46-aa03-96010de6b3f6. Lustre: server umount lustre-MDT0003 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && LustreError: 5695:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0003: not available for connect from 10.240.28.143@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 5695:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 88 previous similar messages Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 linear 252:1 0" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4 LDISKFS-fs (dm-4): 21 truncates cleaned up LDISKFS-fs (dm-4): recovery complete LDISKFS-fs (dm-4): mounted filesystem 7cfa4b01-d375-4c46-aa03-96010de6b3f6 r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0003: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u | Link to test |
| insanity test 7: Seventh Failure Mode: CLIENT/MDS Wed Mar 25 07:43:26 AM UTC 2026 | LustreError: 39156:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed: LustreError: 39156:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG CPU: 1 PID: 39156 Comm: ll_mgs_0002 Kdump: loaded Tainted: G OE ------- --- 5.14.0-570.62.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 lbug_with_loc.cold+0x5/0x43 [libcfs] mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs] ? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc] mgs_get_ir_logs+0x4f4/0xa00 [mgs] ? __req_capsule_get+0x143/0x3b0 [ptlrpc] mgs_config_read+0x1c6/0x1e0 [mgs] tgt_handle_request0+0x147/0x770 [ptlrpc] tgt_request_handle+0x3fd/0xd00 [ptlrpc] ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc] ? srso_alias_return_thunk+0x5/0xfbef5 ptlrpc_main+0x9bf/0xea0 [ptlrpc] ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc] kthread+0xdd/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK> | Lustre: DEBUG MARKER: /usr/sbin/lctl mark Request fail clients: 2, to fail: 1, failed: 0 Lustre: DEBUG MARKER: Request fail clients: 2, to fail: 1, failed: 0 Lustre: DEBUG MARKER: /usr/sbin/lctl mark shutdown_client: onyx-155vm164 uptime: 07:43:29 up 28 min, 0 users, load average: 0.06, 0.06, 0.03 Lustre: DEBUG MARKER: shutdown_client: onyx-155vm164 uptime: 07:43:29 up 28 min, 0 users, load average: 0.06, 0.06, 0.03 Lustre: lustre-MDT0003: haven't heard from client 6a20e600-9f84-4ad3-9b3e-f31516c4e3ee (at 10.240.31.168@tcp) in 101 seconds. I think it's dead, and I am evicting it. exp ff3027c3a50a4800, cur 1774424709 deadline 1774424708 last 1774424608 Lustre: Skipped 1 previous similar message Autotest: Test running for 30 minutes (lustre-reviews_review-dne-part-4_122892.54) Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 10.240.31.170@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 17 previous similar messages LustreError: lustre-MDT0000-osp-MDT0001: operation mds_statfs to node 10.240.31.170@tcp failed: rc = -107 LustreError: Skipped 10 previous similar messages Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm166.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: onyx-155vm166.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all LustreError: MGC10.240.31.170@tcp: Connection to MGS (at 10.240.31.170@tcp) was lost; in progress operations using this service will fail Lustre: Evicted from MGS (at 10.240.31.170@tcp) after server handle changed from 0x3c867190b29de547 to 0x3c867190b29df1db Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 Lustre: Failing over lustre-MDT0001 LDISKFS-fs (dm-3): unmounting filesystem e57ebb81-5709-48ea-9f86-2953df0fbfb9. Lustre: server umount lustre-MDT0001 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; LustreError: 34098:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.31.170@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 34098:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 83 previous similar messages Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 LDISKFS-fs (dm-3): mounted filesystem e57ebb81-5709-48ea-9f86-2953df0fbfb9 r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180 Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect Lustre: Skipped 1 previous similar message Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm167.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm167.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: onyx-155vm167.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: onyx-155vm167.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 4 clients reconnect Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0001: Recovery over after 0:02, of 4 clients 4 recovered and 0 were evicted. Lustre: Skipped 1 previous similar message Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm166.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: onyx-155vm166.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4 Lustre: Failing over lustre-MDT0003 LDISKFS-fs (dm-4): unmounting filesystem d8712dfe-f7e9-48a0-babf-c9fef9d341f2. Lustre: server umount lustre-MDT0003 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4 LDISKFS-fs (dm-4): mounted filesystem d8712dfe-f7e9-48a0-babf-c9fef9d341f2 r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u | Link to test |
| sanity-hsm test 302: HSM tunnable are persistent when CDT is off | LustreError: 183239:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed: LustreError: 183239:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG CPU: 0 PID: 183239 Comm: ll_mgs_0001 Kdump: loaded Tainted: G OE ------- --- 5.14.0-570.62.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 lbug_with_loc.cold+0x5/0x43 [libcfs] mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs] ? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc] mgs_get_ir_logs+0x4f4/0xa00 [mgs] ? __req_capsule_get+0x143/0x3b0 [ptlrpc] mgs_config_read+0x1c6/0x1e0 [mgs] tgt_handle_request0+0x147/0x770 [ptlrpc] tgt_request_handle+0x3fd/0xd00 [ptlrpc] ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc] ? srso_alias_return_thunk+0x5/0xfbef5 ptlrpc_main+0x9bf/0xea0 [ptlrpc] ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc] kthread+0xdd/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK> | Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdt.lustre-MDT0001.hsm_control='shutdown' Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdt.lustre-MDT0003.hsm_control='shutdown' Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0001.hsm_control Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0003.hsm_control Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm166.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: onyx-155vm166.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all LustreError: MGC10.240.31.170@tcp: Connection to MGS (at 10.240.31.170@tcp) was lost; in progress operations using this service will fail Lustre: Evicted from MGS (at 10.240.31.170@tcp) after server handle changed from 0xfe0d6ed644a8050f to 0xfe0d6ed644a80c0f Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 Lustre: Failing over lustre-MDT0001 LDISKFS-fs (dm-3): unmounting filesystem 2d8baffc-0e0e-41b5-8a62-8ae0ced334cf. Lustre: server umount lustre-MDT0001 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 7 previous similar messages LustreError: 17306:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. Lustre: DEBUG MARKER: modprobe dm-flakey; LustreError: 15516:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.31.168@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 15516:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 3 previous similar messages LustreError: 18198:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.31.168@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 18198:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 14 previous similar messages Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 LDISKFS-fs (dm-3): mounted filesystem 2d8baffc-0e0e-41b5-8a62-8ae0ced334cf r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 7 clients reconnect Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u debugfs: Directory '10.240.31.167@tcp' with parent 'exports' already present! debugfs: Directory '10.240.31.169@tcp' with parent 'exports' already present! debugfs: File 'stats' in directory '/' already present! debugfs: File 'ldlm_stats' in directory '/' already present! debugfs: File 'open_files' in directory '/' already present! Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm167.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm167.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: onyx-155vm167.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: onyx-155vm167.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null Lustre: lustre-MDT0001-osp-MDT0003: Connection restored to 0@lo (at 0@lo) Lustre: Skipped 9 previous similar messages Lustre: lustre-MDT0001: Recovery over after 0:05, of 7 clients 7 recovered and 0 were evicted. Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec LustreError: lustre-MDT0002-osp-MDT0001: operation mds_statfs to node 10.240.31.170@tcp failed: rc = -107 LustreError: Skipped 1 previous similar message Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm166.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: onyx-155vm166.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: onyx-155vm163.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: onyx-155vm164.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4 Lustre: Failing over lustre-MDT0003 LDISKFS-fs (dm-4): unmounting filesystem 69b6b624-61a9-4f1c-953b-dfeaf95c3731. Lustre: server umount lustre-MDT0003 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; LustreError: 130351:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0003: not available for connect from 10.240.31.169@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 130351:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 14 previous similar messages Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4 LDISKFS-fs (dm-4): mounted filesystem 69b6b624-61a9-4f1c-953b-dfeaf95c3731 r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0003: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: lustre-MDT0003: Will be in recovery for at least 1:00, or until 7 clients reconnect | Link to test |
| replay-dual test 26: dbench and tar with mds failover | LustreError: 53360:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed: LustreError: 53360:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG CPU: 1 PID: 53360 Comm: ll_mgs_0000 Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.40.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 lbug_with_loc.cold+0x5/0x43 [libcfs] mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs] ? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc] mgs_get_ir_logs+0x4f4/0xa00 [mgs] ? __req_capsule_get+0x143/0x3b0 [ptlrpc] mgs_config_read+0x1c6/0x1e0 [mgs] tgt_handle_request0+0x147/0x770 [ptlrpc] tgt_request_handle+0x3fd/0xd00 [ptlrpc] ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc] ? srso_alias_return_thunk+0x5/0xfbef5 ptlrpc_main+0x9bf/0xea0 [ptlrpc] ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc] kthread+0xdd/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK> | Lustre: DEBUG MARKER: hostname -I Lustre: 5306:0:(client.c:2472:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1774425600/real 1774425600] req@ff1a5db243f89a00 x1860617578804736/t0(0) o103->MGC10.240.47.145@tcp@10.240.47.145@tcp:17/18 lens 328/224 e 0 to 1 dl 1774425616 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'ldlm_bl.0' uid:0 gid:0 projid:4294967295 Lustre: 5335:0:(mgc_request.c:2047:mgc_process_log()) MGC10.240.47.145@tcp: IR log lustre-mdtir failed, not fatal: rc = -5 LustreError: 51041:0:(ldlm_resource.c:1167:ldlm_resource_complain()) MGC10.240.47.145@tcp: namespace resource [0x65727473756c:0x0:0x0].0x0 (ff1a5db25614cd80) refcount nonzero (1) after lock cleanup; forcing cleanup. LustreError: 51041:0:(ldlm_resource.c:1167:ldlm_resource_complain()) Skipped 1 previous similar message Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm97.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: trevis-152vm97.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds1 1 times Lustre: DEBUG MARKER: test_26 fail mds1 1 times LustreError: lustre-MDT0000-osp-MDT0003: operation mds_statfs to node 10.240.47.145@tcp failed: rc = -107 LustreError: Skipped 17 previous similar messages Lustre: 5335:0:(mgc_request.c:2047:mgc_process_log()) MGC10.240.47.145@tcp: IR log lustre-mdtir failed, not fatal: rc = -5 Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm99.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: trevis-152vm99.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: trevis-152vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm96.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: trevis-152vm96.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 notransno Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 flakey 252:0 0 0 1800 1 drop_writes" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds2 REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: mds2 REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds2 2 times Lustre: DEBUG MARKER: test_26 fail mds2 2 times Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 Lustre: Failing over lustre-MDT0001 LustreError: 52430:0:(ldlm_resource.c:1167:ldlm_resource_complain()) lustre-MDT0000-osp-MDT0001: namespace resource [0x200011d41:0x38:0x0].0x0 (ff1a5db269c29240) refcount nonzero (1) after lock cleanup; forcing cleanup. Lustre: lustre-MDT0001: Not available for connect from 10.240.47.142@tcp (stopping) Lustre: Skipped 18 previous similar messages LDISKFS-fs (dm-3): unmounting filesystem 541cd59f-c6c5-4d67-9a3e-c78820cde152. Lustre: server umount lustre-MDT0001 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 linear 252:0 0" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 LDISKFS-fs (dm-3): 12 truncates cleaned up LDISKFS-fs (dm-3): recovery complete LDISKFS-fs (dm-3): mounted filesystem 541cd59f-c6c5-4d67-9a3e-c78820cde152 r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 7 clients reconnect Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm100.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm100.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: trevis-152vm100.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: trevis-152vm100.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null Lustre: lustre-MDT0001: Recovery over after 0:05, of 7 clients 7 recovered and 0 were evicted. Lustre: 5338:0:(mdt_recovery.c:102:mdt_req_from_lrd()) @@@ restoring transno req@ff1a5db244075040 x1860617287465984/t30064775061(0) o36->c825b920-6586-4065-9c52-62f281e71c4e@10.240.47.142@tcp:575/0 lens 488/3152 e 0 to 0 dl 1774425735 ref 1 fl Interpret:/202/0 rc 0/0 job:'tar.0' uid:0 gid:0 projid:4294967295 Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: trevis-152vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm96.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: trevis-152vm96.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds3 3 times Lustre: DEBUG MARKER: test_26 fail mds3 3 times Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm99.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: trevis-152vm99.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: trevis-152vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm96.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: trevis-152vm96.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0003 notransno Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 flakey 252:1 0 0 1800 1 drop_writes" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds4 REPLAY BARRIER on lustre-MDT0003 Lustre: DEBUG MARKER: mds4 REPLAY BARRIER on lustre-MDT0003 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds4 4 times Lustre: DEBUG MARKER: test_26 fail mds4 4 times Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4 Lustre: Failing over lustre-MDT0003 Lustre: lustre-MDT0003: Not available for connect from 10.240.47.142@tcp (stopping) Lustre: Skipped 19 previous similar messages LDISKFS-fs (dm-4): unmounting filesystem 6333fee0-e171-4a98-a5bb-7f435750d7cf. Lustre: server umount lustre-MDT0003 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && LustreError: 5343:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0003: not available for connect from 10.240.47.145@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 5343:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 51 previous similar messages Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 linear 252:1 0" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4 LDISKFS-fs (dm-4): 21 truncates cleaned up LDISKFS-fs (dm-4): recovery complete LDISKFS-fs (dm-4): mounted filesystem 6333fee0-e171-4a98-a5bb-7f435750d7cf r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0003: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u | Link to test |
| insanity test 7: Seventh Failure Mode: CLIENT/MDS Wed Mar 25 07:41:11 AM UTC 2026 | LustreError: 36686:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed: LustreError: 36686:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG CPU: 0 PID: 36686 Comm: ll_mgs_0001 Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.40.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 lbug_with_loc.cold+0x5/0x43 [libcfs] mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs] ? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc] mgs_get_ir_logs+0x4f4/0xa00 [mgs] ? __req_capsule_get+0x143/0x3b0 [ptlrpc] mgs_config_read+0x1c6/0x1e0 [mgs] tgt_handle_request0+0x147/0x770 [ptlrpc] tgt_request_handle+0x3fd/0xd00 [ptlrpc] ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc] ? srso_alias_return_thunk+0x5/0xfbef5 ptlrpc_main+0x9bf/0xea0 [ptlrpc] ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc] kthread+0xdd/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK> | Lustre: DEBUG MARKER: /usr/sbin/lctl mark Request fail clients: 2, to fail: 1, failed: 0 Lustre: DEBUG MARKER: Request fail clients: 2, to fail: 1, failed: 0 Lustre: DEBUG MARKER: /usr/sbin/lctl mark shutdown_client: trevis-156vm98 uptime: 07:41:14 up 27 min, 0 users, load average: 0.09, 0.05, 0.01 Lustre: DEBUG MARKER: shutdown_client: trevis-156vm98 uptime: 07:41:14 up 27 min, 0 users, load average: 0.09, 0.05, 0.01 Lustre: lustre-MDT0003: haven't heard from client c1c0ea49-a747-445c-b7d1-94d151f7f8d2 (at 10.240.46.200@tcp) in 101 seconds. I think it's dead, and I am evicting it. exp ff3c941817a63400, cur 1774424576 deadline 1774424575 last 1774424475 Lustre: Skipped 1 previous similar message LustreError: lustre-MDT0000-osp-MDT0003: operation mds_statfs to node 10.240.46.202@tcp failed: rc = -107 Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 10.240.46.202@tcp) was lost; in progress operations using this service will wait for recovery to complete LustreError: Skipped 16 previous similar messages Lustre: Skipped 16 previous similar messages Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm100.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: trevis-156vm100.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all LustreError: MGC10.240.46.202@tcp: Connection to MGS (at 10.240.46.202@tcp) was lost; in progress operations using this service will fail Lustre: Evicted from MGS (at 10.240.46.202@tcp) after server handle changed from 0x8aec7cde9c7bb0c4 to 0x8aec7cde9c7bbe7e Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 Lustre: Failing over lustre-MDT0001 LDISKFS-fs (dm-3): unmounting filesystem 56cf54a1-ca28-46ef-9432-1fa315f27ed8. Lustre: server umount lustre-MDT0001 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; LustreError: 31940:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.46.202@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 31940:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 74 previous similar messages Autotest: Test running for 30 minutes (lustre-reviews_review-dne-part-4_122892.46) Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 LDISKFS-fs (dm-3): mounted filesystem 56cf54a1-ca28-46ef-9432-1fa315f27ed8 r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180 Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect Lustre: Skipped 1 previous similar message Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 4 clients reconnect Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0001: Recovery over after 0:01, of 4 clients 4 recovered and 0 were evicted. Lustre: Skipped 1 previous similar message Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm100.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: trevis-156vm100.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4 Lustre: Failing over lustre-MDT0003 LDISKFS-fs (dm-4): unmounting filesystem 961f933e-1dba-4213-887e-3725b11f823d. Lustre: server umount lustre-MDT0003 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4 LDISKFS-fs (dm-4): mounted filesystem 961f933e-1dba-4213-887e-3725b11f823d r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all | Link to test |
| sanity-hsm test 302: HSM tunnable are persistent when CDT is off | LustreError: 170570:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed: LustreError: 170570:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG CPU: 1 PID: 170570 Comm: ll_mgs_0002 Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.40.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 lbug_with_loc.cold+0x5/0x43 [libcfs] mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs] ? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc] mgs_get_ir_logs+0x4f4/0xa00 [mgs] ? __req_capsule_get+0x143/0x3b0 [ptlrpc] mgs_config_read+0x1c6/0x1e0 [mgs] tgt_handle_request0+0x147/0x770 [ptlrpc] tgt_request_handle+0x3fd/0xd00 [ptlrpc] ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc] ? srso_alias_return_thunk+0x5/0xfbef5 ptlrpc_main+0x9bf/0xea0 [ptlrpc] ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc] kthread+0xdd/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK> | Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdt.lustre-MDT0001.hsm_control='shutdown' Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdt.lustre-MDT0003.hsm_control='shutdown' Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0001.hsm_control Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0003.hsm_control Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm100.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all LustreError: MGC10.240.46.202@tcp: Connection to MGS (at 10.240.46.202@tcp) was lost; in progress operations using this service will fail Lustre: lustre-MDT0000-lwp-MDT0001: Connection restored to 10.240.46.202@tcp (at 10.240.46.202@tcp) Lustre: Skipped 9 previous similar messages Lustre: Evicted from MGS (at 10.240.46.202@tcp) after server handle changed from 0x2ad993970e1a1e60 to 0x2ad993970e1a253d Lustre: DEBUG MARKER: trevis-156vm100.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 Lustre: Failing over lustre-MDT0001 LDISKFS-fs (dm-3): unmounting filesystem ce689057-7d87-4efa-968f-b47826aa6305. Lustre: server umount lustre-MDT0001 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; LustreError: 14506:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.46.202@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 14507:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.46.202@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 14507:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 3 previous similar messages LustreError: 14506:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 3 previous similar messages LustreError: 105618:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.46.201@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 105618:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 5 previous similar messages LustreError: lustre-MDT0001-osp-MDT0003: operation mds_statfs to node 0@lo failed: rc = -107 LustreError: Skipped 1 previous similar message Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 5 previous similar messages LustreError: 105618:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.46.200@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 105618:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 3 previous similar messages LustreError: 16310:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.46.202@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 16310:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 2 previous similar messages Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1 LustreError: 14507:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.46.202@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 14507:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 14 previous similar messages Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 LDISKFS-fs (dm-3): mounted filesystem ce689057-7d87-4efa-968f-b47826aa6305 r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 7 clients reconnect Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u debugfs: Directory '10.240.46.200@tcp' with parent 'exports' already present! Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: trevis-156vm101.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null Autotest: Test running for 85 minutes (lustre-reviews_review-dne-part-4_122892.46) debugfs: Directory '10.240.46.202@tcp' with parent 'exports' already present! debugfs: File 'stats' in directory '/' already present! debugfs: File 'ldlm_stats' in directory '/' already present! debugfs: File 'open_files' in directory '/' already present! debugfs: Directory '10.240.46.201@tcp' with parent 'exports' already present! debugfs: File 'stats' in directory '/' already present! debugfs: File 'ldlm_stats' in directory '/' already present! debugfs: File 'open_files' in directory '/' already present! Lustre: lustre-MDT0001: Recovery over after 0:05, of 7 clients 7 recovered and 0 were evicted. Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec LustreError: lustre-MDT0002-osp-MDT0001: operation mds_statfs to node 10.240.46.202@tcp failed: rc = -107 Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm100.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: trevis-156vm100.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: trevis-156vm97.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: trevis-156vm98.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4 Lustre: Failing over lustre-MDT0003 LDISKFS-fs (dm-4): unmounting filesystem 931af2f6-deea-43bb-9243-f8e3f10102c7. Lustre: server umount lustre-MDT0003 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; LustreError: 16310:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0003: not available for connect from 10.240.46.202@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 16310:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 9 previous similar messages Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4 LDISKFS-fs (dm-4): mounted filesystem 931af2f6-deea-43bb-9243-f8e3f10102c7 r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0003: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u | Link to test |
| sanity test 804: verify agent entry for remote entry | LustreError: 466238:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed: LustreError: 466238:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG CPU: 0 PID: 466238 Comm: ll_mgs_0002 Kdump: loaded Tainted: G OE ------- --- 5.14.0-570.62.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 lbug_with_loc.cold+0x5/0x43 [libcfs] mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs] ? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc] mgs_get_ir_logs+0x4f4/0xa00 [mgs] ? __req_capsule_get+0x143/0x3b0 [ptlrpc] mgs_config_read+0x1c6/0x1e0 [mgs] tgt_handle_request0+0x147/0x770 [ptlrpc] tgt_request_handle+0x3fd/0xd00 [ptlrpc] ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc] ? srso_alias_return_thunk+0x5/0xfbef5 ptlrpc_main+0x9bf/0xea0 [ptlrpc] ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc] kthread+0xdd/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK> | Lustre: DEBUG MARKER: debugfs -c -R 'ls /REMOTE_PARENT_DIR' /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: debugfs -c -R 'ls /REMOTE_PARENT_DIR' /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: debugfs -c -R 'ls /REMOTE_PARENT_DIR' /dev/mapper/mds4_flakey LustreError: lustre-MDT0000-osp-MDT0003: operation mds_statfs to node 10.240.44.97@tcp failed: rc = -107 LustreError: MGC10.240.44.97@tcp: Connection to MGS (at 10.240.44.97@tcp) was lost; in progress operations using this service will fail Lustre: Evicted from MGS (at 10.240.44.97@tcp) after server handle changed from 0x9d26a2c865163d56 to 0x9d26a2c86530bb53 Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-153vm103.trevis.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: trevis-153vm103.trevis.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 Lustre: Failing over lustre-MDT0001 LustreError: 5635:0:(import.c:706:ptlrpc_connect_import_locked()) can't connect to a closed import LDISKFS-fs (dm-3): unmounting filesystem 4905abfe-0623-4437-ab65-2a2921dfa2dc. Lustre: server umount lustre-MDT0001 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: e2fsck -h LustreError: 298584:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.44.96@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 298584:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 14 previous similar messages Lustre: DEBUG MARKER: e2fsck -d -v -t -t -f -n /dev/mapper/mds2_flakey -m8 Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2 Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1 Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 LDISKFS-fs (dm-3): mounted filesystem 4905abfe-0623-4437-ab65-2a2921dfa2dc r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-153vm104.trevis.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-153vm104.trevis.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: trevis-153vm104.trevis.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: trevis-153vm104.trevis.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 5 clients reconnect Lustre: lustre-MDT0001: Received new MDS connection from 10.240.44.97@tcp, keep former export from same NID Lustre: lustre-MDT0001: Received new MDS connection from 10.240.44.97@tcp, keep former export from same NID Lustre: lustre-MDT0001: Client lustre-MDT0002-mdtlov_UUID (at 10.240.44.97@tcp) refused connection, still busy with 7 references Lustre: lustre-MDT0001: Received new MDS connection from 10.240.44.97@tcp, keep former export from same NID Lustre: lustre-MDT0001: Client lustre-MDT0002-mdtlov_UUID (at 10.240.44.97@tcp) refused connection, still busy with 7 references Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-153vm103.trevis.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: trevis-153vm103.trevis.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4 Lustre: Failing over lustre-MDT0003 LustreError: 5635:0:(import.c:706:ptlrpc_connect_import_locked()) can't connect to a closed import LDISKFS-fs (dm-4): unmounting filesystem ca84123f-0ba0-4ed7-a94c-06a09f5e229c. Lustre: server umount lustre-MDT0003 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: e2fsck -h Lustre: DEBUG MARKER: e2fsck -d -v -t -t -f -n /dev/mapper/mds4_flakey -m8 Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4 Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1 Lustre: lustre-MDT0003-osp-MDT0001: Connection to lustre-MDT0003 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 7 previous similar messages Lustre: lustre-MDT0001: Received new MDS connection from 10.240.44.97@tcp, keep former export from same NID Lustre: lustre-MDT0001: Client lustre-MDT0002-mdtlov_UUID (at 10.240.44.97@tcp) refused connection, still busy with 7 references Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4 LDISKFS-fs (dm-4): mounted filesystem ca84123f-0ba0-4ed7-a94c-06a09f5e229c r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0001: Recovery over after 0:12, of 5 clients 5 recovered and 0 were evicted. Lustre: lustre-MDT0003: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: lustre-MDT0003: Will be in recovery for at least 1:00, or until 5 clients reconnect Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-153vm104.trevis.whamcloud.com: executing set_default_debug all all Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-153vm104.trevis.whamcloud.com: executing set_default_debug all all | Link to test |
| replay-single test 112d: DNE: cross MDT rename, fail MDT4 | LustreError: 188667:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed: LustreError: 188667:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG CPU: 1 PID: 188667 Comm: ll_mgs_0000 Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.40.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 lbug_with_loc.cold+0x5/0x43 [libcfs] mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs] ? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc] mgs_get_ir_logs+0x4f4/0xa00 [mgs] ? __req_capsule_get+0x143/0x3b0 [ptlrpc] mgs_config_read+0x1c6/0x1e0 [mgs] tgt_handle_request0+0x147/0x770 [ptlrpc] tgt_request_handle+0x3fd/0xd00 [ptlrpc] ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc] ? srso_alias_return_thunk+0x5/0xfbef5 ptlrpc_main+0x9bf/0xea0 [ptlrpc] ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc] kthread+0xdd/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK> | Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0003 notransno Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 flakey 252:1 0 0 1800 1 drop_writes" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds4 REPLAY BARRIER on lustre-MDT0003 Lustre: DEBUG MARKER: mds4 REPLAY BARRIER on lustre-MDT0003 Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4 Lustre: Failing over lustre-MDT0003 LDISKFS-fs (dm-4): unmounting filesystem 57190e03-b795-45ec-8ae8-0bc75a4cf658. Lustre: server umount lustre-MDT0003 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 linear 252:1 0" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4 LDISKFS-fs (dm-4): 12 truncates cleaned up LDISKFS-fs (dm-4): recovery complete LDISKFS-fs (dm-4): mounted filesystem 57190e03-b795-45ec-8ae8-0bc75a4cf658 r/w with ordered data mode. Quota mode: journalled. Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u | Link to test |
| replay-single test 70d: mkdir/rmdir striped dir 4mdts recovery | LustreError: 81456:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed: LustreError: 81456:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG CPU: 0 PID: 81456 Comm: ll_mgs_0000 Kdump: loaded Tainted: G OE ------- --- 5.14.0-570.62.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 lbug_with_loc.cold+0x5/0x43 [libcfs] mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs] ? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc] mgs_get_ir_logs+0x4f4/0xa00 [mgs] ? __req_capsule_get+0x143/0x3b0 [ptlrpc] mgs_config_read+0x1c6/0x1e0 [mgs] tgt_handle_request0+0x147/0x770 [ptlrpc] tgt_request_handle+0x3fd/0xd00 [ptlrpc] ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc] ? srso_alias_return_thunk+0x5/0xfbef5 ptlrpc_main+0x9bf/0xea0 [ptlrpc] ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc] kthread+0xdd/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK> | Lustre: DEBUG MARKER: hostname -I Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-155vm105.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: trevis-155vm105.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 notransno Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 flakey 252:0 0 0 1800 1 drop_writes" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds2 REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: mds2 REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_70d fail mds2 1 times Lustre: DEBUG MARKER: test_70d fail mds2 1 times Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 Lustre: Failing over lustre-MDT0001 Lustre: lustre-MDT0001: Not available for connect from 10.240.46.6@tcp (stopping) Lustre: Skipped 2 previous similar messages LustreError: 80424:0:(ldlm_resource.c:1167:ldlm_resource_complain()) lustre-MDT0002-osp-MDT0001: namespace resource [0x280000fbc:0x966:0x0].0x0 (ff2b95df028f5f00) refcount nonzero (1) after lock cleanup; forcing cleanup. LustreError: lustre-MDT0001-osp-MDT0003: operation mds_statfs to node 0@lo failed: rc = -107 LustreError: Skipped 6 previous similar messages Lustre: lustre-MDT0001-osp-MDT0003: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 11 previous similar messages LustreError: 15995:0:(ldlm_lockd.c:2560:ldlm_cancel_handler()) ldlm_cancel from 10.240.46.9@tcp arrived at 1774427240 with bad export cookie 3003157324093779003 Autotest: Test running for 70 minutes (lustre-reviews_review-dne-part-6_122892.56) LDISKFS-fs (dm-3): unmounting filesystem 2c9bb5f6-8089-4839-a321-516fc16f5c19. Lustre: server umount lustre-MDT0001 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && LustreError: 5669:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.46.6@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 5669:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 35 previous similar messages Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 linear 252:0 0" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 LDISKFS-fs (dm-3): 12 truncates cleaned up LDISKFS-fs (dm-3): recovery complete LDISKFS-fs (dm-3): mounted filesystem 2c9bb5f6-8089-4839-a321-516fc16f5c19 r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 5 clients reconnect Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-155vm108.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-155vm108.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: trevis-155vm108.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: trevis-155vm108.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null Lustre: lustre-MDT0001: Recovery over after 0:07, of 5 clients 5 recovered and 0 were evicted. Lustre: lustre-MDT0001-osp-MDT0003: Connection restored to 0@lo (at 0@lo) Lustre: Skipped 14 previous similar messages Lustre: 75556:0:(mdt_recovery.c:102:mdt_req_from_lrd()) @@@ restoring transno req@ff2b95df1c5b5a00 x1860617651767808/t12885020237(0) o36->ea54ed17-e5df-4930-83f1-a1cef264e6b1@10.240.46.6@tcp:647/0 lens 496/2888 e 0 to 0 dl 1774427317 ref 1 fl Interpret:/202/0 rc 0/0 job:'rm.0' uid:0 gid:0 projid:4294967295 Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-155vm105.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: trevis-155vm105.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0003 notransno Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 flakey 252:1 0 0 1800 1 drop_writes" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds4 REPLAY BARRIER on lustre-MDT0003 Lustre: DEBUG MARKER: mds4 REPLAY BARRIER on lustre-MDT0003 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_70d fail mds4 2 times Lustre: DEBUG MARKER: test_70d fail mds4 2 times Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4 Lustre: Failing over lustre-MDT0003 Lustre: lustre-MDT0003: Not available for connect from 10.240.46.6@tcp (stopping) Lustre: Skipped 17 previous similar messages LDISKFS-fs (dm-4): unmounting filesystem 249c35b5-9887-4289-92b0-3c440fe3bf8f. Lustre: server umount lustre-MDT0003 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; LustreError: 78652:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0003: not available for connect from 10.240.46.8@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 78652:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 37 previous similar messages Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 linear 252:1 0" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4 LDISKFS-fs (dm-4): 12 truncates cleaned up LDISKFS-fs (dm-4): recovery complete LDISKFS-fs (dm-4): mounted filesystem 249c35b5-9887-4289-92b0-3c440fe3bf8f r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0003: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: lustre-MDT0003: Will be in recovery for at least 1:00, or until 5 clients reconnect | Link to test |
| replay-dual test 26: dbench and tar with mds failover | LustreError: 68871:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed: LustreError: 68871:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG CPU: 0 PID: 68871 Comm: ll_mgs_0000 Kdump: loaded Tainted: G OE ------- --- 5.14.0-570.62.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 lbug_with_loc.cold+0x5/0x43 [libcfs] mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs] ? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc] mgs_get_ir_logs+0x4f4/0xa00 [mgs] ? __req_capsule_get+0x143/0x3b0 [ptlrpc] mgs_config_read+0x1c6/0x1e0 [mgs] tgt_handle_request0+0x147/0x770 [ptlrpc] tgt_request_handle+0x3fd/0xd00 [ptlrpc] ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc] ? srso_alias_return_thunk+0x5/0xfbef5 ptlrpc_main+0x9bf/0xea0 [ptlrpc] ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc] kthread+0xdd/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK> | Lustre: DEBUG MARKER: hostname -I Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm166.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: onyx-156vm166.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds1 1 times Lustre: DEBUG MARKER: test_26 fail mds1 1 times Lustre: 5658:0:(client.c:2472:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1774427817/real 1774427817] req@ff17b6b06a3230c0 x1860617434105984/t0(0) o103->MGC10.240.24.16@tcp@10.240.24.16@tcp:17/18 lens 328/224 e 0 to 1 dl 1774427833 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'ldlm_bl.0' uid:0 gid:0 projid:4294967295 Lustre: 5687:0:(mgc_request.c:2047:mgc_process_log()) MGC10.240.24.16@tcp: IR log lustre-mdtir failed, not fatal: rc = -5 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm168.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: onyx-156vm168.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: 5687:0:(mgc_request.c:2047:mgc_process_log()) MGC10.240.24.16@tcp: IR log lustre-mdtir failed, not fatal: rc = -5 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm166.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: onyx-156vm166.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm165.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: onyx-156vm165.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 notransno Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 flakey 252:0 0 0 1800 1 drop_writes" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds2 REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: mds2 REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds2 2 times Lustre: DEBUG MARKER: test_26 fail mds2 2 times Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 Lustre: Failing over lustre-MDT0001 Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) Lustre: Skipped 14 previous similar messages LDISKFS-fs (dm-3): unmounting filesystem 4510c1d6-0961-4c6f-9f82-601c6611a776. Lustre: server umount lustre-MDT0001 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; LustreError: 8485:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0001: not available for connect from 10.240.23.243@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 8485:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 31 previous similar messages Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 linear 252:0 0" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 LDISKFS-fs (dm-3): 12 truncates cleaned up LDISKFS-fs (dm-3): recovery complete LDISKFS-fs (dm-3): mounted filesystem 4510c1d6-0961-4c6f-9f82-601c6611a776 r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 7 clients reconnect Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm169.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm169.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: onyx-156vm169.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: onyx-156vm169.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null Lustre: lustre-MDT0001: Recovery over after 0:05, of 7 clients 7 recovered and 0 were evicted. Lustre: lustre-MDT0001-osp-MDT0003: Connection restored to 0@lo (at 0@lo) Lustre: Skipped 68 previous similar messages Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm166.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: onyx-156vm166.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm165.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: onyx-156vm165.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds3 3 times Lustre: DEBUG MARKER: test_26 fail mds3 3 times LustreError: lustre-MDT0002-osp-MDT0001: operation mds_statfs to node 10.240.24.16@tcp failed: rc = -107 LustreError: Skipped 19 previous similar messages Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm168.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: onyx-156vm168.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm166.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: onyx-156vm166.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-156vm165.onyx.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: onyx-156vm165.onyx.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0003 notransno Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 flakey 252:1 0 0 1800 1 drop_writes" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds4 REPLAY BARRIER on lustre-MDT0003 Lustre: DEBUG MARKER: mds4 REPLAY BARRIER on lustre-MDT0003 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds4 4 times Lustre: DEBUG MARKER: test_26 fail mds4 4 times Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4 Lustre: Failing over lustre-MDT0003 LDISKFS-fs (dm-4): unmounting filesystem 619002bf-cc7e-4d1a-94c4-e05ca1825a5c. Lustre: server umount lustre-MDT0003 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 linear 252:1 0" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4 LDISKFS-fs (dm-4): 21 truncates cleaned up LDISKFS-fs (dm-4): recovery complete LDISKFS-fs (dm-4): mounted filesystem 619002bf-cc7e-4d1a-94c4-e05ca1825a5c r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0003: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: lustre-MDT0003: Will be in recovery for at least 1:00, or until 7 clients reconnect | Link to test |
| replay-dual test 26: dbench and tar with mds failover | LustreError: 64194:0:(mgs_nids.c:195:mgs_nidtbl_read()) ASSERTION( nidtbl_is_sane(tbl) ) failed: LustreError: 64194:0:(mgs_nids.c:195:mgs_nidtbl_read()) LBUG CPU: 1 PID: 64194 Comm: ll_mgs_0000 Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.40.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 lbug_with_loc.cold+0x5/0x43 [libcfs] mgs_nidtbl_read.constprop.0+0x811/0x8b0 [mgs] ? __pfx_lustre_swab_mgs_config_res+0x10/0x10 [ptlrpc] mgs_get_ir_logs+0x4f4/0xa00 [mgs] ? __req_capsule_get+0x143/0x3b0 [ptlrpc] mgs_config_read+0x1c6/0x1e0 [mgs] tgt_handle_request0+0x147/0x770 [ptlrpc] tgt_request_handle+0x3fd/0xd00 [ptlrpc] ptlrpc_server_handle_request.isra.0+0x2e5/0xd80 [ptlrpc] ? srso_alias_return_thunk+0x5/0xfbef5 ptlrpc_main+0x9bf/0xea0 [ptlrpc] ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc] kthread+0xdd/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK> | Lustre: DEBUG MARKER: hostname -I Lustre: 5305:0:(client.c:2472:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1774427640/real 1774427640] req@ff305529e9263a80 x1860617328025088/t0(0) o103->MGC10.240.41.242@tcp@10.240.41.242@tcp:17/18 lens 328/224 e 0 to 1 dl 1774427656 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'ldlm_bl.0' uid:0 gid:0 projid:4294967295 Lustre: 5335:0:(mgc_request.c:2047:mgc_process_log()) MGC10.240.41.242@tcp: IR log lustre-mdtir failed, not fatal: rc = -5 Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm101.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: trevis-154vm101.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds1 1 times Lustre: DEBUG MARKER: test_26 fail mds1 1 times Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm103.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: trevis-154vm103.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: 5305:0:(client.c:2472:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1774427667/real 1774427667] req@ff305529e74349c0 x1860617328616192/t0(0) o400->MGC10.240.41.242@tcp@10.240.41.242@tcp:26/25 lens 224/224 e 0 to 1 dl 1774427683 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0 projid:4294967295 Lustre: 5305:0:(client.c:2472:ptlrpc_expire_one_request()) Skipped 1 previous similar message LustreError: 62086:0:(ldlm_resource.c:1167:ldlm_resource_complain()) MGC10.240.41.242@tcp: namespace resource [0x65727473756c:0x2:0x0].0x0 (ff305529d7edb540) refcount nonzero (1) after lock cleanup; forcing cleanup. LustreError: 62086:0:(ldlm_resource.c:1167:ldlm_resource_complain()) Skipped 1 previous similar message Lustre: 5335:0:(mgc_request.c:2047:mgc_process_log()) MGC10.240.41.242@tcp: IR log lustre-mdtir failed, not fatal: rc = -5 Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm101.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: trevis-154vm101.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm100.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: trevis-154vm100.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 notransno Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 flakey 252:0 0 0 1800 1 drop_writes" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds2 REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: mds2 REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds2 2 times Lustre: DEBUG MARKER: test_26 fail mds2 2 times Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds2 Lustre: Failing over lustre-MDT0001 Lustre: lustre-MDT0001: Not available for connect from 10.240.41.239@tcp (stopping) Lustre: Skipped 16 previous similar messages LDISKFS-fs (dm-3): unmounting filesystem 7305e73d-5281-45c6-87f2-dcff4964fe9d. Lustre: server umount lustre-MDT0001 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Autotest: Test running for 80 minutes (lustre-reviews_review-dne-part-8_122892.50) Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds2_flakey 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds2_flakey --table "0 4071424 linear 252:0 0" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: test -b /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 LDISKFS-fs (dm-3): 12 truncates cleaned up LDISKFS-fs (dm-3): recovery complete LDISKFS-fs (dm-3): mounted filesystem 7305e73d-5281-45c6-87f2-dcff4964fe9d r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 7 clients reconnect Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm104.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm104.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: trevis-154vm104.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: trevis-154vm104.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds2_flakey 2>/dev/null Lustre: lustre-MDT0001: Recovery over after 0:04, of 7 clients 7 recovered and 0 were evicted. Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm101.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: trevis-154vm101.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm100.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0001-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: trevis-154vm100.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0001-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds3 3 times Lustre: DEBUG MARKER: test_26 fail mds3 3 times Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm103.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: trevis-154vm103.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm101.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: trevis-154vm101.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-154vm100.trevis.whamcloud.com: executing wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0002-mdc-\*.mds_server_uuid Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: trevis-154vm100.trevis.whamcloud.com: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0002-mdc-*.mds_server_uuid Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-\*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0003 notransno Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 flakey 252:1 0 0 1800 1 drop_writes" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds4 REPLAY BARRIER on lustre-MDT0003 Lustre: DEBUG MARKER: mds4 REPLAY BARRIER on lustre-MDT0003 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds4 4 times Lustre: DEBUG MARKER: test_26 fail mds4 4 times Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds4' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds4 Lustre: Failing over lustre-MDT0003 Lustre: lustre-MDT0003: Not available for connect from 10.240.41.239@tcp (stopping) Lustre: Skipped 1 previous similar message LDISKFS-fs (dm-4): unmounting filesystem 8b9f3749-5b8b-46aa-9107-16227669ef3d. Lustre: server umount lustre-MDT0003 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; LustreError: 5342:0:(ldlm_lib.c:1175:target_handle_connect()) lustre-MDT0003: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: 5342:0:(ldlm_lib.c:1175:target_handle_connect()) Skipped 89 previous similar messages Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds4_flakey 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds4_flakey --table "0 4071424 linear 252:1 0" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: test -b /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre -o localrecov /dev/mapper/mds4_flakey /mnt/lustre-mds4 LDISKFS-fs (dm-4): 21 truncates cleaned up LDISKFS-fs (dm-4): recovery complete LDISKFS-fs (dm-4): mounted filesystem 8b9f3749-5b8b-46aa-9107-16227669ef3d r/w with ordered data mode. Quota mode: journalled. Lustre: lustre-MDT0003: Imperative Recovery not enabled, recovery window 60-180 Lustre: lustre-MDT0003: in recovery but waiting for the first client to connect Lustre: lustre-MDT0003: Will be in recovery for at least 1:00, or until 7 clients reconnect Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u | Link to test |