| Match messages in logs (every line would be required to be present in log output Copy from "Messages before crash" column below): | |
| Match messages in full crash (every line would be required to be present in crash log output Copy from "Full Crash" column below): | |
| Limit to a test: (Copy from below "Failing text"): | |
| Delete these reports as invalid (real bug in review or some such) | |
| Bug or comment: | |
| Extra info: |
| Failing Test | Full Crash | Messages before crash | Comment |
|---|---|---|---|
| replay-dual test 26: dbench and tar with mds failover | BUG: unable to handle kernel NULL pointer dereference at 0000000000000024 PGD 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 PID: 144809 Comm: tgt_recover_0 Kdump: loaded Tainted: P W OE -------- - - 4.18.0-553.89.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014 RIP: 0010:lu_context_key_get+0x2b/0x80 [obdclass] Code: 1f 44 00 00 55 53 48 63 46 20 48 39 34 c5 80 74 2f c1 75 1f 48 89 f3 8b 37 48 89 fd f7 c6 00 02 00 00 74 3f 48 8b 55 10 5b 5d <48> 8b 04 c2 e9 c7 d7 bc e2 48 c7 c7 a0 77 1d c1 48 c7 c2 38 83 19 RSP: 0018:ff3966f38a483ca8 EFLAGS: 00010282 RAX: 0000000000000004 RBX: ff1660d8ce746f78 RCX: 0000000000000000 RDX: 0000000000000004 RSI: ff1660d8fba1e698 RDI: ff1660d8fba1e698 RBP: ff1660d8e1dd2800 R08: 0000000000000000 R09: c0000000ffff7fff R10: 0000000000000001 R11: ff3966f38a483ab0 R12: ff1660d8d21f12b0 R13: ff1660d8b9666400 R14: ff1660d8bc9db600 R15: ff1660d8af6ff150 FS: 0000000000000000(0000) GS:ff1660d8fba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000024 CR3: 0000000137610005 CR4: 0000000000771ef0 PKRU: 55555554 Call Trace: ? __die_body+0x1a/0x60 ? no_context+0x1ba/0x3f0 ? __bad_area_nosemaphore+0x157/0x180 ? do_error_trap+0x9e/0xd0 ? do_page_fault+0x37/0x12d ? page_fault+0x1e/0x30 ? lu_context_key_get+0x2b/0x80 [obdclass] mdd_close+0x73/0xf00 [mdd] mdt_mfd_close+0x6e2/0xc10 [mdt] mdt_obd_disconnect+0x23f/0x820 [mdt] class_disconnect_export_list+0x21c/0x590 [obdclass] class_disconnect_stale_exports+0x26f/0x3b0 [obdclass] ? exp_lock_replay_healthy+0x30/0x30 [ptlrpc] target_recovery_thread+0x62d/0x1250 [ptlrpc] ? srso_alias_return_thunk+0x5/0xfcdfd ? replay_request_or_update.isra.31+0xa90/0xa90 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x1f/0x40 Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) dm_mod zfs(POE) spl(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common kvm_amd ccp kvm irqbypass iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev i2c_i801 lpc_ich virtio_balloon ext4 mbcache jbd2 ahci libahci libata crc32c_intel virtio_net net_failover failover serio_raw virtio_blk [last unloaded: obdecho] CR2: 0000000000000024 | Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm44.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: trevis-152vm44.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 Lustre: MGS: Client 32319d8c-256e-44cc-9164-5b758577f787 (at 10.240.47.90@tcp) reconnecting Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds1 1 times Lustre: DEBUG MARKER: test_26 fail mds1 1 times Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: 7988:0:(client.c:2478:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1774605265/real 1774605265] req@ff1660d8bb9f1380 x1860806070566400/t0(0) o103->MGC10.240.47.92@tcp@0@lo:17/18 lens 328/224 e 0 to 1 dl 1774605281 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'ldlm_bl.0' uid:0 gid:0 projid:4294967295 Lustre: MGS: Client d7f22288-df3e-4cee-95a6-a381e17f9cb8 (at 0@lo) reconnecting LustreError: 144173:0:(ldlm_resource.c:1170:ldlm_resource_complain()) MGC10.240.47.92@tcp: namespace resource [0x65727473756c:0x2:0x0].0x0 (ff1660d8b9fa8300) refcount nonzero (1) after lock cleanup; forcing cleanup. Lustre: 8024:0:(mgc_request.c:1917:mgc_process_log()) MGC10.240.47.92@tcp: IR log lustre-mdtir failed, not fatal: rc = -5 Lustre: Failing over lustre-MDT0000 LustreError: 8016:0:(client.c:1380:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ff1660d8cd3ddd40 x1860806071278080/t0(0) o105->MGS@10.240.47.90@tcp:15/16 lens 336/224 e 0 to 0 dl 0 ref 1 fl Rpc:QU/0/ffffffff rc 0/-1 job:'' uid:4294967295 gid:4294967295 projid:4294967295 LustreError: 8016:0:(client.c:1380:ptlrpc_import_delay_req()) Skipped 2 previous similar messages Lustre: server umount lustre-MDT0000 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: ! zpool list -H lustre-mdt1 >/dev/null 2>&1 || Lustre: DEBUG MARKER: lsmod | grep zfs >&/dev/null || modprobe zfs; Lustre: DEBUG MARKER: zfs get -H -o value lustre:svname lustre-mdt1/mdt1 Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 LustreError: 144757:0:(ldlm_resource.c:1170:ldlm_resource_complain()) MGC10.240.47.92@tcp: namespace resource [0x65727473756c:0x2:0x0].0x0 (ff1660d8b9fa8f00) refcount nonzero (1) after lock cleanup; forcing cleanup. Lustre: 8024:0:(mgc_request.c:1917:mgc_process_log()) MGC10.240.47.92@tcp: IR log lustre-mdtir failed, not fatal: rc = -5 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm46.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-152vm46.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: trevis-152vm46.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: trevis-152vm46.trevis.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: zfs get -H -o value lustre:svname lustre-mdt1/mdt1 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: zfs get -H -o value lustre:svname lustre-mdt1/mdt1 2>/dev/null Lustre: 144809:0:(ldlm_lib.c:2067:extend_recovery_timer()) lustre-MDT0000: extended recovery timer reached hard limit: 180, extend: 1 Lustre: 144809:0:(ldlm_lib.c:2067:extend_recovery_timer()) Skipped 4 previous similar messages LustreError: 144809:0:(mdt_open.c:1729:mdt_reint_open()) lustre-MDT0000: name 'RESULTS1.PRN' present, but FID [0x20000afe1:0x17:0x0] is invalid LustreError: 144809:0:(mdt_handler.c:5295:mdt_intent_open()) @@@ Replay open failed with -5 req@ff1660d8c8086a40 x1860805781154176/t0(154618823102) o101->9196f5d8-a1a4-49be-ad23-4d93e7c96dd9@10.240.47.89@tcp:544/0 lens 584/608 e 0 to 0 dl 1774605394 ref 1 fl Complete:/604/0 rc 0/0 job:'dbench.0' uid:0 gid:0 projid:0 Lustre: 144809:0:(genops.c:1620:class_disconnect_stale_exports()) lustre-MDT0000: disconnect stale client 9196f5d8-a1a4-49be-ad23-4d93e7c96dd9@10.240.47.89@tcp Lustre: lustre-MDT0000: disconnecting 1 stale clients ------------[ cut here ]------------ Probable access of uninitialized array lc_tags:d1f4f000 WARNING: CPU: 0 PID: 144809 at /tmp/rpmbuild-lustre-jenkins-RFOiZdjK/BUILD/lustre-2.17.51_23_g649b37b/lustre/obdclass/lu_object.c:1613 lu_context_key_get+0x70/0x80 [obdclass] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) dm_mod zfs(POE) spl(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common kvm_amd ccp kvm irqbypass iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev i2c_i801 lpc_ich virtio_balloon ext4 mbcache jbd2 ahci libahci libata crc32c_intel virtio_net net_failover failover serio_raw virtio_blk [last unloaded: obdecho] CPU: 0 PID: 144809 Comm: tgt_recover_0 Kdump: loaded Tainted: P OE -------- - - 4.18.0-553.89.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014 RIP: 0010:lu_context_key_get+0x70/0x80 [obdclass] Code: 44 1a c1 c7 05 55 0e 0a 00 00 00 04 00 e8 e8 2c 63 ff 48 c7 c7 a0 77 1d c1 e8 2c 0b 63 ff 48 c7 c7 58 83 19 c1 e8 8d 2c fc e1 <0f> 0b 48 63 43 20 eb ad 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 RSP: 0018:ff3966f38a483c98 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffffc11e6a80 RCX: 0000000000000000 RDX: ff1660d8fba2eec0 RSI: ff1660d8fba1e698 RDI: ff1660d8fba1e698 RBP: ff3966f38a483dc8 R08: 0000000000000000 R09: c0000000ffff7fff R10: 0000000000000001 R11: ff3966f38a483ab0 R12: ff1660d8d21f12b0 R13: ff1660d8b9666400 R14: ff1660d8bc9db600 R15: ff1660d8af6ff150 FS: 0000000000000000(0000) GS:ff1660d8fba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055ac9a521d68 CR3: 0000000137610005 CR4: 0000000000771ef0 PKRU: 55555554 Call Trace: ? __warn+0x94/0xe0 ? lu_context_key_get+0x70/0x80 [obdclass] ? lu_context_key_get+0x70/0x80 [obdclass] ? report_bug+0xb1/0xe0 ? do_error_trap+0x9e/0xd0 ? do_invalid_op+0x36/0x40 ? lu_context_key_get+0x70/0x80 [obdclass] ? invalid_op+0x14/0x20 ? lu_context_key_get+0x70/0x80 [obdclass] ? lu_context_key_get+0x70/0x80 [obdclass] mdd_close+0x73/0xf00 [mdd] mdt_mfd_close+0x6e2/0xc10 [mdt] mdt_obd_disconnect+0x23f/0x820 [mdt] class_disconnect_export_list+0x21c/0x590 [obdclass] class_disconnect_stale_exports+0x26f/0x3b0 [obdclass] ? exp_lock_replay_healthy+0x30/0x30 [ptlrpc] target_recovery_thread+0x62d/0x1250 [ptlrpc] ? srso_alias_return_thunk+0x5/0xfcdfd ? replay_request_or_update.isra.31+0xa90/0xa90 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x1f/0x40 ---[ end trace 1f6d74f64f49d4c8 ]--- | Link to test |
| replay-dual test 26: dbench and tar with mds failover | BUG: unable to handle kernel NULL pointer dereference at 0000000000000024 PGD 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 PID: 259990 Comm: tgt_recover_0 Kdump: loaded Tainted: G W OE -------- - - 4.18.0-553.89.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014 RIP: 0010:lu_context_key_get+0x2b/0x80 [obdclass] Code: 1f 44 00 00 55 53 48 63 46 20 48 39 34 c5 80 24 a1 c0 75 1f 48 89 f3 8b 37 48 89 fd f7 c6 00 02 00 00 74 3f 48 8b 55 10 5b 5d <48> 8b 04 c2 e9 c7 27 4b d3 48 c7 c7 a0 27 8f c0 48 c7 c2 38 33 8b RSP: 0018:ff6c89fe4a8a7ca8 EFLAGS: 00010282 RAX: 0000000000000004 RBX: ff2d3859c3d6b618 RCX: 0000000000000000 RDX: 0000000000000004 RSI: ff2d3859fba1e698 RDI: ff2d3859fba1e698 RBP: ff2d3859c097b700 R08: 0000000000000000 R09: c0000000ffff7fff R10: 0000000000000001 R11: ff6c89fe4a8a7ab0 R12: ff2d3859c3db22b0 R13: ff2d3859c050f800 R14: ff2d3859c0bc4780 R15: ff2d3859c2d6d000 FS: 0000000000000000(0000) GS:ff2d3859fba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000024 CR3: 000000012f410004 CR4: 0000000000771ef0 PKRU: 55555554 Call Trace: ? __die_body+0x1a/0x60 ? no_context+0x1ba/0x3f0 ? __bad_area_nosemaphore+0x157/0x180 ? do_error_trap+0x9e/0xd0 ? do_page_fault+0x37/0x12d ? page_fault+0x1e/0x30 ? lu_context_key_get+0x2b/0x80 [obdclass] mdd_close+0x73/0xf00 [mdd] mdt_mfd_close+0x6e2/0xc10 [mdt] mdt_obd_disconnect+0x23f/0x820 [mdt] class_disconnect_export_list+0x21c/0x590 [obdclass] class_disconnect_stale_exports+0x26f/0x3b0 [obdclass] ? exp_lock_replay_healthy+0x30/0x30 [ptlrpc] target_recovery_thread+0x62d/0x1250 [ptlrpc] ? srso_alias_return_thunk+0x5/0xfcdfd ? replay_request_or_update.isra.31+0xa90/0xa90 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x1f/0x40 Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod intel_rapl_msr intel_rapl_common kvm_amd ccp kvm rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache iTCO_wdt iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr i2c_i801 virtio_balloon lpc_ich sunrpc ext4 mbcache jbd2 ahci libahci libata crc32c_intel virtio_net serio_raw net_failover failover virtio_blk CR2: 0000000000000024 | Lustre: MGS: Client 1ed6ab3b-59cf-48ef-ae99-bff3729f331a (at 10.240.29.127@tcp) reconnecting Lustre: 8365:0:(client.c:2478:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1774607453/real 1774607453] req@ff2d3859b35aba80 x1860805848136192/t0(0) o103->MGC10.240.29.129@tcp@0@lo:17/18 lens 328/224 e 0 to 1 dl 1774607469 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'ldlm_bl.0' uid:0 gid:0 projid:4294967295 LustreError: 257449:0:(ldlm_resource.c:1170:ldlm_resource_complain()) MGC10.240.29.129@tcp: namespace resource [0x65727473756c:0x2:0x0].0x0 (ff2d3859b9f806c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Lustre: 8407:0:(mgc_request.c:1917:mgc_process_log()) MGC10.240.29.129@tcp: IR log lustre-mdtir failed, not fatal: rc = -5 Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm255.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: onyx-155vm255.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds1_flakey --table "0 4071424 flakey 252:0 0 0 1800 1 drop_writes" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds1 1 times Lustre: DEBUG MARKER: test_26 fail mds1 1 times Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 LustreError: 258713:0:(ldlm_resource.c:1170:ldlm_resource_complain()) lustre-MDT0001-osp-MDT0000: namespace resource [0x2400032e0:0xf:0x0].0x0 (ff2d3859c0053300) refcount nonzero (1) after lock cleanup; forcing cleanup. Lustre: lustre-MDT0000: Not available for connect from 10.240.29.126@tcp (stopping) LustreError: 10755:0:(client.c:1380:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ff2d38598598b740 x1860805848453888/t0(0) o105->MGS@0@lo:15/16 lens 336/224 e 0 to 0 dl 0 ref 1 fl Rpc:QU/0/ffffffff rc 0/-1 job:'' uid:4294967295 gid:4294967295 projid:4294967295 LustreError: 10755:0:(client.c:1380:ptlrpc_import_delay_req()) Skipped 3 previous similar messages Lustre: server umount lustre-MDT0000 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Autotest: Test running for 80 minutes (lustre-reviews_review-dne-part-8_122951.33) Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds1_flakey --table "0 4071424 linear 252:0 0" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: test -b /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 LDISKFS-fs (dm-3): 9 truncates cleaned up LDISKFS-fs (dm-3): recovery complete LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc LustreError: 259937:0:(ldlm_resource.c:1170:ldlm_resource_complain()) MGC10.240.29.129@tcp: namespace resource [0x65727473756c:0x2:0x0].0x0 (ff2d3859b9f80a80) refcount nonzero (1) after lock cleanup; forcing cleanup. LustreError: 259937:0:(ldlm_resource.c:1170:ldlm_resource_complain()) Skipped 1 previous similar message Lustre: 8407:0:(mgc_request.c:1917:mgc_process_log()) MGC10.240.29.129@tcp: IR log lustre-mdtir failed, not fatal: rc = -5 Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm257.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-155vm257.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: onyx-155vm257.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: onyx-155vm257.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null Lustre: 259990:0:(ldlm_lib.c:2067:extend_recovery_timer()) lustre-MDT0000: extended recovery timer reached hard limit: 180, extend: 1 Lustre: 259990:0:(ldlm_lib.c:2067:extend_recovery_timer()) Skipped 39 previous similar messages Lustre: 259990:0:(genops.c:1620:class_disconnect_stale_exports()) lustre-MDT0000: disconnect stale client 221af37f-3824-4662-b987-01202e96fc97@10.240.29.126@tcp Lustre: 259990:0:(genops.c:1620:class_disconnect_stale_exports()) Skipped 2 previous similar messages Lustre: lustre-MDT0000: disconnecting 1 stale clients Lustre: Skipped 2 previous similar messages ------------[ cut here ]------------ Probable access of uninitialized array lc_tags:c020c000 WARNING: CPU: 0 PID: 259990 at /tmp/rpmbuild-lustre-jenkins-RFOiZdjK/BUILD/lustre-2.17.51_23_g649b37b/lustre/obdclass/lu_object.c:1613 lu_context_key_get+0x70/0x80 [obdclass] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod intel_rapl_msr intel_rapl_common kvm_amd ccp kvm rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache iTCO_wdt iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr i2c_i801 virtio_balloon lpc_ich sunrpc ext4 mbcache jbd2 ahci libahci libata crc32c_intel virtio_net serio_raw net_failover failover virtio_blk CPU: 0 PID: 259990 Comm: tgt_recover_0 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.89.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014 RIP: 0010:lu_context_key_get+0x70/0x80 [obdclass] Code: f4 8b c0 c7 05 55 0e 0a 00 00 00 04 00 e8 e8 6c ab ff 48 c7 c7 a0 27 8f c0 e8 2c 4b ab ff 48 c7 c7 58 33 8b c0 e8 8d 7c 8a d2 <0f> 0b 48 63 43 20 eb ad 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 RSP: 0018:ff6c89fe4a8a7c98 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffffc0901a80 RCX: 0000000000000000 RDX: ff2d3859fba2eec0 RSI: ff2d3859fba1e698 RDI: ff2d3859fba1e698 RBP: ff6c89fe4a8a7dc8 R08: 0000000000000000 R09: c0000000ffff7fff R10: 0000000000000001 R11: ff6c89fe4a8a7ab0 R12: ff2d3859c3db22b0 R13: ff2d3859c050f800 R14: ff2d3859c0bc4780 R15: ff2d3859c2d6d000 FS: 0000000000000000(0000) GS:ff2d3859fba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc6ecafa040 CR3: 000000012f410004 CR4: 0000000000771ef0 PKRU: 55555554 Call Trace: ? __warn+0x94/0xe0 ? lu_context_key_get+0x70/0x80 [obdclass] ? lu_context_key_get+0x70/0x80 [obdclass] ? report_bug+0xb1/0xe0 ? do_error_trap+0x9e/0xd0 ? do_invalid_op+0x36/0x40 ? lu_context_key_get+0x70/0x80 [obdclass] ? invalid_op+0x14/0x20 ? lu_context_key_get+0x70/0x80 [obdclass] ? lu_context_key_get+0x70/0x80 [obdclass] mdd_close+0x73/0xf00 [mdd] mdt_mfd_close+0x6e2/0xc10 [mdt] mdt_obd_disconnect+0x23f/0x820 [mdt] class_disconnect_export_list+0x21c/0x590 [obdclass] class_disconnect_stale_exports+0x26f/0x3b0 [obdclass] ? exp_lock_replay_healthy+0x30/0x30 [ptlrpc] target_recovery_thread+0x62d/0x1250 [ptlrpc] ? srso_alias_return_thunk+0x5/0xfcdfd ? replay_request_or_update.isra.31+0xa90/0xa90 [ptlrpc] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x1f/0x40 ---[ end trace 0f85b6467e16cbf4 ]--- | Link to test |