Match messages in logs (every line would be required to be present in log output Copy from "Messages before crash" column below): | |
Match messages in full crash (every line would be required to be present in crash log output Copy from "Full Crash" column below): | |
Limit to a test: (Copy from below "Failing text"): | |
Delete these reports as invalid (real bug in review or some such) | |
Bug or comment: | |
Extra info: |
Failing Test | Full Crash | Messages before crash | Comment |
---|---|---|---|
replay-dual test 15a: timeout waiting for lost client during replay, 1 client completes | BUG: unable to handle page fault for address: 00000011005d3068 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 140327 Comm: tgt_recover_0 Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.38.1_lustre.el9.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tgt_txn_start_cb+0x23b/0x3b0 [ptlrpc] Code: 8b 42 38 48 83 c2 38 49 8b 0c 24 4c 8d 68 e8 48 39 c2 75 16 e9 f2 fe ff ff 49 8b 45 18 4c 8d 68 e8 48 39 c2 0f 84 e1 fe ff ff <49> 3b 4d 08 75 e9 e9 d9 fe ff ff 48 c7 c7 40 bf 34 c1 48 c7 c2 5e RSP: 0018:ffffada047ca7b78 EFLAGS: 00010296 RAX: 00000011005d3078 RBX: ffffa028765d2d48 RCX: ffffa02861510000 RDX: ffffa028429d5a00 RSI: ffffa02875fa6000 RDI: ffffa02875a84f00 RBP: ffffa0289b758e00 R08: 0000000000000011 R09: ffffa02975fa5d49 R10: ffffffffffffffff R11: 000000000000000f R12: ffffa0284418a000 R13: 00000011005d3060 R14: ffffa028429d5968 R15: ffffa0286de87000 FS: 0000000000000000(0000) GS:ffffa028ffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000011005d3068 CR3: 000000000376e006 CR4: 00000000001706f0 Call Trace: <TASK> ? show_trace_log_lvl+0x1c4/0x2df ? show_trace_log_lvl+0x1c4/0x2df ? dt_txn_hook_start+0x52/0x80 [obdclass] ? __die_body.cold+0x8/0xd ? page_fault_oops+0x134/0x170 ? kernelmode_fixup_or_oops+0x84/0x110 ? exc_page_fault+0x62/0x150 ? asm_exc_page_fault+0x22/0x30 ? tgt_txn_start_cb+0x23b/0x3b0 [ptlrpc] ? tgt_txn_start_cb+0x1ee/0x3b0 [ptlrpc] dt_txn_hook_start+0x52/0x80 [obdclass] osd_trans_start+0xc3/0x750 [osd_ldiskfs] tgt_server_data_update+0x38f/0x5e0 [ptlrpc] tgt_client_del+0x362/0x780 [ptlrpc] mdt_export_cleanup+0x2d4/0x3d0 [mdt] mdt_obd_disconnect+0xc1/0x280 [mdt] obd_disconnect+0x10e/0x250 [obdclass] class_disconnect_export_list+0x1fa/0x390 [obdclass] class_disconnect_stale_exports+0x2a3/0x3a0 [obdclass] ? __pfx_exp_finished_or_from_mdt+0x10/0x10 [ptlrpc] ? __pfx_check_for_next_transno+0x10/0x10 [ptlrpc] target_recovery_overseer+0x497/0x660 [ptlrpc] ? __pfx_exp_req_replay_healthy_or_from_mdt+0x10/0x10 [ptlrpc] ? dtrq_destroy+0x42c/0x600 [ptlrpc] replay_request_or_update+0x90/0x900 [ptlrpc] target_recovery_thread+0x5c8/0xf50 [ptlrpc] ? __pfx_target_recovery_thread+0x10/0x10 [ptlrpc] kthread+0xe0/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> Modules linked in: tls osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill intel_rapl_msr intel_rapl_common virtio_balloon i2c_piix4 pcspkr joydev sunrpc drm fuse ext4 mbcache jbd2 ata_generic ata_piix libata crct10dif_pclmul crc32_pclmul crc32c_intel virtio_net virtio_blk net_failover ghash_clmulni_intel failover serio_raw CR2: 00000011005d3068 | Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds1_flakey --table "0 3964928 flakey 252:0 0 0 1800 1 drop_writes" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 Lustre: Failing over lustre-MDT0000 LDISKFS-fs (dm-3): unmounting filesystem b83e6038-59a3-4ee0-9851-dd0622c23ad6. Lustre: server umount lustre-MDT0000 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: modprobe dm-flakey; Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1 Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey 2>&1 Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds1_flakey --table "0 3964928 linear 252:0 0" Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: test -b /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 LDISKFS-fs (dm-3): recovery complete LDISKFS-fs (dm-3): mounted filesystem b83e6038-59a3-4ee0-9851-dd0622c23ad6 r/w with ordered data mode. Quota mode: journalled. Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-72vm9.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-72vm9.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: onyx-72vm9.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: onyx-72vm9.onyx.whamcloud.com: executing set_default_debug -1 all Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null Lustre: lustre-MDT0000: recovery is timed out, evict stale exports Lustre: Skipped 2 previous similar messages Lustre: 140327:0:(genops.c:1616:class_disconnect_stale_exports()) lustre-MDT0000: disconnect stale client 1bd4b473-0dee-42b8-8c67-4f1537a961d3@<unknown> Lustre: 140327:0:(genops.c:1616:class_disconnect_stale_exports()) Skipped 8 previous similar messages Lustre: lustre-MDT0000: disconnecting 1 stale clients Lustre: Skipped 4 previous similar messages LustreError: 140327:0:(tgt_grant.c:233:tgt_grant_sanity_check()) mdt_obd_disconnect: tot_granted 0 != fo_tot_granted 4194304 LustreError: 140327:0:(tgt_grant.c:233:tgt_grant_sanity_check()) Skipped 1 previous similar message | Link to test |