Editing crashreport #73747

ReasonCrashing FunctionWhere to cut BacktraceReports Count
BUG: unable to handle page fault tgt_txn_start_cbdt_txn_hook_start
osd_trans_start
tgt_server_data_update
tgt_client_del
mdt_export_cleanup
mdt_obd_disconnect
obd_disconnect
class_disconnect_export_list
class_disconnect_stale_exports
target_recovery_overseer
replay_request_or_update
target_recovery_thread
kthread
ret_from_fork
1

Added fields:

Match messages in logs
(every line would be required to be present in log output
Copy from "Messages before crash" column below):
Match messages in full crash
(every line would be required to be present in crash log output
Copy from "Full Crash" column below):
Limit to a test:
(Copy from below "Failing text"):
Delete these reports as invalid (real bug in review or some such)
Bug or comment:
Extra info:

Failures list (last 100):

Failing TestFull CrashMessages before crashComment
replay-dual test 15a: timeout waiting for lost client during replay, 1 client completes
BUG: unable to handle page fault for address: 00000011005d3068
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 140327 Comm: tgt_recover_0 Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.38.1_lustre.el9.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:tgt_txn_start_cb+0x23b/0x3b0 [ptlrpc]
Code: 8b 42 38 48 83 c2 38 49 8b 0c 24 4c 8d 68 e8 48 39 c2 75 16 e9 f2 fe ff ff 49 8b 45 18 4c 8d 68 e8 48 39 c2 0f 84 e1 fe ff ff <49> 3b 4d 08 75 e9 e9 d9 fe ff ff 48 c7 c7 40 bf 34 c1 48 c7 c2 5e
RSP: 0018:ffffada047ca7b78 EFLAGS: 00010296
RAX: 00000011005d3078 RBX: ffffa028765d2d48 RCX: ffffa02861510000
RDX: ffffa028429d5a00 RSI: ffffa02875fa6000 RDI: ffffa02875a84f00
RBP: ffffa0289b758e00 R08: 0000000000000011 R09: ffffa02975fa5d49
R10: ffffffffffffffff R11: 000000000000000f R12: ffffa0284418a000
R13: 00000011005d3060 R14: ffffa028429d5968 R15: ffffa0286de87000
FS: 0000000000000000(0000) GS:ffffa028ffc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000011005d3068 CR3: 000000000376e006 CR4: 00000000001706f0
Call Trace:
<TASK>
? show_trace_log_lvl+0x1c4/0x2df
? show_trace_log_lvl+0x1c4/0x2df
? dt_txn_hook_start+0x52/0x80 [obdclass]
? __die_body.cold+0x8/0xd
? page_fault_oops+0x134/0x170
? kernelmode_fixup_or_oops+0x84/0x110
? exc_page_fault+0x62/0x150
? asm_exc_page_fault+0x22/0x30
? tgt_txn_start_cb+0x23b/0x3b0 [ptlrpc]
? tgt_txn_start_cb+0x1ee/0x3b0 [ptlrpc]
dt_txn_hook_start+0x52/0x80 [obdclass]
osd_trans_start+0xc3/0x750 [osd_ldiskfs]
tgt_server_data_update+0x38f/0x5e0 [ptlrpc]
tgt_client_del+0x362/0x780 [ptlrpc]
mdt_export_cleanup+0x2d4/0x3d0 [mdt]
mdt_obd_disconnect+0xc1/0x280 [mdt]
obd_disconnect+0x10e/0x250 [obdclass]
class_disconnect_export_list+0x1fa/0x390 [obdclass]
class_disconnect_stale_exports+0x2a3/0x3a0 [obdclass]
? __pfx_exp_finished_or_from_mdt+0x10/0x10 [ptlrpc]
? __pfx_check_for_next_transno+0x10/0x10 [ptlrpc]
target_recovery_overseer+0x497/0x660 [ptlrpc]
? __pfx_exp_req_replay_healthy_or_from_mdt+0x10/0x10 [ptlrpc]
? dtrq_destroy+0x42c/0x600 [ptlrpc]
replay_request_or_update+0x90/0x900 [ptlrpc]
target_recovery_thread+0x5c8/0xf50 [ptlrpc]
? __pfx_target_recovery_thread+0x10/0x10 [ptlrpc]
kthread+0xe0/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2c/0x50
</TASK>
Modules linked in: tls osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill intel_rapl_msr intel_rapl_common virtio_balloon i2c_piix4 pcspkr joydev sunrpc drm fuse ext4 mbcache jbd2 ata_generic ata_piix libata crct10dif_pclmul crc32_pclmul crc32c_intel virtio_net virtio_blk net_failover ghash_clmulni_intel failover serio_raw
CR2: 00000011005d3068
Lustre: DEBUG MARKER: sync; sync; sync
Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds1_flakey --table "0 3964928 flakey 252:0 0 0 1800 1 drop_writes"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000
Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1
Lustre: Failing over lustre-MDT0000
LDISKFS-fs (dm-3): unmounting filesystem b83e6038-59a3-4ee0-9851-dd0622c23ad6.
Lustre: server umount lustre-MDT0000 complete
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: modprobe dm-flakey;
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1
Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey 2>&1
Lustre: DEBUG MARKER: dmsetup table /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: dmsetup suspend --nolockfs --noflush /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: dmsetup load /dev/mapper/mds1_flakey --table "0 3964928 linear 252:0 0"
Lustre: DEBUG MARKER: dmsetup resume /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: test -b /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1
LDISKFS-fs (dm-3): recovery complete
LDISKFS-fs (dm-3): mounted filesystem b83e6038-59a3-4ee0-9851-dd0622c23ad6 r/w with ordered data mode. Quota mode: journalled.
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-72vm9.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-72vm9.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: onyx-72vm9.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: onyx-72vm9.onyx.whamcloud.com: executing set_default_debug -1 all
Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null
Lustre: lustre-MDT0000: recovery is timed out, evict stale exports
Lustre: Skipped 2 previous similar messages
Lustre: 140327:0:(genops.c:1616:class_disconnect_stale_exports()) lustre-MDT0000: disconnect stale client 1bd4b473-0dee-42b8-8c67-4f1537a961d3@<unknown>
Lustre: 140327:0:(genops.c:1616:class_disconnect_stale_exports()) Skipped 8 previous similar messages
Lustre: lustre-MDT0000: disconnecting 1 stale clients
Lustre: Skipped 4 previous similar messages
LustreError: 140327:0:(tgt_grant.c:233:tgt_grant_sanity_check()) mdt_obd_disconnect: tot_granted 0 != fo_tot_granted 4194304
LustreError: 140327:0:(tgt_grant.c:233:tgt_grant_sanity_check()) Skipped 1 previous similar message
Link to test
Return to new crashes list