Editing crashreport #73275

ReasonCrashing FunctionWhere to cut BacktraceReports Count
ASSERTION( nm->nm_md_stats ) failedmdt_counter_incrmdt_counter_incr
mdt_statfs
tgt_handle_request0
tgt_request_handle
ptlrpc_server_handle_request
ptlrpc_main
kthread
ret_from_fork
2

Added fields:

Match messages in logs
(every line would be required to be present in log output
Copy from "Messages before crash" column below):
Match messages in full crash
(every line would be required to be present in crash log output
Copy from "Full Crash" column below):
Limit to a test:
(Copy from below "Failing text"):
Delete these reports as invalid (real bug in review or some such)
Bug or comment:
Extra info:

Failures list (last 100):

Failing TestFull CrashMessages before crashComment
sanity-sec test 15: test id mapping
LustreError: 189733:0:(mdt_lproc.c:1610:mdt_counter_incr()) ASSERTION( nm->nm_md_stats ) failed:
LustreError: 189733:0:(mdt_lproc.c:1610:mdt_counter_incr()) LBUG
CPU: 0 PID: 189733 Comm: mdt_out00_002 Kdump: loaded Tainted: G OE ------- --- 5.14.0-427.42.1_lustre.el9.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
lbug_with_loc.cold+0x5/0x43 [libcfs]
mdt_counter_incr+0x188/0x190 [mdt]
mdt_statfs+0x55b/0x8b0 [mdt]
? tgt_request_preprocess+0x20f/0x4b0 [ptlrpc]
tgt_handle_request0+0x14a/0x770 [ptlrpc]
tgt_request_handle+0x1eb/0xb80 [ptlrpc]
ptlrpc_server_handle_request.isra.0+0x2a0/0xce0 [ptlrpc]
ptlrpc_main+0xa7e/0xfa0 [ptlrpc]
? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
kthread+0xe0/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2c/0x50
</TASK>
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.squash_uid
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.squash_gid
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.59652_0.id
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.59652_1.id
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.59652_2.id
Lustre: 8843:0:(client.c:2445:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1746449314/real 1746449314] req@ffff947b4c630d00 x1831277218547968/t0(0) o13->lustre-OST0000-osc-MDT0001@10.240.28.6@tcp:7/4 lens 224/368 e 0 to 1 dl 1746449330 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'osp-pre-0-1.0' uid:0 gid:0 projid:4294967295
Lustre: 8843:0:(client.c:2445:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Lustre: lustre-OST0000-osc-MDT0001: Connection to lustre-OST0000 (at 10.240.28.6@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 19 previous similar messages
Lustre: 8842:0:(client.c:2445:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1746449315/real 1746449315] req@ffff947b4420d6c0 x1831277218548608/t0(0) o13->lustre-OST0007-osc-MDT0003@10.240.28.6@tcp:7/4 lens 224/368 e 0 to 1 dl 1746449331 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'osp-pre-7-3.0' uid:0 gid:0 projid:4294967295
Lustre: 8842:0:(client.c:2445:ptlrpc_expire_one_request()) Skipped 11 previous similar messages
Lustre: 8842:0:(client.c:2445:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1746449316/real 1746449316] req@ffff947b4fdd0340 x1831277218552320/t0(0) o13->lustre-OST0006-osc-MDT0001@10.240.28.6@tcp:7/4 lens 224/368 e 0 to 1 dl 1746449332 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'osp-pre-6-1.0' uid:0 gid:0 projid:4294967295
Lustre: 8842:0:(client.c:2445:ptlrpc_expire_one_request()) Skipped 10 previous similar messages
Lustre: 8842:0:(client.c:2445:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1746449318/real 1746449318] req@ffff947b4420e080 x1831277218552960/t0(0) o13->lustre-OST0005-osc-MDT0001@10.240.28.6@tcp:7/4 lens 224/368 e 0 to 1 dl 1746449334 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'osp-pre-5-1.0' uid:0 gid:0 projid:4294967295
Lustre: 8842:0:(client.c:2445:ptlrpc_expire_one_request()) Skipped 4 previous similar messages
Lustre: 8842:0:(client.c:2445:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1746449325/real 1746449325] req@ffff947b3dff0d00 x1831277218560000/t0(0) o400->lustre-OST0006-osc-MDT0003@10.240.28.6@tcp:28/4 lens 224/224 e 0 to 1 dl 1746449341 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0 projid:4294967295
Lustre: 8842:0:(client.c:2445:ptlrpc_expire_one_request()) Skipped 19 previous similar messages
Link to test
sanity-sec test 15: test id mapping
LustreError: 186025:0:(mdt_lproc.c:1610:mdt_counter_incr()) ASSERTION( nm->nm_md_stats ) failed:
LustreError: 186025:0:(mdt_lproc.c:1610:mdt_counter_incr()) LBUG
CPU: 0 PID: 186025 Comm: mdt_out00_000 Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.38.1_lustre.el9.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
lbug_with_loc.cold+0x5/0x43 [libcfs]
mdt_counter_incr+0x188/0x190 [mdt]
mdt_statfs+0x55b/0x8b0 [mdt]
? tgt_request_preprocess+0x20f/0x4b0 [ptlrpc]
tgt_handle_request0+0x14a/0x770 [ptlrpc]
tgt_request_handle+0x1eb/0xb80 [ptlrpc]
ptlrpc_server_handle_request.isra.0+0x2a0/0xce0 [ptlrpc]
ptlrpc_main+0xa7b/0xfa0 [ptlrpc]
? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
kthread+0xe0/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2c/0x50
</TASK>
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.squash_uid
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.default.squash_gid
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.42148_0.id
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.42148_1.id
Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.42148_2.id
LNet: Host 10.240.40.250 reset our connection while we were sending data; it may have rebooted: rc = -104
Lustre: 8869:0:(client.c:2445:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1746449120/real 1746449120] req@ffff992d04022a40 x1831276865482624/t0(0) o400->lustre-OST0000-osc-MDT0001@10.240.40.250@tcp:28/4 lens 224/224 e 0 to 1 dl 1746449136 ref 1 fl Rpc:eXNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0 projid:4294967295
Lustre: 8869:0:(client.c:2445:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Lustre: lustre-OST0000-osc-MDT0001: Connection to lustre-OST0000 (at 10.240.40.250@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 19 previous similar messages
Autotest: Killing test framework, node(s) in the cluster crashed (lustre-reviews_review-dne-part-2_113101.30)
Autotest: Sleeping to ensure other nodes in the cluster have not crashed (lustre-reviews_review-dne-part-2_113101.30)
Lustre: 8869:0:(client.c:2445:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1746449116/real 1746449116] req@ffff992d081b9d40 x1831276865479296/t0(0) o13->lustre-OST0002-osc-MDT0003@10.240.40.250@tcp:7/4 lens 224/368 e 0 to 1 dl 1746449132 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'osp-pre-2-3.0' uid:0 gid:0 projid:4294967295
Lustre: 8869:0:(client.c:2445:ptlrpc_expire_one_request()) Skipped 16 previous similar messages
Lustre: 8870:0:(client.c:2445:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1746449117/real 1746449117] req@ffff992d081b8680 x1831276865480704/t0(0) o13->lustre-OST0001-osc-MDT0003@10.240.40.250@tcp:7/4 lens 224/368 e 0 to 1 dl 1746449133 ref 1 fl Rpc:XQr/200/ffffffff rc 0/-1 job:'osp-pre-1-3.0' uid:0 gid:0 projid:4294967295
Lustre: 8870:0:(client.c:2445:ptlrpc_expire_one_request()) Skipped 27 previous similar messages
Autotest: Test running for 130 minutes (lustre-reviews_review-dne-part-2_113101.30)
Autotest: trevis-58vm3 crashed during sanity-sec (lustre-reviews_review-dne-part-2_113101.30)
Lustre: lustre-MDT0003: haven't heard from client lustre-MDT0003-lwp-OST0000_UUID (at 10.240.40.250@tcp) in 104 seconds. I think it's dead, and I am evicting it. exp ffff992cf7508000, cur 1746449213 deadline 1746449209 last 1746449109
Link to test
Return to new crashes list