百度糯米网站怎么做南阳哪里做网站比较好
- 作者: 五速梦信息网
- 时间: 2026年03月21日 10:03
当前位置: 首页 > news >正文
百度糯米网站怎么做,南阳哪里做网站比较好,php做网站优势,打开wordpress1.服务器环境以及配置 物理机/虚拟机/云/容器 物理机 外网/私有网络/无网络 私有网络 处理器#xff1a; PHYTIUM FT2000PLUS 2200 MHz 内存#xff1a; 128 GiB 整机类型/架构#xff1a; HIKVISION DS-V BIOS版本#xff1a; HK 601FBE02HK 网卡#xff1…
1.服务器环境以及配置 物理机/虚拟机/云/容器 物理机 外网/私有网络/无网络 私有网络 处理器 PHYTIUM FT2000PLUS 2200 MHz 内存 128 GiB 整机类型/架构 HIKVISION DS-V BIOS版本 HK 601FBE02HK 网卡 mlx5_core v5.0-0 / fw 14.29.2002 (MT_2420110004) 具体操作系统版本 (7860) Kylin-Server-V10-SP3-General-Release-2303-arm64 内核版本 4.19.90-52.22.v2207.ky10.aarch64 2.问题现象描述
服务器开机之后反复重启用户现场已经收集了日志需要分析日志。
3.问题分析
3.1 查看系统环境 cat ./soscommands/networking/ethtool-i_enp6s0f1 driver: mlx5_core version: 5.0-0firmware-version: 14.29.2002 (MT_2420110004) expansion-rom-version: bus-info: 0000:06:00.1 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes
3.2 vmcore-dmesg分析
分析127.0.0.1-2023-09-19-07:58:37的vmcore-dmesg [1865270.773139] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry [1865271.773142] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry [1865272.773093] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry [1865273.773195] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry [1865274.773168] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry [1865275.773147] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry [1865276.773151] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry [1865277.773127] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry [1865278.773147] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry [1865279.773174] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry [1865280.773149] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry [1865281.773120] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry [1865282.773112] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry [1865283.773139] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry [1865284.773150] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry [1865285.773097] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry [1865286.773067] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry [1865287.773095] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1026764): failed to allocate command entry
分析127.0.0.1-2023-10-29-14:17:27的vmcore-dmesg [3478639.193325] mlx5_core 0000:06:00.0: wait_func:1034:(pid 4068733): MODIFY_CQ(0x403) timeout. Will cause a leak of a command resource [3478639.193333] mlx5_core 0000:06:00.0: wait_func:1034:(pid 4166044): MODIFY_CQ(0x403) timeout. Will cause a leak of a command resource [3478639.193334] mlx5_core 0000:06:00.0: wait_func:1034:(pid 2812292): MODIFY_CQ(0x403) timeout. Will cause a leak of a command resource [3478639.193342] mlx5_core 0000:06:00.0: wait_func:1034:(pid 3401828): MODIFY_CQ(0x403) timeout. Will cause a leak of a command resource [3478639.193344] mlx5_core 0000:06:00.0: wait_func:1034:(pid 3822606): MODIFY_CQ(0x403) timeout. Will cause a leak of a command resource [3478639.193354] mlx5_core 0000:06:00.0: wait_func:1034:(pid 188898): MODIFY_CQ(0x403) timeout. Will cause a leak of a command resource [3478639.193363] mlx5_core 0000:06:00.0: wait_func:1034:(pid 3288787): MODIFY_CQ(0x403) timeout. Will cause a leak of a command resource [3478639.194469] mlx5_core 0000:06:00.0: wait_func:1038:(pid 192618): MODIFY_CQ(0x403) canceled on out of queue timeout. [3478639.194487] mlx5_core 0000:06:00.0: wait_func:1038:(pid 192616): MODIFY_CQ(0x403) canceled on out of queue timeout. [3478639.194495] mlx5_core 0000:06:00.0: wait_func:1038:(pid 192615): MODIFY_CQ(0x403) canceled on out of queue timeout. [3478639.194505] mlx5_core 0000:06:00.0: wait_func:1038:(pid 192613): MODIFY_CQ(0x403) canceled on out of queue timeout. [3478639.194515] mlx5_core 0000:06:00.0: wait_func:1038:(pid 192610): MODIFY_CQ(0x403) canceled on out of queue timeout.[3478640.183135] mlx5_core 0000:06:00.0: cmd_work_handler:906:(pid 1268656): failed to allocate command entry
从日志可见不断打印failed to allocate command entry可见该打印来自于drivers/net/ethernet/mellanox/mlx5/core/cmd.c文件的cmd_work_handler函数。 分析异常打印时的函数调用流程如下所示 - mlx5_cmd_exec - cmd_exec - mlx5_cmd_invoke - cmd_work_handler - alloc_ent - find_first_bit static int alloc_ent(struct mlx5_cmd cmd) { unsigned long flags; int ret; spin_lock_irqsave(cmd-alloc_lock, flags); ret find_first_bit(cmd-bitmask, cmd-max_reg_cmds); if (ret cmd-max_reg_cmds) clear_bit(ret, cmd-bitmask); spin_unlock_irqrestore(cmd-alloc_lock, flags); return ret cmd-max_reg_cmds ? ret : -ENOMEM; } / * Find the first set bit in a memory region. */ unsigned long find_first_bit(const unsigned long *addr, unsigned long size) { unsigned long idx; for (idx 0; idx * BITS_PER_LONG size; idx) { if (addr[idx]) return min(idx * BITS_PER_LONG __ffs(addr[idx]), size); } return size; } static void free_ent(struct mlx5_cmd *cmd, int idx) { unsigned long flags; spin_lock_irqsave(cmd-alloc_lock, flags); set_bit(idx, cmd-bitmask); spin_unlock_irqrestore(cmd-alloc_lock, flags); }
从上述代码可见通过cmd-bitmask进行标记是否有还可以申请cmd-bitmask的初始化如下假设获取出来的cmd-log_sz0x80则cmd-max_reg_cmds0xFFcmd-bitmask0x1FD。应该是cmd-bitmask初始化的时候有多少bit位为1就可以alloc_ent多少次。 int mlx5_cmd_init(struct mlx5_core_dev *dev) { int size sizeof(struct mlx5_cmd_prot_block); int align roundup_pow_of_two(size); struct mlx5_cmd *cmd dev-cmd; u32 cmd_h, cmd_l; u16 cmd_if_rev; int err; int i; memset(cmd, 0, sizeof(*cmd)); cmd_if_rev cmdif_rev(dev); if (cmd_if_rev ! CMD_IF_REV) { dev_err(dev-pdev-dev, Driver cmdif rev(%d) differs from firmwares(%d)\n, CMD_IF_REV, cmd_if_rev); return -EINVAL; } cmd-pool dma_pool_create(mlx5_cmd, dev-pdev-dev, size, align, 0); if (!cmd-pool) return -ENOMEM; err alloc_cmd_page(dev, cmd); if (err) goto err_free_pool; cmd_l ioread32be(dev-iseg-cmdq_addr_l_sz) 0xff; cmd-log_sz cmd_l 4 0xf; cmd-log_stride cmd_l 0xf; if (1 cmd-log_sz MLX5_MAX_COMMANDS) { dev_err(dev-pdev-dev, firmware reports too many outstanding commands %d\n, 1 cmd-log_sz); err -EINVAL; goto err_free_page; } if (cmd-log_sz cmd-log_stride MLX5_ADAPTER_PAGE_SHIFT) { dev_err(dev-pdev-dev, command queue size overflow\n); err -EINVAL; goto err_free_page; } cmd-checksum_disabled 1; cmd-max_reg_cmds (1 cmd-log_sz) - 1; cmd-bitmask (1UL cmd-max_reg_cmds) - 1;
执行cmd_exec的时候会去调用alloc_ent找到cmd-bitmask中最近为1的位并调用clear_bit清除为0cmd传到固件执行完以后要去释放command entry调用free_ent将对应位重新标志为1。所以出现问题有两种可能性
1、当cmd在传到固件一直未能执行完而一直有cmd_exec执行被积压导致cmd-bitmask的位使用完cmd-bitmask所有位被置为了0。alloc_ent无法申请到cmd-bitmask中的标识位。
2、cmd_exec执行完后由于某种原因没能调用free_ent导致cmd-bitmask的所有位被使用完。
3.3 vmcore分析
日志中提示Internal error: SP/PC alignment exceptionPC寄存器出现了未对齐的打印。 [3478640.184122] Internal error: SP/PC alignment exception: 8a000000 [#1] SMP [3478640.184720] Modules linked in: fuse tun bridge stp llc fccore(POE) fchook(OE) bonding ip_set_hash_ip ip_set nfnetlink rfkill ipmi_ssif vfat fat rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser rdma_cm ib_umad iw_cm ib_ipoib libiscsi scsi_transport_iscsi ib_cm mlx5_ib joydev ib_uverbs ib_core ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel mlx5_core megaraid_sas igb ast mlxfw dm_mirror dm_region_hash dm_log [last unloaded: ip_tables] [3478640.188127] Process kworker/u128:0 (pid: 1268656, stack limit 0x0000000074e0af67) [3478640.188820] CPU: 15 PID: 1268656 Comm: kworker/u128:0 Kdump: loaded Tainted: P OE 4.19.90-52.22.v2207.ky10.aarch64 #1 [3478640.189835] Source Version: 52f32c50138f4eec54ab90ce9775f0d75e1d6968 [3478640.190505] Hardware name: HIKVISION DS-V/N/A, BIOS 601FBE02HK 2020/04/20 [3478640.191173] Workqueue: mlx5_cmd_0000:06:00.0 cmd_work_handler [mlx5_core] [3478640.191790] pstate: 40000005 (nZcv daif -PAN -UAO) [3478640.192235] pc : 0x42[3478640.192539] lr : cmd_work_handler0x434/0x578 [mlx5_core][3478640.193029] sp : ffffda857e113d20 [3478640.193348] x29: ffffda857e113d20 x28: ffffda8553b4c400 [3478640.193838] x27: 0000000000000000 x26: ffff5607dffd1000 [3478640.194341] x25: 0000000000000000 x24: ffffda8511e001c8 [3478640.194830] x23: ffffda8511e000e0 x22: ffffda8511e00160 [3478640.195345] x21: ffffda8553b4c548 x20: 0000000000000001 [3478640.195840] x19: ffffda8511e001d8 x18: 0000000000000001 [3478640.196332] x17: 0000fffc58585308 x16: ffff5607def87070 [3478640.196812] x15: ffffffffffffffff x14: 742064656c696166 [3478640.197283] x13: 203a293635363836 x12: 323120646970283a [3478640.197769] x11: 3630393a72656c64 x10: ffff5607dffd5b70 [3478640.198278] x9 : 5f646d63203a302e x8 : ffff5607deee95d8 [3478640.198769] x7 : ffff5607e07230f8 x6 : 000000000000056b [3478640.199307] x5 : 0000000000000000 x4 : 0000000000000000 [3478640.199801] x3 : 0000000000000000 x2 : 0000000000000042 [3478640.200358] x1 : 0000000000000000 x0 : 00000000fffffff5 [3478640.200835] Call trace: [3478640.201084] 0x42 [3478640.201307] process_one_work0x1fc/0x4a8 [3478640.201705] worker_thread0x50/0x4d8 [3478640.202068] kthread0x134/0x138 [3478640.202386] ret_from_fork0x10/0x18 [3478640.202781] Code: bad PC value [3478640.203165] SMP: stopping secondary CPUs [3478640.205196] Starting crashdump kernel…
crash /usr/lib/debug/lib/modules/4.19.90-52.22.v2207.ky10.aarch64/vmlinux vmcore分析vmcore LOAD AVERAGE: 102.08, 41.38, 22.56 TASKS: 7790 NODENAME: localhost.localdomain RELEASE: 4.19.90-52.22.v2207.ky10.aarch64 VERSION: #1 SMP Tue Mar 14 11:52:45 CST 2023 MACHINE: aarch64 (unknown Mhz) MEMORY: 126.4 GB PANIC: PID: 1268656 COMMAND: kworker/u128:0 TASK: ffffdc8435076a00 [THREAD_INFO: ffffdc8435076a00] CPU: 15 STATE: TASK_RUNNING (PANIC) crash bt PID: 1268656 TASK: ffffdc8435076a00 CPU: 15 COMMAND: kworker/u128:0 #0 [ffffda857e1138c0] machine_kexec at ffff5607de7b2e8c #1 [ffffda857e113920] __crash_kexec at ffff5607de8ca148 #2 [ffffda857e113a90] crash_kexec at ffff5607de8ca258 #3 [ffffda857e113ac0] die at ffff5607de79f65c #4 [ffffda857e113b00] arm64_notify_die at ffff5607de79f964 #5 [ffffda857e113b30] do_sp_pc_abort at ffff5607de7914f8 #6 [ffffda857e113d10] el1_pc at ffff5607de792f94 #7 ffffda857e113d20 at 3e #8 [ffffda857e113db0] process_one_work at ffff5607de8226f8 #9 [ffffda857e113e00] worker_thread at ffff5607de8229f4 #10 [ffffda857e113e70] kthread at ffff5607de82a1e0
根据vmcore-dmesg中pc : 0x42 lr : cmd_work_handler0x434/0x578 [mlx5_core]查看lr寄存器执行的位置可见执行 blr x2的时候触发了空指针。从而可知问题发生在drivers/net/ethernet/mellanox/mlx5/core/cmd.c: 908 crash mod -s mlx5_core crash dis -rl cmd_work_handler0x434 |tail -n 20 0xffff56077bcf6808 cmd_work_handler1016: mrs x1, sp_el0 /usr/src/debug/kernel-4.19.90/linux-4.19.90-52.22.v2207.ky10.aarch64/drivers/net/ethernet/mellanox/mlx5/core/cmd.c: 906 0xffff56077bcf680c cmd_work_handler1020: ldr x0, [x22,#-128] 0xffff56077bcf6810 cmd_work_handler1024: adrp x2, 0xffff56077bd70000 0xffff56077bcf6814 cmd_work_handler1028: ldr w4, [x1,#1712] 0xffff56077bcf6818 cmd_work_handler1032: add x2, x2, #0xe50 0xffff56077bcf681c cmd_work_handler1036: add x2, x2, #0x7c0 0xffff56077bcf6820 cmd_work_handler1040: mov w3, #0x38a // #906 0xffff56077bcf6824 cmd_work_handler1044: adrp x1, 0xffff56077bd7e000 0xffff56077bcf6828 cmd_work_handler1048: add x1, x1, #0xcb0 0xffff56077bcf682c cmd_work_handler1052: bl 0xffff56077bd65e60 plt:_dev_err /usr/src/debug/kernel-4.19.90/linux-4.19.90-52.22.v2207.ky10.aarch64/drivers/net/ethernet/mellanox/mlx5/core/cmd.c: 9070xffff56077bcf6830 cmd_work_handler1056: ldr x2, [x28,#40]0xffff56077bcf6834 cmd_work_handler1060: cbz x2, 0xffff56077bcf6970 cmd_work_handler1376 /usr/src/debug/kernel-4.19.90/linux-4.19.90-52.22.v2207.ky10.aarch64/drivers/net/ethernet/mellanox/mlx5/core/cmd.c: 9080xffff56077bcf6838 cmd_work_handler1064: ldr x1, [x21,#-88]0xffff56077bcf683c cmd_work_handler1068: mov w0, #0xfffffff5 // #-110xffff56077bcf6840 cmd_work_handler1072: blr x2 /usr/src/debug/kernel-4.19.90/linux-4.19.90-52.22.v2207.ky10.aarch64/drivers/net/ethernet/mellanox/mlx5/core/cmd.c: 909 0xffff56077bcf6844 cmd_work_handler1076: ldr x1, [x28,#16] crash struct mlx5_cmd_work_ent是一个自定义的结构体它包含了额外的字段和数据用于处理MLX5驱动程序中的命令工作项。可见callback是一个有问题的值该值出现问题应是网卡固件导致。 crash struct mlx5_cmd_work_ent 0xffffda8553b4c400 struct mlx5_cmd_work_ent { state 1, in 0x383ef300000001, out 0x1, uout 0x0, uout_size 0, callback 0x42, cb_timeout_work { work { data { counter 0 }, 4.问题分析结果
通过上述分析同时查看相关资料https://bugzilla.redhat.com/show_bug.cgi?id1767678和https://forums.developer.nvidia.com/t/occure-mlx5-core-failed-to-allocate-command-entry-with-mcx4421a-acan-in-ubtuntu-18-04-1/206943该问题更可能是网卡固件原因。可以尝试升级网卡固件进行观察。
经查找有新的网卡固件版本为v14.32.1010。
Firmware for ConnectX®-4 Lx EN
具体的OPN号需要联系整机或网卡厂商获取再找到对应的OPN网卡固件驱动由硬件厂商进行升级 5.参考资料
1767678 – [mlx5_core] OVS offload: mlx5_core 0000:04:00.0: cmd_work_handler:855:(pid 87647): failed to allocate command entry
RHEL8: Mellanox driver [mlx5_core] causing kernel stack overflow. - Red Hat Customer Portal
Occure mlx5_core : failed to allocate command entry with MCX4421A-ACAN in Ubtuntu 18.04.1 - Software And Drivers - NVIDIA Developer Forums
https://www.mellanox.com/downloads/MFT/mft-4.21.0-99-x86_64-rpm.tgz
https://www.mellanox.com/downloads/firmware/fw-ConnectX4Lx-rel-14_32_1010-MCX4121A-ACA_Ax-UEFI-14.25.17-FlexBoot-3.6.502.bin.zip
Bug Fixes History - NVIDIA Docs
- 上一篇: 百度秒收网站网站建设技术质量指标
- 下一篇: 百度排名 网站标题展示营销型网站
相关文章
-
百度秒收网站网站建设技术质量指标
百度秒收网站网站建设技术质量指标
- 技术栈
- 2026年03月21日
-
百度免费校园网站建设网站建设的理由
百度免费校园网站建设网站建设的理由
- 技术栈
- 2026年03月21日
-
百度免费网站制作长沙优化网站多少钱
百度免费网站制作长沙优化网站多少钱
- 技术栈
- 2026年03月21日
-
百度排名 网站标题展示营销型网站
百度排名 网站标题展示营销型网站
- 技术栈
- 2026年03月21日
-
百度企业云网站建设当当网站开发系统说明
百度企业云网站建设当当网站开发系统说明
- 技术栈
- 2026年03月21日
-
百度权重等级企业网站优化怎么提高关键词排名
百度权重等级企业网站优化怎么提高关键词排名
- 技术栈
- 2026年03月21日






