首页 - 技术栈

花西子网络营销策划方案郑州seo排名哪有

作者: 五速梦信息网
时间: 2026年06月19日 10:57

当前位置：首页 > news >正文

花西子网络营销策划方案,郑州seo排名哪有,环保工程网站建设价格,有做挂名法人和股东的网站吗AMD的云上GPU运行Deepseek 笔者在2025年4月19下午有幸参加了 AMDCSDN的ROCM AI开发者交流大会#xff0c;使用AMD的云上GPU运行了Deepseek 本机安装Open web UI 之前是容器化方式部署的#xff1a; https://lizhiyong.blog.csdn.net/article/details/145582453 这次…AMD的云上GPU运行Deepseek 笔者在2025年4月19下午有幸参加了 AMDCSDN的ROCM AI开发者交流大会使用AMD的云上GPU运行了Deepseek 本机安装Open web UI 之前是容器化方式部署的 https://lizhiyong.blog.csdn.net/article/details/145582453 这次换个方式因为要使用Python环境参考 https://lizhiyong.blog.csdn.net/article/details/127827522 安装Anaconda后几条命令即可拉起服务 conda env list conda init conda create -n py311 python3.11 conda activate py311 pip install open-webui open-webui serve成功后可能还需要按一下ctrlc才能访问 http://127.0.0.1:8080服务器配置 rootatl1g2r2u16gpu:/workspace# rocm-smi ROCm System Management Interface Concise Info Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% (DID, GUID) (Junction) (Socket) (Mem, Compute, ID) 0 4 0x74a1, 47045 45.0°C 156.0W NPS1, SPX, 0 136Mhz 900Mhz 0% auto 750.0W 0% 0% End of ROCm SMI Log rootatl1g2r2u16gpu:/workspace# rocminfo ROCk module version 6.10.5 is loadedHSA System Attributes Runtime Version: 1.14 Runtime Ext Version: 1.6 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED DMAbuf Support: YES
HSA Agents *******
Agent 1
******* Name: INTEL® XEON® PLATINUM 8568Y Uuid: CPU-XX Marketing Name: INTEL® XEON® PLATINUM 8568Y Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 49152(0xc000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 4000 BDFID: 0 Internal Node ID: 0 Compute Unit: 48 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Memory Properties: Features: NonePool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 1056335292(0x3ef665bc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINEDSize: 1056335292(0x3ef665bc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINEDSize: 1056335292(0x3ef665bc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 4 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 1056335292(0x3ef665bc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info:

Agent 2
******* Name: INTEL® XEON® PLATINUM 8568Y Uuid: CPU-XX Marketing Name: INTEL® XEON® PLATINUM 8568Y Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 1 Device Type: CPU Cache Info: L1: 49152(0xc000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 4000 BDFID: 0 Internal Node ID: 1 Compute Unit: 48 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Memory Properties: Features: NonePool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 1056940056(0x3effa018) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINEDSize: 1056940056(0x3effa018) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINEDSize: 1056940056(0x3effa018) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 4 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 1056940056(0x3effa018) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info:

Agent 3
******* Name: gfx942 Uuid: GPU-e927e74ce22946d1 Marketing Name: AMD Instinct MI300X Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 2 Device Type: GPU Cache Info: L1: 32(0x20) KB L2: 4096(0x1000) KB L3: 262144(0x40000) KB Chip ID: 29857(0x74a1) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2100 BDFID: 19968 Internal Node ID: 2 Compute Unit: 304 SIMDs per CU: 4 Shader Engines: 32 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Memory Properties: Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension:x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 2048(0x800) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension:x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 166 SDMA engine uCode:: 22 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 201310208(0xbffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINEDSize: 201310208(0xbffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 201310208(0xbffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 4 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Recommended Granule:0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa–gfx942:sramecc:xnack-Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension:x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension:x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32
*** Done ***
rootatl1g2r2u16gpu:/workspace# free -htotal used free shared buff/cache available Mem: 2.0Ti 79Gi 1.5Ti 8.0Mi 432Gi 1.9Ti Swap: 8.0Gi 0B 8.0Gi rootatl1g2r2u16gpu:/workspace# cpuinfo Python Version: 3.12.9.final.0 (64 bit) Cpuinfo Version: 9.0.0 Vendor ID Raw: GenuineIntel Hardware Raw: Brand Raw: INTEL® XEON® PLATINUM 8568Y Hz Advertised Friendly: 3.1935 GHz Hz Actual Friendly: 3.1935 GHz Hz Advertised: (3193491000, 0) Hz Actual: (3193491000, 0) Arch: X86_64 Bits: 64 Count: 96 Arch String Raw: x86_64 L1 Data Cache Size: 4.5 MiB L1 Instruction Cache Size: 3145728 L2 Cache Size: 201326592 L2 Cache Line Size: 2048 L2 Cache Associativity: 7 L3 Cache Size: 314572800 Stepping: 2 Model: 207 Family: 6 Processor Type: Flags: 3dnowprefetch, abm, acpi, adx, aes, amx_bf16, amx_int8, amx_tile, aperfmperf, apic, arat, arch_capabilities, arch_lbr, arch_perfmon, art, avx, avx2, avx512_bf16, avx512_bitalg, avx512_fp16, avx512_vbmi2, avx512_vnni, avx512_vpopcntdq, avx512bitalg, avx512bw, avx512cd, avx512dq, avx512f, avx512ifma, avx512vbmi, avx512vbmi2, avx512vl, avx512vnni, avx512vpopcntdq, avx_vnni, bmi1, bmi2, bts, bus_lock_detect, cat_l2, cat_l3, cdp_l2, cdp_l3, cldemote, clflush, clflushopt, clwb, cmov, constant_tsc, cpuid, cpuid_fault, cqm, cqm_llc, cqm_mbm_local, cqm_mbm_total, cqm_occup_llc, cx16, cx8, dca, de, ds_cpl, dtes64, dtherm, dts, enqcmd, epb, erms, est, f16c, flush_l1d, fma, fpu, fsgsbase, fsrm, fxsr, gfni, hfi, ht, ibpb, ibrs, ibrs_enhanced, ibt, ida, intel_pt, invpcid, la57, lahf_lm, lm, mba, mca, mce, md_clear, mmx, monitor, movbe, movdir64b, movdiri, msr, mtrr, nonstop_tsc, nopl, nx, ospke, osxsave, pae, pat, pbe, pcid, pclmulqdq, pconfig, pdcm, pdpe1gb, pebs, pge, pku, pln, pni, popcnt, pqe, pqm, pse, pse36, pts, rdpid, rdrand, rdrnd, rdseed, rdt_a, rdtscp, rep_good, sdbg, sep, serialize, sgx, sgx_lc, sha, sha_ni, smap, smep, smx, split_lock_detect, ss, ssbd, sse, sse2, sse4_1, sse4_2, ssse3, stibp, syscall, tm, tm2, tme, tsc, tsc_adjust, tsc_deadline_timer, tsc_known_freq, tscdeadline, tsxldtrk, umip, user_shstk, vaes, vme, vmx, vpclmulqdq, waitpkg, wbnoinvd, x2apic, xgetbv1, xsave, xsavec, xsaveopt, xsaves, xtopology, xtpr rootatl1g2r2u16gpu:/workspace# 运行步骤 rootatl1g2r2u16gpu:/workspace# vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --host 0.0.0.0 --port \(VLLM_PORT \--api-key abc-123 \--trust-remote-code \--seed 42 INFO 04-19 08:13:42 [__init__.py:239] Automatically detected platform rocm. INFO 04-19 08:13:43 [api_server.py:1034] vLLM API server version 0.8.3.dev349gb8498bc4a INFO 04-19 08:13:43 [api_server.py:1035] args: Namespace(subparserserve, model_tagdeepseek-ai/DeepSeek-R1-Distill-Qwen-32B, config, host0.0.0.0, port8100, uvicorn_log_levelinfo, disable_uvicorn_access_logFalse, allow_credentialsFalse, allowed_origins[*], allowed_methods[*], allowed_headers[*], api_keyabc-123, lora_modulesNone, prompt_adaptersNone, chat_templateNone, chat_template_content_formatauto, response_roleassistant, ssl_keyfileNone, ssl_certfileNone, ssl_ca_certsNone, enable_ssl_refreshFalse, ssl_cert_reqs0, root_pathNone, middleware[], return_tokens_as_token_idsFalse, disable_frontend_multiprocessingFalse, enable_request_id_headersFalse, enable_auto_tool_choiceFalse, tool_call_parserNone, tool_parser_plugin, modeldeepseek-ai/DeepSeek-R1-Distill-Qwen-32B, taskauto, tokenizerNone, hf_config_pathNone, skip_tokenizer_initFalse, revisionNone, code_revisionNone, tokenizer_revisionNone, tokenizer_modeauto, trust_remote_codeTrue, allowed_local_media_pathNone, download_dirNone, load_formatauto, config_formatConfigFormat.AUTO: auto, dtypeauto, kv_cache_dtypeauto, max_model_lenNone, guided_decoding_backendxgrammar, logits_processor_patternNone, model_implauto, distributed_executor_backendNone, pipeline_parallel_size1, tensor_parallel_size1, data_parallel_size1, enable_expert_parallelFalse, max_parallel_loading_workersNone, ray_workers_use_nsightFalse, block_sizeNone, enable_prefix_cachingNone, prefix_caching_hash_algobuiltin, disable_sliding_windowFalse, use_v2_block_managerTrue, num_lookahead_slots0, seed42, swap_space4, cpu_offload_gb0, gpu_memory_utilization0.9, num_gpu_blocks_overrideNone, max_num_batched_tokensNone, max_num_partial_prefills1, max_long_partial_prefills1, long_prefill_token_threshold0, max_num_seqsNone, max_logprobs20, disable_log_statsFalse, quantizationNone, rope_scalingNone, rope_thetaNone, hf_tokenNone, hf_overridesNone, enforce_eagerFalse, max_seq_len_to_capture8192, disable_custom_all_reduceFalse, tokenizer_pool_size0, tokenizer_pool_typeray, tokenizer_pool_extra_configNone, limit_mm_per_promptNone, mm_processor_kwargsNone, disable_mm_preprocessor_cacheFalse, enable_loraFalse, enable_lora_biasFalse, max_loras1, max_lora_rank16, lora_extra_vocab_size256, lora_dtypeauto, long_lora_scaling_factorsNone, max_cpu_lorasNone, fully_sharded_lorasFalse, enable_prompt_adapterFalse, max_prompt_adapters1, max_prompt_adapter_token0, deviceauto, num_scheduler_steps1, use_tqdm_on_loadTrue, multi_step_stream_outputsTrue, scheduler_delay_factor0.0, enable_chunked_prefillNone, speculative_configNone, model_loader_extra_configNone, ignore_patterns[], preemption_modeNone, served_model_nameNone, qlora_adapter_name_or_pathNone, show_hidden_metrics_for_versionNone, otlp_traces_endpointNone, collect_detailed_tracesNone, disable_async_output_procFalse, scheduling_policyfcfs, scheduler_clsvllm.core.scheduler.Scheduler, override_neuron_configNone, override_pooler_configNone, compilation_configNone, kv_transfer_configNone, worker_clsauto, worker_extension_cls, generation_configauto, override_generation_configNone, enable_sleep_modeFalse, calculate_kv_scalesFalse, additional_configNone, enable_reasoningFalse, reasoning_parserNone, disable_cascade_attnFalse, disable_log_requestsFalse, max_log_lenNone, disable_fastapi_docsFalse, enable_prompt_tokens_detailsFalse, enable_server_load_trackingFalse, dispatch_functionfunction ServeSubcommand.cmd at 0x730f25e351c0) config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 664/664 [00:0000:00, 8.19MB/s] INFO 04-19 08:13:55 [config.py:604] This model supports multiple tasks: {classify, embed, reward, score, generate}. Defaulting to generate. INFO 04-19 08:13:55 [arg_utils.py:1735] rocm is experimental on VLLM_USE_V11. Falling back to V0 Engine. WARNING 04-19 08:13:55 [arg_utils.py:1597] The model has a long context length (131072). This may causeOOM during the initial memory profiling phase, or result in low performance due to small KV cache size. Consider setting --max-model-len to a smaller value. INFO 04-19 08:13:59 [api_server.py:246] Started engine process with PID 225 tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.07k/3.07k [00:0000:00, 38.8MB/s] tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.03M/7.03M [00:0000:00, 27.0MB/s] INFO 04-19 08:14:01 [__init__.py:239] Automatically detected platform rocm. INFO 04-19 08:14:02 [llm_engine.py:242] Initializing a V0 LLM engine (v0.8.3.dev349gb8498bc4a) with config: modeldeepseek-ai/DeepSeek-R1-Distill-Qwen-32B, speculative_configNone, tokenizerdeepseek-ai/DeepSeek-R1-Distill-Qwen-32B, skip_tokenizer_initFalse, tokenizer_modeauto, revisionNone, override_neuron_configNone, tokenizer_revisionNone, trust_remote_codeTrue, dtypetorch.bfloat16, max_seq_len131072, download_dirNone, load_formatLoadFormat.AUTO, tensor_parallel_size1, pipeline_parallel_size1, disable_custom_all_reduceFalse, quantizationNone, enforce_eagerFalse, kv_cache_dtypeauto, device_configcuda, decoding_configDecodingConfig(guided_decoding_backendxgrammar, reasoning_backendNone), observability_configObservabilityConfig(show_hidden_metricsFalse, otlp_traces_endpointNone, collect_model_forward_timeFalse, collect_model_execute_timeFalse), seed42, served_model_namedeepseek-ai/DeepSeek-R1-Distill-Qwen-32B, num_scheduler_steps1, multi_step_stream_outputsTrue, enable_prefix_cachingNone, chunked_prefill_enabledFalse, use_async_output_procTrue, disable_mm_preprocessor_cacheFalse, mm_processor_kwargsNone, pooler_configNone, compilation_config{splitting_ops:[],compile_sizes:[],cudagraph_capture_sizes:[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],max_capture_size:256}, use_cached_outputsTrue, generation_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 181/181 [00:0000:00, 1.97MB/s] INFO 04-19 08:14:05 [utils.py:746] Port 8100 is already in use, trying port 8101 INFO 04-19 08:14:10 [rocm.py:181] None is not supported in AMD GPUs. INFO 04-19 08:14:10 [rocm.py:182] Using ROCmFlashAttention backend. INFO 04-19 08:14:10 [parallel_state.py:957] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0 INFO 04-19 08:14:10 [model_runner.py:1110] Starting to load model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B... WARNING 04-19 08:14:10 [rocm.py:283] Model architecture Qwen2ForCausalLM is partially supported by ROCm: Sliding window attention (SWA) is not yet supported in Triton flash attention. For half-precision SWA support, please use CK flash attention by setting VLLM_USE_TRITON_FLASH_ATTN0 INFO 04-19 08:14:11 [weight_utils.py:265] Using model weights format [*.safetensors] model-00008-of-000008.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.07G/4.07G [00:1700:00, 239MB/s] model-00001-of-000008.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8.79G/8.79G [00:3500:00, 246MB/s] model-00003-of-000008.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8.78G/8.78G [00:3600:00, 239MB/s] model-00004-of-000008.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8.78G/8.78G [00:3700:00, 235MB/s] model-00006-of-000008.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8.78G/8.78G [00:3700:00, 234MB/s] model-00005-of-000008.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8.78G/8.78G [00:3700:00, 233MB/s] model-00002-of-000008.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8.78G/8.78G [00:3700:00, 233MB/s] model-00007-of-000008.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8.78G/8.78G [00:3700:00, 232MB/s] INFO 04-19 08:14:49 [weight_utils.py:281] Time spent downloading weights for deepseek-ai/DeepSeek-R1-Distill-Qwen-32B: 37.866222 seconds████████████████████████████▏| 8.72G/8.78G [00:3700:00, 476MB/s] model.safetensors.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64.0k/64.0k [00:0000:00, 47.3MB/s] Loading safetensors checkpoint shards: 0% Completed | 0/8 [00:00?, ?it/s]█████████████████████████████████████████████████████████████████████████████████████████| 8.78G/8.78G [00:3700:00, 485MB/s] Loading safetensors checkpoint shards: 12% Completed | 1/8 [00:0300:21, 3.08s/it]█████████████████████████████████████████████████████████████████████████████▌ | 8.54G/8.78G [00:3700:00, 424MB/s] Loading safetensors checkpoint shards: 25% Completed | 2/8 [00:0600:19, 3.23s/it]██████████████████████████████████████████████████████████████████████████████▌ | 8.61G/8.78G [00:3700:00, 498MB/s] Loading safetensors checkpoint shards: 38% Completed | 3/8 [00:0900:15, 3.17s/it] Loading safetensors checkpoint shards: 50% Completed | 4/8 [00:1100:10, 2.54s/it]███████████████████████████████████████████████████████████████████████████████▉ | 8.70G/8.78G [00:3700:00, 613MB/s] Loading safetensors checkpoint shards: 62% Completed | 5/8 [00:1400:08, 2.70s/it] Loading safetensors checkpoint shards: 75% Completed | 6/8 [00:1700:05, 2.84s/it] Loading safetensors checkpoint shards: 88% Completed | 7/8 [00:2000:02, 2.89s/it] Loading safetensors checkpoint shards: 100% Completed | 8/8 [00:2300:00, 2.93s/it] Loading safetensors checkpoint shards: 100% Completed | 8/8 [00:2300:00, 2.90s/it]INFO 04-19 08:15:12 [loader.py:458] Loading weights took 23.41 seconds INFO 04-19 08:15:12 [model_runner.py:1146] Model loading took 61.3008 GiB and 61.724195 seconds INFO 04-19 08:15:56 [worker.py:295] Memory profiling takes 43.98 seconds INFO 04-19 08:15:56 [worker.py:295] the current vLLM instance can use total_gpu_memory (191.98GiB) x gpu_memory_utilization (0.90) 172.79GiB INFO 04-19 08:15:56 [worker.py:295] model weights take 61.30GiB; non_torch_memory takes 0.96GiB; PyTorch activation peak memory takes 25.45GiB; the rest of the memory reserved for KV Cache is 85.07GiB. INFO 04-19 08:15:56 [executor_base.py:112] # rocm blocks: 21777, # CPU blocks: 1024 INFO 04-19 08:15:56 [executor_base.py:117] Maximum concurrency for 131072 tokens per request: 2.66x INFO 04-19 08:15:57 [model_runner.py:1456] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set enforce_eagerTrue or use --enforce-eager in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing gpu_memory_utilization or switching to eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage. Capturing CUDA graph shapes: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:0900:00, 3.61it/s] INFO 04-19 08:16:07 [model_runner.py:1598] Graph capturing finished in 10 secs, took 0.24 GiB INFO 04-19 08:16:07 [llm_engine.py:448] init engine (profile, create kv cache, warmup model) took 54.57 seconds WARNING 04-19 08:16:07 [config.py:1094] Default sampling parameters have been overridden by the models Hugging Face generation config recommended from the model creator. If this is not intended, please relaunch vLLM instance with --generation-config vllm. INFO 04-19 08:16:07 [serving_chat.py:117] Using default chat sampling params from model: {temperature: 0.6, top_p: 0.95} INFO 04-19 08:16:07 [serving_completion.py:61] Using default completion sampling params from model: {temperature: 0.6, top_p: 0.95} INFO 04-19 08:16:07 [api_server.py:1081] Starting vLLM API server on http://0.0.0.0:8100 INFO 04-19 08:16:07 [launcher.py:26] Available routes are: INFO 04-19 08:16:07 [launcher.py:34] Route: /openapi.json, Methods: HEAD, GET INFO 04-19 08:16:07 [launcher.py:34] Route: /docs, Methods: HEAD, GET INFO 04-19 08:16:07 [launcher.py:34] Route: /docs/oauth2-redirect, Methods: HEAD, GET INFO 04-19 08:16:07 [launcher.py:34] Route: /redoc, Methods: HEAD, GET INFO 04-19 08:16:07 [launcher.py:34] Route: /health, Methods: GET INFO 04-19 08:16:07 [launcher.py:34] Route: /load, Methods: GET INFO 04-19 08:16:07 [launcher.py:34] Route: /ping, Methods: POST, GET INFO 04-19 08:16:07 [launcher.py:34] Route: /tokenize, Methods: POST INFO 04-19 08:16:07 [launcher.py:34] Route: /detokenize, Methods: POST INFO 04-19 08:16:07 [launcher.py:34] Route: /v1/models, Methods: GET INFO 04-19 08:16:07 [launcher.py:34] Route: /version, Methods: GET INFO 04-19 08:16:07 [launcher.py:34] Route: /v1/chat/completions, Methods: POST INFO 04-19 08:16:07 [launcher.py:34] Route: /v1/completions, Methods: POST INFO 04-19 08:16:07 [launcher.py:34] Route: /v1/embeddings, Methods: POST INFO 04-19 08:16:07 [launcher.py:34] Route: /pooling, Methods: POST INFO 04-19 08:16:07 [launcher.py:34] Route: /score, Methods: POST INFO 04-19 08:16:07 [launcher.py:34] Route: /v1/score, Methods: POST INFO 04-19 08:16:07 [launcher.py:34] Route: /v1/audio/transcriptions, Methods: POST INFO 04-19 08:16:07 [launcher.py:34] Route: /rerank, Methods: POST INFO 04-19 08:16:07 [launcher.py:34] Route: /v1/rerank, Methods: POST INFO 04-19 08:16:07 [launcher.py:34] Route: /v2/rerank, Methods: POST INFO 04-19 08:16:07 [launcher.py:34] Route: /invocations, Methods: POST INFO 04-19 08:16:07 [launcher.py:34] Route: /metrics, Methods: GET INFO: Started server process [71] INFO: Waiting for application startup. INFO: Application startup complete. INFO: 223.160.206.192:62698 - OPTIONS /v1/models HTTP/1.1 200 OK INFO: 223.160.206.192:62698 - GET /v1/models HTTP/1.1 200 OK INFO: 223.160.206.192:62699 - GET /v1/models HTTP/1.1 200 OK INFO: 223.160.206.192:62700 - GET /v1/models HTTP/1.1 200 OK INFO: 223.160.206.192:62701 - GET /v1/models HTTP/1.1 200 OK INFO: 223.160.206.192:62702 - GET /v1/models HTTP/1.1 200 OK INFO: 223.160.206.192:62703 - GET /v1/models HTTP/1.1 200 OK INFO: 223.160.206.192:62704 - OPTIONS /v1/chat/completions HTTP/1.1 200 OK INFO 04-19 08:30:50 [chat_utils.py:396] Detected the chat template content format to be string. You can set --chat-template-content-format to override this. INFO 04-19 08:30:50 [logger.py:39] Received request chatcmpl-fa95118d98274ce79820a32a2c0fbb1f: prompt: begin▁of▁sentenceUser你是谁Assistantthink\n, params: SamplingParams(n1, presence_penalty0.0, frequency_penalty0.0, repetition_penalty1.0, temperature0.6, top_p0.95, top_k-1, min_p0.0, ppl_measurementFalse, seedNone, stop[], stop_token_ids[], bad_words[], include_stop_str_in_outputFalse, ignore_eosFalse, max_tokens131064, min_tokens0, logprobsNone, prompt_logprobsNone, skip_special_tokensTrue, spaces_between_special_tokensTrue, truncate_prompt_tokensNone, guided_decodingNone, extra_argsNone), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None. INFO: 223.160.206.192:62704 - POST /v1/chat/completions HTTP/1.1 200 OK INFO 04-19 08:30:50 [engine.py:310] Added request chatcmpl-fa95118d98274ce79820a32a2c0fbb1f. INFO 04-19 08:30:53 [metrics.py:489] Avg prompt throughput: 1.6 tokens/s, Avg generation throughput: 14.4 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%. INFO 04-19 08:30:54 [logger.py:39] Received request chatcmpl-439caf5427184211b2c8a8e61a1db25d: prompt: begin▁of▁sentenceUser### Task:\nGenerate a concise, 3-5 word title with an emoji summarizing the chat history.\n### Guidelines:\n- The title should clearly represent the main theme or subject of the conversation.\n- Use emojis that enhance understanding of the topic, but avoid quotation marks or special formatting.\n- Write the title in the chat\s primary language; default to English if multilingual.\n- Prioritize accuracy over excessive creativity; keep it clear and simple.\n### Output:\nJSON format: { title: your concise title here }\n### Examples:\n- { title: Stock Market Trends },\n- { title: Perfect Chocolate Chip Recipe },\n- { title: Evolution of Music Streaming },\n- { title: Remote Work Productivity Tips },\n- { title: Artificial Intelligence in Healthcare },\n- { title: Video Game Development Insights }\n### Chat History:\nchat_history\nUSER: 你是谁\nASSISTANT: 您好我是由中国的深度求索DeepSeek公司开发的智能助手DeepSeek-R1。如您有任何任何问题我会尽我所能为您提供帮助。\n/think\n\n您好我是由中国的深度求索DeepSeek公司开发的智能助手DeepSeek-R1。如您有任何任何问题我会尽我所能为您提供帮助。\n/chat_historyAssistantthink\n, params: SamplingParams(n1, presence_penalty0.0, frequency_penalty0.0, repetition_penalty1.0, temperature0.6, top_p0.95, top_k-1, min_p0.0, ppl_measurementFalse, seedNone, stop[], stop_token_ids[], bad_words[], include_stop_str_in_outputFalse, ignore_eosFalse, max_tokens1000, min_tokens0, logprobsNone, prompt_logprobsNone, skip_special_tokensTrue, spaces_between_special_tokensTrue, truncate_prompt_tokensNone, guided_decodingNone, extra_argsNone), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None. INFO 04-19 08:30:54 [engine.py:310] Added request chatcmpl-439caf5427184211b2c8a8e61a1db25d. INFO 04-19 08:30:58 [metrics.py:489] Avg prompt throughput: 57.5 tokens/s, Avg generation throughput: 22.4 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%. INFO: 223.160.206.192:62704 - POST /v1/chat/completions HTTP/1.1 200 OK INFO 04-19 08:31:01 [logger.py:39] Received request chatcmpl-8c4f8f4fc3fd40b48f62b822c7154e05: prompt: begin▁of▁sentenceUser### Task:\nGenerate 1-3 broad tags categorizing the main themes of the chat history, along with 1-3 more specific subtopic tags.\n\n### Guidelines:\n- Start with high-level domains (e.g. Science, Technology, Philosophy, Arts, Politics, Business, Health, Sports, Entertainment, Education)\n- Consider including relevant subfields/subdomains if they are strongly represented throughout the conversation\n- If content is too short (less than 3 messages) or too diverse, use only [General]\n- Use the chat\s primary language; default to English if multilingual\n- Prioritize accuracy over specificity\n\n### Output:\nJSON format: { tags: [tag1, tag2, tag3] }\n\n### Chat History:\nchat_history\nUSER: 你是谁\nASSISTANT: 您好我是由中国的深度求索DeepSeek公司开发的智能助手DeepSeek-R1。如您有任何任何问题我会尽我所能为您提供帮助。\n/think\n\n您好我是由中国的深度求索DeepSeek公司开发的智能助手DeepSeek-R1。如您有任何任何问题我会尽我所能为您提供帮助。\n/chat_historyAssistantthink\n, params: SamplingParams(n1, presence_penalty0.0, frequency_penalty0.0, repetition_penalty1.0, temperature0.6, top_p0.95, top_k-1, min_p0.0, ppl_measurementFalse, seedNone, stop[], stop_token_ids[], bad_words[], include_stop_str_in_outputFalse, ignore_eosFalse, max_tokens130822, min_tokens0, logprobsNone, prompt_logprobsNone, skip_special_tokensTrue, spaces_between_special_tokensTrue, truncate_prompt_tokensNone, guided_decodingNone, extra_argsNone), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None. INFO 04-19 08:31:01 [engine.py:310] Added request chatcmpl-8c4f8f4fc3fd40b48f62b822c7154e05. INFO 04-19 08:31:03 [metrics.py:489] Avg prompt throughput: 49.0 tokens/s, Avg generation throughput: 26.3 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%. INFO 04-19 08:31:08 [metrics.py:489] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 46.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%. INFO: 223.160.206.192:62704 - POST /v1/chat/completions HTTP/1.1 200 OK INFO 04-19 08:31:20 [metrics.py:489] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 6.2 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%. INFO 04-19 08:31:30 [metrics.py:489] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%. INFO 04-19 08:35:21 [logger.py:39] Received request chatcmpl-bb908ecb4c1f46749eb5145d9272b85d: prompt: begin▁of▁sentenceUser你是谁Assistant\n\n您好我是由中国的深度求索DeepSeek公司开发的智能助手DeepSeek-R1。如您有任何任何问题我会尽我所能为您提供帮助。end▁of▁sentenceUser虎鲸是鱼吗Assistantthink\n, params: SamplingParams(n1, presence_penalty0.0, frequency_penalty0.0, repetition_penalty1.0, temperature0.6, top_p0.95, top_k-1, min_p0.0, ppl_measurementFalse, seedNone, stop[], stop_token_ids[], bad_words[], include_stop_str_in_outputFalse, ignore_eosFalse, max_tokens131019, min_tokens0, logprobsNone, prompt_logprobsNone, skip_special_tokensTrue, spaces_between_special_tokensTrue, truncate_prompt_tokensNone, guided_decodingNone, extra_argsNone), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None. INFO: 223.160.206.192:62706 - POST /v1/chat/completions HTTP/1.1 200 OK INFO 04-19 08:35:21 [engine.py:310] Added request chatcmpl-bb908ecb4c1f46749eb5145d9272b85d. INFO 04-19 08:35:25 [metrics.py:489] Avg prompt throughput: 10.6 tokens/s, Avg generation throughput: 22.8 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%. INFO 04-19 08:35:30 [metrics.py:489] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 46.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%. INFO 04-19 08:35:40 [metrics.py:489] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.采用了容器化方式端口 rootatl1g2r2u16gpu:/workspace# echo \)VLLM_PORT xxxx rootatl1g2r2u16gpu:/workspace#
可以本地通过公网ip访问配置连接 http://134.199.131.6:xxxx/v1验证通过Open WebUI可以正常访问服务器Token生成速度不错支持ROCm的GPU 参考官网 https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-distributions截止2025年6月官方适配支持ROCm的GPU GPUArchitectureLLVM targetSupportAMD Radeon RX 9070 XTRDNA4gfx1201✅ [5]AMD Radeon RX 9070 GRERDNA4gfx1201✅ [5]AMD Radeon RX 9070RDNA4gfx1201✅ [5]AMD Radeon RX 9060 XTRDNA4gfx1200✅ [5]AMD Radeon RX 7900 XTXRDNA3gfx1100✅AMD Radeon RX 7900 XTRDNA3gfx1100✅AMD Radeon RX 7900 GRERDNA3gfx1100✅ [5]AMD Radeon RX 7800 XTRDNA3gfx1101✅ [5]AMD Radeon VIIGCN5.1gfx906❌ 好消息是终于支持了RDNA4的新显卡不用可以买老款显卡了。坏消息是等了这么久笔者的7840hs核显780M依旧不被官方支持【但是Github有魔改版】 https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html https://github.com/ByronLeeeee/Ollama-For-AMD-Installer https://github.com/likelovewant/ollama-for-amd https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU后续使用魔改版ollama尝试下较LM Studio的Vulkan哪种方式token生成速度更可观转载请注明出处https://lizhiyong.blog.csdn.net/article/details/148623788

上一篇：花生壳做网站缺点工业设计网站哪个最
下一篇：华邦网站263企业邮箱注册申请

花西子网络营销策划方案郑州seo排名哪有

相关文章

花生壳做网站缺点工业设计网站哪个最

花生壳内网穿透网站如何做seo优化网站建设销售该学的

花生壳内网穿透网站如何做seo优化洛浦县网站建设

华邦网站263企业邮箱注册申请

华邦网站鄂州网站建设网络公司

华大基因网站建设公司南京网站建设公司开发

成都网站开发收费定制网站制作广州

成都网站开发培训网站上的百度地图标注咋样做

成都网站开发工资百度竞价产品

成都网站建设招标企业网络推广运营技巧

成都网站建设优点项目

成都网站建设赢展网络架构种类

花西子网络营销策划方案郑州seo排名哪有

相关文章

花生壳做网站缺点工业设计网站哪个最

花生壳内网穿透网站如何做seo优化网站建设销售该学的

花生壳内网穿透网站如何做seo优化洛浦县网站建设

华邦网站263企业邮箱注册申请

华邦网站鄂州网站建设网络公司

华大基因 网站建设公司南京网站建设公司开发

成都网站开发收费定制网站制作广州

成都网站开发培训网站上的百度地图标注咋样做

成都网站开发工资百度竞价产品

成都网站建设招标企业网络推广运营技巧

成都网站建设优点项目

成都网站建设赢展网络架构种类

华大基因网站建设公司南京网站建设公司开发