花西子网络营销策划方案郑州seo排名哪有
- 作者: 五速梦信息网
- 时间: 2026年03月21日 10:53
当前位置: 首页 > news >正文
花西子网络营销策划方案,郑州seo排名哪有,环保工程网站建设价格,有做挂名法人和股东的网站吗AMD的云上GPU运行Deepseek
笔者在2025年4月19下午有幸参加了 AMDCSDN的ROCM AI开发者交流大会#xff0c;使用AMD的云上GPU运行了Deepseek
本机安装Open web UI
之前是容器化方式部署的#xff1a;
https://lizhiyong.blog.csdn.net/article/details/145582453
这次…AMD的云上GPU运行Deepseek
笔者在2025年4月19下午有幸参加了 AMDCSDN的ROCM AI开发者交流大会使用AMD的云上GPU运行了Deepseek
本机安装Open web UI
之前是容器化方式部署的
https://lizhiyong.blog.csdn.net/article/details/145582453
这次换个方式因为要使用Python环境参考
https://lizhiyong.blog.csdn.net/article/details/127827522
安装Anaconda后几条命令即可拉起服务
conda env list
conda init
conda create -n py311 python3.11
conda activate py311
pip install open-webui
open-webui serve成功后可能还需要按一下ctrlc才能访问
http://127.0.0.1:8080服务器配置
rootatl1g2r2u16gpu:/workspace# rocm-smi ROCm System Management Interface Concise Info
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% (DID, GUID) (Junction) (Socket) (Mem, Compute, ID) 0 4 0x74a1, 47045 45.0°C 156.0W NPS1, SPX, 0 136Mhz 900Mhz 0% auto 750.0W 0% 0% End of ROCm SMI Log
rootatl1g2r2u16gpu:/workspace# rocminfo
ROCk module version 6.10.5 is loadedHSA System Attributes Runtime Version: 1.14
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
HSA Agents *******
Agent 1
******* Name: INTEL® XEON® PLATINUM 8568Y Uuid: CPU-XX Marketing Name: INTEL® XEON® PLATINUM 8568Y Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 49152(0xc000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 4000 BDFID: 0 Internal Node ID: 0 Compute Unit: 48 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Memory Properties: Features: NonePool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 1056335292(0x3ef665bc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINEDSize: 1056335292(0x3ef665bc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINEDSize: 1056335292(0x3ef665bc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 4 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 1056335292(0x3ef665bc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info:
Agent 2
******* Name: INTEL® XEON® PLATINUM 8568Y Uuid: CPU-XX Marketing Name: INTEL® XEON® PLATINUM 8568Y Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 1 Device Type: CPU Cache Info: L1: 49152(0xc000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 4000 BDFID: 0 Internal Node ID: 1 Compute Unit: 48 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Memory Properties: Features: NonePool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 1056940056(0x3effa018) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINEDSize: 1056940056(0x3effa018) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINEDSize: 1056940056(0x3effa018) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 4 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 1056940056(0x3effa018) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info:
Agent 3
******* Name: gfx942 Uuid: GPU-e927e74ce22946d1 Marketing Name: AMD Instinct MI300X Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 2 Device Type: GPU Cache Info: L1: 32(0x20) KB L2: 4096(0x1000) KB L3: 262144(0x40000) KB Chip ID: 29857(0x74a1) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2100 BDFID: 19968 Internal Node ID: 2 Compute Unit: 304 SIMDs per CU: 4 Shader Engines: 32 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Memory Properties: Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension:x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 2048(0x800) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension:x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 166 SDMA engine uCode:: 22 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 201310208(0xbffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINEDSize: 201310208(0xbffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 201310208(0xbffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 4 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Recommended Granule:0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa–gfx942:sramecc:xnack-Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension:x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension:x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32
*** Done ***
rootatl1g2r2u16gpu:/workspace# free -htotal used free shared buff/cache available
Mem: 2.0Ti 79Gi 1.5Ti 8.0Mi 432Gi 1.9Ti
Swap: 8.0Gi 0B 8.0Gi
rootatl1g2r2u16gpu:/workspace# cpuinfo
Python Version: 3.12.9.final.0 (64 bit)
Cpuinfo Version: 9.0.0
Vendor ID Raw: GenuineIntel
Hardware Raw:
Brand Raw: INTEL® XEON® PLATINUM 8568Y
Hz Advertised Friendly: 3.1935 GHz
Hz Actual Friendly: 3.1935 GHz
Hz Advertised: (3193491000, 0)
Hz Actual: (3193491000, 0)
Arch: X86_64
Bits: 64
Count: 96
Arch String Raw: x86_64
L1 Data Cache Size: 4.5 MiB
L1 Instruction Cache Size: 3145728
L2 Cache Size: 201326592
L2 Cache Line Size: 2048
L2 Cache Associativity: 7
L3 Cache Size: 314572800
Stepping: 2
Model: 207
Family: 6
Processor Type:
Flags: 3dnowprefetch, abm, acpi, adx, aes, amx_bf16, amx_int8, amx_tile, aperfmperf, apic, arat, arch_capabilities, arch_lbr, arch_perfmon, art, avx, avx2, avx512_bf16, avx512_bitalg, avx512_fp16, avx512_vbmi2, avx512_vnni, avx512_vpopcntdq, avx512bitalg, avx512bw, avx512cd, avx512dq, avx512f, avx512ifma, avx512vbmi, avx512vbmi2, avx512vl, avx512vnni, avx512vpopcntdq, avx_vnni, bmi1, bmi2, bts, bus_lock_detect, cat_l2, cat_l3, cdp_l2, cdp_l3, cldemote, clflush, clflushopt, clwb, cmov, constant_tsc, cpuid, cpuid_fault, cqm, cqm_llc, cqm_mbm_local, cqm_mbm_total, cqm_occup_llc, cx16, cx8, dca, de, ds_cpl, dtes64, dtherm, dts, enqcmd, epb, erms, est, f16c, flush_l1d, fma, fpu, fsgsbase, fsrm, fxsr, gfni, hfi, ht, ibpb, ibrs, ibrs_enhanced, ibt, ida, intel_pt, invpcid, la57, lahf_lm, lm, mba, mca, mce, md_clear, mmx, monitor, movbe, movdir64b, movdiri, msr, mtrr, nonstop_tsc, nopl, nx, ospke, osxsave, pae, pat, pbe, pcid, pclmulqdq, pconfig, pdcm, pdpe1gb, pebs, pge, pku, pln, pni, popcnt, pqe, pqm, pse, pse36, pts, rdpid, rdrand, rdrnd, rdseed, rdt_a, rdtscp, rep_good, sdbg, sep, serialize, sgx, sgx_lc, sha, sha_ni, smap, smep, smx, split_lock_detect, ss, ssbd, sse, sse2, sse4_1, sse4_2, ssse3, stibp, syscall, tm, tm2, tme, tsc, tsc_adjust, tsc_deadline_timer, tsc_known_freq, tscdeadline, tsxldtrk, umip, user_shstk, vaes, vme, vmx, vpclmulqdq, waitpkg, wbnoinvd, x2apic, xgetbv1, xsave, xsavec, xsaveopt, xsaves, xtopology, xtpr
rootatl1g2r2u16gpu:/workspace# 运行步骤
rootatl1g2r2u16gpu:/workspace# vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --host 0.0.0.0 --port \(VLLM_PORT \--api-key abc-123 \--trust-remote-code \--seed 42
INFO 04-19 08:13:42 [__init__.py:239] Automatically detected platform rocm.
INFO 04-19 08:13:43 [api_server.py:1034] vLLM API server version 0.8.3.dev349gb8498bc4a
INFO 04-19 08:13:43 [api_server.py:1035] args: Namespace(subparserserve, model_tagdeepseek-ai/DeepSeek-R1-Distill-Qwen-32B, config, host0.0.0.0, port8100, uvicorn_log_levelinfo, disable_uvicorn_access_logFalse, allow_credentialsFalse, allowed_origins[*], allowed_methods[*], allowed_headers[*], api_keyabc-123, lora_modulesNone, prompt_adaptersNone, chat_templateNone, chat_template_content_formatauto, response_roleassistant, ssl_keyfileNone, ssl_certfileNone, ssl_ca_certsNone, enable_ssl_refreshFalse, ssl_cert_reqs0, root_pathNone, middleware[], return_tokens_as_token_idsFalse, disable_frontend_multiprocessingFalse, enable_request_id_headersFalse, enable_auto_tool_choiceFalse, tool_call_parserNone, tool_parser_plugin, modeldeepseek-ai/DeepSeek-R1-Distill-Qwen-32B, taskauto, tokenizerNone, hf_config_pathNone, skip_tokenizer_initFalse, revisionNone, code_revisionNone, tokenizer_revisionNone, tokenizer_modeauto, trust_remote_codeTrue, allowed_local_media_pathNone, download_dirNone, load_formatauto, config_formatConfigFormat.AUTO: auto, dtypeauto, kv_cache_dtypeauto, max_model_lenNone, guided_decoding_backendxgrammar, logits_processor_patternNone, model_implauto, distributed_executor_backendNone, pipeline_parallel_size1, tensor_parallel_size1, data_parallel_size1, enable_expert_parallelFalse, max_parallel_loading_workersNone, ray_workers_use_nsightFalse, block_sizeNone, enable_prefix_cachingNone, prefix_caching_hash_algobuiltin, disable_sliding_windowFalse, use_v2_block_managerTrue, num_lookahead_slots0, seed42, swap_space4, cpu_offload_gb0, gpu_memory_utilization0.9, num_gpu_blocks_overrideNone, max_num_batched_tokensNone, max_num_partial_prefills1, max_long_partial_prefills1, long_prefill_token_threshold0, max_num_seqsNone, max_logprobs20, disable_log_statsFalse, quantizationNone, rope_scalingNone, rope_thetaNone, hf_tokenNone, hf_overridesNone, enforce_eagerFalse, max_seq_len_to_capture8192, disable_custom_all_reduceFalse, tokenizer_pool_size0, tokenizer_pool_typeray, tokenizer_pool_extra_configNone, limit_mm_per_promptNone, mm_processor_kwargsNone, disable_mm_preprocessor_cacheFalse, enable_loraFalse, enable_lora_biasFalse, max_loras1, max_lora_rank16, lora_extra_vocab_size256, lora_dtypeauto, long_lora_scaling_factorsNone, max_cpu_lorasNone, fully_sharded_lorasFalse, enable_prompt_adapterFalse, max_prompt_adapters1, max_prompt_adapter_token0, deviceauto, num_scheduler_steps1, use_tqdm_on_loadTrue, multi_step_stream_outputsTrue, scheduler_delay_factor0.0, enable_chunked_prefillNone, speculative_configNone, model_loader_extra_configNone, ignore_patterns[], preemption_modeNone, served_model_nameNone, qlora_adapter_name_or_pathNone, show_hidden_metrics_for_versionNone, otlp_traces_endpointNone, collect_detailed_tracesNone, disable_async_output_procFalse, scheduling_policyfcfs, scheduler_clsvllm.core.scheduler.Scheduler, override_neuron_configNone, override_pooler_configNone, compilation_configNone, kv_transfer_configNone, worker_clsauto, worker_extension_cls, generation_configauto, override_generation_configNone, enable_sleep_modeFalse, calculate_kv_scalesFalse, additional_configNone, enable_reasoningFalse, reasoning_parserNone, disable_cascade_attnFalse, disable_log_requestsFalse, max_log_lenNone, disable_fastapi_docsFalse, enable_prompt_tokens_detailsFalse, enable_server_load_trackingFalse, dispatch_functionfunction ServeSubcommand.cmd at 0x730f25e351c0)
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 664/664 [00:0000:00, 8.19MB/s]
INFO 04-19 08:13:55 [config.py:604] This model supports multiple tasks: {classify, embed, reward, score, generate}. Defaulting to generate.
INFO 04-19 08:13:55 [arg_utils.py:1735] rocm is experimental on VLLM_USE_V11. Falling back to V0 Engine.
WARNING 04-19 08:13:55 [arg_utils.py:1597] The model has a long context length (131072). This may causeOOM during the initial memory profiling phase, or result in low performance due to small KV cache size. Consider setting --max-model-len to a smaller value.
INFO 04-19 08:13:59 [api_server.py:246] Started engine process with PID 225
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.07k/3.07k [00:0000:00, 38.8MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.03M/7.03M [00:0000:00, 27.0MB/s]
INFO 04-19 08:14:01 [__init__.py:239] Automatically detected platform rocm.
INFO 04-19 08:14:02 [llm_engine.py:242] Initializing a V0 LLM engine (v0.8.3.dev349gb8498bc4a) with config: modeldeepseek-ai/DeepSeek-R1-Distill-Qwen-32B, speculative_configNone, tokenizerdeepseek-ai/DeepSeek-R1-Distill-Qwen-32B, skip_tokenizer_initFalse, tokenizer_modeauto, revisionNone, override_neuron_configNone, tokenizer_revisionNone, trust_remote_codeTrue, dtypetorch.bfloat16, max_seq_len131072, download_dirNone, load_formatLoadFormat.AUTO, tensor_parallel_size1, pipeline_parallel_size1, disable_custom_all_reduceFalse, quantizationNone, enforce_eagerFalse, kv_cache_dtypeauto, device_configcuda, decoding_configDecodingConfig(guided_decoding_backendxgrammar, reasoning_backendNone), observability_configObservabilityConfig(show_hidden_metricsFalse, otlp_traces_endpointNone, collect_model_forward_timeFalse, collect_model_execute_timeFalse), seed42, served_model_namedeepseek-ai/DeepSeek-R1-Distill-Qwen-32B, num_scheduler_steps1, multi_step_stream_outputsTrue, enable_prefix_cachingNone, chunked_prefill_enabledFalse, use_async_output_procTrue, disable_mm_preprocessor_cacheFalse, mm_processor_kwargsNone, pooler_configNone, compilation_config{splitting_ops:[],compile_sizes:[],cudagraph_capture_sizes:[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],max_capture_size:256}, use_cached_outputsTrue,
generation_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 181/181 [00:0000:00, 1.97MB/s]
INFO 04-19 08:14:05 [utils.py:746] Port 8100 is already in use, trying port 8101
INFO 04-19 08:14:10 [rocm.py:181] None is not supported in AMD GPUs.
INFO 04-19 08:14:10 [rocm.py:182] Using ROCmFlashAttention backend.
INFO 04-19 08:14:10 [parallel_state.py:957] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0
INFO 04-19 08:14:10 [model_runner.py:1110] Starting to load model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B...
WARNING 04-19 08:14:10 [rocm.py:283] Model architecture Qwen2ForCausalLM is partially supported by ROCm: Sliding window attention (SWA) is not yet supported in Triton flash attention. For half-precision SWA support, please use CK flash attention by setting VLLM_USE_TRITON_FLASH_ATTN0
INFO 04-19 08:14:11 [weight_utils.py:265] Using model weights format [*.safetensors]
model-00008-of-000008.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.07G/4.07G [00:1700:00, 239MB/s]
model-00001-of-000008.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8.79G/8.79G [00:3500:00, 246MB/s]
model-00003-of-000008.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8.78G/8.78G [00:3600:00, 239MB/s]
model-00004-of-000008.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8.78G/8.78G [00:3700:00, 235MB/s]
model-00006-of-000008.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8.78G/8.78G [00:3700:00, 234MB/s]
model-00005-of-000008.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8.78G/8.78G [00:3700:00, 233MB/s]
model-00002-of-000008.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8.78G/8.78G [00:3700:00, 233MB/s]
model-00007-of-000008.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8.78G/8.78G [00:3700:00, 232MB/s]
INFO 04-19 08:14:49 [weight_utils.py:281] Time spent downloading weights for deepseek-ai/DeepSeek-R1-Distill-Qwen-32B: 37.866222 seconds████████████████████████████▏| 8.72G/8.78G [00:3700:00, 476MB/s]
model.safetensors.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64.0k/64.0k [00:0000:00, 47.3MB/s]
Loading safetensors checkpoint shards: 0% Completed | 0/8 [00:00?, ?it/s]█████████████████████████████████████████████████████████████████████████████████████████| 8.78G/8.78G [00:3700:00, 485MB/s]
Loading safetensors checkpoint shards: 12% Completed | 1/8 [00:0300:21, 3.08s/it]█████████████████████████████████████████████████████████████████████████████▌ | 8.54G/8.78G [00:3700:00, 424MB/s]
Loading safetensors checkpoint shards: 25% Completed | 2/8 [00:0600:19, 3.23s/it]██████████████████████████████████████████████████████████████████████████████▌ | 8.61G/8.78G [00:3700:00, 498MB/s]
Loading safetensors checkpoint shards: 38% Completed | 3/8 [00:0900:15, 3.17s/it]
Loading safetensors checkpoint shards: 50% Completed | 4/8 [00:1100:10, 2.54s/it]███████████████████████████████████████████████████████████████████████████████▉ | 8.70G/8.78G [00:3700:00, 613MB/s]
Loading safetensors checkpoint shards: 62% Completed | 5/8 [00:1400:08, 2.70s/it]
Loading safetensors checkpoint shards: 75% Completed | 6/8 [00:1700:05, 2.84s/it]
Loading safetensors checkpoint shards: 88% Completed | 7/8 [00:2000:02, 2.89s/it]
Loading safetensors checkpoint shards: 100% Completed | 8/8 [00:2300:00, 2.93s/it]
Loading safetensors checkpoint shards: 100% Completed | 8/8 [00:2300:00, 2.90s/it]INFO 04-19 08:15:12 [loader.py:458] Loading weights took 23.41 seconds
INFO 04-19 08:15:12 [model_runner.py:1146] Model loading took 61.3008 GiB and 61.724195 seconds
INFO 04-19 08:15:56 [worker.py:295] Memory profiling takes 43.98 seconds
INFO 04-19 08:15:56 [worker.py:295] the current vLLM instance can use total_gpu_memory (191.98GiB) x gpu_memory_utilization (0.90) 172.79GiB
INFO 04-19 08:15:56 [worker.py:295] model weights take 61.30GiB; non_torch_memory takes 0.96GiB; PyTorch activation peak memory takes 25.45GiB; the rest of the memory reserved for KV Cache is 85.07GiB.
INFO 04-19 08:15:56 [executor_base.py:112] # rocm blocks: 21777, # CPU blocks: 1024
INFO 04-19 08:15:56 [executor_base.py:117] Maximum concurrency for 131072 tokens per request: 2.66x
INFO 04-19 08:15:57 [model_runner.py:1456] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set enforce_eagerTrue or use --enforce-eager in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing gpu_memory_utilization or switching to eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage.
Capturing CUDA graph shapes: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:0900:00, 3.61it/s]
INFO 04-19 08:16:07 [model_runner.py:1598] Graph capturing finished in 10 secs, took 0.24 GiB
INFO 04-19 08:16:07 [llm_engine.py:448] init engine (profile, create kv cache, warmup model) took 54.57 seconds
WARNING 04-19 08:16:07 [config.py:1094] Default sampling parameters have been overridden by the models Hugging Face generation config recommended from the model creator. If this is not intended, please relaunch vLLM instance with --generation-config vllm.
INFO 04-19 08:16:07 [serving_chat.py:117] Using default chat sampling params from model: {temperature: 0.6, top_p: 0.95}
INFO 04-19 08:16:07 [serving_completion.py:61] Using default completion sampling params from model: {temperature: 0.6, top_p: 0.95}
INFO 04-19 08:16:07 [api_server.py:1081] Starting vLLM API server on http://0.0.0.0:8100
INFO 04-19 08:16:07 [launcher.py:26] Available routes are:
INFO 04-19 08:16:07 [launcher.py:34] Route: /openapi.json, Methods: HEAD, GET
INFO 04-19 08:16:07 [launcher.py:34] Route: /docs, Methods: HEAD, GET
INFO 04-19 08:16:07 [launcher.py:34] Route: /docs/oauth2-redirect, Methods: HEAD, GET
INFO 04-19 08:16:07 [launcher.py:34] Route: /redoc, Methods: HEAD, GET
INFO 04-19 08:16:07 [launcher.py:34] Route: /health, Methods: GET
INFO 04-19 08:16:07 [launcher.py:34] Route: /load, Methods: GET
INFO 04-19 08:16:07 [launcher.py:34] Route: /ping, Methods: POST, GET
INFO 04-19 08:16:07 [launcher.py:34] Route: /tokenize, Methods: POST
INFO 04-19 08:16:07 [launcher.py:34] Route: /detokenize, Methods: POST
INFO 04-19 08:16:07 [launcher.py:34] Route: /v1/models, Methods: GET
INFO 04-19 08:16:07 [launcher.py:34] Route: /version, Methods: GET
INFO 04-19 08:16:07 [launcher.py:34] Route: /v1/chat/completions, Methods: POST
INFO 04-19 08:16:07 [launcher.py:34] Route: /v1/completions, Methods: POST
INFO 04-19 08:16:07 [launcher.py:34] Route: /v1/embeddings, Methods: POST
INFO 04-19 08:16:07 [launcher.py:34] Route: /pooling, Methods: POST
INFO 04-19 08:16:07 [launcher.py:34] Route: /score, Methods: POST
INFO 04-19 08:16:07 [launcher.py:34] Route: /v1/score, Methods: POST
INFO 04-19 08:16:07 [launcher.py:34] Route: /v1/audio/transcriptions, Methods: POST
INFO 04-19 08:16:07 [launcher.py:34] Route: /rerank, Methods: POST
INFO 04-19 08:16:07 [launcher.py:34] Route: /v1/rerank, Methods: POST
INFO 04-19 08:16:07 [launcher.py:34] Route: /v2/rerank, Methods: POST
INFO 04-19 08:16:07 [launcher.py:34] Route: /invocations, Methods: POST
INFO 04-19 08:16:07 [launcher.py:34] Route: /metrics, Methods: GET
INFO: Started server process [71]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: 223.160.206.192:62698 - OPTIONS /v1/models HTTP/1.1 200 OK
INFO: 223.160.206.192:62698 - GET /v1/models HTTP/1.1 200 OK
INFO: 223.160.206.192:62699 - GET /v1/models HTTP/1.1 200 OK
INFO: 223.160.206.192:62700 - GET /v1/models HTTP/1.1 200 OK
INFO: 223.160.206.192:62701 - GET /v1/models HTTP/1.1 200 OK
INFO: 223.160.206.192:62702 - GET /v1/models HTTP/1.1 200 OK
INFO: 223.160.206.192:62703 - GET /v1/models HTTP/1.1 200 OK
INFO: 223.160.206.192:62704 - OPTIONS /v1/chat/completions HTTP/1.1 200 OK
INFO 04-19 08:30:50 [chat_utils.py:396] Detected the chat template content format to be string. You can set --chat-template-content-format to override this.
INFO 04-19 08:30:50 [logger.py:39] Received request chatcmpl-fa95118d98274ce79820a32a2c0fbb1f: prompt: begin▁of▁sentenceUser你是谁Assistantthink\n, params: SamplingParams(n1, presence_penalty0.0, frequency_penalty0.0, repetition_penalty1.0, temperature0.6, top_p0.95, top_k-1, min_p0.0, ppl_measurementFalse, seedNone, stop[], stop_token_ids[], bad_words[], include_stop_str_in_outputFalse, ignore_eosFalse, max_tokens131064, min_tokens0, logprobsNone, prompt_logprobsNone, skip_special_tokensTrue, spaces_between_special_tokensTrue, truncate_prompt_tokensNone, guided_decodingNone, extra_argsNone), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.
INFO: 223.160.206.192:62704 - POST /v1/chat/completions HTTP/1.1 200 OK
INFO 04-19 08:30:50 [engine.py:310] Added request chatcmpl-fa95118d98274ce79820a32a2c0fbb1f.
INFO 04-19 08:30:53 [metrics.py:489] Avg prompt throughput: 1.6 tokens/s, Avg generation throughput: 14.4 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 04-19 08:30:54 [logger.py:39] Received request chatcmpl-439caf5427184211b2c8a8e61a1db25d: prompt: begin▁of▁sentenceUser### Task:\nGenerate a concise, 3-5 word title with an emoji summarizing the chat history.\n### Guidelines:\n- The title should clearly represent the main theme or subject of the conversation.\n- Use emojis that enhance understanding of the topic, but avoid quotation marks or special formatting.\n- Write the title in the chat\s primary language; default to English if multilingual.\n- Prioritize accuracy over excessive creativity; keep it clear and simple.\n### Output:\nJSON format: { title: your concise title here }\n### Examples:\n- { title: Stock Market Trends },\n- { title: Perfect Chocolate Chip Recipe },\n- { title: Evolution of Music Streaming },\n- { title: Remote Work Productivity Tips },\n- { title: Artificial Intelligence in Healthcare },\n- { title: Video Game Development Insights }\n### Chat History:\nchat_history\nUSER: 你是谁\nASSISTANT: 您好我是由中国的深度求索DeepSeek公司开发的智能助手DeepSeek-R1。如您有任何任何问题我会尽我所能为您提供帮助。\n/think\n\n您好我是由中国的深度求索DeepSeek公司开发的智能助手DeepSeek-R1。如您有任何任何问题我会尽我所能为您提供帮助。\n/chat_historyAssistantthink\n, params: SamplingParams(n1, presence_penalty0.0, frequency_penalty0.0, repetition_penalty1.0, temperature0.6, top_p0.95, top_k-1, min_p0.0, ppl_measurementFalse, seedNone, stop[], stop_token_ids[], bad_words[], include_stop_str_in_outputFalse, ignore_eosFalse, max_tokens1000, min_tokens0, logprobsNone, prompt_logprobsNone, skip_special_tokensTrue, spaces_between_special_tokensTrue, truncate_prompt_tokensNone, guided_decodingNone, extra_argsNone), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.
INFO 04-19 08:30:54 [engine.py:310] Added request chatcmpl-439caf5427184211b2c8a8e61a1db25d.
INFO 04-19 08:30:58 [metrics.py:489] Avg prompt throughput: 57.5 tokens/s, Avg generation throughput: 22.4 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%.
INFO: 223.160.206.192:62704 - POST /v1/chat/completions HTTP/1.1 200 OK
INFO 04-19 08:31:01 [logger.py:39] Received request chatcmpl-8c4f8f4fc3fd40b48f62b822c7154e05: prompt: begin▁of▁sentenceUser### Task:\nGenerate 1-3 broad tags categorizing the main themes of the chat history, along with 1-3 more specific subtopic tags.\n\n### Guidelines:\n- Start with high-level domains (e.g. Science, Technology, Philosophy, Arts, Politics, Business, Health, Sports, Entertainment, Education)\n- Consider including relevant subfields/subdomains if they are strongly represented throughout the conversation\n- If content is too short (less than 3 messages) or too diverse, use only [General]\n- Use the chat\s primary language; default to English if multilingual\n- Prioritize accuracy over specificity\n\n### Output:\nJSON format: { tags: [tag1, tag2, tag3] }\n\n### Chat History:\nchat_history\nUSER: 你是谁\nASSISTANT: 您好我是由中国的深度求索DeepSeek公司开发的智能助手DeepSeek-R1。如您有任何任何问题我会尽我所能为您提供帮助。\n/think\n\n您好我是由中国的深度求索DeepSeek公司开发的智能助手DeepSeek-R1。如您有任何任何问题我会尽我所能为您提供帮助。\n/chat_historyAssistantthink\n, params: SamplingParams(n1, presence_penalty0.0, frequency_penalty0.0, repetition_penalty1.0, temperature0.6, top_p0.95, top_k-1, min_p0.0, ppl_measurementFalse, seedNone, stop[], stop_token_ids[], bad_words[], include_stop_str_in_outputFalse, ignore_eosFalse, max_tokens130822, min_tokens0, logprobsNone, prompt_logprobsNone, skip_special_tokensTrue, spaces_between_special_tokensTrue, truncate_prompt_tokensNone, guided_decodingNone, extra_argsNone), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.
INFO 04-19 08:31:01 [engine.py:310] Added request chatcmpl-8c4f8f4fc3fd40b48f62b822c7154e05.
INFO 04-19 08:31:03 [metrics.py:489] Avg prompt throughput: 49.0 tokens/s, Avg generation throughput: 26.3 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%.
INFO 04-19 08:31:08 [metrics.py:489] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 46.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%.
INFO: 223.160.206.192:62704 - POST /v1/chat/completions HTTP/1.1 200 OK
INFO 04-19 08:31:20 [metrics.py:489] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 6.2 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 04-19 08:31:30 [metrics.py:489] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 04-19 08:35:21 [logger.py:39] Received request chatcmpl-bb908ecb4c1f46749eb5145d9272b85d: prompt: begin▁of▁sentenceUser你是谁Assistant\n\n您好我是由中国的深度求索DeepSeek公司开发的智能助手DeepSeek-R1。如您有任何任何问题我会尽我所能为您提供帮助。end▁of▁sentenceUser虎鲸是鱼吗Assistantthink\n, params: SamplingParams(n1, presence_penalty0.0, frequency_penalty0.0, repetition_penalty1.0, temperature0.6, top_p0.95, top_k-1, min_p0.0, ppl_measurementFalse, seedNone, stop[], stop_token_ids[], bad_words[], include_stop_str_in_outputFalse, ignore_eosFalse, max_tokens131019, min_tokens0, logprobsNone, prompt_logprobsNone, skip_special_tokensTrue, spaces_between_special_tokensTrue, truncate_prompt_tokensNone, guided_decodingNone, extra_argsNone), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.
INFO: 223.160.206.192:62706 - POST /v1/chat/completions HTTP/1.1 200 OK
INFO 04-19 08:35:21 [engine.py:310] Added request chatcmpl-bb908ecb4c1f46749eb5145d9272b85d.
INFO 04-19 08:35:25 [metrics.py:489] Avg prompt throughput: 10.6 tokens/s, Avg generation throughput: 22.8 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%.
INFO 04-19 08:35:30 [metrics.py:489] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 46.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 04-19 08:35:40 [metrics.py:489] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.采用了容器化方式
端口
rootatl1g2r2u16gpu:/workspace# echo \)VLLM_PORT
xxxx
rootatl1g2r2u16gpu:/workspace#
可以本地通过公网ip访问
配置连接
http://134.199.131.6:xxxx/v1验证 通过Open WebUI可以正常访问服务器Token生成速度不错
支持ROCm的GPU
参考官网
https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-distributions截止2025年6月官方适配支持ROCm的GPU
GPUArchitectureLLVM targetSupportAMD Radeon RX 9070 XTRDNA4gfx1201✅ [5]AMD Radeon RX 9070 GRERDNA4gfx1201✅ [5]AMD Radeon RX 9070RDNA4gfx1201✅ [5]AMD Radeon RX 9060 XTRDNA4gfx1200✅ [5]AMD Radeon RX 7900 XTXRDNA3gfx1100✅AMD Radeon RX 7900 XTRDNA3gfx1100✅AMD Radeon RX 7900 GRERDNA3gfx1100✅ [5]AMD Radeon RX 7800 XTRDNA3gfx1101✅ [5]AMD Radeon VIIGCN5.1gfx906❌
好消息是终于支持了RDNA4的新显卡不用可以买老款显卡了。
坏消息是等了这么久笔者的7840hs核显780M依旧不被官方支持【但是Github有魔改版】
https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html
https://github.com/ByronLeeeee/Ollama-For-AMD-Installer
https://github.com/likelovewant/ollama-for-amd
https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU后续使用魔改版ollama尝试下较LM Studio的Vulkan哪种方式token生成速度更可观
转载请注明出处https://lizhiyong.blog.csdn.net/article/details/148623788
- 上一篇: 花生壳做网站缺点工业设计网站哪个最
- 下一篇: 华邦网站263企业邮箱注册申请
相关文章
-
花生壳做网站缺点工业设计网站哪个最
花生壳做网站缺点工业设计网站哪个最
- 技术栈
- 2026年03月21日
-
花生壳内网穿透网站如何做seo优化网站建设销售该学的
花生壳内网穿透网站如何做seo优化网站建设销售该学的
- 技术栈
- 2026年03月21日
-
花生壳内网穿透网站如何做seo优化洛浦县网站建设
花生壳内网穿透网站如何做seo优化洛浦县网站建设
- 技术栈
- 2026年03月21日
-
华邦网站263企业邮箱注册申请
华邦网站263企业邮箱注册申请
- 技术栈
- 2026年03月21日
-
华邦网站鄂州网站建设网络公司
华邦网站鄂州网站建设网络公司
- 技术栈
- 2026年03月21日
-
华大基因 网站建设公司南京网站建设公司开发
华大基因 网站建设公司南京网站建设公司开发
- 技术栈
- 2026年03月21日
