宁波做网站有哪些公司公司h5制作模板免费版

当前位置: 首页 > news >正文

宁波做网站有哪些公司公司,h5制作模板免费版,安阳吧贴吧,发布做网站需求qq群目录 摘要 SwinTransformer原理 代码实现 YOLOv8详细添加步骤 ymal文件内容 one_swinTrans three_swinTrans 启动命令 完整代码分享 摘要 Swin Transformer通过引入创新的分层注意力机制展现了其架构的独特性#xff0c;该机制通过将注意力区域划分为块并在这些块内执…目录 摘要 SwinTransformer原理 代码实现 YOLOv8详细添加步骤 ymal文件内容 one_swinTrans three_swinTrans 启动命令 完整代码分享 摘要 Swin Transformer通过引入创新的分层注意力机制展现了其架构的独特性该机制通过将注意力区域划分为块并在这些块内执行操作从而有效降低了计算复杂性。其主要结构呈现分层形式每个阶段包括一组基础块负责捕捉不同层次的特征表示形成了分层的特征提取过程。采用多尺度的注意力机制使得模型能够同时关注不同大小的特征从而提高对图像中不同尺度信息的感知。在多个图像分类基准数据集上Swin Transformer表现出与其他先进模型相媲美甚至更优的性能且在相对较少的参数和计算成本下取得出色的结果。其模块化设计使得它在目标检测和语义分割等其他计算机视觉任务上也具备良好的通用性。 SwinTransformer原理 Swin Transformer 的一个关键设计元素是连续自注意力层之间窗口分区的移动如图所示。移动的窗口桥接了前一层的窗口提供了它们之间的连接从而显着增强了建模能力。这种策略在现实世界的延迟方面也很有效窗口内的所有查询补丁共享相同的密钥这有利于硬件中的内存访问。相比之下早期基于滑动窗口的自注意力方法 由于不同查询像素的键集不同因此在通用硬件上延迟较低。 Swin Transformer 架构中计算自注意力的移位窗口方法的图示 下图概述了 Swin Transformer 架构其中展示了微型版本 。它首先通过补丁分割模块如 ViT将输入 RGB 图像分割成不重叠的补丁。每个补丁都被视为一个“token”其特征被设置为原始像素 RGB 值的串联。在我们的实现中我们使用 4 × 4 的 patch 大小因此每个 patch 的特征维度为 4 × 4 × 3 48。线性嵌入层应用于此原始值特征将其投影到任意维度记为C) SwinTransformer结构 在这些补丁token上应用了几个经过修改的自注意力计算的 Transformer 块Swin Transformer 块。 Transformer 块维护tokens数量 ( H/4 ×W/4 )与线性嵌入一起被称为“阶段 1” 两个连续的 Swin 变压器块 Swin Transformer 是通过将 Transformer 块中的标准多头自注意力MSA模块替换为基于移位窗口的模块而构建的其他层保持不变。如上图所示Swin Transformer 模块由基于移位窗口的 MSA 模块组成后跟中间带有 GELU 非线性的 2 层 MLP。在每个 MSA 模块和每个 MLP 之前应用 LayerNorm (LN) 层并在每个模块之后应用残差连接。 代码实现 class WindowAttention(nn.Module):def init(self, dim, window_size, num_heads, qkv_biasTrue, qk_scaleNone, attn_drop0., proj_drop0.):super().init()self.dim dimself.window_size window_size # Wh, Wwself.num_heads num_headshead_dim dim // num_headsself.scale qk_scale or head_dim ** -0.5# define a parameter table of relative position biasself.relative_position_bias_table nn.Parameter(torch.zeros((2 * window_size[0] - 1) * (2 * window_size[1] - 1), num_heads)) # 2*Wh-1 * 2*Ww-1, nH# get pair-wise relative position index for each token inside the windowcoords_h torch.arange(self.window_size[0])coords_w torch.arange(self.window_size[1])coords torch.stack(torch.meshgrid([coords_h, coords_w])) # 2, Wh, Wwcoords_flatten torch.flatten(coords, 1) # 2, Wh*Wwrelative_coords coords_flatten[:, :, None] - coords_flatten[:, None, :] # 2, Wh*Ww, Wh*Wwrelative_coords relative_coords.permute(1, 2, 0).contiguous() # Wh*Ww, Wh*Ww, 2relative_coords[:, :, 0] self.window_size[0] - 1 # shift to start from 0relative_coords[:, :, 1] self.window_size[1] - 1relative_coords[:, :, 0] * 2 * self.window_size[1] - 1relative_position_index relative_coords.sum(-1) # Wh*Ww, Wh*Wwself.register_buffer(relative_position_index, relative_position_index)self.qkv nn.Linear(dim, dim * 3, biasqkv_bias)self.attn_drop nn.Dropout(attn_drop)self.proj nn.Linear(dim, dim)self.proj_drop nn.Dropout(projdrop)nn.init.normal(self.relative_position_biastable, std.02)self.softmax nn.Softmax(dim-1)def forward(self, x, maskNone):B, N, C x.shapeqkv self.qkv(x).reshape(B_, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)q, k, v qkv[0], qkv[1], qkv[2] # make torchscript happy (cannot use tensor as tuple)q q * self.scaleattn (q k.transpose(-2, -1))relative_position_bias self.relative_position_bias_table[self.relative_position_index.view(-1)].view(self.window_size[0] * self.window_size[1], self.window_size[0] * self.window_size[1], -1) # Wh*Ww,Wh*Ww,nHrelative_position_bias relative_position_bias.permute(2, 0, 1).contiguous() # nH, Wh*Ww, Wh*Wwattn attn relative_positionbias.unsqueeze(0)if mask is not None:nW mask.shape[0]attn attn.view(B // nW, nW, self.num_heads, N, N) mask.unsqueeze(1).unsqueeze(0)attn attn.view(-1, self.num_heads, N, N)attn self.softmax(attn)else:attn self.softmax(attn)attn self.attndrop(attn)# print(attn.dtype, v.dtype)try:x (attn v).transpose(1, 2).reshape(B, N, C)except:# print(attn.dtype, v.dtype)x (attn.half() v).transpose(1, 2).reshape(B_, N, C)x self.proj(x)x self.proj_drop(x)return xclass SwinTransformer(nn.Module):# CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworksdef init(self, c1, c2, n1, shortcutTrue, g1, e0.5): # ch_in, chout, number, shortcut, groups, expansionsuper(SwinTransformer, self).init()c int(c2 * e) # hidden channelsself.cv1 Conv(c1, c, 1, 1)self.cv2 Conv(c1, c, 1, 1)self.cv3 Conv(2 * c_, c2, 1, 1)numheads c // 32self.m SwinTransformerBlock(c, c, numheads, n)# self.m nn.Sequential(*[Bottleneck(c, c_, shortcut, g, e1.0) for _ in range(n)])def forward(self, x):y1 self.m(self.cv1(x))y2 self.cv2(x)return self.cv3(torch.cat((y1, y2), dim1))class SwinTransformerB(nn.Module):# CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworksdef init(self, c1, c2, n1, shortcutFalse, g1, e0.5): # ch_in, ch_out, number, shortcut, groups, expansionsuper(Swin_TransformerB, self).init()c int(c2) # hidden channelsself.cv1 Conv(c1, c, 1, 1)self.cv2 Conv(c, c, 1, 1)self.cv3 Conv(2 * c, c2, 1, 1)numheads c // 32self.m SwinTransformerBlock(c, c, numheads, n)# self.m nn.Sequential(*[Bottleneck(c, c_, shortcut, g, e1.0) for _ in range(n)])def forward(self, x):x1 self.cv1(x)y1 self.m(x1)y2 self.cv2(x1)return self.cv3(torch.cat((y1, y2), dim1))class SwinTransformerC(nn.Module):# CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworksdef init(self, c1, c2, n1, shortcutTrue, g1, e0.5): # ch_in, ch_out, number, shortcut, groups, expansionsuper(Swin_TransformerC, self).init()c int(c2 * e) # hidden channelsself.cv1 Conv(c1, c, 1, 1)self.cv2 Conv(c1, c, 1, 1)self.cv3 Conv(c, c, 1, 1)self.cv4 Conv(2 * c_, c2, 1, 1)numheads c // 32self.m SwinTransformerBlock(c, c, numheads, n)# self.m nn.Sequential(*[Bottleneck(c, c_, shortcut, g, e1.0) for _ in range(n)])def forward(self, x):y1 self.cv3(self.m(self.cv1(x)))y2 self.cv2(x)return self.cv4(torch.cat((y1, y2), dim1))class Mlp(nn.Module):def init(self, in_features, hidden_featuresNone, out_featuresNone, act_layernn.SiLU, drop0.):super().init()out_features out_features or in_featureshidden_features hidden_features or in_featuresself.fc1 nn.Linear(in_features, hidden_features)self.act act_layer()self.fc2 nn.Linear(hidden_features, out_features)self.drop nn.Dropout(drop)def forward(self, x):x self.fc1(x)x self.act(x)x self.drop(x)x self.fc2(x)x self.drop(x)return xdef window_partition(x, window_size):B, H, W, C x.shapeassert H % window_size 0, feature map h and w can not divide by window sizex x.view(B, H // window_size, window_size, W // window_size, window_size, C)windows x.permute(0, 1, 3, 2, 4, 5).contiguous().view(-1, window_size, window_size, C)return windowsdef window_reverse(windows, window_size, H, W):B int(windows.shape[0] / (H * W / window_size / window_size))x windows.view(B, H // window_size, W // window_size, window_size, window_size, -1)x x.permute(0, 1, 3, 2, 4, 5).contiguous().view(B, H, W, -1)return xclass SwinTransformerLayer(nn.Module):def init(self, dim, num_heads, window_size8, shift_size0,mlp_ratio4., qkv_biasTrue, qk_scaleNone, drop0., attn_drop0., drop_path0.,act_layernn.SiLU, norm_layernn.LayerNorm):super().init()self.dim dimself.num_heads num_headsself.window_size window_sizeself.shift_size shift_sizeself.mlp_ratio mlp_ratio# if min(self.input_resolution) self.window_size:# # if window size is larger than input resolution, we dont partition windows# self.shift_size 0# self.window_size min(self.input_resolution)assert 0 self.shift_size self.window_size, shift_size must in 0-window_sizeself.norm1 norm_layer(dim)self.attn WindowAttention(dim, window_size(self.window_size, self.window_size), num_headsnum_heads,qkv_biasqkv_bias, qk_scaleqk_scale, attn_dropattn_drop, proj_dropdrop)self.drop_path DropPath(drop_path) if drop_path 0. else nn.Identity()self.norm2 norm_layer(dim)mlp_hidden_dim int(dim * mlp_ratio)self.mlp Mlp(in_featuresdim, hidden_featuresmlp_hidden_dim, act_layeract_layer, dropdrop)def create_mask(self, H, W):# calculate attention mask for SW-MSAimg_mask torch.zeros((1, H, W, 1)) # 1 H W 1h_slices (slice(0, -self.window_size),slice(-self.window_size, -self.shift_size),slice(-self.shift_size, None))w_slices (slice(0, -self.window_size),slice(-self.window_size, -self.shift_size),slice(-self.shift_size, None))cnt 0for h in h_slices:for w in w_slices:img_mask[:, h, w, :] cntcnt 1mask_windows window_partition(img_mask, self.window_size) # nW, window_size, window_size, 1mask_windows mask_windows.view(-1, self.window_size * self.window_size)attn_mask mask_windows.unsqueeze(1) - mask_windows.unsqueeze(2)attn_mask attn_mask.masked_fill(attn_mask ! 0, float(-100.0)).masked_fill(attn_mask 0, float(0.0))return attnmaskdef forward(self, x):# reshape x[b c h w] to x[b l c], , H, W_ x.shapePadding Falseif min(H, W) self.windowsize or H % self.windowsize ! 0 or W % self.window_size ! 0:Padding True# print(fimgsize {min(H, W_)} is less than (or not divided by) window_size {self.window_size}, Padding.)pad_r (self.windowsize - W % self.window_size) % self.window_sizepad_b (self.windowsize - H % self.window_size) % self.window_sizex F.pad(x, (0, pad_r, 0, pad_b))# print(2, x.shape)B, C, H, W x.shapeL H * Wx x.permute(0, 2, 3, 1).contiguous().view(B, L, C) # b, L, c# create mask from init to forwardif self.shift_size 0:attn_mask self.create_mask(H, W).to(x.device)else:attn_mask Noneshortcut xx self.norm1(x)x x.view(B, H, W, C)# cyclic shiftif self.shift_size 0:shifted_x torch.roll(x, shifts(-self.shift_size, -self.shift_size), dims(1, 2))else:shifted_x x# partition windowsx_windows window_partition(shifted_x, self.window_size) # nW*B, window_size, window_size, Cx_windows x_windows.view(-1, self.window_size * self.window_size, C) # nW*B, window_size*window_size, C# W-MSA/SW-MSAattn_windows self.attn(x_windows, maskattn_mask) # nW*B, window_size*window_size, C# merge windowsattn_windows attn_windows.view(-1, self.window_size, self.window_size, C)shifted_x window_reverse(attn_windows, self.window_size, H, W) # B H W C# reverse cyclic shiftif self.shift_size 0:x torch.roll(shifted_x, shifts(self.shift_size, self.shift_size), dims(1, 2))else:x shifted_xx x.view(B, H * W, C)# FFNx shortcut self.drop_path(x)x x self.droppath(self.mlp(self.norm2(x)))x x.permute(0, 2, 1).contiguous().view(-1, C, H, W) # b c h wif Padding:x x[:, :, :H, :W_] # reverse paddingreturn xclass SwinTransformerBlock(nn.Module):def init(self, c1, c2, num_heads, num_layers, window_size8):super().init()self.conv Noneif c1 ! c2:self.conv Conv(c1, c2)# remove input_resolutionself.blocks nn.Sequential(*[SwinTransformerLayer(dimc2, num_headsnum_heads, window_sizewindow_size,shift_size0 if (i % 2 0) else window_size // 2) for i inrange(num_layers)])def forward(self, x):if self.conv is not None:x self.conv(x)x self.blocks(x)return x YOLOv8详细添加步骤 1. 复制以上代码在 ultralytics/nn/modules/conv.py 添加 2. 在ultralytics/nn/modules/init.py 注册SwinTransformer 3. 在ultralytics/nn/task.py 注册SwinTransformer两处注册 4. 成功添加SwinTransformer ymal文件内容 one_swinTrans

Ultralytics YOLO , AGPL-3.0 license

YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect# Parameters

nc: 6 # number of classes scales: # model compound scaling constants, i.e. modelyolov8n.yaml will call yolov8.yaml with scale n# [depth, width, max_channels]n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPss: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPsm: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPsl: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPsx: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs# YOLOv8.0n backbone backbone:# [from, repeats, module, args]- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4- [-1, 3, C2f, [128, True]]- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8- [-1, 6, SwinTransformer, [256, True]]- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16- [-1, 6, C2f, [512, True]]- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32- [-1, 3, C2f, [1024, True]]- [-1, 1, SPPF, [1024, 5]] # 9# YOLOv8.0n head head:- [-1, 1, nn.Upsample, [None, 2, nearest]]- [[-1, 6], 1, Concat, [1]] # cat backbone P4- [-1, 3, C2f, [512]] # 12- [-1, 1, nn.Upsample, [None, 2, nearest]]- [[-1, 4], 1, Concat, [1]] # cat backbone P3- [-1, 3, C2f, [256]] # 15 (P3/8-small)- [-1, 1, Conv, [256, 3, 2]]- [[-1, 12], 1, Concat, [1]] # cat head P4- [-1, 3, C2f, [512]] # 18 (P4/16-medium)- [-1, 1, Conv, [512, 3, 2]]- [[-1, 9], 1, Concat, [1]] # cat head P5- [-1, 3, C2f, [1024]] # 21 (P5/32-large)- [[15, 18, 21], 1, Detect, [nc]] # Detect(P3, P4, P5) three_swinTrans

Ultralytics YOLO , AGPL-3.0 license

YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect# Parameters

nc: 6 # number of classes scales: # model compound scaling constants, i.e. modelyolov8n.yaml will call yolov8.yaml with scale n# [depth, width, max_channels]n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPss: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPsm: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPsl: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPsx: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs# YOLOv8.0n backbone backbone:# [from, repeats, module, args]- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4- [-1, 3, C2f, [128, True]]- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8- [-1, 6, SwinTransformer, [256, True]]- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16- [-1, 6, SwinTransformer, [512, True]]- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32- [-1, 3, SwinTransformer, [1024, True]]- [-1, 1, SPPF, [1024, 5]] # 9# YOLOv8.0n head head:- [-1, 1, nn.Upsample, [None, 2, nearest]]- [[-1, 6], 1, Concat, [1]] # cat backbone P4- [-1, 3, C2f, [512]] # 12- [-1, 1, nn.Upsample, [None, 2, nearest]]- [[-1, 4], 1, Concat, [1]] # cat backbone P3- [-1, 3, C2f, [256]] # 15 (P3/8-small)- [-1, 1, Conv, [256, 3, 2]]- [[-1, 12], 1, Concat, [1]] # cat head P4- [-1, 3, C2f, [512]] # 18 (P4/16-medium)- [-1, 1, Conv, [512, 3, 2]]- [[-1, 9], 1, Concat, [1]] # cat head P5- [-1, 3, C2f, [1024]] # 21 (P5/32-large)- [[15, 18, 21], 1, Detect, [nc]] # Detect(P3, P4, P5)启动命令 from ultralytics import YOLO# Load a model

model YOLO(yolov8s.yaml) # build a new model from YAML

model YOLO(/ultralytics/cfg/models/v8/yolov8_swinTrans.yaml) # load a pretrained model (recommended for training)

model YOLO(yolov8s.yaml).load(yolov8s.pt) # build from YAML and transfer weights# Train the model

if name main:model.train( )完整代码分享 https://download.csdn.net/download/m0_6764732188890624