一、模型输出解析:
设输出图片大小为1280,768,类别个数为2,则yolov5输出的三种特征图,其维度分别为:[1,3,96,160,7],[1,3,48,80,7],[1,3,24,40,7];相当于yolov5模型总共输出(96*160+48*80+24*40)*3=60480个目标框;
其中,[1,3,96,160,7] 中1指代输入图像个数为1,3指的是该尺度下的3种anchor,(96,160) 指的是特征图的尺寸,7具体指的是:(center_x,center_y, width, height, obj_conf, class_1_prob, class_2_prob ),即分别为box框中心点x,y,长和宽 width,height,以及该框存在目标的置信度obj_conf,类别1和类别2 的置信度,若class_1_prob > class_2_prob,则该框的类别为class1;因此,obj_conf和class_1_prob一个指得是该框存在目标的概率,一个指是该框分类为类别1的概率;
二、yolov5后处理解析;
从一可知模型输出了60480个目标框,因此,要经过NMS进行过滤,进NMS之前需要经过初筛(即将obj_conf小于我们设置的置信度的框去除),再计算每个box框的综合置信度conf:conf = obj_conf * max(class_1_prob ,class_2_prob),此时的conf是综合了obj_conf以及class_prob的综合概率;再经过进一步的过滤(即将conf小于我们设置的置信度的框去除),最后,将剩余的框通过NMS算法,得出最终的框;(NMS中用到了我们设置的iou_thres);
因此,最终我们可视化在box上方的置信度是综合了obj_conf以及class_prob的综合概率;
以下是yolov5中NMS源码,可查看细节:
def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False, labels=(), max_det=300): """Non-Maximum Suppression (NMS) on inference results to reject overlapping bounding boxes Returns: list of detections, on (n,6) tensor per image [xyxy, conf, cls] """ bs = prediction.shape[0] # batch size nc = prediction.shape[2] - 5 # number of classes xc = prediction[..., 4] > conf_thres # candidates # Checks assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0' assert 0 <= iou_thres 1 # multiple labels per box (adds 0.5ms/img) merge = False # use merge-NMS t = time.time() output = [torch.zeros((0, 6), device=prediction.device)] * bs for xi, x in enumerate(prediction): # image index, image inference # Apply constraints # x[((x[..., 2:4] max_wh)).any(1), 4] = 0 # width-height x = x[xc[xi]] # confidence # Cat apriori labels if autolabelling if labels and len(labels[xi]): lb = labels[xi] v = torch.zeros((len(lb), nc + 5), device=x.device) v[:, :4] = lb[:, 1:5] # box v[:, 4] = 1.0 # conf v[range(len(lb)), lb[:, 0].long() + 5] = 1.0 # cls x = torch.cat((x, v), 0) # If none remain process next image if not x.shape[0]: continue # Compute conf x[:, 5:] *= x[:, 4:5] # conf = obj_conf * cls_conf # Box (center x, center y, width, height) to (x1, y1, x2, y2) box = xywh2xyxy(x[:, :4]) # Detections matrix nx6 (xyxy, conf, cls) if multi_label: i, j = (x[:, 5:] > conf_thres).nonzero(as_tuple=False).T x = torch.cat((box[i], x[i, j + 5, None], j[:, None].float()), 1) else: # best class only conf, j = x[:, 5:].max(1, keepdim=True) x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres] # Filter by class if classes is not None: x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)] # Apply finite constraint # if not torch.isfinite(x).all(): # x = x[torch.isfinite(x).all(1)] # Check shape n = x.shape[0] # number of boxes if not n: # no boxes continue elif n > max_nms: # excess boxes x = x[x[:, 4].argsort(descending=True)[:max_nms]] # sort by confidence # Batched NMS c = x[:, 5:6] * (0 if agnostic else max_wh) # classes boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS if i.shape[0] > max_det: # limit detections i = i[:max_det] if merge and (1 < n iou_thres # iou matrix weights = iou * scores[None] # box weights x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True) # merged boxes if redundant: i = i[iou.sum(1) > 1] # require redundancy output[xi] = x[i] if (time.time() - t) > time_limit: LOGGER.warning(f'WARNING: NMS time limit {time_limit:.3f}s exceeded') break # time limit exceeded return output