# 目标检测

# 简介

目标检测是计算机视觉中的一项关键任务，主要用于识别图像或视频中的目标物体，并标出其类别和位置。常见算法包括 YOLO、SSD 和 Faster R-CNN，广泛应用于自动驾驶、安防监控、工业检测等场景。通过深度学习，目标检测已实现高精度和实时性，成为智能视觉系统的重要基础。

SmartJavaAI 已全面集成 YOLOv12 及基于 YOLO 架构的自训练模型，文末提供了使用自定义数据集进行目标检测训练的完整教程链接。

# 安装

# Maven

在项目的 pom.xml 中添加以下依赖，详细引入方式参考 Maven 引入。

如需引入全部功能，请使用 【不推荐 ❌】 all 模块。

<dependency>
    <groupId>cn.smartjavaai</groupId>
    <artifactId>vision</artifactId>
    <version>1.0.27</version>
</dependency>

# 获取目标检测模型

DetectorModelConfig config = new DetectorModelConfig();
config.setModelEnum(DetectorModelEnum.YOLOV12_OFFICIAL_ONNX);
config.setModelPath("/Users/xxx/Documents/develop/model/vision/object/yolov12/yolov12n.onnx");
DetectorModel detectorModel = ObjectDetectionModelFactory.getInstance().getModel(config);

# DetectorModelConfig参数说明

字段名称	字段类型	默认值	说明
modelEnum	DetectorModelEnum	YOLO11N	目标检测模型枚举
modelPath	String		模型路径
allowedClasses	`List<String>`		允许的分类列表
topK	int	不限制	检测结果数量
threshold	double	0.5	置信度阈值，分数低于这个值的结果将被过滤掉。值越高越严格，越低越宽松
maxBox	int	8400	默认为8400，应设置0到8400之间的整数，用于性能优化的关键参数，它通过限制模型后处理阶段需要处理的候选框（bounding boxes）数量来提高推理速度，建议不低于1000
device	DeviceEnum	CPU	指定运行设备，支持 CPU/GPU
gpuId	int	0	gpu设备ID 当device为GPU时生效
predictorPoolSize	int	默认为cpu核心数	模型预测器线程池大小
customParams	Map<String, Object>	无	个性化配置（按模型类型动态解析）

customParams中包含的个性化参数包含：

字段名称	字段类型	默认值	说明
width	int	640	图片预处理resize宽
height	int	640	图片预处理resize高
nmsThreshold	float	0.5f	非极大值抑制（NMS）的阈值，用于控制重叠框的去除程度

# 支持目标检测模型

YOLO 系列

模型名称	引擎	模型简介	模型开源网站
YOLOV12	OnnxRuntime	最流行的目标检测模型	Github (opens new window)
YOLOV11	OnnxRuntime	最流行的目标检测模型	Github (opens new window)
YOLOV8	OnnxRuntime	最流行的目标检测模型	Github (opens new window)

Tensorflow系列

仅测试了以下Tensorflow目标检测模型

模型名称	引擎	模型简介	模型开源网站
EfficientDet	Tensorflow	Tensorflow目标检测	Github (opens new window)
SSD MobileNet V2	Tensorflow	Tensorflow目标检测	Github (opens new window)
Faster RCNN Inception Resnet V2	Tensorflow	Tensorflow目标检测	Github (opens new window)

更多Tensorflow目标检测模型可参考Tensorflow目标检测模型 (opens new window)

SSD 系列

模型名称	引擎	骨干网络	输入尺寸	训练数据集	精度（mAP）	推理速度	适用场景
SSD_300_RESNET5	PyTorch	ResNet‑50	300×300	COCO	中等	快	精度需求一般
SSD_512_RESNET50_V1_VOC	PyTorch	ResNet‑50	512×512	Pascal VOC	稍高	中等	精度优先、可接受略低速度的场景
SSD_512_VGG16_ATROUS_COCO	MXNet	VGG‑16	512×512	COCO	较高	中等	通用场景；对小目标有一定提升
SSD_300_VGG16_ATROUS_VOC	MXNet	VGG‑16	300×300	Pascal VOC	中等偏上	快	VOC 数据集同类任务；资源受限时使用
SSD_512_MOBILENET1_VOC	MXNet	MobileNet‑1.0	512×512	Pascal VOC	中等	快	嵌入式/移动端设备；算力和内存都很有限

# DetectorModel API 方法说明

# 目标检测

//推荐使用
DetectionResponse detect(Image image);
//以下接口将在后续版本中移除
DetectionResponse detect(String imagePath);
DetectionResponse detect(BufferedImage image);
DetectionResponse detect(byte[] imageData);

Image类说明：

所有接口中我们使用了一个Image类，用来表示图像，具体使用请查看Image类

# DetectionResponse字段说明

返回并非json格式，仅用于字段讲解

{
  "detectionInfoList": [ //检测信息列表
    {
      "detectionRectangle": { //矩形框
        "height": 174, // 矩形高度
        "width": 147, // 矩形宽度
        "x": 275, // 左上角横坐标
        "y": 143 // 左上角纵坐标
      },
      "objectDetInfo": { //目标检测信息
        "className": "person" //类别
      },
      "score": 0.8118719 //检测结果置信度分数
    }
  ]
}

# 检测并绘制结果

void detectAndDraw(String imagePath, String outputPath);
Image detectAndDraw(Image image);

注意事项：

1、模型目录下必须包含 synset.txt（类别文件），否则程序无法运行。网盘中已包含此文件。

# 使用自己训练的模型检测

调用接口前需要准备好两个文件：

（1）onnx格式的模型文件

（2）synset.txt类别文件（放入和模型相同目录）

DetectorModelConfig config = new DetectorModelConfig();
config.setModelEnum(DetectorModelEnum.YOLOV12_CUSTOM);//自定义YOLOV12模型
// 指定模型路径，需要更改为自己的模型路径
config.setModelPath("/Users/xxx/Documents/develop/fire_model/best.onnx");
DetectorModel detectorModel = ObjectDetectionModelFactory.getInstance().getModel(config);
//指定图片路径，需要更改为自己的图片路径
detectorModel.detectAndDraw("/Users/xxx/Downloads/test.jpg","output/test_detected.jpg");

示例 synset.txt 文件内容：

每一行代表一个类别，此文件必须和训练时候的类型一致

fire
smoke

注意事项：

1、训练方法详见模型训练教程。

2、支持与其他检测方法通用调用，示例仅使用 detectAndDraw 方法。

自定义数据集训练自己的目标检测模型教程

# 视频流目标检测

StreamDetector 封装了视频流/摄像头/视频文件的实时目标检测逻辑，支持 目标检测模型 的接入，帮助开发者快速实现视频目标检测功能。

# 📦 功能特性

支持多种视频源类型：
- 视频流：RTSP、RTMP、HTTP 等
- 摄像头：本地摄像头
- 视频文件：mp4、avi 等格式
可配置帧间隔检测，减轻计算负担
支持重复目标检测间隔设置，避免频繁回调
提供回调接口，实时获取检测结果
支持断流检测与结束通知

# 快速开始

# 1. 创建 `StreamDetector`

StreamDetector detector = new StreamDetector.Builder()
        // 视频源类型：支持 STREAM（视频流）、CAMERA（摄像头）、FILE（视频文件）
        .sourceType(VideoSourceType.STREAM)
        // 视频流地址，支持 rtsp、rtmp、http
        .streamUrl("rtsp://username:password@ip:port/Streaming/Channels/101")
        // 每隔多少帧进行一次检测（需根据模型推理速度调整）
        .frameDetectionInterval(10)
        // 指定目标检测模型
        .detectorModel(getModel())
        // 回调监听器：检测到目标时触发
        .listener(new StreamDetectionListener() {
            @Override
            public void onObjectDetected(List<DetectionInfo> detectionInfoList, Image image) {
                // 建议将耗时操作放到新线程执行
                log.info("检测到目标数量: {}", detectionInfoList.size());
            }

            @Override
            public void onStreamEnded() {
                log.info("视频流检测结束");
            }

            @Override
            public void onStreamDisconnected() {
                log.info("视频流断开连接");
            }
        })
        // 重复检测同一目标的间隔（秒）
        .repeatGap(5)
        .build();

# 2. 启动检测

detector.startDetection();

# 3. 停止检测

detector.stopDetection();

# 4. 释放资源

StreamDetector 实现了 AutoCloseable，推荐在 try-with-resources 或应用关闭时释放资源：

detector.close();

# ⚙️ 常用配置说明

参数	默认值	说明
`sourceType`	STREAM	视频源类型：STREAM（流）、CAMERA（摄像头）、FILE（视频文件）
`streamUrl`	-	视频流/文件路径，仅在 `STREAM` 或 `FILE` 模式下必填
`cameraIndex`	0	摄像头索引，仅在 `CAMERA` 模式下生效
`frameDetectionInterval`	1	每隔多少帧检测一次，建议根据模型推理性能调整
`repeatGap`	5 秒	同一目标再次触发回调的最小间隔，避免重复回调
`listener`	-	检测回调接口，必须实现
`detectorModel`	-	检测模型，必须提供

# 回调接口说明

public interface StreamDetectionListener {
    /**
     * 检测到目标时触发
     * @param detectionInfoList 目标检测信息
     * @param image 当前帧图像
     */
    void onObjectDetected(List<DetectionInfo> detectionInfoList, Image image);

    /** 视频流正常结束 */
    void onStreamEnded();

    /** 视频流断开连接（异常） */
    void onStreamDisconnected();
}

# 💡 使用建议

耗时操作放异步线程：回调方法中避免直接执行耗时逻辑，可提交到线程池处理。
合理设置帧间隔：
- 帧间隔需结合视频帧率和模型推理速度设置。
- 例如：视频帧率为 30 FPS，模型单次推理耗时约 300ms（≈1 秒最多处理 3 帧），则应设置 frameDetectionInterval = 10，这样每秒检测 3 帧，既能覆盖全部画面，又能避免丢帧。
- 公式化表达（frameDetectionInterval ≈ 帧率 ÷ (1000 / 模型耗时ms)）
断流检测：内部已实现空帧计数机制，超过阈值会回调 onStreamDisconnected。
重复检测过滤：通过 repeatGap 避免同一目标过于频繁触发。
视频源检测结束后的处理方式：
- 视频文件（FILE）：检测完成后，可直接调用 startNextVideo(videoPath) 方法继续检测下一个视频文件。
- 视频流（STREAM）/ 摄像头（CAMERA）：如果发生断开连接，需要先调用 close() 释放资源，再重新创建一个新的 StreamDetector 实例进行重连。

# 完整示例代码

示例代码 (opens new window)

# 离线使用

离线使用请看文档

← 人脸识别目标检测模型训练 →

# 目标检测

# 简介

# 安装

# Maven

# 获取目标检测模型

# DetectorModelConfig参数说明

# 支持目标检测模型

# DetectorModel API 方法说明

# 目标检测

# DetectionResponse字段说明

# 检测并绘制结果

# 使用自己训练的模型检测

# 视频流目标检测

# 📦 功能特性

# 快速开始

# 1. 创建 StreamDetector

# 2. 启动检测

# 3. 停止检测

# 4. 释放资源

# ⚙️ 常用配置说明

# 回调接口说明

# 💡 使用建议

# 完整示例代码

# 离线使用

# 1. 创建 `StreamDetector`