EVO ONO 模型加速#

介绍#

FastDeploy是一款全场景、易用灵活、极致高效的AI推理部署工具, 支持云边端部署。提供超过 160+ Text,Vision, Speech和跨模态模型开箱即用的部署体验,并实现端到端的推理性能优化。包括 物体检测、字符识别(OCR)、人脸、人像扣图、多目标跟踪系统、NLP、Stable Diffusion文图生成、TTS 等几十种任务场景,满足开发者多场景、多硬件、多平台的产业部署需求。

部署#

本文档使用NVIDIA NX作为测试平台,其他Jetson平台类似。

部署之前,设备需要提前安装tensorRT及cuda,安装方式请参考NVIDIA官方文档。

注意:以下操作都是在设备端进行

下载源码#

本文使用的commit版本:9689bf5fce94cba1aa732a30793916cde43b4ba1

https://github.com/PaddlePaddle/FastDeploy.git

配置&编译#

根据Jetpack版本,下载paddle后端推理,并解压到FastDeploy同级目录

https://www.paddlepaddle.org.cn/inference/v2.4/guides/install/download_lib.html#c

C++部署#

cd FastDeploy
mkdir build && cd build 

cmake .. -DBUILD_ON_JETSON=ON \ 
         -DENABLE_TRT_BACKEND=ON \
         -DENABLE_VISION=ON \ 
         -DENABLE_TEXT=ON \
         -DENABLE_PADDLE_BACKEND=ON -DPADDLEINFERENCE_DIRECTORY=${PWD}/../paddle_inference_install_dir \
         -DCMAKE_INSTALL_PREFIX=${PWD}/installed_fastdeploy
#开始编译
make -j4
#安装
make install

python部署#

cd FastDeploy/python
export BUILD_ON_JETSON=ON
export ENABLE_VISION=ON

# ENABLE_PADDLE_BACKEND & PADDLEINFERENCE_DIRECTORY为可选项
export ENABLE_PADDLE_BACKEND=ON
export PADDLEINFERENCE_DIRECTORY=/Download/paddle_inference_jetson

python setup.py build
python setup.py bdist_wheel
pip3 install fastdeploy_gpu_python-0.0.0-cp36-cp36m-linux_aarch64.whl

推理模型测试#

准备#

cd installed_fastdeploy && source fastdeploy_init.sh

YOLOV5#

模型和图片可以根据README里的说明下载

注意:这里使用的是paddle的模型,onnx模型有问题

python#

# GPU上使用TensorRT推理
python infer.py --model yolov5s_infer --image 000000014439.jpg --device gpu --use_trt True

c++#

编译

cd examples/vision/detection/yolov5/cpp/
#编译测试代码
mkdir build && cd build

cmake .. -DFASTDEPLOY_INSTALL_DIR=../../../../../
make

运行

./infer_demo yolov5s_infer 000000014439.jpg 3

结果如下:#

DetectionResult: [xmin, ymin, xmax, ymax, score, label_id]
104.669334,46.087219, 127.901680, 94.499176, 0.856947, 0
157.721054,81.559006, 198.065811, 167.143372, 0.854261, 0
378.751434,40.270676, 395.942841, 83.398239, 0.830164, 0
267.751617,82.657181, 298.819305, 170.920868, 0.828556, 0
503.086304,112.830048, 592.226196, 276.320038, 0.783002, 0
362.641418,57.056015, 382.677429, 114.379959, 0.777170, 0
582.803772,112.910446, 613.015442, 201.026489, 0.766368, 0
327.868683,38.758179, 346.338654, 79.359100, 0.758027, 0
414.737396,89.913757, 504.320709, 285.370178, 0.717283, 0
186.462357,45.325516, 199.852371, 61.371841, 0.545427, 0
2.531250,151.546875, 38.812500, 173.625000, 0.537435, 24
351.980316,44.242645, 367.510651, 95.554779, 0.515579, 0
168.954941,47.224487, 178.287460, 60.941376, 0.496691, 0
163.140625,86.109375, 403.484375, 342.812500, 0.449054, 33
58.218750,153.437500, 102.375000, 174.234375, 0.394597, 24
71.406250,122.562500, 101.718750, 155.343750, 0.358583, 56
24.718750,117.343750, 59.531250, 152.937500, 0.317215, 24
65.265625,134.703125, 87.390625, 153.843750, 0.297299, 24
3.765625,134.781250, 41.828125, 153.421875, 0.269623, 24
465.328796,14.773834, 472.708252, 34.129822, 0.265332, 0

YOLOV7#

模型和图片可以根据README里的说明下载

python#

编译

python infer.py --model yolov7.onnx --image 000000014439.jpg --device gpu --use_trt True

c++ 编译

cd examples/vision/detection/yolov7/cpp/
#编译测试代码
mkdir build && cd build

cmake .. -DFASTDEPLOY_INSTALL_DIR=../../../../../
make

运行

./infer_demo yolov7.onnx 000000014439.jpg 2
#使用paddle模型
./infer_demo yolov7_infer 000000014439.jpg 3

结果如下:

DetectionResult: [xmin, ymin, xmax, ymax, score, label_id]
267.634705,88.168289, 298.606628, 169.180908, 0.894812, 0
414.397614,87.049408, 505.852631, 285.574371, 0.892970, 0
504.979309,112.990097, 594.087708, 271.906555, 0.887881, 0
103.902000,45.589203, 127.782860, 93.685974, 0.885139, 0
349.076599,43.947617, 366.672729, 97.737900, 0.860225, 0
164.047516,81.687592, 198.846161, 165.900269, 0.856663, 0
363.723083,58.837402, 381.935852, 114.391418, 0.852691, 0
327.998444,38.783173, 347.394501, 80.100067, 0.850448, 0
379.129486,39.812363, 395.328766, 84.123154, 0.831067, 0
162.015625,81.828125, 609.968750, 342.750000, 0.823380, 33
581.765564,113.130280, 612.558289, 193.391388, 0.762796, 0
26.318138,117.764587, 64.740952, 153.909332, 0.751199, 0
2.953125,150.718750, 38.109375, 172.796875, 0.629195, 24
75.281250,121.968750, 106.593750, 156.250000, 0.617437, 56
169.509720,47.386780, 178.708725, 61.201477, 0.532830, 0
64.968750,135.203125, 84.312500, 154.921875, 0.516204, 24
187.483490,44.773804, 199.905548, 61.227783, 0.512023, 0
100.765625,152.078125, 119.953125, 168.484375, 0.425604, 24
0.039804,125.777222, 8.523237, 171.827026, 0.398506, 0
279.960571,81.092270, 296.699524, 110.481308, 0.319030, 0
464.447540,15.560196, 471.513275, 33.871872, 0.282174, 0
396.453125,168.343750, 617.531250, 202.843750, 0.260488, 33

resnet#

编译:

cd examples/vision/classification/resnet/cpp
 #编译测试代码
mkdir build && cd build
cmake .. -DFASTDEPLOY_INSTALL_DIR=../../../../../
make

运行测试:

#使用onnx模型
./infer_demo resnet50.onnx ILSVRC2012_val_00000010.jpeg 2

结果如下:

ClassifyResult( label_ids: 332,  scores: 0.825349,  )

优化建议#

初次运行demo加载时间比较长,可以加入缓存文件,减少加载时间:

int main(int argc, char* argv[]) {
  google::ParseCommandLineFlags(&argc, &argv, true);
  auto option = fastdeploy::RuntimeOption();
  //修改option,减少加载时间
  option.trt_option.serialize_file = "./picodet.trt";
  if (!CreateRuntimeOption(&option)) {
    PrintUsage();
    return -1;
  }

  auto model = fastdeploy::vision::headpose::FSANet(FLAGS_model, "", option);
  if (!model.Initialized()) {
    std::cerr << "Failed to initialize." << std::endl;
    return -1;
  }