性能数据

可以参考benchmark_tools,推荐一键benchmark

ARM测试环境

  • 测试模型
    • fp32模型
      • mobilenet_v1
      • mobilenet_v2
      • squeezenet_v1.1
      • mnasnet
      • shufflenet_v2
    • int8模型
      • mobilenet_v1
      • mobilenet_v2
  • 测试机器(android ndk ndk-r17c)
    • 骁龙855
      • xiaomi mi9, snapdragon 855 (enable sdot instruction)
      • 4xA76(1@2.84GHz + 3@2.4GHz) + 4xA55@1.78GHz
    • 骁龙845
      • xiaomi mi8, 845
      • 2.8GHz(大四核),1.7GHz(小四核)
    • 骁龙835
      • xiaomi mix2, snapdragon 835
      • 2.45GHz(大四核),1.9GHz(小四核)
    • 麒麟970
      • HUAWEI Mate10
  • 测试说明
    • branch: release/v2.6.0
    • warmup=10, repeats=30,统计平均时间,单位是ms
    • 当线程数为1时,DeviceInfo::Global().SetRunMode设置LITE_POWER_HIGH,否者设置LITE_POWER_NO_BIND
    • 模型的输入图像的维度是{1, 3, 224, 224},输入图像的每一位数值是1

ARM测试数据

fp32模型测试数据

paddlepaddle model

骁龙855 armv7 armv7 armv7 armv8 armv8 armv8
threads num 1 2 4 1 2 4
mobilenet_v1 35.11 20.67 11.83 30.56 18.59 10.44
mobilenet_v2 26.36 15.83 9.29 21.64 13.25 7.95
shufflenet_v2 4.56 3.14 2.35 4.07 2.89 2.28
squeezenet_v1.1 21.27 13.55 8.49 18.05 11.51 7.83
mnasnet 21.40 13.18 7.63 18.84 11.40 6.80
骁龙845 armv7 armv7 armv7 armv8 armv8 armv8
threads num 1 2 4 1 2 4
mobilenet_v1 65.56 37.17 19.65 63.23 32.98 17.68
mobilenet_v2 45.89 25.20 14.39 41.03 22.94 12.98
shufflenet_v2 7.31 4.66 3.27 7.08 4.71 3.41
squeezenet_v1.1 36.98 22.53 13.45 34.27 20.96 12.60
mnasnet 39.85 23.64 12.25 37.81 20.70 11.81
骁龙835 armv7 armv7 armv7 armv8 armv8 armv8
threads num 1 2 4 1 2 4
mobilenet_v1 92.77 51.56 30.14 87.46 48.02 26.42
mobilenet_v2 65.78 36.52 22.34 58.31 33.04 19.87
shufflenet_v2 10.39 6.26 4.46 9.72 6.19 4.41
squeezenet_v1.1 53.59 33.16 20.13 51.56 31.81 19.10
mnasnet 57.44 32.62 19.47 54.99 30.69 17.98

caffe model

骁龙855 armv7 armv7 armv7 armv8 armv8 armv8
threads num 1 2 4 1 2 4
mobilenet_v1 32.38 18.65 10.69 30.75 18.11 9.88
mobilenet_v2 29.45 17.86 10.81 26.61 16.26 9.67
shufflenet_v2 5.04 3.14 2.20 4.09 2.85 2.25
骁龙845 armv7 armv7 armv7 armv8 armv8 armv8
threads num 1 2 4 1 2 4
mobilenet_v1 65.26 35.19 19.11 61.42 33.15 17.48
mobilenet_v2 55.59 31.31 17.68 51.54 29.69 16.00
shufflenet_v2 7.42 4.73 3.33 7.18 4.75 3.39
骁龙835 armv7 armv7 armv7 armv8 armv8 armv8
threads num 1 2 4 1 2 4
mobilenet_v1 95.38 52.16 30.37 92.10 46.71 26.31
mobilenet_v2 82.89 45.49 28.14 74.91 41.88 25.25
shufflenet_v2 10.25 6.36 4.42 9.68 6.20 4.42

int8量化模型测试数据

骁龙855 armv7 armv7 armv7 armv8 armv8 armv8
threads num 1 2 4 1 2 4
mobilenet_v1 37.18 21.71 11.16 14.41 8.34 4.37
mobilenet_v2 27.95 16.57 8.97 13.68 8.16 4.67
骁龙835 armv7 armv7 armv7 armv8 armv8 armv8
threads num 1 2 4 1 2 4
mobilenet_v1 61.63 32.60 16.49 57.36 29.74 15.50
mobilenet_v2 47.13 25.62 13.56 41.87 22.42 11.72
麒麟970 armv7 armv7 armv7 armv8 armv8 armv8
threads num 1 2 4 1 2 4
mobilenet_v1 63.13 32.63 16.85 58.92 29.96 15.42
mobilenet_v2 48.60 25.43 13.76 43.06 22.10 12.09

华为麒麟NPU测试环境

  • 测试模型
    • fp32模型
      • mobilenet_v1
      • mobilenet_v2
      • squeezenet_v1.1
      • mnasnet
  • 测试机器(android ndk ndk-r17c)
    • 麒麟810
      • HUAWEI Nova5, Kirin 810
      • 2xCortex A76 2.27GHz + 6xCortex A55 1.88GHz
    • 麒麟990
      • HUAWEI Mate 30, Kirin 990
      • 2 x Cortex-A76 Based 2.86 GHz + 2 x Cortex-A76 Based 2.09 GHz + 4 x Cortex-A55 1.86 GHz
    • 麒麟990 5G
      • HUAWEI P40, Kirin 990 5G
      • 2 x Cortex-A76 Based 2.86GHz + 2 x Cortex-A76 Based 2.36GHz + 4 x Cortex-A55 1.95GHz
  • HIAI ddk 版本: 310 or 320
  • 测试说明
    • branch: release/v2.6.1
    • warmup=10, repeats=30,统计平均时间,单位是ms
    • 线程数为1,DeviceInfo::Global().SetRunMode设置LITE_POWER_HIGH
    • 模型的输入图像的维度是{1, 3, 224, 224},输入图像的每一位数值是1

华为麒麟NPU测试数据

paddlepaddle model

  • ddk 310
Kirin 810 990 990 5G
cpu(ms) npu(ms) cpu(ms) npu(ms) cpu(ms) npu(ms)
mobilenet_v1 41.20 12.76 31.91 4.07 33.97 3.20
mobilenet_v2 29.57 12.12 22.47 5.61 23.17 3.51
squeezenet 23.96 9.04 17.79 3.82 18.65 3.01
mnasnet 26.47 13.62 19.54 5.17 20.34 3.32
  • ddk 320
模型 990 990-5G
cpu(ms) npu(ms) cpu(ms) npu(ms)
ssd_mobilenetv1 65.67 18.21 71.8 16.6

说明:ssd_mobilenetv1的npu性能为npu、cpu混合调度运行的总时间