去年安装paddleOCR2.5使用正常,也记录了安装步骤文档,今天交接安装文档导致安装成功,OCR识别不生效,以及报错,并且更换服务器尝试也不行,估计是相关模块版本原因导致不能识别。

本次记录paddleOCR最新的2.6安装文档,记录遇到的问题。开撸！！！

基于PaddleHub部署CPU版本的PaddleOCR实操过程记录。

PaddleOCR：release/2.6分支
pip: 23.1.2
PaddlePaddle 2.4.2
Paddlehub 2.1.0

一、python安装

python 3.9.12安装(网上有更多教程，过)

二、预训练模型应用工具安装


#使用清华源或者阿里下载速度会快
python -m pip install -i https://pypi.tuna.tsinghua.edu.cn/simple paddlepaddle 

#执行完语句会提示升级pip,那就升级吧
python.exe -m pip install -i https://mirrors.aliyun.com/pypi/simple --upgrade pip  

pip install -i https://mirrors.aliyun.com/pypi/simple --default-timeout=100 paddlehub

三、PaddleOCR

1、拉取代码

text
git clone https://github.com/PaddlePaddle/PaddleOCR

2、根目录创建文件夹,命名`inference`

3、安装所需环境


cd PaddleOCR-release-2.6/
pip install -i https://mirrors.aliyun.com/pypi/simple --default-timeout=100 -r requirements.txt

注意:执行此命令前,先看本章第六节,博主已踩坑，已完善!:rage:

四、基于PaddleHub Serving的服务部署

hubserving服务部署目录下包括文本检测、文本方向分类，文本识别、文本检测+文本方向分类+文本识别3阶段串联，表格识别和PP-Structure六种服务包，请根据需求选择相应的服务包进行安装和启动。目录结构如下：


deploy/hubserving/
  └─  ocr_cls     文本方向分类模块服务包
  └─  ocr_det     文本检测模块服务包
  └─  ocr_rec     文本识别模块服务包
  └─  ocr_system  文本检测+文本方向分类+文本识别串联服务包
  └─  structure_table  表格识别服务包
  └─  structure_system  PP-Structure服务包

每个服务包下包含3个文件。以2阶段串联服务包为例，目录如下：


deploy/hubserving/ocr_system/
  └─  __init__.py    空文件，必选
  └─  config.json    配置文件，可选，使用配置启动服务时作为参数传入
  └─  module.py      主模块，必选，包含服务的完整逻辑
  └─  params.py      参数文件，必选，包含模型路径、前后处理参数等参数

本片使用的ocr_system

1、放入推理模型

安装服务模块前，需要准备推理模型,模型来自官网中'PP-OCR系列模型列表（更新中）'

推理模型并放到根目录inference文件夹。默认使用的是PP-OCRv3模型，默认模型路径为：


检测模型：./inference/ch_PP-OCRv3_det_infer/
识别模型：./inference/ch_PP-OCRv3_rec_infer/
方向分类器：./inference/ch_ppocr_mobile_v2.0_cls_infer/
表格结构识别模型：./inference/en_ppocr_mobile_v2.0_table_structure_infer/

模型路径可在\deploy\hubserving\ocr_system\params.py中查看和修改。

2、安装服务模块

PaddleOCR提供5种服务模块，根据需要安装所需模块。

在Windows环境下(文件夹的分隔符为\)，安装示例如下：


# 安装检测+识别串联服务模块：
hub install deploy\hubserving\ocr_system\

3、启动服务

3.1.命令行命令启动（仅支持CPU,本人使用CPU）

启动命令：

shell
$ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \
                    --port XXXX \
                    --use_multiprocess \
                    --workers \

参数：

参数	用途
--modules/-m	PaddleHub Serving预安装模型，以多个Module==Version键值对的形式列出 `当不指定Version时，默认选择最新版本`
--port/-p	服务端口，默认为8866
--use_multiprocess	是否启用并发方式，默认为单进程方式，推荐多核CPU机器使用此方式 `Windows操作系统只支持单进程方式`
--workers	在并发方式下指定的并发任务数，默认为`2*cpu_count-1`，其中`cpu_count`为CPU核数

**简易启动串联服务： hub serving start -m ocr_system **

这样就完成了一个服务化API的部署，使用默认端口号8866。

3.2.了解配置文件启动（支持CPU、GPU）

启动命令：
hub serving start -c config.json

其中，config.json格式如下：

python
{
    "modules_info": {
        "ocr_system": {
            "init_args": {
                "version": "1.0.0",
                "use_gpu": true
            },
            "predict_args": {
            }
        }
    },
    "port": 8868,
    "use_multiprocess": false,
    "workers": 2
}

init_args中的可配参数与module.py中的_initialize函数接口一致。其中，当use_gpu为true时，表示使用GPU启动服务。
predict_args中的可配参数与module.py中的predict函数接口一致。

注意:

使用配置文件启动服务时，其他参数会被忽略。
如果使用GPU预测(即，use_gpu置为true)，则需要在启动服务之前，设置CUDA_VISIBLE_DEVICES环境变量，如：export CUDA_VISIBLE_DEVICES=0，否则不用设置。
use_gpu不可与use_multiprocess同时为true。

如，使用GPU 3号卡启动串联服务：

shell
export CUDA_VISIBLE_DEVICES=3
hub serving start -c deploy/hubserving/ocr_system/config.json

五、文字识别测试

进入PaddleOCR\tools目录，为了简单起见，在目录下放入一张命名为1.jpg的图片,在命令行执行命令：


python test_hubserving.py http://127.0.0.1:8866/predict/ocr_rec 1.jpg

六、问题关键点

1、numpy

使用文字识别测试时报错如下:


{'msg': "module 'numpy' has no attribute 'int'.\nnp.int was a deprecated alias for the builtin int. To avoid
this error in existing code, use int by itself. Doing this will not modify any behavior and is safe. When 
replacing np.int, you may wish to use e.g. np.int64 or np.int32 to specify the precision. If you wish to 
review your current use, check the release note link for additional information.\nThe aliases was originally
deprecated in NumPy 1.20; for more details and guidance see the original release note at:\n    
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations", 'results': '', 'status': '101'}

解决方案:

numpy降级，修改requirements.txt中的numpy改为

==1.23.3

2、Polygon3

正常安装报错如下:


error:subprocess-exited-with-error
python setuppy bdist_wheel did not run successfully.
exit code: 1
	[14 Lines of output]
	Using NumPy extension!
	running bdist_wheel
	runningbuild
	runningbuild_py
	creating build
	creatingbuild\lib.win-amd64-cpython-39
	creating build\lib.win-amd64-cpython-39\Polygon
	copying Polygon\Io.py -> build\lib.win-amd64-cpython-39\Polygon
	copying Polygon\shapes.py -> build\lib.win-amd64-cpython-39\Polygon
	copying Polygon\utils.py -> buildlib.win-amd64-cpython-39\Polygon
	copying Polygon-_init_-.py -> build\lib.win-amd64-cpython-39\Polygon
	running build_ext
	building 'Polygon.cPolygon' extensionerror: Microsoft Visual C++ 14. or greater is required. Get it with "Microsoft C++ Build Tools"
: https://visualstudio.microsoft.com/visual-cpp-build-tools/
	[end of output]