基于PaddleOCR的身份证文字识别的实现--慧智精品网

基于PaddleOCR的⾝份证⽂字识别的实现

⼀、前⾔

好久没有更新博客了，最近实习，接触了OCR的项⽬，感觉还挺有意思的，然后也发现了⼀款⾮常好⽤的OCR识别库，来⾃百度开发

的PaddleOCR，识别率堪⽐商业级别。所以本⽂就没啥图像处理了，简单运⽤⼀下这个PaddleOCR。

⼆、PaddleOCR的使⽤

三、识别思路

接下来讲⼀下这个识别的思路，其实很简单，就是对获取的数据进⾏整合就OK。

（1）直接对⽂字识别蓝燕 3d

# 待预测图⽚

test_img_path =[reversePath,frontPath]

# 加载移动端预训练模型

ocr = hub.Module(name="chinese_ocr_db_crnn_mobile")

np_images =[cv2.imread(image_path)for image_path in test_img_path]

简短开学第一天寄语

#检测

results = ize_text(

images=np_images,# 图⽚数据，ndarray.shape 为 [H, W, C]，BGR格式；

use_gpu=True,# 是否使⽤ GPU；若使⽤GPU，请先设置CUDA_VISIBLE_DEVICES环境变量

output_dir='ocr_result',# 图⽚的保存路径，默认设为 ocr_result；

visualization=True,# 是否将识别结果保存为图⽚⽂件；

box_thresh=0.5,# 检测⽂本框置信度的阈值；

text_thresh=0.5)# 识别中⽂⽂本置信度的阈值；

#获取⽂字数据/

resultStr =''

for result in results:

data = result['data']

save_path = result['save_path']

for infomation in data:

resultStr = resultStr+infomation['text']

上述代码块基本就是解决问题的核⼼，其中第⼀⾏的reversePath和frontPath分别是反⾯照⽚路径和正⾯的路径，然后通过加载训练模型，通过ize_text函数即可扫描出图⽚中的⽂字数据，结果如下图所⽰：

然后们对获取的数据直接整合成⼀个长⽂本，即resultStr的结果为：

姓名充伊性别⼥民族汉出⽣1947年6⽉11⽇住址四川省成都市武侯区益州⼤道中段722号复城国际公民⾝份号码513701************居民⾝份证签发机关四川省成都市锦江分局有效期限2012.01.26-2032.01.21

（2）对获取的数据进⾏筛选

对数据提取之前我们要先删去那些我们不需要的东西，⽐如说可能出现的空格字符，因为OCR可能会误判从⽽多出来⼀些奇怪的符号

def removeSpace(long_str):

#去除空格

noneSpaceStr =''

str_arry = long_str.split()

for x in range(0,len(str_arry)):

noneSpaceStr = noneSpaceStr+str_arry[x]

return noneSpaceStr

def removePunctuation(noneSpaceStr):

#去除标点符号

punctuation = r"""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~“”？，！『【】（）、。：；’‘……￥·"""

广东所有大学排名

s =noneSpaceStr

dicts={i:''for i in punctuation}

punc_table=str.maketrans(dicts)

anslate(punc_table)

return nonePunctuationStr

（3）对数据进⾏提取

国标舞蹈随后我们将拿到⼲净的数据，提取的思路就是我们到这串数据中的姓名，性别，出⽣等关键词在⽂本中的序列号，然后进⾏分割再拼凑，具体实现如下，最后返回⼀个字典：

def findResult(nonePunctuationStr):

name ="姓名"

sex ="性别"

race ="民族"

birth ="出⽣"

address ="住址"

idCardNumber ="公民⾝份号码"

issuedBy ='签发机关'

validDate ='有效期限'

validDateStart ='有效期开始时间'

validDateEnd ='有效期结束时间'

indexName = nonePunctuationStr.find(name)

indexSex = nonePunctuationStr.find(sex)

indexRace = nonePunctuationStr.find(race)

indexBirth = nonePunctuationStr.find(birth)

indexAddress = nonePunctuationStr.find(address)肇事逃逸什么罪

indexIdCardNumber = nonePunctuationStr.find(idCardNumber)

indexIssuedBy = nonePunctuationStr.find(issuedBy)

indexValidDate = nonePunctuationStr.find(validDate)

numberName = nonePunctuationStr[indexName+2:indexSex]

numberSex = nonePunctuationStr[indexSex+2:indexSex+3]

numberRace = nonePunctuationStr[indexRace+2:indexRace+3]

numberBirth = nonePunctuationStr[indexBirth+2:indexAddress]

numberAddress = nonePunctuationStr[indexAddress+2:indexIdCardNumber]

numberIdCardNumber = nonePunctuationStr[indexIdCardNumber+6:indexIdCardNumber+24]

strIssuedBy = nonePunctuationStr[indexIssuedBy+4:indexValidDate]关于重阳节的诗句

strDate = nonePunctuationStr[indexValidDate+4:len(nonePunctuationStr)]

strValidDateStart = strDate[0:4]+"."+strDate[4:6]+"."+strDate[6:8]

strValidDateEnd = strDate[8:12]+"."+strDate[12:14]+"."+strDate[14:16]

reverseDict ={name:numberName,sex:numberSex,race:numberRace,birth:numberBirth,address:numberAddress,idCardNumber:numberIdCardNumber, issuedBy:strIssuedBy,validDateStart:strValidDateStart,validDateEnd:strValidDateEnd}

return reverseDict

到这⾥识别完成，最终识别的结果如下

{'姓名':'充伊', '性别':'⼥', '民族':'汉', '出⽣':'1947年6⽉11⽇', '住址':'四川省成都市武侯区益州⼤道中段722号复城国际', '

公民⾝份号码':'513701************', '签发机关':'四川省成都市锦江分局', '有效期开始时间':'2012.01.26', '有效期结束时间':'2032.01.21'}

四、总结和注意事项

本项⽬其实很简单，之前做这个项⽬⼀直⽤的图像识别技术去做，虽然也做出来了，但是识别精确度太低，直到发现这个宝藏OCR识别库。当然⼤家可能会碰到⼀些问题如下：

1. 如果要使⽤GPU的话，第⼀步肯定得配置好CUDA的环境，先把recognize函数GPU改为Ture，然后在识别之前加上以下代码设置好

GPU

#设置、gpu

import os

2.如果提⽰缺少cudnn7.dll之类的代码，就去百度这个依赖库下载然后放到CUDA⽂件中的bin⽬录中

慧智精品网

基于PaddleOCR的身份证文字识别的实现

发表评论

推荐文章

【中国历史十五讲】读书说明与指导(吴树国)

中药泡脚的历史典故

关于司马迁的历史评价

3-真题专练-沈阳历史中考中国古代史-材料解析题

历史上对陶渊明的评价

热门文章

山西汉代政治,文化名人及作品

唐装的起源和历史演变

中国古代史阶段特征

中国历史上的汉朝文化发展

[汉代历史简介]汉代历史故事

历史汉代全部知识点总结

汉代经济发展对中国经济史的影响

汉代的文学体裁

汉源名字来历

简述汉代艺术的基本特征和美学风格

汉代文学的风格和特征

汉代陶瓷知识点归纳总结

汉代经济发展与中外贸易

中国古代史完整版

汉代的科学技术与数学发展

汉代的录囚名词解释

汉代对中国文化的影响-概述说明以及解释

汉代的思想大一统知识点

汉学与宋学的名词解释

汉唐文化交融研究

最新文章

3-真题专练-沈阳历史中考中国古代史-材料解析题

历史上对陶渊明的评价

互动训练B—《汉武帝巩固大一统王朝》

汉代耧车的历史价值

红星照耀中国汉代青铜读书笔记

湖南马王堆汉墓的发掘与研究

标签列表