python三国演义⼈物统计分析前20个_使⽤python统计《三国演义》⼩说⾥⼈物出现次。。。
⼀、安装所需要的第三⽅库
jieba (jieba是优秀的中⽂分词第三分库)
八月十五和国庆同一天多少年一次pyecharts (⼀个优秀的数据可视化库)
《三国演义》.txt下载地址(提取码:kist )
使⽤pycharm安装库
打开Pycharm选择【File】下的Settings
出现下⾯页⾯,
选择右边的【+】出现下⾯页⾯,在此页⾯顶端搜索想要的库,然后安装就可以了
⼆、编写代码
import jieba #导⼊库
import os
print("⼈物出现次数前⼗名:")
txt = open('三国演义.txt', 'r' ,encoding='gb18030').read()
words = jieba.lcut(txt)
counts = {}
for word in words:
if len(word) == 1:
continue
elif word == "诸葛亮" or word == "孔明⽈":
rword = "孔明"
elif word == "关公" or word == "云长":
rword = "关⽻"
elif word == "⽞德" or word == "⽞德⽈":
rword = "刘备"
elif word == "孟德" or word == "丞相":
rword = "曹操" # 把相同意思的名字归为⼀个⼈
else:
rword = word
counts[rword] = (rword, 0) + 1
items.sort(key=lambda x: x[1], reverse=True)
for i in range(10):
word, count=items[i]
print("{}:{}".format(word, count)) # 打印前⼗名名单
结果如下图:
可以看到这⾥⾯有很多不是⼈物的名字,所以咱们要把这些删掉。更改代码如下
import jieba #导⼊库
import os
print("⼈物出现次数前⼗名:")
txt = open('三国演义.txt', 'r' ,encoding='gb18030').read()
remove = {"将军", "却说", "不能", "后主", "上马", "不知", "天⼦", "⼤叫", "众将", "不可",
"主公", "蜀兵", "只见", "如何", "商议", "都督", "⼀⼈", "汉中", "⼈马",
"陛下", "魏兵", "天下", "今⽇", "左右", "东吴", "于是", "荆州", "不能", "如此",
"⼤喜", "引兵", "次⽇", "军⼠", "军马","⼆⼈","不敢"} # 这些⽂字是要排出掉的,多次运⾏程序所得到的words = jieba.lcut(txt)
counts = {}
for word in words:
if len(word) == 1:
continue
elif word == "诸葛亮" or word == "孔明⽈":
rword = "孔明"
elif word == "关公" or word == "云长":
rword = "关⽻"
elif word == "⽞德" or word == "⽞德⽈":
rword = "刘备"黑道学生与校花
elif word == "孟德" or word == "丞相":
rword = "曹操" # 把相同意思的名字归为⼀个⼈
else:
rword = word
counts[rword] = (rword, 0) + 1
for word in remove:
del counts[word] #匹配⽂字相等就删除
for i in range(10):
word, count=items[i]
print("{}:{}".format(word, count)) # 打印前⼗名名单
运⾏结果如下图
可以看到现在都是⼈物名称了
导出数据,代码如下
import jieba #导⼊库
import os
print("⼈物出现次数前⼗名:")
txt = open('三国演义.txt', 'r' ,encoding='gb18030').read()
remove = {"将军", "却说", "不能", "后主", "上马", "不知", "天⼦", "⼤叫", "众将", "不可",
"主公", "蜀兵", "只见", "如何", "商议", "都督", "⼀⼈", "汉中", "⼈马",
"陛下", "魏兵", "天下", "今⽇", "左右", "东吴", "于是", "荆州", "不能", "如此",
"⼤喜", "引兵", "次⽇", "军⼠", "军马","⼆⼈","不敢"} # 这些⽂字是要排出掉的,多次运⾏程序所得到的words = jieba.lcut(txt)
counts = {}
for word in words:
if len(word) == 1:
continue
elif word == "诸葛亮" or word == "孔明⽈":
rword = "孔明"
elif word == "关公" or word == "云长":
rword = "关⽻"
elif word == "⽞德" or word == "⽞德⽈":
rword = "刘备"
elif word == "孟德" or word == "丞相":
rword = "曹操" # 把相同意思的名字归为⼀个⼈
else:
比喻句大全 优美喜剧 电影rword = word
counts[rword] = (rword, 0) + 1
for word in remove:
del counts[word] #匹配⽂字相等就删除
#导出数据
fo = open("三国⼈物出场次数.txt", "a", encoding='utf-8')
for i in range(10):
word, count=items[i]
word = str(word)
count = str(count)
fo.write(word)
fo.write(':') #使⽤冒号分开
fo.write(count)
fo.write('n') #换⾏
刺客信条操作方法fo.close() #关闭⽂件
现在咱们运⾏看是否导出,运⾏结果如下图。
可以看到已经⽣成⼀个名为三国⼈物出场次数.txt的⽂件,⽽⽂件⾥的内容就是咱们刚才的数据。
三、数据可视化
想要可视化⾸先咱们要有数据,咱们把刚才导出的数据转换为字典形式。代码如下
#将txt⽂本⾥的数据转换为字典形式
fr = open('三国⼈物出场次数.txt', 'r', encoding='utf-8')
dic = {}
keys = [] # ⽤来存储读取的顺序
for line in fr:
v = line.strip().split(':')
dic[v[0]] = v[1]
keys.append(v[0])
fr.close()
print(dic)
-运⾏结果如下
使⽤pyecharts绘图
先倒⼊模块
from pyecharts import options as opts
from pyecharts.charts import Bar
代码如下
# 绘图
list2=list(dic.values()) #提取字典⾥的数据作为绘图数据
c = (
Bar()
八一感言简短.add_xaxis(list1)
.add_yaxis("⼈物出场次数",list2)
.set_global_opts(
xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15)),
)
.
render("⼈物出场次数可视化图.html")
)
运⾏程序看到⽬录下会⽣成⼀个名为⼈物出场次数可视化图.html的⽂件,如下图
使⽤浏览器打开,就可以看到数据以图形的⽅式呈现出来。
三、全部代码呈现
#《三国演义》的⼈物出场次数Python代码:
import jieba #导⼊库
import os
from pyecharts import options as opts
from pyecharts.charts import Bar
print("⼈物出现次数前⼗名:")
txt = open('三国演义.txt', 'r' ,encoding='gb18030').read()
remove = {"将军", "却说", "不能", "后主", "上马", "不知", "天⼦", "⼤叫", "众将", "不可",
"主公", "蜀兵", "只见", "如何", "商议", "都督", "⼀⼈", "汉中", "⼈马",
"陛下", "魏兵", "天下", "今⽇", "左右", "东吴", "于是", "荆州", "不能", "如此",
"⼤喜", "引兵", "次⽇", "军⼠", "军马","⼆⼈","不敢"} # 这些⽂字是要排出掉的,多次运⾏程序所得到的words = jieba.lcut(txt)
counts = {}
for word in words:
if len(word) == 1:
continue
elif word == "诸葛亮" or word == "孔明⽈":
rword = "孔明"
elif word == "关公" or word == "云长":
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论