文本分析实例---QQ聊天记录分析--慧智精品网

⽂本分析实例---QQ聊天记录分析

对QQ聊天记录进⾏分析，因为每天产⽣的聊天记录⽐较多，所以选取的是从2⽉份整⽉的聊天记录数据，分析要产⽣的结果有三个，聊天记录中发消息的⼈前top15，统计24⼩时时间段那个时间段发贴⼈最多，还有对消息中的热词进⾏抽取。

对QQ⽤户发贴次数进⾏统计，需要注意QQ导出的聊天记录格式，【年⽉⽇时分秒 QQ账号相关信息】，需要对聊天记录做解析。另外对聊天内容也要做解析。

具体思路不做详细解释，只贴结果和部分代码，相信⼤家⼀看就明⽩。

清明节问候语安康

统计24⼩时时间段QQ消息数量

可以看出每天下午3点到5点⼤家都很活跃

李云龙结局还有⼀个就是对讨论的话题做分析，⾸先要对发的消息做分词处理，去掉⼀个停⽤词，然后按词频出现的次数统计，得到如下结果。

第⼀个表⽰出现的词，第⼆个表⽰在某个时间段内出现的次数，总的来说，我们这个还算是⼀个技术吧。相关部分代码：

def userProcess():

userArray = []

contentArray = LoadUserInfo.loadUser()

for userInfo in contentArray:

if(len(userInfo)==3):

userArray.append(userInfo[2])

print(len(userArray))

#Counter(words).most_common(10)

userGroupInof = Counter(userArray).most_common(15)

#print(userGroupInof)

userNameLable = []

postMessageNum = []

for key,value in userGroupInof:

userNameLable.append(key)

postMessageNum.append(value)

#performance = 3 + 10 * np.random.rand(len(people))

#error = np.random.rand(len(people))

zh_font = matplotlib.font_manager.FontProperties(fname='C:\Windows\')

plt.barh(np.arange(len(userNameLable)), postMessageNum, align='center', alpha=0.4)

plt.xlabel('发贴数量',fontproperties=zh_font)

plt.title('java-Endless Space(4881914)发贴最多的15个⼈',fontproperties=zh_font)

plt.show()

def hourProcess():

hourArray = []

contentArray = LoadUserInfo.loadUser()

for userInfo in contentArray:

if(len(userInfo)==3):

messageDate = userInfo[1]

hourInfo = re.split('[:]',messageDate)

hourArray.append(hourInfo[0])

print(len(hourArray))

#Counter(words).most_common(10)

hour_counts = Counter(hourArray)

#对数据进⾏排序

sortByHour = sorted(hour_counts.items())

print(sortByHour)

postMessageLable = []

postMessageNum = []

for key,value in sortByHour:

postMessageLable.append(key)防灾减灾手抄报

postMessageNum.append(value)

print(postMessageLable)

print(postMessageNum)

#⽣成发贴柱状图

N = len(postMessageNum)

ind = np.arange(N)+0.5 # the x locations for the groups

#print(ind) #x轴上的数值

width = 0.35 # the width of the bars

fig, ax = plt.subplots()

rects = ax.bar(ind, postMessageNum, width, color='r')

# add some text for labels, title and axes ticks

成山轮胎

ax.set_ylabel('message number')

ax.set_title('QQ message number of hour,total message ( '+ str(len(hourArray)) + ")")

ax.set_xticks(ind+width)

ax.set_xticklabels(postMessageLable)

def autolabel(rects):

# attach some text labels

for rect in rects:

height = _height()

<(_x()+_width()/2., height, '%d'%int(height), ha='center', va='bottom') autolabel(rects)

plt.show()

#对导⼊的⽂件第四列做中⽂分词处理

#对⽤户发出的消息进⾏处理

def messageProcess():

wordArray = []

contentArray = LoadMessageInfo.loadMessage()

print("processing original data ........")

for messageInfo in contentArray:

#print(messageInfo[3])

word_list = jieba.cut(messageInfo, cut_all=False)

for word in word_list:

#过滤掉短词，只有⼀个长度的词

if(len(word)>1):

wordArray.append(word)

#print(wordArray)

print("remove stop word data ........")

qq消息jsonResource = open('./data/stopword.json','r',encoding='utf8') stopwords = json.load(jsonResource)

慧智精品网

文本分析实例---QQ聊天记录分析

发表评论

推荐文章

【中国历史十五讲】读书说明与指导(吴树国)

中药泡脚的历史典故

关于司马迁的历史评价

3-真题专练-沈阳历史中考中国古代史-材料解析题

历史上对陶渊明的评价

热门文章

山西汉代政治,文化名人及作品

唐装的起源和历史演变

中国古代史阶段特征

中国历史上的汉朝文化发展

[汉代历史简介]汉代历史故事

历史汉代全部知识点总结

汉代经济发展对中国经济史的影响

汉代的文学体裁

汉源名字来历

简述汉代艺术的基本特征和美学风格

汉代文学的风格和特征

汉代陶瓷知识点归纳总结

汉代经济发展与中外贸易

中国古代史完整版

汉代的科学技术与数学发展

汉代的录囚名词解释

汉代对中国文化的影响-概述说明以及解释

汉代的思想大一统知识点

汉学与宋学的名词解释

汉唐文化交融研究

最新文章

3-真题专练-沈阳历史中考中国古代史-材料解析题

历史上对陶渊明的评价

互动训练B—《汉武帝巩固大一统王朝》

汉代耧车的历史价值

红星照耀中国汉代青铜读书笔记

湖南马王堆汉墓的发掘与研究

标签列表