美团西安美食部分爬虫（修改版）（python）--慧智精品网

美团西安美⾷部分爬⾍（修改版）（python）

#美团美⾷

# -*- coding:UTF-8 -*-

import requests

import time

from bs4 import BeautifulSoup

import json

import csv

import random

with open(r'C:\Users\Hanju\Desktop\美团西安美⾷.csv',"w", newline='',encoding='UTF-8') as csvfile: （将爬取到的数据写⼊对应csv⽂件）

writer = csv.writer(csvfile)

writer.writerow(['⽹站名','品类','商家名称','地址']) （⽼板要求按照这个格式）

target = 'xa.meituan/meishi/'

head={}

head['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'

req = (url=target,headers=head)

bf=BeautifulSoup(html,'lxml')

tag=bf.find_all('script')[14]

text=str(tag)[27:-10]

data=json.loads(text)

print(data)

运⾏结果如下，⾥⾯包含有我需要抓取的所有信息，包括菜系分类，区域id，还有本页的店家名称和地址，打印结果如下

{'$meta': {'knbJS': '//s0.meituan/bs/knb/v1.3.19/knb.js', 'adunionJS': '//h5.dianping/app/adu-track/adunion-track.js', 'uuid': '5282ca77-bfd1-484e-84

接下来⾸先提取菜系分类的id，和区域id，上部分代码已经将tag类型转换成json类型，类似于python中的字典，所有接下来就很简单了

ClassList=[]

AreaList=[]

for item in result['cates']:

ClassList.append(item['id'])

积压库存处理方案for item in result['areas']:

for subarea in item['subAreas']:

AreaList.append(subarea['id'])

print(ClassList)

print(AreaList)

拿到所有菜系id，区域id如下：

[393, 11, 17, 40, 36, 28, 35, 395, 54, 20003, 55, 56, 20004, 57, 400, 58, 41, 59, 60, 62, 63, 217, 227, 228, 229, 232, 233, 24]

[113, 6835, 7137, 900, 8976, 897, 898, 899, 908, 7402, 7404, 8974, 8975, 9012, 15634, 15639, 15642, 15643, 15664, 15667, 15784, 15785, 116, 907, 910, 109然后取其中⼀个进⾏组合组成url，先尝试解析单个⽹页，代码如下：

#美团美⾷

露地西瓜种植技术

# -*- coding:UTF-8 -*-

import requests

import time

from bs4 import BeautifulSoup

import json

import csv

import random

with open(r'C:\Users\Hanju\Desktop\美团西安美⾷.csv',"w", newline='',encoding='UTF-8') as csvfile:

writer = csv.writer(csvfile)

writer.writerow(['⽹站名','品类','商家名称','地址'])

target = 'xa.meituan/meishi/c11b113/pn2/'

#ClassList=[393, 11, 17, 40, 36, 28, 35, 395, 54, 20003, 55, 56, 20004, 57, 400, 58, 41, 59, 60, 62, 63, 217, 227, 228, 229, 232, 233, 24]

#AreaList=[113, 6835, 7137, 900, 8976, 897, 898, 899, 908, 7402, 7404, 8974, 8975, 9012, 15634, 15639, 15642, 15643, 15664, 15667, 15784, 15785, 116, 9 head={}

head['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'

head['authorization']='Client-Id'（cookie倒数第三个参数）

req = (url=target,headers=head)

bf=BeautifulSoup(html,'lxml')

tag=bf.find_all('script')[14]

data=json.loads(str(tag)[27:-10])

result=data['poiLists']['poiInfos']

for item in result:

print(item['title']) (打印出商家名称）

print(item['address']) (打印出商家地址)

运⾏结果如下：

DQ（⽴丰⼴场店）

碑林区⾦花南路6号⽴丰国际购物⼴场1层（近味千拉⾯）

伯爵⼯房（草场坡店）

碑林区长安路街道长安三号⼩区6号楼201室

⼥皇初吻甜品⼯作室

碑林区南门外宏信国际花园5号楼2单元704室（南门世纪⾦花）

绿梦（⾦花路理⼯⼤店）

碑林区东⼆环路西安理⼯⼤学东门斜对⾯

玛格丽特蛋糕店（太⼄路店）

碑林区太⼄路⽴交曼城国际2号楼2单元1003

甜咪much cake（南稍门店）

碑林区南稍门⼗字东南⾓地铁D⼝旁⾆尖上的南门商场⼆楼北区

麦之⾹蛋糕（友谊东路店）

碑林区测绘东路李家分村内

分享时光（中贸⼴场店）

碑林区长安北路中贸⼴场4号楼三层

皇家happy（友谊东路店）

碑林区友谊东路⼩雁塔10号

安旗（经⼆路店）

碑林区经⼆路南⼝

稻⾹村（南⼆环店）

碑林区南⼆环太⽩⽴交桥⼈⼈乐超市三层

菲滋洛伊

碑林区景观路就掌灯35号

古早味

碑林区体育馆南路与体育馆东路交⼝东⾏10⽶路北

⿊⾯蔡（中贸⼴场店）

碑林区中贸⼴场⼀层

麦德房（端履门店）

碑林区端履门路北卧龙⼤厦1楼（西安⾼级中学西侧）

左右奶茶

碑林区爱学路秦川⼩学正对⾯

酒⼼甜品（民⽣百货店）

碑林区东⼤街骡马市1号民⽣百货6层

ccake蛋糕（交⼤乐居场店）

碑林区乐居场47号

千家粗粮王（⾦花南路店）

碑林区⾦花南路105号⽴丰国际向北100⽶

慕尼⿊森林

碑林区建⼯路与公园南路交叉⼝的东北⾓（公园南路⼝东北⾓）

爺茶（南稍门店）

碑林区长安北路113号（南稍门⼗字东南⾓、⾆尖上的南门商场负⼀楼B1-10号乡村基对⾯）青果青橙速饮（圣荣⼴场店）

碑林区长乐西路圣荣服饰⼴场⼆楼家年华⼩吃城

⾦皇冠

碑林区新科路53号（西安东⽅中学南约50⽶）

⽶旗（李家村店）

碑林区雁塔北路58号秋林商厦1楼（近秋林公司）

Cake Talk ⼼语蛋糕房（交⼤店）

碑林区东⼆环北沙坡街兰蒂斯城1期4号楼内

SPRCOFFEE（李家村店）

碑林区雁塔北路52号（李家村万达⼴场北侧煤研宾馆⼀层）

⼤卡司（太⽩印象城店）

碑林区⼆环南路西段155号太⽩印象城3层

丝特巴瑞烘焙⼯坊

碑林区陕西省西安市⾦⽔路6号（交⼤南门对⾯）

麦哆滋烘焙⼯坊（咸宁路店）

碑林区咸宁路中段往东300⽶（⼋⼗三中学对⾯）

等⼀个⼈咖啡馆

碑林区⿎楼对⾯⽵笆市银泰百货地下停车场往西50⽶(⿎楼)

蜜東都可茶饮

碑林区开元商城后门解放市场哆啦星球门⼝

暮光森林

碑林区骡马市兴正元⼴场30202号（兴正元⼴场环郎汉堡王旁扶⼿电梯上2楼）

审查元素network发现url组成规律：

元旦节简短好句15字

月亮的成语

然后结合上⾯得到的两组id，组合出url，完整代码如下：

#美团美⾷

# -*- coding:UTF-8 -*-魔兽世界急救

import requests

import time

from bs4 import BeautifulSoup

import json

import csv

import random

with open(r'C:\Users\Hanju\Desktop\美团西安美⾷.csv',"w", newline='',encoding='UTF-8') as csvfile:

writer = csv.writer(csvfile)

writer.writerow(['⽹站名','品类','商家名称','地址'])

target = 'xa.meituan/meishi/'

ClassList=[393, 11, 17, 40, 36, 28, 35, 395, 54, 20003, 55, 56, 20004, 57, 400, 58, 41, 59, 60, 62, 63, 217, 227, 228, 229, 232, 233, 24]

AreaList=[113, 6835, 7137, 900, 8976, 897, 898, 899, 908, 7402, 7404, 8974, 8975, 9012, 15634, 15639, 15642, 15643, 15664, 15667, 15784, 15785, 116, 90 for class_ in ClassList:

for area in AreaList:

for i in range(1,51):

url=target+'c'+str(class_)+'b'+str(area)+'/pn'+str(i)+'/'

head={}

head['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'

head['authorization']='client-id'(cookie倒数第三个参数）

req = (url=url,headers=head)

bf=BeautifulSoup(html,'lxml')

tag=bf.find_all('script')[14]

data=json.loads(str(tag)[27:-10])

print(data)

鼠标右键不能用result=data['poiLists']['poiInfos']

if result:

print(url)

for item in result:

Info_List=[]

Info_List.append('美团')

Info_List.append('美⾷')

Info_List.append(item['title'])

Info_List.append(item['address'])

writer.writerow(Info_List)

time.sleep(random.choice(range(2,5)))

else:

break（如果该页没有商家列表，跳出循环）

print('Done')

贴出部分运⾏结果

xa.meituan/meishi/c393b113/pn1/ xa.meituan/meishi/c393b113/pn2/ xa.meituan/meishi/c393b113/pn3/ xa.meituan/meishi/c393b113/pn4/ xa.meituan/meishi/c393b113/pn5/ xa.meituan/meishi/c393b113/pn6/ xa.meituan/meishi/c393b113/pn7/ xa.meituan/meishi/c393b113/pn8/ xa.meituan/meishi/c393b113/pn9/ xa.meituan/meishi/c393b113/pn10/ xa.meituan/meishi/c393b113/pn11/ xa.meituan/meishi/c393b113/pn12/ xa.

meituan/meishi/c393b113/pn13/ xa.meituan/meishi/c393b113/pn14/ xa.meituan/meishi/c393b113/pn15/ xa.meituan/meishi/c393b113/pn16/ xa.meituan/meishi/c393b113/pn17/ xa.meituan/meishi/c393b113/pn18/ xa.meituan/meishi/c393b113/pn19/ xa.meituan/meishi/c393b113/pn20/ xa.meituan/meishi/c393b113/pn21/ xa.meituan/meishi/c393b113/pn22/ xa.meituan/meishi/c393b113/pn23/ xa.meituan/meishi/c393b113/pn24/ xa.meituan/meishi/c393b113/pn25/ xa.meituan/meishi/c393b113/pn26/ xa.meituan/meishi/c393b113/pn27/ xa.meituan/meishi/c393b113/pn28/ xa.meituan/meishi/c393b113/pn29/ xa.meituan/meishi/c393b113/pn30/ xa.meituan/meishi/c393b113/pn31/ xa.meituan/meishi/c393b113/pn32/ xa.meituan/meishi/c393b6835/pn1/ xa.meituan/meishi/c393b6835/pn2/ xa.meituan/meishi/c393b7137/pn1/ xa.meituan/meishi/c393b7137/pn2/ xa.meituan/meishi/c393b897/pn1/ xa.meituan/meishi/c393b897/pn2/

写⼊⽂件如下（同样是截取了⼀部分）

台湾塑化剂污染食品事件分析

« 上一篇

我的饭盒日记

发表评论

推荐文章

【中国历史十五讲】读书说明与指导(吴树国)

中药泡脚的历史典故

关于司马迁的历史评价

3-真题专练-沈阳历史中考中国古代史-材料解析题

历史上对陶渊明的评价

热门文章

山西汉代政治,文化名人及作品

唐装的起源和历史演变

中国古代史阶段特征

中国历史上的汉朝文化发展

[汉代历史简介]汉代历史故事

历史汉代全部知识点总结

汉代经济发展对中国经济史的影响

汉代的文学体裁

汉源名字来历

简述汉代艺术的基本特征和美学风格

汉代文学的风格和特征

汉代陶瓷知识点归纳总结

汉代经济发展与中外贸易

中国古代史完整版

汉代的科学技术与数学发展

汉代的录囚名词解释

汉代对中国文化的影响-概述说明以及解释

汉代的思想大一统知识点

汉学与宋学的名词解释

汉唐文化交融研究

最新文章

3-真题专练-沈阳历史中考中国古代史-材料解析题

历史上对陶渊明的评价

互动训练B—《汉武帝巩固大一统王朝》

汉代耧车的历史价值

红星照耀中国汉代青铜读书笔记

湖南马王堆汉墓的发掘与研究

标签列表