【选课脚本】用Python网页爬虫来进行选(qiang)课(更新至v1.0.8)
【选课脚本】⽤Python⽹页爬⾍来进⾏选(qiang)课(更新⾄v1.0.8)
0x00 前⾔
每当选课的时候,都如同打仗⼀般
都有⾃⼰想要的课,但是名额就那么⼀点
于是各显神通,有⼈⽤ js,有⼈⽤ chrome 的 console
⼈⽣苦短,我⽤Python
(Last Update: 2020/09/22 版本号v1.0.8)
0x01 环境依赖
Python 3.x
如果你想看 html 的结果,最好有个浏览器
<
beautifulsoup4>=4.6.0
bs4>=0.0.1
configparser>=3.5.0
lxml>=3.7.3
requests>=2.13.0
tqdm>=4.11.2
0x02 使⽤⽅法
获取程序
你可以直接git clone最新版本的程序
$ git clone github/okcd00/CDSelector.git
$ cd CDSelector
$ vim config 修改登陆信息
$ vim courseid 修改选课信息
$ python CDSelector
修改⽂件 config
[info]
username = [你的SEP登陆帐户名,通常是个邮箱]
password = [你的密码,不⽤加双引号框起来]
runtime  = [打算每隔多少秒尝试选课⼀次]
[action]
debug = false [debug模式输出⼤量的中间变量,为节省资源可设置为false]
enroll = true [轮询模式下⽆限循环尝试,没想过什么情况下需要设为false]
evaluate = true [验证选课成功与否,建议开启]
select_bat = false [打包选课,应⽤于类似英语B这种不让单独选,必须同时选“听说+读写”两个课才允许提交表单的特殊情况]
修改⽂件 courseid
⼀⾏是⼀门课,写成类似下⽂⽰例中⼀样,每⾏⼀门课的学院+编号即可
本次选课系统的更新中(此处指v1.0.8版本)课程编号和学院编号脱钩,所以⽬前需要在 courseid ⾥⼿动增加学院名称,"学院"⼆字允许省略,但前两个字必须得对。
计算机:081203M04003H
公管学院:030100M01004H
特别的,如果这门课你需要选成学位课的话,后⾯要加个 on,也是⽤冒号隔开
计算机学院:081203M04003H:on
公管:030100M01004H:on
然后运⾏ CDSelector.py
python ⽂件名.py 是 PYTHON 代码的运⾏模式,如果你发现你安装完python之后,我的 CDSelector.py ⽂件双击就可以直接执⾏的话,是同样的效果。
$ python CDSelector.py
Debug Mode: True
Login success
Enrolling start
> Course Selection is unreachable or not started. <1134> Thu Jun 01 08:43:42 2017
如果显⽰ImportError: xxx,就是说缺少了某些python包,使⽤下⾯的指令直接安装即可,pip是随着python安装的时候⾃带的⼀个⼯具,不⽤额外下载。
$ pip install xxx
当然你如果稍微熟悉⼀些 python,也可以⼀次性安装所有依赖项
$ pip install -
0x03 Source Code
代码⽐原先长了不少,全贴在这的话⽐较影响观看体验,移到⽂末最新的详细代码可移步
v1.0.0 web端访问部分参考了 scusjs 的实现⽅式,功能强化参考了 zoecur
psp侠盗猎车自由城秘籍v1.0.7 感谢 bobo334334 提供错误样例,感谢 xzqforever 提供帐号测试
(Updated: 2017/09/07) 选课系统参数微调,某些学院的课⽆法正常选上
v1.0.8 感谢 daiiwei 同学提供帐号⽤于测试
(Updated: 2020/09/22) 这回SEP选课系统改了不少,⼤改。版本号 v1.0.8
更新了学院ID词典,由课程编号前两位改成了⽆规律的3位整数
“403 Forbidden” 更加多发,增加了多种headers防⽌403
“会话过期重新登录”更加频繁,新采⽤Cookie模式以维持登录状态
优化⽇志输出,并在关键页⾯保存离线页⾯。在选课系统流量爆炸时提供轻量级本地⽹页查看,通过repository ⾥预设的
js/css,允许仅加载⽹页源码,可以达到相对轻量级的可视化检查。
0xFE 获取途径
Github: github/okcd00/CDSelector
Release: github/okcd00/CDSelector/releases
说明⽂档: blog.csdn/okcd00/article/details/72827861
鸣谢:
Mailto: zoecur@icloud
Mailto: scusjs@foxmail
0xFF 单⽂件代码⼀览 (404⾏)
还是有同学习惯于⼀个页⾯得到⾃⼰需要的信息,不太喜欢跳转到Github (最近访问速度也不快)
也照顾到喜欢我以往的单⽂件实现风格的同学,还是在这贴⼀下吧。
# coding = utf8
# =====================================================
#  Copyright (C) 2016-2021 All rights reserved.
#
#  filename : CDSelector.py
#  author  : okcd00 / okcd00@qq
#  date    : 2020-09-22
#  desc    : UCAS Course_Selection Program
# =====================================================
import re
import os
import sys
import time
import requests
from bs4 import BeautifulSoup
from configparser import RawConfigParser
index_course ={
'910': u'数学','911': u'物理','957': u'天⽂','912': u'化学','928': u'材料',
'913': u'⽣命','914': u'地球','921': u'资环','951': u'计算','952': u'电⼦',
'958': u'⼯程','917': u'经管','945': u'公管','927': u'⼈⽂','964': u'马克',
'915': u'外语','954': u'中丹','955': u'国际','959': u'存济','946': u'体育',
'961': u'微电','962': u'未来','963': u'⽹络','968': u'⼼理','969': u'⼈⼯',
'970': u'纳⽶','971': u'艺术','972': u'光电','967': u'创新','973': u'核学',
'974': u'现代','975': u'化学','976': u'海洋','977': u'航空','979': u'杭州'
}
dept_ids_dict =dict([(v, k)for k, v in index_course.items()])
header_store =[
"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36",
"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.75.14 (KHTML, like Gecko) Versi
on/7.0.3 Safari/537.75.14",
"Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)",
'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11',
'Opera/9.25 (Windows NT 5.1; U; en)',
'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)',
'Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Kubuntu)',
'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20070731 Ubuntu/dapper-security Firefox/1.5.0.12',
'Lynx/2.8.5rel.1 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/1.2.9',
"Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.7 (KHTML, like Gecko) Ubuntu/11.04 Chromium/16.0.912.77 Chrome/16.0.912.77 Safari/535.7", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36",
"Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0 ",
'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
]
class UCASEvaluate:
def__init__(self):
self.__read_from_course_id('./courseid')
cf = RawConfigParser()
self.username = cf.get('info','username')
self.password = cf.get('info','password')
self.runtime = cf.getint('info','runtime')
self.debug = cf.getboolean('action','debug')
self.evaluate = cf.getboolean('action','evaluate')
self.select_bat = cf.getboolean('action','select_bat')
self.watch_logo = cf.getboolean('action','watch_logo')
self.loginPage ='sep.ucas.ac'
self.loginUrl = self.loginPage +'/slogin'
self.selectCourseUrl ='jwjz.ucas.ac/Student/DesktopModules/Course/SelectCourse.aspx'
self.headers ={
'Host':'jwxk.ucas.ac',
'Connection':'keep-alive',
# 'Pragma': 'no-cache',
# 'Cache-Control': 'no-cache',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0. 9',
'Upgrade-Insecure-Requests':'1',
'User-Agent': header_store[-5],
'Accept-Encoding':'gzip, deflate',
'Accept-Language':'zh-CN,zh;q=0.9,en;q=0.8,zh-TW;q=0.7',
}
# self.headers = None
self.s = requests.Session()
(self.loginPage, headers=self.headers)
def dump_check(self, response, page_name='check'):
if self.debug:
with open('./{}.html'.format(page_name),'wb+')as f:
text = place('href="/static','href="static')
text = place('src="/static','src="static')
f.de('utf-8'))
def dump_here(self, response):
self.dump_check(response,'here')
@staticmethod
def show_http_request(url, data):
request_str ='{}'.format(url)
if data is not None:
request_str +='?'
request_str +='&'.join(['{}={}'.format(k, v)for k, v in data.items()])
return request_str
def show_response(self, response, url="", data=None, description=""):
if200<=int(response.status_code)<300:
status_str ="Link Success"
else:温度计的原理
status_str ="Link failed with code {}".format(int(response.status_code))
print('[{}] {}'.format(description, status_str))
加班费基数
if self.debug:
print("\tReq as {}".format(self.show_http_request(url, data)))
print("\tView as {}".format(response.url))
print("\tCookie: {}".format(_dict()))
def update_headers_with_cookie(self):
self.headers.update({'Cookie':';'.join(['{}={}'.format(k, v)for k, v in kies.items()])})
def session_get(self, url, data=None, desc=""):
response = (
url=url, data=data, headers=self.headers)
self.show_response(
response, url, data, description=desc)
self.update_headers_with_cookie()
return response
def session_post(self, url, data=None, desc=""):
response = self.s.post(
url=url, data=data, headers=self.headers)
self.show_response(
response, url, data, description=desc)
self.update_headers_with_cookie()
return response
def login(self):
post_data ={
'userName': self.username,
'pwd': self.password,
'sb':'sb'
}
response = self.s.post(
self.loginUrl, data=post_data, headers=self.headers)
self.show_response(response, self.loginUrl, post_data,'Login')
if'sepuser'in _dict():
return True
return False
@staticmethod
def get_message(restext):
css_soup = BeautifulSoup(restext,'html.parser')
text = css_soup.select('#main-content > div > div.m-cbox.m-lgray > -body > div')[0].text return"".join(line.strip()for line in text.split('\n'))
def__read_from_course_id(self, filename):
courses_file =open(filename,'rb')
大开头的成语print('[Loading CourseID]')
for line in adlines():
if isinstance(line,bytes):
line = line.decode('utf-8')
line = line.strip().replace(' ','').split(':')开早餐店
course_dept = dept_(line[0][:2])
print(line[1], line[0][:2],'ID:', course_dept)
course_id = line[1]
is_degree =False
if len(line)==3and line[2]=='on':
is_degree =True
print("")
def enrollCourses(self):
response = self.session_get(
urseSystem, desc='SEP AppStore')
soup = ,'html.parser')
identity = re.findall(r'"jwxk.ucas.ac/login\?Identity=(.*)&roleId=[0-9]{2,4}"',
str(soup))[0]
print("[Obtain Identity]", identity)
try:

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。