python scrapy简单模拟登录的代码分析_Python

python scrapy简单模拟登录的代码分析

2021-12-14 00:19小妮浅浅 Python

在本篇文章里小编给大家整理的是一篇关于python scrapy简单模拟登录的代码分析，有兴趣的朋友们可以学习参考下。

1、requests模块。直接携带cookies请求页面。

找到url，发送post请求存储cookie。

2、selenium(浏览器自动处理cookie)。

找到相应的input标签，输入文本，点击登录。

3、scrapy直接带cookies。

找到url，发送post请求存储cookie。

				?

									# -*- coding: utf-8 -*-

									import scrapy

									import re

									class GithubLoginSpider(scrapy.Spider):

									    name = 'github_login'

									    allowed_domains = ['github.com']

									    start_urls = ['https://github.com/login']

									    def parse(self, response): # 发送Post请求获取Cookies

									        authenticity_token = response.xpath('//input[@name="authenticity_token"]/@value').extract_first()

									        utf8 = response.xpath('//input[@name="utf8"]/@value').extract_first()

									        commit = response.xpath('//input[@name="commit"]/@value').extract_first()

									        form_data = {

									            'login': 'pengjunlee@163.com',

									            'password': '123456',

									            'webauthn-support': 'supported',

									            'authenticity_token': authenticity_token,

									            'utf8': utf8,

									            'commit': commit}

									        yield scrapy.FormRequest("https://github.com/session", formdata=form_data, callback=self.after_login)

									    def after_login(self, response): # 验证是否请求成功

									        print(re.findall('Learn Git and GitHub without any code!', response.body.decode()))

知识点扩展：

parse_login方法是提交完表单后callback回调函数指定要执行的方法，为了验证是否成功。这里我们直接在response中搜索Welcome Liu这个字眼就证明登录成功。

这个好理解，重点是yield from super().start_resquests()，这个代表着如果一旦登录成功后，就直接带着登录成功后Cookie值，方法start_urls里面的地址。

这样的话登录成功后的response可以直接在parse里面写。

				?

									# -*- coding: utf-8 -*-

									import scrapy

									from scrapy import FormRequest,Request

									class ExampleLoginSpider(scrapy.Spider):

									    name = "login_"

									    allowed_domains = ["example.webscraping.com"]

									    start_urls = ['http://example.webscraping.com/user/profile']

									    login_url = 'http://example.webscraping.com/places/default/user/login'

									    def parse(self, response):

									        print(response.text)

									    def start_requests(self):

									        yield scrapy.Request(self.login_url,callback=self.login)

									    def login(self,response):

									        formdata = {

									            'email':'liushuo@webscraping.com','password':'12345678'}

									        yield FormRequest.from_response(response,formdata=formdata,

									                                        callback=self.parse_login)

									    def parse_login(self,response):

									        # print('>>>>>>>>'+response.text)

									        if 'Welcome Liu' in response.text:

									            yield from super().start_requests()