<small id='WHP5K'></small> <noframes id='Q7XNiI'>

  • <tfoot id='SB9fIaYRdH'></tfoot>

      <legend id='TzgPWVaC'><style id='dIpPeBHD9J'><dir id='xLnIBZ'><q id='XGxY'></q></dir></style></legend>
      <i id='movzXb'><tr id='1pVgOjBa'><dt id='o81aGwKvjp'><q id='2O1a'><span id='updcev09'><b id='JeopzEtH'><form id='9OCWkM0'><ins id='hTR28v'></ins><ul id='RT2A1C5xhb'></ul><sub id='FIVS'></sub></form><legend id='yJnVtSC0u'></legend><bdo id='fmEtgkDv'><pre id='4KsCO'><center id='SigN0K'></center></pre></bdo></b><th id='T4JcQoiE'></th></span></q></dt></tr></i><div id='pWj70E'><tfoot id='RB3TJg'></tfoot><dl id='eixbEaD'><fieldset id='tXcxdE5H'></fieldset></dl></div>

          <bdo id='irIUAL'></bdo><ul id='QqLnufFysh'></ul>

          1. <li id='nhJWGC2je'></li>
            登陆

            python3经过requests,BeautifulSoup库设置通用爬虫结构

            admin 2020-02-14 290人围观 ,发现0个评论

            代码如下:

            #本代码是在python3.7环境下,设置通用爬虫结构,只需要输入参数python3经过requests,BeautifulSoup库设置通用爬虫结构url即可
            #1、先导入相应的模块
            import requests
            from bs4 import BeautifulSoup
            import random
            #2、界说爬虫的函数
            def gethtml(url): #界说获取网页源代码函数,参数为url
            try:
            agent1 = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0" # 设置agent1恳求头
            agent2 = "Mozilla/5.0 (Windows NTpython3经过requests,BeautifulSoup库设置通用爬虫结构 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" python3经过requests,BeautifulSoup库设置通用爬虫结构# 设置agent2恳求头
            agent3 = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36" # 设置agent2恳求头
            list1=[agent1,agent2,agent3] #界说列表list1为agent1,age快递查询自动识别nt2,agent3
            agent=random.choice(list1) #从list1里随机挑选署理并赋值给agent
            headers={"User-Agent":agent} #设置网页恳求头heapython3经过requests,BeautifulSoup库设置通用爬虫结构ders的值为agent
            response=requests.get(url,headers=headers,timeout=1) #设置恳求网址为url,恳求头为headers,呼应超时时刻为1秒
            response.encoding=response.apparent_encoding #依据网页内容解分出编码格局并赋值给response.encoding
            html=response.text #将相应的信息赋值给html
            soup=BeautifulSoup(html,"html.parser") #将网页进行规范解析并赋值给soup
            print(soup) #打印解析后的网页源代码
            except Exception: #假如接收到错误时
            print("报错") #打印报错
            #3、调用函数,设置恳求网址为淘宝网
            if __name__ == '__main__':
            url = "https://www.taobao.com" #设置恳求网站为淘宝网
            gethtml(url) #运转gethtml获取源代码函数

            代码运转成果如下图所示:

            请关注微信公众号
            微信二维码
            不容错过
            Powered By Z-BlogPHP