引言

最近发现了一个有意思的网站，里面充斥了大量的舔狗箴言。作为一个爬虫发烧友怎么能错过此等机会，咱们直接就是上才艺！

类的编写

本次爬虫使用了多协程的方案进行，保证了爬虫的速度。在这里我们新建一个爬虫类，并在里边添加上我们需要的方法。

网页的获取

首先是网页的获取部分，我们在这里主要使用了多协程，代码如下：

# 定义多协程函数 async def page_get(self): # 利用with创建协程会话 async with aiohttp.ClientSession() as session: # 再次利用循环的会话来进行网页源代码的获取 async with session.get(url=self.url) as resp: html = await resp.text() # 调用网页解析的函数 result = self.page_parse(html) # 写入文件 with open('tiangou.txt', mode='a+', encoding='utf-8') as f: if result + '\n' not in f.readlines(): f.write(result + '\n')

网页解析部分

在本函数中我们调用了BeautifulSoup解析网站的内容，然后返回页面的内容，代码如下：

def page_parse(self, html):soup = BeautifulSoup(html, 'html.parser')result = soup.find('article')return result.text

类的主方法

在类的主方法中我们创建了协程循环，并提交任务给协程。与此同时，我们还能够根据类的属性来决定获取多少句子。

def run(self):try:loop = asyncio.get_event_loop()tasks = [self.page_get() for i in range(self.num)]loop.run_until_complete(asyncio.wait(tasks))except Exception as e:print(e)

全部代码与运行演示

基于上述内容，全部代码如下所示：

import asynciofrom bs4 import BeautifulSoupimport aiohttpfrom random import choiceHEADERS = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 ''Safari/537.36 '}URL = 'https://www.nihaowua.com/dog.html'class crawl_dog():def __init__(self, url, headers, num=100):self.url = urlself.headers = headersself.num = numasync def page_get(self):async with aiohttp.ClientSession() as session:async with session.get(url=self.url) as resp:html = await resp.text()result = self.page_parse(html)with open('tiangou.txt', mode='a+', encoding='utf-8') as f:if result + '\n' not in f.readlines():f.write(result + '\n')def page_parse(self, html):soup = BeautifulSoup(html, 'html.parser')result = soup.find('article')return result.textdef run(self):try:loop = asyncio.get_event_loop()tasks = [self.page_get() for i in range(self.num)]loop.run_until_complete(asyncio.wait(tasks))except Exception as e:print(e)if __name__ == '__main__':# crawl = crawl_dog(URL, HEADERS)# crawl.run()with open('tiangou.txt', mode='r', encoding='utf-8') as f:result = f.readlines()print(choice(result))

在主函数中，新建了类并调用了run方法来进行抓取。我们随机抽几句来尝尝鲜：

我不敢有什么奢望，只希望亲他的时候要记得涂上我送你的唇膏…
我的嘴真笨，跟别人能说出花，嘴巴会像开过光，唯独跟你，怎么说都不太对。每天都要看很多遍微博，你稳居我微博经常访问第一的宝座，有什么好玩的都想分享给你只为逗你一笑。你的抑郁你的不快我都看在眼里急在心头，我想默默陪着你让你开心。天快亮了，又一包烟抽完。你是我最孤独的心事，能不能偶尔低下头看看我。
问你在干嘛，你问我能不能别烦你。居然询问我的意见，态度真好，真喜欢你！

妈妈再也不同担心我没有舔狗语录喽！