用urllib.request获取html内容,再用BeautifulSoup提取其中的数据,完成一次简单的爬取。getone.find_all获取 a.mnav标签如图
from urllib.request import urlopen
from bs4 import BeautifulSoup
html=urlopen(‘http://www.baidu.com’)
getone=BeautifulSoup(html.read(),’html.parser’)
test_list=getone.find_all(‘a’, ‘mnav’)
for test in test_list:
print (test.get_text())
html.close()

Leave a Reply