python爬虫经典例子(代码示例)-IT技术网站

python爬虫经典例子

python爬虫比如：首先导入爬虫的库，生成一个response目标；然后设置编码格局，并打印状况码；最终输出爬取的信息，代码为【print(response.text)】。
python爬虫比如：
1.爬取强壮的BD页面，打印页面信息#第一个爬虫示例,爬取百度页面
importrequests#导入爬虫的库，否则调用不了爬虫的函数
response=requests.get(“http://www.baidu.com”)#生成一个response目标
response.encoding=response.apparent_encoding#设置编码格局
print(“状况码:”+str(response.status_code))#打印状况码
print(response.text)#输出爬取的信息
点击并拖拽以移动
2.常用办法之get办法实例，下面还有传参实例#第二个get办法实例
importrequests#先导入爬虫的库，否则调用不了爬虫的函数
response=requests.get(“http://httpbin.org/get”)#get办法
print(response.status_code)#状况码
print(response.text)
点击并拖拽以移动
3.常用办法之post办法实例，下面还有传参实例#第三个post办法实例
importrequests#先导入爬虫的库，否则调用不了爬虫的函数
response=requests.post(“http://httpbin.org/post”)#post办法拜访
print(response.status_code)#状况码
print(response.text)
点击并拖拽以移动
4.put办法实例#第四个put办法实例
importrequests#先导入爬虫的库，否则调用不了爬虫的函数
response=requests.put(“http://httpbin.org/put”)#put办法拜访
print(response.status_code)#状况码
print(response.text)
点击并拖拽以移动
5.常用办法之get办法传参实例(1)
假如需求传多个参数只需求用&符号连接即可如下#第五个get传参办法实例
importrequests#先导入爬虫的库，否则调用不了爬虫的函数
response=requests.get(“http://httpbin.org/get?name=hezhi&age=20”)#get传参
print(response.status_code)#状况码
print(response.text)
点击并拖拽以移动
6.常用办法之get办法传参实例(2)
params用字典能够传多个#第六个get传参办法实例
importrequests#先导入爬虫的库，否则调用不了爬虫的函数
data={
“name”:”hezhi”,
“age”:20
}
response=requests.get(“http://httpbin.org/get”,params=data)#get传参
print(response.status_code)#状况码
print(response.text)
点击并拖拽以移动
7.常用办法之post办法传参实例(2)和上一个有没有很像#第七个post传参办法实例
importrequests#先导入爬虫的库，否则调用不了爬虫的函数
data={
“name”:”hezhi”,
“age”:20
}
response=requests.post(“http://httpbin.org/post”,params=data)#post传参
print(response.status_code)#状况码
print(response.text)
点击并拖拽以移动
8.关于绕过反爬机制，以zh爸爸为例#第好几个办法实例
importrequests#先导入爬虫的库，否则调用不了爬虫的函数
response=requests.get(“http://www.zhihu.com”)#第一次拜访知乎，不设置头部信息
print(“第一次,不设头部信息,状况码:”+response.status_code)#没写headers，不能正常爬取，状况码不是200
#下面是能够正常爬取的差异，更改了User-Agent字段
headers={
“User-Agent”:”Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/80.0.3987.122Safari/537.36″
}#设置头部信息,伪装浏览器
response=requests.get(“http://www.zhihu.com”,headers=headers)#get办法拜访,传入headers参数，
print(response.status_code)#200！拜访成功的状况码
print(response.text)
点击并拖拽以移动
9.爬取信息并保存到本地,
由于目录关系，在D盘建立了一个叫做爬虫的文件夹，然后保存信息
注意文件保存时的encoding设置#爬取一个html并保存
importrequests
url=”http://www.baidu.com”
response=requests.get(url)
response.encoding=”utf-8″#设置接收编码格局
print(“r的类型”+str(type(response)))
print(“状况码是:”+str(response.status_code))
print(“头部信息:”+str(response.headers))
print(“呼应内容:”)
print(response.text)
#保存文件
file=open(“D:\爬虫\baidu.html”,”w”,encoding=”utf”)#打开一个文件，w是文件不存在则新建一个文件，这里不必wb是由于不必保存成二进制
file.write(response.text)
file.close()
点击并拖拽以移动
10.爬取图片，保存到本地#保存百度图片到本地
importrequests#先导入爬虫的库，否则调用不了爬虫的函数
response=requests.get(“https://www.baidu.com/img/baidu_jgylogo3.gif”)#get办法的到图片呼应
file=open(“D:\爬虫\baidu_logo.gif”,”wb”)#打开一个文件,wb表明以二进制格局打开一个文件只用于写入
file.write(response.content)#写入文件

python爬虫经典代码示例

#第一个爬虫示例,爬取百度页面
importrequests#导入爬虫的库，不然调用不了爬虫的函数
response=requests.get(“http://www.baidu.com”)#生成一个response对象
response.encoding=response.apparent_encoding#设置编码格式
print(“状态码:”+str(response.status_code))#打印状态码
print(response.text)#输出爬取的信息
点击并拖拽以移动
2.常用方法之get方法实例，下面还有传参实例
#第二个get方法实例
importrequests#先导入爬虫的库，不然调用不了爬虫的函数
response=requests.get(“http://httpbin.org/get”)#get方法
print(response.status_code)#状态码
print(response.text)
点击并拖拽以移动
3.常用方法之post方法实例，下面还有传参实例
#第三个post方法实例
importrequests#先导入爬虫的库，不然调用不了爬虫的函数
response=requests.post(“http://httpbin.org/post”)#post方法访问
print(response.status_code)#状态码
print(response.text)
点击并拖拽以移动
4.put方法实例
#第四个put方法实例
importrequests#先导入爬虫的库，不然调用不了爬虫的函数
response=requests.put(“http://httpbin.org/put”)#put方法访问
print(response.status_code)#状态码
print(response.text)
点击并拖拽以移动
5.常用方法之get方法传参实例(1)
如果需要传多个参数只需要用&符号连接即可如下
#第五个get传参方法实例
importrequests#先导入爬虫的库，不然调用不了爬虫的函数
response=requests.get(“http://httpbin.org/get?name=hezhi&age=20”)#get传参
print(response.status_code)#状态码
print(response.text)
点击并拖拽以移动
6.常用方法之get方法传参实例(2)
params用字典可以传多个
#第六个get传参方法实例
importrequests#先导入爬虫的库，不然调用不了爬虫的函数
data={
“name”:”hezhi”,
“age”:20
}
response=requests.get(“http://httpbin.org/get”,params=data)#get传参
print(response.status_code)#状态码
print(response.text)
点击并拖拽以移动
7.常用方法之post方法传参实例(2)和上一个有没有很像
#第七个post传参方法实例
importrequests#先导入爬虫的库，不然调用不了爬虫的函数
data={
“name”:”hezhi”,
“age”:20
}
response=requests.post(“http://httpbin.org/post”,params=data)#post传参
print(response.status_code)#状态码
print(response.text)
点击并拖拽以移动
8.关于绕过反爬机制，以zh爸爸为例
#第好几个方法实例
importrequests#先导入爬虫的库，不然调用不了爬虫的函数
response=requests.get(“http://www.zhihu.com”)#第一次访问知乎，不设置头部信息
print(“第一次,不设头部信息,状态码:”+response.status_code)#没写headers，不能正常爬取，状态码不是200
#下面是可以正常爬取的区别，更改了User-Agent字段
headers={
“User-Agent”:”Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/80.0.3987.122Safari/537.36″
}#设置头部信息,伪装浏览器
response=requests.get(“http://www.zhihu.com”,headers=headers)#get方法访问,传入headers参数，
print(response.status_code)#200！访问成功的状态码
print(response.text)
点击并拖拽以移动
9.爬取信息并保存到本地,
因为目录关系，在D盘建立了一个叫做爬虫的文件夹，然后保存信息
注意文件保存时的encoding设置
#爬取一个html并保存
importrequests
url=”http://www.baidu.com”
response=requests.get(url)
response.encoding=”utf-8″#设置接收编码格式
print(“\nr的类型”+str(type(response)))
print(“\n状态码是:”+str(response.status_code))
print(“\n头部信息:”+str(response.headers))
print(“\n响应内容:”)
print(response.text)
#保存文件
file=open(“D:\\爬虫\\baidu.html”,”w”,encoding=”utf”)#打开一个文件，w是文件不存在则新建一个文件，这里不用wb是因为不用保存成二进制
file.write(response.text)
file.close()
点击并拖拽以移动
10.爬取图片，保存到本地
#保存百度图片到本地
importrequests#先导入爬虫的库，不然调用不了爬虫的函数
response=requests.get(“https://www.baidu.com/img/baidu_jgylogo3.gif”)#get方法的到图片响应
file=open(“D:\\爬虫\\baidu_logo.gif”,”wb”)#打开一个文件,wb表示以二进制格式打开一个文件只用于写入
file.write(response.content)#写入文件
file.close()#关闭操作，运行完毕后去你的目录看一眼有没有保存成功