美图云海网站建设,新征程启航

为企业提供网站建设、域名注册、服务器等服务

Python常用爬虫代码总结方便查询-创新互联

beautifulsoup解析页面

创新互联公司主营南岔网站建设的网络公司,主营网站建设方案,重庆APP软件开发,南岔h5微信小程序搭建,南岔网站营销推广欢迎南岔等地区企业咨询
from bs4 import BeautifulSoup
soup = BeautifulSoup(htmltxt, "lxml")
# 三种装载器
soup = BeautifulSoup("

", "html.parser") ### 只有起始标签的会自动补全,只有结束标签的会自动忽略 ### 结果为:
soup = BeautifulSoup("

", "lxml") ### 结果为:
soup = BeautifulSoup("

", "html5lib") ### html5lib则出现一般的标签都会自动补全 ### 结果为:

# 根据标签名、id、class、属性等查找标签 ### 根据class、id、以及属性alog-action的值和标签类别查询 soup.find("a",class_="title",id="t1",attrs={"alog-action": "qb-ask-uname"})) ### 查询标签内某属性的值 pubtime = soup.find("meta",attrs={"itemprop":"datePublished"}).attrs['content'] ### 获取所有class为title的标签 for i in soup.find_all(class_="title"): print(i.get_text()) ### 获取特定数量的class为title的标签 for i in soup.find_all(class_="title",limit = 2): print(i.get_text()) ### 获取文本内容时可以指定不同标签之间的分隔符,也可以选择是否去掉前后的空白。 soup = BeautifulSoup('

The Dormouses story

The Dormouses story

', "html5lib") soup.find(class_="title").get_text("|", strip=True) #结果为:The Dormouses story|The Dormouses story ### 获取class为title的p标签的id soup.find(class_="title").get("id") ### 对class名称正则: soup.find_all(class_=re.compile("tit")) ### recursive参数,recursive=False时,只find当前标签的第一级子标签的数据 soup = BeautifulSoup('abc','lxml') soup.html.find_all("title", recursive=False)</pre> <br> 文章标题:Python常用爬虫代码总结方便查询-创新互联 <br> 文章来源:<a href="http://tjjierui.cn/article/hoihh.html">http://tjjierui.cn/article/hoihh.html</a> </div> </div> <div class="othernews"> <h3>其他资讯</h3> <div class="othernews_list"> <ul> <li> <a href="/article/giesjc.html">硬件基本部件的计算机为什么称为第四代计算机</a> </li><li> <a href="/article/gieedh.html">学习jQuery中的noConflict()用法</a> </li><li> <a href="/article/gieedj.html">Python支持哪些图形界面的第三方库</a> </li><li> <a href="/article/gieeso.html">Oracle数据库中怎么创建字段约束</a> </li><li> <a href="/article/gieeod.html">centos7硬盘三步操作(分区、格式化、挂载)</a> </li> </ul> </div> </div> </div> </div> <div class="footer"> <div class="footer_content"> <div class="footer_content_top clear"> <div class="content_top_share fl"> <div><img src="/Public/Home/img/logo.png"></div> <div class="top_share_content"> <dd>分享至:</dd> <dt class="bdsharebuttonbox clear" id="share"> <a href="#" class="bds_tsina iconfont fl" data-cmd="tsina" title="分享到新浪微博"></a> <a href="#" class="bds_sqq iconfont fl" data-cmd="sqq" title="分享到QQ好友"></a> <a href="#" class="bds_weixin iconfont fl" data-cmd="weixin" title="分享到微信"></a> <a href="#" class="bds_weixin iconfont fl" data-cmd="tieba" title="分享到贴吧"></a> </dt> <script>window._bd_share_config={"common":{"bdSnsKey":{},"bdText":"","bdMini":"2","bdMiniList":false,"bdPic":"","bdStyle":"0","bdSize":"16"},"share":{}};with(document)0[(getElementsByTagName('head')[0]||body).appendChild(createElement('script')).src='http://bdimg.share.baidu.com/static/api/js/share.js?v=89860593.js?cdnversion='+~(-new Date()/36e5)];</script> </div> </div> <div class="content_top_left fl clear"> <div class="top_left_list fl"> <dd><a href="/about/">关于我们</a></dd> <dt> <a href="/about/#gsjj">公司简介</a> <a href="/about/#fzlc">发展历程</a> </dt> </div> <div class="top_left_list fl"> <dd><a href="/service/">服务项目</a></dd> <dt> <a href="/service/">高端网站建设</a> <a href="/miniprogram/">小程序开发</a> <a href="/service/app.html">APP开发</a> <a href="/service/yingxiao.html">网络营销</a> </dt> </div> <div class="top_left_list fl"> <dd><a href="/jianzhan/">建站知识</a></dd> <dt> <a href="/jianzhan/">行业新闻</a> <a href="/jianzhan/">建站学堂</a> <a href="/jianzhan/">常见问题</a> </dt> </div> <div class="top_left_list fl"> <dd><a href="/contact/">联系我们</a></dd> <dt> <a href="/contact/#lxwm">公司地址</a> <a href="/contact/#rczp">人才招聘</a> </dt> </div> </div> <div class="content_top_right addressR fr"> <div class="top_right_title addressf_title"> <a href="javascript:;" class="on">成都</a> <a href="javascript:;">什邡</a> </div> <div class="top_right_content addressf"> <div class="right_content_li on"> <div class="right_content_list clear"> <dd class="fl iconfont"></dd> <dt class="fl">电话:028-86922220</dt> </div> <div class="right_content_list clear"> <dd class="fl iconfont"></dd> <dt class="fl">地址:成都市太升南路288号锦天国际A幢1002号</dt> </div> </div> <div class="right_content_li"> <div class="right_content_list clear"> <dd class="fl iconfont"></dd> <dt class="fl">电话:028-86922220</dt> </div> <div class="right_content_list clear"> <dd class="fl iconfont"></dd> <dt class="fl">地址:德阳市什邡市顺河路物华南苑B区</dt> </div> </div> </div> </div> </div> <div class="link"> 友情链接: <a href="http://www.cxhljz.cn/" title="成都网站设计" target="_blank">成都网站设计</a>   <a href="http://m.cdcxhl.cn/qiye/ " title="企业网站建设" target="_blank">企业网站建设</a>   <a href="http://chengdu.cdcxhl.com/" title="成都网站制作" target="_blank">成都网站制作</a>   <a href="http://www.dlganxi.com/" title="公司网站维护" target="_blank">公司网站维护</a>   <a href="http://www.cdxwcx.cn/tuoguan/meishan.html" title="眉山托管服务器" target="_blank">眉山托管服务器</a>   <a href="http://www.cdcxycgs.com/" title="成都发电机租赁" target="_blank">成都发电机租赁</a>   <a href="http://www.cdxwcx.cn/seo/" title="网络营销推广" target="_blank">网络营销推广</a>   <a href="http://www.cdxwcx.cn/tuoguan/" title="成都主机托管" target="_blank">成都主机托管</a>   <a href="http://www.scdkjj.com/" title="scdkjj.com" target="_blank">scdkjj.com</a>   <a href="https://www.cdcxhl.cn/ " title="云虚拟主机" target="_blank">云虚拟主机</a>    </div> </div> <div class="footer_content_copyright clear">版权所有:青羊区美图云海设计工作室(个体工商户) <a href="http://beian.miit.gov.cn/" rel="nofollow" target="_blank">蜀ICP备2025119305号-5</a> </div> </div> <!--浮窗--> <div class="FloatingWindow clear"> <a href="tencent://message/?uin=1683211881&Site=&Menu=yes" class="FloatingWindow_list fr"> <div class="FloatingWindow_list_title"> <dd class="iconfont"></dd> <dt><span>在线</span>咨询</dt> </div> </a> <a href="javascript:;" class="FloatingWindow_list fr"> <div class="FloatingWindow_list_title"> <dd class="iconfont"></dd> <dt>服务热线</dt> </div> <div class="FloatingWindow_list_down fadeInRight animated">服务热线:028-86922220</div> </a> <a href="javascript:;" class="FloatingWindow_list fr STop"> <div class="FloatingWindow_list_title"> <dd class="iconfont"></dd> <dt>TOP</dt> </div> </a> </div> <script src="/Public/Home/js/jquery-1.8.3.min.js"></script> <script src="/Public/Home/js/comm.js"></script> <script src="/Public/Home/js/wow.js"></script> <script src="/Public/Home/js/common.js"></script> </body> </html> <script> $(".cont img").each(function(){ var src = $(this).attr("src"); //获取图片地址 var str=new RegExp("http"); var result=str.test(src); if(result==false){ var url = "https://www.cdcxhl.com"+src; //绝对路径 $(this).attr("src",url); } }); window.onload=function(){ document.oncontextmenu=function(){ return false; } } </script>