六狼论坛

 找回密码
 立即注册

QQ登录

只需一步,快速开始

新浪微博账号登陆

只需一步,快速开始

搜索
查看: 172|回复: 0

使用jsoup解析百度“今日热门搜索排行榜”

[复制链接]

升级  28%

28

主题

28

主题

28

主题

秀才

Rank: 2

积分
92
 楼主| 发表于 2013-2-7 20:11:18 | 显示全部楼层 |阅读模式
百度“今日热门搜索排行榜”


javabean实体类,这个就不说了:
public class HotSearch implements Serializable {private String rank;//排名private String keyword;//关键词private String detailLink;//关键词链接地址private String trend;//趋势private String todaySearchCount;//今日搜索private String lastSevenDaysSearchCount;//最近七日private String newsLink;//新闻链接地址private String postBarLink;//贴吧链接地址@Overridepublic String toString() {return rank + ":" + keyword;}

主要看这里:
public static void main(String[] args) throws IOException {String connectUrl = "http://top.baidu.com/buzz.php?p=top_keyword";Document doc = Jsoup.connect(connectUrl).get();System.out.println(doc.title());Elements cellspacingEles = doc.select("table[cellspacing=0]");Elements tbodyEles = cellspacingEles.select("tbody");Elements trEles = tbodyEles.select("tr");List<HotSearch> hotSearchList = new ArrayList<HotSearch>();for (Element trEle : trEles) {if (!trEle.attr("class").equals("th")) {HotSearch hotSearch = new HotSearch();hotSearch.setRank(trEle.select("th").text());Elements aEles = trEle.select("a");hotSearch.setKeyword(aEles.get(0).text());hotSearch.setDetailLink("http://top.baidu.com/"+ aEles.get(0).attr("href"));Elements spanEles = trEle.select("span[class]");String spanAttrClass = spanEles.first().attr("class");if ("trend down".equals(spanAttrClass)) {hotSearch.setTrend("down");} else if ("trend rise".equals(spanAttrClass)) {hotSearch.setTrend("rise");}Elements tdEles = trEle.select("td");hotSearch.setTodaySearchCount(tdEles.get(3).text());hotSearch.setLastSevenDaysSearchCount(tdEles.get(4).text());hotSearch.setNewsLink(aEles.get(2).attr("href"));hotSearch.setPostBarLink(aEles.get(3).attr("href"));hotSearchList.add(hotSearch);}}sysPrint(hotSearchList);}private static void sysPrint(List<HotSearch> hotSearchList) {if (hotSearchList == null || hotSearchList.size() == 0) {return;}for (HotSearch hotSearch : hotSearchList) {StringBuffer sb = new StringBuffer();sb.append("[排名:");sb.append(hotSearch.getRank());sb.append(",关键词:");sb.append(hotSearch.getKeyword());sb.append(",详细链接:");sb.append(hotSearch.getDetailLink());sb.append(",趋势:");sb.append(hotSearch.getTrend());sb.append(",今日搜索");sb.append(hotSearch.getTodaySearchCount());sb.append(",最近七日");sb.append(hotSearch.getLastSevenDaysSearchCount());sb.append(",新闻链接:");sb.append(hotSearch.getNewsLink());sb.append(",贴吧链接:");sb.append(hotSearch.getPostBarLink());sb.append("]");System.out.println(sb.toString());}}
输出结果如下(部分):
今日热门搜索排行榜--百度搜索风云榜[排名:1,关键词:淘宝网,详细链接:http://top.baidu.com/detail.php?b=2&w=%CC%D4%B1%A6%CD%F8,趋势:down,今日搜索1015022,最近七日6938243,新闻链接:http://news.baidu.com/ns?tn=news&from=news&cl=2&rn=20&ct=0&word=%CC%D4%B1%A6%CD%F8,贴吧链接:http://tieba.baidu.com/f?kw=%CC%D4%B1%A6%CD%F8][排名:2,关键词:民兵葛二蛋,详细链接:http://top.baidu.com/detail.php?b=2&w=%C3%F1%B1%F8%B8%F0%B6%FE%B5%B0,趋势:rise,今日搜索489299,最近七日1980288,新闻链接:http://news.baidu.com/ns?tn=news&from=news&cl=2&rn=20&ct=0&word=%C3%F1%B1%F8%B8%F0%B6%FE%B5%B0,贴吧链接:http://tieba.baidu.com/f?kw=%C3%F1%B1%F8%B8%F0%B6%FE%B5%B0][排名:3,关键词:NBA,详细链接:http://top.baidu.com/detail.php?b=2&w=nba,趋势:rise,今日搜索471288,最近七日3004468,新闻链接:http://news.baidu.com/ns?tn=news&from=news&cl=2&rn=20&ct=0&word=nba,贴吧链接:http://tieba.baidu.com/f?kw=nba][排名:4,关键词:qq空间,详细链接:http://top.baidu.com/detail.php?b=2&w=qq%BF%D5%BC%E4,趋势:down,今日搜索394228,最近七日2619448,新闻链接:http://news.baidu.com/ns?tn=news&from=news&cl=2&rn=20&ct=0&word=qq%BF%D5%BC%E4,贴吧链接:http://tieba.baidu.com/f?kw=qq%BF%D5%BC%E4][排名:5,关键词:泰�,详细链接:http://top.baidu.com/detail.php?b=2&w=%CC%A9%87%E5,趋势:down,今日搜索388803,最近七日2161700,新闻链接:http://news.baidu.com/ns?tn=news&from=news&cl=2&rn=20&ct=0&word=%CC%A9%87%E5,贴吧链接:http://tieba.baidu.com/f?kw=%CC%A9%87%E5][排名:6,关键词:京东商城,详细链接:http://top.baidu.com/detail.php?b=2&w=%BE%A9%B6%AB%C9%CC%B3%C7,趋势:down,今日搜索365121,最近七日2492075,新闻链接:http://news.baidu.com/ns?tn=news&from=news&cl=2&rn=20&ct=0&word=%BE%A9%B6%AB%C9%CC%B3%C7,贴吧链接:http://tieba.baidu.com/f?kw=%BE%A9%B6%AB%C9%CC%B3%C7][排名:7,关键词:优酷,详细链接:http://top.baidu.com/detail.php?b=2&w=%D3%C5%BF%E1,趋势:down,今日搜索335415,最近七日2500742,新闻链接:http://news.baidu.com/ns?tn=news&from=news&cl=2&rn=20&ct=0&word=%D3%C5%BF%E1,贴吧链接:http://tieba.baidu.com/f?kw=%D3%C5%BF%E1][排名:8,关键词:隋唐英雄,详细链接:http://top.baidu.com/detail.php?b=2&w=%CB%E5%CC%C6%D3%A2%D0%DB,趋势:rise,今日搜索332192,最近七日2403616,新闻链接:http://news.baidu.com/ns?tn=news&from=news&cl=2&rn=20&ct=0&word=%CB%E5%CC%C6%D3%A2%D0%DB,贴吧链接:http://tieba.baidu.com/f?kw=%CB%E5%CC%C6%D3%A2%D0%DB][排名:9,关键词:新浪微博,详细链接:http://top.baidu.com/detail.php?b=2&w=%D0%C2%C0%CB%CE%A2%B2%A9,趋势:down,今日搜索303496,最近七日2018841,新闻链接:http://news.baidu.com/ns?tn=news&from=news&cl=2&rn=20&ct=0&word=%D0%C2%C0%CB%CE%A2%B2%A9,贴吧链接:http://tieba.baidu.com/f?kw=%D0%C2%C0%CB%CE%A2%B2%A9][排名:10,关键词:163,详细链接:http://top.baidu.com/detail.php?b=2&w=163,趋势:down,今日搜索270041,最近七日1612133,新闻链接:http://news.baidu.com/ns?tn=news&from=news&cl=2&rn=20&ct=0&word=163,贴吧链接:http://tieba.baidu.com/f?kw=163]
您需要登录后才可以回帖 登录 | 立即注册 新浪微博账号登陆

本版积分规则

快速回复 返回顶部 返回列表