jsoup 的选择器一览表[转]
File input = new File("/tmp/input.html"); Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/"); Elements links = doc.select("a"); // a with href Elements pngs = doc.select("img"); // img with src ending .png Element masthead = doc.select("div.masthead").first(); // div with class=masthead Elements resultLinks = doc.select("h3.r > a"); // direct a after h3下面是 jsoup 所支持的选择器列表:
Selector overview
tagname: find elements by tag, e.g. a
ns|tag: find elements by tag in a namespace, e.g. fb|name finds <fb:name> elements
#id: find elements by ID, e.g. #logo
.class: find elements by class name, e.g. .masthead
: elements with attribute, e.g.
[^attr]: elements with an attribute name prefix, e.g. [^data-] finds elements with HTML5 dataset attributes
: elements with attribute value, e.g.
, , : elements with attributes that start with, end with, or contain the value, e.g.
: elements that have the attribute key, that its value matches the supplied regular expression; e.g. img
*: all elements, e.g. *
Selector combinations
el#id: elements with ID, e.g. div#logo
el.class: elements with class, e.g. div.masthead
el: elements with attribute, e.g. a
Any combination, e.g. a.highlight
ancestor child: child elements that descend from ancestor, e.g. .body p finds p elements anywhere under a block with class "body"
parent > child: child elements that descend directly from parent, e.g. div.content > p finds p elements; and body > * finds the direct children of the body tag
siblingA + siblingB: finds sibling B element immediately preceded by sibling A, e.g. div.head + div
siblingA ~ siblingX: finds sibling X element preceded by sibling A, e.g. h1 ~ p
el, el, el: group multiple selectors, find unique elements that match any of the selectors; e.g. div.masthead, div.logo
Pseudo selectors
el:lt(n): find elements whose sibling index (i.e. its position in the DOM tree relative to its parent) is less than n; e.g. td:lt(3)
el:gt(n): find elements whose sibling index is greater than n; e.g. div p:gt(2)
el:eq(n): find elements whose sibling index is equal to n; e.g. form input:eq(1)
el:has(seletor): find elements that contain elements matching the selector; e.g. div:has(p)
el:contains(text): find elements that contain the given text. The search is case-insensitive; e.g. p:contains(jsoup)
el:matches(regex): find elements whose text matches the specified regular expression; e.g. div:matches((?i)login).
Note that all of the above indexed pseudo-selectors are 0-based, that is, the first element is at index 0, the second at 1, etc.
资源来自:http://www.oschina.net/bbs/thread/10224
页:
[1]