六狼论坛

 找回密码
 立即注册

QQ登录

只需一步,快速开始

新浪微博账号登陆

只需一步,快速开始

搜索
查看: 140|回复: 0

java 正则表达式过滤html元素

[复制链接]

升级  30%

3

主题

3

主题

3

主题

童生

Rank: 1

积分
15
 楼主| 发表于 2013-2-7 19:13:15 | 显示全部楼层 |阅读模式
/**
      * filter all html element.
      * For example:<a href="www.sohu.com/test">hello!</a>
      * The filter result is :hello!
      * Notice:This method filter the text between "<" and ">"
      * @param element
      * @return
      */
     public static String getTxtWithoutHTMLElement (String element)
     {
//       String reg="<[^<|^>]+>";
//       return   element.replaceAll(reg,"");
       
         if(null==element||"".equals(element.trim()))
         {
             return element;
         }
         Pattern pattern=Pattern.compile("<[^<|^>]*>");
         Matcher matcher=pattern.matcher(element);
         StringBuffer txt=new StringBuffer();
         while(matcher.find())
         {
             String group=matcher.group();
             if(group.matches("<[\\s]*>"))
             {
                 matcher.appendReplacement(txt,group);   
             }
             else
             {
                 matcher.appendReplacement(txt,"");
             }
         }
         matcher.appendTail(txt);
         repaceEntities(txt,"&","&");
         repaceEntities(txt,"<","<");       
         repaceEntities(txt,">",">");
         repaceEntities(txt,""","\"");
         repaceEntities(txt," ","");       
         return txt.toString();
     }
private static void repaceEntities ( StringBuffer txt,String entity,String replace)
     {
         int pos=-1;
         while(-1!=(pos=txt.indexOf(entity)))
         {
             txt.replace(pos,pos+entity.length(),replace);
         }
     }
 
下面是测试用例:
public void testGetTxtWithoutHTMLElement ()
     {
       
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<a href='a/test'>test</a>"));
       
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<a href='a/test'>test"));
       
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<input type='text'>test</input>"));
       
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<p>test"));
       
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<table><tr><td>test</td></tr></table>"));
       
         assertEquals("te<st",ExcelHssfView.getTxtWithoutHTMLElement("<p>te<st"));
       
         assertEquals("te>st",ExcelHssfView.getTxtWithoutHTMLElement("<p>te>st"));
       
         assertEquals("tst",ExcelHssfView.getTxtWithoutHTMLElement("<p>t<e>st"));
       
         assertEquals("t<st",ExcelHssfView.getTxtWithoutHTMLElement("<p>t<<e>st"));
       
         assertEquals("<>test",ExcelHssfView.getTxtWithoutHTMLElement("<p><>test"));
       
         assertEquals("< >test",ExcelHssfView.getTxtWithoutHTMLElement("<p>< >test"));
       
         assertEquals("<<>test",ExcelHssfView.getTxtWithoutHTMLElement("<p><<>test"));
       
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<table><tr><td> test</td></tr></table>"));
       
     }
您需要登录后才可以回帖 登录 | 立即注册 新浪微博账号登陆

本版积分规则

快速回复 返回顶部 返回列表