六狼论坛

 找回密码
 立即注册

QQ登录

只需一步,快速开始

新浪微博账号登陆

只需一步,快速开始

搜索
查看: 110|回复: 0

lucene-处理中文PDF的xpdf

[复制链接]

升级  94.2%

305

主题

305

主题

305

主题

进士

Rank: 4

积分
971
 楼主| 发表于 2013-1-16 16:43:41 | 显示全部楼层 |阅读模式
简单处理中文的方式是xpdf
http://www.foolabs.com/xpdf/home.html
2、
Xpdf is an open source viewer for Portable Document Format (PDF) files. (These are also sometimes also called 'Acrobat' files, from the name of Adobe's PDF software.) The Xpdf project also includes a PDF text extractor, PDF-to-PostScript converter, and various other utilities.
Xpdf runs under the X Window System on UNIX, VMS, and OS/2. The non-X components (pdftops, pdftotext, etc.) also run on Win32 systems and should run on pretty much any system with a decent C++ compilernaries are available for the following machines:
Precompiled binaries are available for the following machines:
3、将PDF文档转化为TXT,使用XPDF带的pdftotext程序转化,这是一个独立于lucene外的软件。
您需要登录后才可以回帖 登录 | 立即注册 新浪微博账号登陆

本版积分规则

快速回复 返回顶部 返回列表