六狼论坛

 找回密码
 立即注册

QQ登录

只需一步,快速开始

新浪微博账号登陆

只需一步,快速开始

搜索
查看: 1030|回复: 0

在phpcms中应用sphinx全文索引[性能测试中]

[复制链接]
 楼主| 发表于 2016-2-25 15:24:15 | 显示全部楼层 |阅读模式
在phpcms中应用sphinx全文索引[性能测试中]
Sphinx is a full-text search engine,The latest stable release is 0.9.9-release.
Sphinx features
    * high indexing speed (upto 10 MB/sec on modern CPUs);
    * high search speed (avg query is under 0.1 sec on 2-4 GB text collections);
    * high scalability (upto 100 GB of text, upto 100 M documents on a single CPU);
    * ....
英文介绍:http://www.sphinxsearch.com/docs/manual-0.9.9.html

一、首先需要在服务器上安装sphinx
在Windows上安装sphinx
    1.下载支持mysql的包  http://www.sphinxsearch.com/downloads/sphinx-0.9.9-win32.zip
    2.解压缩 sphinx-0.9.9-win32.zip 到 D:sphinx
    3.安装sphinx服务,在命令行执行命令D:sphinxsearchd --install --config d:sphinxsphinx.conf --servicename SphinxSearch
    英文参照:http://www.sphinxsearch.com/docs ... #installing-windows

在Linux服务器上安装sphinx
   1.下载源码包 http://www.sphinxsearch.com/downloads/sphinx-0.9.9.tar.gz
  1. $ tar xzvf sphinx-0.9.8.tar.gz
  2. $ cd sphinx
  3. $ ./configure --prefix=/usr/local/sphinx --with-mysql=/usr/local/mysql
  4. $ make
  5. $ make install
复制代码
sphinx.conf样例
  1. source main
  2. {
  3. type = mysql

  4. sql_host = 10.228.129.199 #主机地址
  5. sql_user = admin #用户名
  6. sql_pass = admin #密码
  7. sql_db = demo #数据库名
  8. sql_port = 3306 # 端口, default is 3306

  9. sql_query_pre = SET NAMES utf8
  10. sql_query_pre = REPLACE INTO phpcms_counter SELECT 1, MAX(searchid) FROM phpcms_search
  11. sql_query = SELECT searchid, type, data FROM phpcms_search
  12.       WHERE searchid>=$start AND searchid<=$end
  13. sql_query_range  = SELECT 1,max_doc_id FROM phpcms_counter WHERE counter_id=1
  14. sql_range_step = 5000
  15. sql_query_info = SELECT * FROM main2008_search WHERE searchid=$id
  16. }

  17. source delta : main
  18. {
  19. sql_query_pre = SET NAMES utf8
  20. sql_query = SELECT searchid, type, data FROM phpcms_search
  21. WHERE searchid >( SELECT max_doc_id FROM phpcms_counter WHERE counter_id=1 )
  22. }

  23. index main
  24. {
  25. source = main
  26. # 放索引的目录
  27. path = D:sphinxdatamain #主索引路径
  28. # 编码
  29. charset_type = utf-8
  30. # 指定utf-8的编码表
  31. charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
  32. # 简单分词,只支持0和1,如果要搜索中文,请指定为1
  33. ngram_len = 1
  34. # 需要分词的字符,如果要搜索中文,去掉前面的注释
  35. ngram_chars = U+3000..U+2FA1F
  36. }

  37. index delta : main
  38. {
  39. source = delta
  40. path = D:sphinxdatadelta #从索引(暂时这么理解吧)路径
  41. }

  42. indexer
  43. {
  44. mem_limit = 128M #索引占用内存
  45. }

  46. searchd
  47. {
  48. port = 9312
  49. log = D:sphinxdataphpcmssearchd.log #服务日志路径
  50. query_log = D:sphinxdataphpcmsquery.log #查询日志路径
  51. read_timeout = 5
  52. max_children = 30
  53. pid_file = D:sphinxdataphpcmssearchd.pid
  54. max_matches = 1000
  55. seamless_rotate = 0
  56. preopen_indexes = 0
  57. unlink_old = 1
  58. }
复制代码
二、升级phpcms search模块
    下载升级包直接覆盖search模块目录
    下载地址:search.zip(16.39 KB, 下载次数: 522)
     进入后台配置全文检索
     
创建数据表
  1. CREATE TABLE `phpcms_counter` (
  2. `counter_id` INT(11) NOT NULL,
  3. `max_doc_id` INT(11) NOT NULL,
  4. PRIMARY KEY (`counter_id`)
  5. ) ENGINE=MYISAM DEFAULT CHARSET=gbk
复制代码
三、设置计划任务更新索引
1.windows下
需要设置计划任务
#凌晨4点合并索引,执行merge.bat
#其余时间每分钟更新索引,执行delta.bat
merge.bat
  1.     @ECHO off
  2.     D:\sphinx\bin\indexer.exe --config D:\sphinx\sphinx.conf --merge main delta --rotate
  3.     echo indexing, window will close when complete

  4. 复制代码
复制代码
delta.bat
  1.     @ECHO off
  2.     D:\sphinx\bin\indexer.exe --config D:\sphinx\sphinx.conf delta --rotate
  3.     echo indexing, window will close when complete
复制代码
2.linux下编辑定时任务 crontab -e
  1.     #凌晨4点合并索引,其余时间每分钟更新索引
  2.     * 0-3 * * * /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf delta --rotate
  3.     * 6-23 * * * /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf delta --rotate
  4.     0 4 * * * /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf --merge main delta --rotate
复制代码
注意:升级前请注意备份文件,避免意外。

各种路径、权限需要应用所在服务器一致
如:
sphinx.conf 中需要配置
sql_host
sql_user
sql_pass
sql_db
sql_port
phpcms表前缀样例中为phpcms_
索引路径 D:\sphinx\data\delta


使用coreseek中文分词sphinx.conf样例
  1. 中文参照:http://www.coreseek.cn/products-install/
  2. 安装步骤:
  3. 按照“中文参照”安装步骤,完成“三、coreseek中文全文检索测试”表示安装成功
  4. coreseek.conf样例:
  5. source main
  6. {
  7. type = mysql
  8. sql_host = 10.228.129.199 #主机地址
  9. sql_user = admin #用户名
  10. sql_pass = admin #密码
  11. sql_db = demo #数据库名
  12. sql_port = 3306 # 端口, default is 3306
  13. sql_query_pre = SET NAMES utf8
  14. sql_query_pre = REPLACE INTO phpcms_counter SELECT 1, MAX(searchid) FROM phpcms_search
  15. sql_query = SELECT searchid, type, data FROM phpcms_search \
  16.       WHERE searchid>=$start AND searchid<=$end
  17. sql_query_range  = SELECT 1,max_doc_id FROM phpcms_counter WHERE counter_id=1
  18. sql_range_step = 5000
  19. sql_query_info = SELECT * FROM main2008_search WHERE searchid=$id
  20. }
  21. source delta : main
  22. {
  23. sql_query_pre = SET NAMES utf8
  24. sql_query = SELECT searchid, type, data FROM phpcms_search \
  25. WHERE searchid >( SELECT max_doc_id FROM phpcms_counter WHERE counter_id=1 )
  26. }
  27. index main
  28. {
  29. source = main
  30. # 放索引的目录
  31. path = D:\sphinx\data\main #主索引路径
  32. #未分词版本,详情请参考:http://www.coreseek.cn/products-install/ngram_len_cjk/
  33. # 编码
  34. #charset_type = zh_cn.utf-8
  35. # 指定utf-8的编码表
  36. #charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
  37. # 简单分词,只支持0和1,如果要搜索中文,请指定为1
  38. #ngram_len = 1
  39. # 需要分词的字符,如果要搜索中文,去掉前面的注释
  40. #ngram_chars = U+3000..U+2FA1F
  41. # 分词版本,详情请参考:http://www.coreseek.cn/products-install/ngram_len_cjk/
  42. charset_dictpath=D:\sphinx\etc
  43. # 编码
  44. charset_type = zh_cn.utf-8
  45. # 指定zh_cn.utf-8的编码表
  46. #charset_table =
  47. ngram_len = 0
  48. #ngram_chars =
  49. }
  50. index delta : main
  51. {
  52. source = delta
  53. path = D:\sphinx\data\delta #从索引(暂时这么理解吧)路径
  54. }
  55. indexer
  56. {
  57. mem_limit = 128M #索引占用内存
  58. }
  59. searchd
  60. {
  61. port = 9312
  62. log = D:\sphinx\data\phpcms\searchd.log #服务日志路径
  63. query_log = D:\sphinx\data\phpcms\query.log #查询日志路径
  64. read_timeout = 5
  65. max_children = 30
  66. pid_file = D:\sphinx\data\phpcms\searchd.pid
  67. max_matches = 1000
  68. seamless_rotate = 0
  69. preopen_indexes = 0
  70. unlink_old = 1
  71. }

复制代码
http://bbs.phpcms.cn/thread-149380-1-1.html
在phpcms中应用sphinx全文索引[性能测试中]

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有帐号?立即注册 新浪微博账号登陆

x
该会员没有填写今日想说内容.
您需要登录后才可以回帖 登录 | 立即注册 新浪微博账号登陆

本版积分规则

快速回复 返回顶部 返回列表