Python自然语言处理读书笔记-第7章

2010Freeze · 发表于 2012-12-13 21:25:22

Python自然语言处理读书笔记-第7章

<div id="cnblogs_post_body">第7章从文本提取信息
开头三问：
1.我们如何能构建一个系统，从非结构化文本中提取结构化数据？
2.有哪些稳健的方法识别一个文本中描述的实体和关系？
3.哪些语料库适合这项工作，我们如何使用它们来训练和评估我们的模型？
整体流程：
分句->分词->词性标注->分块->实体识别->信息抽取->查询
先做分句->分词->词性标注：
<div class="cnblogs_code"> 1 import nltk 2 import re 3 import pprint 4 5 def ie_preprocess(document): 6 sentences = nltk.sent_tokenize(document) 7 sentences = [nltk.word_tokenize(sent) for sent in sentences] 8 sentences = [nltk.pos_tag(sent) for sent in sentences] 9 return sentences10 11 if __name__ == '__main__':12 mystr = 'My name is freeze, i like coding. I am in hangzhou now. I am from USTC.'13 print ie_preprocess(mystr)14

		自动登录	找回密码
密码			立即注册

Python自然语言处理读书笔记-第7章

浏览过的版块