注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Koala++'s blog

计算广告学 RTB

 
 
 

日志

 
 

相关反馈与查询扩展[1]   

2009-12-12 18:11:32|  分类: 搜索引擎 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

选译自《Introduction to IR》

In most collections, the same concept may be referred to using different words. This issue, known as synonymy, has an impact on the recall of most information retrieval systems. For example, you would want a search for aircraft to match plane (but only for references to an airplane, not a woodworking plane), and for a search on thermodynamics to match references to heat in appropriate discussions. Users often attempt to address this problem themselves by manually refining a query, as was discussed in Section 1.4; in this chapter we discuss ways in which a system can help with query refinement, either fully automatically or with the user in the loop. The methods for tackling this problem split into two major classes: global methods and local methods. Global methods are techniques for expanding or reformulating query terms independent of the query and results returned from it, so that changes in the query wording will cause the new query to match other semantically similar terms. Global methods include:

 

? Query expansion/reformulation with a thesaurus orWordNet (Section 9.2.2)

? Query expansion via automatic thesaurus generation (Section 9.2.3)

? Techniques like spelling correction (discussed in Chapter 3)

 

Local methods adjust a query relative to the documents that initially appear to match the query. The basic methods here are:

 

? Relevance feedback (Section 9.1)

? Pseudo relevance feedback, also known as Blind relevance feedback (Section 9.1.6)

? (Global) indirect relevance feedback (Section 9.1.7)

 

在大部分collections中,相同的概念可以用不同的词去表达,它被称为同义词,它对大多数IR系统的recall是有影响的。比如,你搜索aircraft去匹配plane(但是它只想表达airplane,而不是木制飞机),在搜索thermodynamics匹配在正式讨论的heat。这里讨论系统如何改善查询,使用或是完全自动,或是有用户在循环中参与的方法。解决这个问题的方法可分为两个大的类别:全局方法和局部方法。全局方法是与查询和返回结果无关的一种扩展和重新形式化查询terms的技术,所以查询词的改变会导致新的查询词匹配其它语义相关的terms。全局的方法包括:

Ø  使用分类词典或是WordNet进行查询扩展/改善。

Ø  通过自动产生分词词典进行查询扩展

Ø  类似拼写纠错的技术

局部方法根据起初匹配查询的文档调整查询,基本的方法包括:

Ø  相关反馈

Ø  伪相关反馈,也被称为Blind relevance feedback

Ø  (全局)间接相关反馈。

 

9.1 Relevance feedback and pseudo relevance feedback

The idea of relevance feedback (RF) is to involve the user in the retrieval process so as to improve the final result set. In particular, the user gives feedback on the relevance of documents in an initial set of results. The basic procedure is:

 

? The user issues a (short, simple) query.

? The system returns an initial set of retrieval results.

? The user marks some returned documents as relevant or nonrelevant.

? The system computes a better representation of the information need based on the user feedback.

? The system displays a revised set of retrieval results.

 

Relevance feedback can go through one or more iterations of this sort. The process exploits the idea that it may be difficult to formulate a good query when you don’t know the collection well, but it is easy to judge particular documents, and so it makes sense to engage in iterative query refinement of this sort. In such a scenario, relevance feedback can also be effective in tracking a user’s evolving information need: seeing some documents may lead users to refine their understanding of the information they are seeking.

 

         Relevance feedback(RF)的思想是用户参与检索的过程来提高最后的结果集效果,特别是用户在一个初始的结果集中给出反馈,基本的步骤是:

Ø  用户给出一个(短的,简单的)查询

Ø  系统返回一个初始的检索结果集

Ø  用户标记返回的文档为相关或不相关

Ø  系统基于用户的反馈计算所需信息的一个更好表达

Ø  系统显示修正过的检索集合

 

相关反馈可以进行一次或多次这种迭代。这个过程利用了这种思想:在你也许没有很好了解collection时,形式化一个好的查询是的困难的,但是却很容易判断特定的文档是否所需,所以这种查询改善的迭代方法是合理的。在这种情况下,相关反馈也可以用来跟踪用户进一步信息需求:看哪一些文档可以引导用户去改善他们对他们所找寻信息的理解。

                Relevance feedback and query - quweiprotoss - Koala++s blog

         (a)是开始产生的搜索结果集,用户选择了几个相关的返回,用绿框标出来的,(b)是修正后的返回结果集合。
  评论这张
 
阅读(1500)| 评论(1)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017