注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Koala++'s blog

计算广告学 RTB

 
 
 

日志

 
 

查询日志的优点与局限  

2010-01-01 17:14:09|  分类: 搜索引擎 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

         选译自<Query logs alone are not enough> 3.3节。

        

The main advantage of query logs is that they have a diversity of tasks, queries, user experiences that is difficult, if not impossible, to duplicate in any other data source. However, this diversity is completely unlabeled: the researcher may know that the user did a query, but the researcher does not know what the user meant by it, nor does the researcher knows whether the user was happy with the result.

         查询日志最大的优点是:任务,用户体验是有差异的,这些是从别的数据源很难甚至不可能得到的,但是这种差异是完全未标记的:研究者可能知道用户有做了一次查询,而研究者不知道用户想表达什么,研究者也不知道用户有是不是对结果满意。

         Another advantage of logs is that they measure users in the wild, so assuming that a change is ready for live traffic, it is often possible to test experimental conditions to observe the impact on users (assuming proper experimental design and infrastructure).

         日志的另一个好外是它们在真实的环境(in the wild)中衡量用户,那么假设一个改动准备上线了,那么它常可以用来测试实验条件来观察对用户的影响(假设有合适的实验设计和基础)

         While logs are easy to collect in bulk and contain a wealth of observational data, there are many limitations and biases to consider.

         虽然为日志很容易收集一大堆,也包含很多有价值的观察数据,但它也有一些局限和思考的假设。

First and foremost, logs can only measure the how and the what, rather than the why. For example, if we have a sequence of queries, we only know the sequence of queries, but we have no evidence of why the user is typing in that particular sequence. We might be able to infer the user goal, but it is really only a guess -- we are inherently making up a story. The query sequence does not provide complete information about the goal of a user. While work has been done to connect loggable behavior with task classification, the success rate for such models, even for very general task types, is quite low. Lee, et al [9] showed that it was possible to classify many logged tasks into two general categories: Informational and Navigational using both user behavior and data about the web pages viewed. However, even for their sample of the 50 most popular queries, the classification could only be done for 20 that had sufficient data, leaving the majority of tasks unclassified.

         第一,也是最重要的,日志只可以衡量‘如何’和‘什么’,而不是‘为什么’。比如,如果我们得到一个查询序列,我们只知道查询的顺序,但我们没有为什么用户要输入这个特定序列的证据。我们可能可以推测用户的目标,但这只是一个猜测——我们本质上只是在编一个故事。查询序列不能提供用户目标的完整信息。然而这可以通过将可记录的行为连起来,用任务分类来做。这种模型的成功率,即使是对这种非常一般的任务类型,都是非常低的。Lee et al指出将许多记录的任务可以分成两个一般的类别:信息的和导航的。它是通过用户的行为和查看的页面来做的。然而就是在他们所用的最常见的50个查询样本中。分类只能对20个有足够数据的样本进行分类,而大多数任务是无法分类的。

Second, logs are completely unlabeled except for the presence or absence of an event. For example, it is possible to determine whether an ad was clicked, but not necessarily why the ad was clicked or whether the user found the ad useful. Fox, et al [4] inferred session-level satisfaction based on the user behaviors during a search, and were able to do significantly better than baseline: up to 74% accuracy versus a baseline of ~56%. However, this model used explicit judgments of the quality of individual results provided by the user, and without those, the accuracy dropped to 60%. Even in their reduced model, one of the major factors – session exit status – would not normally be observed by a search engine without some browser side instrumentation.

         第二,除了一个事件有没有出现外,日志是完全未标注的。比如,可以确定一个广告是不是被点击了,但是无法确定为什么广告被点了,用户是不是感觉广告有用。Fox et al.推测基于在搜索时用户行为的session-level satisfaction,可以做到比基线好的多:达到74%的正确率比之基线的56%。但是,这个模型使用了用户提供的明确的对每个结果质量的评价信息,不使用这些,正确率会降到60%。就算是在后面的模型中,一个重要的因素——session退出状态——如果没有浏览器方面的工具也一般观察不到。

Next, logs can only measure the system being logged: Nielsen's study found that only 20% of a user's time is spent searching, which means that logs from a search engine are missing at least 80% of a user's online activity, more if a user uses multiple search engines [11]. Logs can only capture the specific system: if users use Google for one type of search and Yahoo for another, then the logs from Yahoo will only contain the second type. Even within the framework of a single search engine, logs record only measurable properties: depending on the system set-up, some things may not be loggable [16]. Similarly, logs cannot capture any social interaction, such as people talking to one another; this consideration is important when testing a user-visible change, for example: the full effect may not be felt until a critical mass has been reached.

         接下来,日志只可以记录系统自己的日志:Nielsen的研究表明一个用哀悼只有20%的时间在搜索,这表示一个搜索引擎缺少至少80%的用户网上行为,如果用户使用多个搜索,这个量会更大。日志可以记录一个特别的系统:如果用户用google进行一类搜索,用Yahoo!进行另一类搜索,那么Yahoo!的日志中只包启第二类搜索日志。就是在一个单一的搜索引擎中,日志也只记录可度量的属性:依赖于系统开始的时候,一些事不能被日志记录。类似的日志不能捕获社会交互,比如一个人与另一个人谈话;当测试一个用户可见的改变时是它是很重要的。比如,只有达到一个临界点时它的全部效果才能被感觉到。

Fourth, query logs are noisy, since they include everything, including robots, spam, data outages, recording errors, etc. Using intelligent filters can reduce that noise, but filters are often based on a set of heuristics, with varying degrees of accuracy and stringency (it is often a question of over-filtering vs. underfiltering).

         第四,查询日志有噪音,因为它们包话一切信息,包括爬虫,欺骗搜索引擎的信息,过时数据,错误记录,等等。用智能过滤器可以减少噪音,但是过滤器常常基于试探集合,来改变正确率和stringency(它通常是过拟合和欠拟合的问题)

Finally, logs don’t necessarily allow long-term studies of a single user. The goals of these studies are often to determine whether a particular change leads to a poorer user experience and users searching less or coming to the site less often, or whether users learn to trigger a particular feature. Even in these studies, it is the still the aggregate behavior that researchers are often interested in, rather than what individual users do (e.g., how many users learned to trigger a particular feature). However, due to concerns regarding respecting users' privacy, sites will usually log a cookie, with no personally identifying information, as a proxy for a user. Cookies, however, are a very imprecise approximation of users: people can clear their cookies, software that people don't even know they have installed can automatically clear cookies, one person may use multiple machines (e.g., one at work, one at home), multiple people may use the same machine (e.g., a shared computer at home), and people may share cookies to try out different experimental conditions [23]. One study from Jupiter cited a weekly cookie churn rate of 30% [21], and another survey found that 40% of users voluntarily clean their cookies weekly [22]. While both numbers are from surveys and have a self-reporting bias, evidence suggests that cookie churn is significant enough to impede these aggregate long-term studies.

         最后,日志不可进行对单一用户的长期行为研究。这些研究的目标常常是确定一个特定的改动是否会导致用户体验变差,而用户更少搜索,并很少访问这个网站。或是是否用户会去触发一个特定特性。就算在这些研究中,也研究者也仍常常对合并后的行为感兴趣,而不是单一用户的行为(比如,多少用户会去触发一个特定的特性)。然而,因为要考虑尊重用户的隐私,网站通常记录一个cookie,而不使用个人身份信息,作为用户代理。然而,Cookies,对确定是非常不准确的:人们可以清除他们的日志,用户都不清楚的安装软件在自动清除日志,一个用户可以使用多个计算机(比如:一台在工作时用,一台在家里用),多个用户也可能用同一台计算机(比如:在家共用计算机),并且人们可能共享cookies来尝试不同的实验条件。Jupiter的研究一周的cookie流失率达30%,另一个调查发现40%的用户每周自发地清除它们的cookies。虽然两个数字都来自调晒,并都有一个报告假设,但事实表明cookie流失足以妨碍这些用户长期行为研究。

Overall, logs are useful for analyses where a large amount of data is needed, and for testing the impact of changes, especially ones that only impact a small proportion of queries. However, it must be emphasized that logs can only ever reveal correlations and not causality, however tempting inferring causality may be.

         综上,日志对下述需求有用的:分析大量数据,测试改变的影响,特别是只对一小部分查询有影响的改变。然而,必需强调日志只能反应相关关系,而不是因果关系,然而推测因果关系也许可以。

 

Koala++注:在这一篇中,我感觉google的研究者看的是比较深,比较远的,一些它提到的局限,我以前都认为根本是不言而喻,或是根本就不是局限的,看来还是有很大差距。

  评论这张
 
阅读(932)| 评论(0)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017