注册 登录  
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Koala++'s blog

计算广告学 RTB





2010-01-01 17:14:09|  分类: 搜索引擎 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

         选译自<Query logs alone are not enough> 3.3节。


The main advantage of query logs is that they have a diversity of tasks, queries, user experiences that is difficult, if not impossible, to duplicate in any other data source. However, this diversity is completely unlabeled: the researcher may know that the user did a query, but the researcher does not know what the user meant by it, nor does the researcher knows whether the user was happy with the result.


         Another advantage of logs is that they measure users in the wild, so assuming that a change is ready for live traffic, it is often possible to test experimental conditions to observe the impact on users (assuming proper experimental design and infrastructure).

         日志的另一个好外是它们在真实的环境(in the wild)中衡量用户,那么假设一个改动准备上线了,那么它常可以用来测试实验条件来观察对用户的影响(假设有合适的实验设计和基础)

         While logs are easy to collect in bulk and contain a wealth of observational data, there are many limitations and biases to consider.


First and foremost, logs can only measure the how and the what, rather than the why. For example, if we have a sequence of queries, we only know the sequence of queries, but we have no evidence of why the user is typing in that particular sequence. We might be able to infer the user goal, but it is really only a guess -- we are inherently making up a story. The query sequence does not provide complete information about the goal of a user. While work has been done to connect loggable behavior with task classification, the success rate for such models, even for very general task types, is quite low. Lee, et al [9] showed that it was possible to classify many logged tasks into two general categories: Informational and Navigational using both user behavior and data about the web pages viewed. However, even for their sample of the 50 most popular queries, the classification could only be done for 20 that had sufficient data, leaving the majority of tasks unclassified.

         第一,也是最重要的,日志只可以衡量‘如何’和‘什么’,而不是‘为什么’。比如,如果我们得到一个查询序列,我们只知道查询的顺序,但我们没有为什么用户要输入这个特定序列的证据。我们可能可以推测用户的目标,但这只是一个猜测——我们本质上只是在编一个故事。查询序列不能提供用户目标的完整信息。然而这可以通过将可记录的行为连起来,用任务分类来做。这种模型的成功率,即使是对这种非常一般的任务类型,都是非常低的。Lee et al指出将许多记录的任务可以分成两个一般的类别:信息的和导航的。它是通过用户的行为和查看的页面来做的。然而就是在他们所用的最常见的50个查询样本中。分类只能对20个有足够数据的样本进行分类,而大多数任务是无法分类的。

Second, logs are completely unlabeled except for the presence or absence of an event. For example, it is possible to determine whether an ad was clicked, but not necessarily why the ad was clicked or whether the user found the ad useful. Fox, et al [4] inferred session-level satisfaction based on the user behaviors during a search, and were able to do significantly better than baseline: up to 74% accuracy versus a baseline of ~56%. However, this model used explicit judgments of the quality of individual results provided by the user, and without those, the accuracy dropped to 60%. Even in their reduced model, one of the major factors – session exit status – would not normally be observed by a search engine without some browser side instrumentation.

         第二,除了一个事件有没有出现外,日志是完全未标注的。比如,可以确定一个广告是不是被点击了,但是无法确定为什么广告被点了,用户是不是感觉广告有用。Fox et al.推测基于在搜索时用户行为的session-level satisfaction,可以做到比基线好的多:达到74%的正确率比之基线的56%。但是,这个模型使用了用户提供的明确的对每个结果质量的评价信息,不使用这些,正确率会降到60%。就算是在后面的模型中,一个重要的因素——session退出状态——如果没有浏览器方面的工具也一般观察不到。

Next, logs can only measure the system being logged: Nielsen's study found that only 20% of a user's time is spent searching, which means that logs from a search engine are missing at least 80% of a user's online activity, more if a user uses multiple search engines [11]. Logs can only capture the specific system: if users use Google for one type of search and Yahoo for another, then the logs from Yahoo will only contain the second type. Even within the framework of a single search engine, logs record only measurable properties: depending on the system set-up, some things may not be loggable [16]. Similarly, logs cannot capture any social interaction, such as people talking to one another; this consideration is important when testing a user-visible change, for example: the full effect may not be felt until a critical mass has been reached.


Fourth, query logs are noisy, since they include everything, including robots, spam, data outages, recording errors, etc. Using intelligent filters can reduce that noise, but filters are often based on a set of heuristics, with varying degrees of accuracy and stringency (it is often a question of over-filtering vs. underfiltering).


Finally, logs don’t necessarily allow long-term studies of a single user. The goals of these studies are often to determine whether a particular change leads to a poorer user experience and users searching less or coming to the site less often, or whether users learn to trigger a particular feature. Even in these studies, it is the still the aggregate behavior that researchers are often interested in, rather than what individual users do (e.g., how many users learned to trigger a particular feature). However, due to concerns regarding respecting users' privacy, sites will usually log a cookie, with no personally identifying information, as a proxy for a user. Cookies, however, are a very imprecise approximation of users: people can clear their cookies, software that people don't even know they have installed can automatically clear cookies, one person may use multiple machines (e.g., one at work, one at home), multiple people may use the same machine (e.g., a shared computer at home), and people may share cookies to try out different experimental conditions [23]. One study from Jupiter cited a weekly cookie churn rate of 30% [21], and another survey found that 40% of users voluntarily clean their cookies weekly [22]. While both numbers are from surveys and have a self-reporting bias, evidence suggests that cookie churn is significant enough to impede these aggregate long-term studies.


Overall, logs are useful for analyses where a large amount of data is needed, and for testing the impact of changes, especially ones that only impact a small proportion of queries. However, it must be emphasized that logs can only ever reveal correlations and not causality, however tempting inferring causality may be.




阅读(963)| 评论(0)
推荐 转载



<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->


网易公司版权所有 ©1997-2018