注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Koala++'s blog

计算广告学 RTB

 
 
 

日志

 
 

Lucene 2.4.1 源代码分析 DefaultSkipListReader   

2009-09-24 09:34:24|  分类: Lucene |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

       DefaultSkipListWriter相似,DefaultSkipListReader继承自MultiLevelSkipListReaderMultiLevelSkipListReaer比起MultiLevelSkipListWriter要难一些,主要也是因为MultiLevelSkipListWriter太过简单。这两者不完全就是一个简单的相反过程,看代码量也可以看出来,先看一下它的成员变量。

// the maximum number of skip levels possible for this index

// 对于这个index最大的skip level

private int maxNumberOfSkipLevels;

 

// number of levels in this skip list

// 在这个skip list的级数

private int numberOfSkipLevels;

 

// Expert: defines the number of top skip levels to buffer in memory.

// Reducing this number results in less memory usage, but possibly

// slower performance due to more random I/Os.

// Please notice that the space each level occupies is limited by

// the skipInterval. The top level can not contain more than

// skipLevel entries, the second top level can not contain more

// than skipLevel^2 entries and so forth.

// Expert: 定义缓存到内存的前面的skip数量

// 变小这个值可以节省内存,但是可能可能性能会变差,因为会有更多random I/Os

// 请注意,对于每一个的空间是受限于skipIntervaltop level不能包含skipLevel

// 以上的entriessecond top level不能包含skipLevel^2的入口

// 以此类推

private int numberOfLevelsToBuffer = 1;

 

private int docCount;

 

private boolean haveSkipped;

 

// skipStream for each level

// 对于每一级的skipStream

private IndexInput[] skipStream;

 

// the start pointer of each skip level

// 每一级的开始指针

private long skipPointer[];

 

// skipInterval of each level

// 每一级的skipInterval

private int skipInterval[];

 

// number of docs skipped per level

// 对于每级跳过的doc数量

private int[] numSkipped;

 

// doc id of current skip entry per level

// 每级当前skip entrydoc id

private int[] skipDoc;

 

// doc id of last read skip entry with docId <= target

// 上次读取的skip entry中的doc id,有docId <= target

private int lastDoc;

 

// child pointer of current skip entry per level

// 每级当前skip entry的子指针

private long[] childPointer;

 

// childPointer of last read skip entry with docId <= target

// 上次读的skip entry的子指针,有docId <= target

private long lastChildPointer;

 

private boolean inputIsBuffered;

       看一下它的构造函数:

public MultiLevelSkipListReader(IndexInput skipStream, int maxSkipLevels, int skipInterval) {

    this.skipStream = new IndexInput[maxSkipLevels];

    this.skipPointer = new long[maxSkipLevels];

    this.childPointer = new long[maxSkipLevels];

    this.numSkipped = new int[maxSkipLevels];

    this.maxNumberOfSkipLevels = maxSkipLevels;

    this.skipInterval = new int[maxSkipLevels];

    this.skipStream[0] = skipStream;

    this.inputIsBuffered = (skipStream instanceof BufferedIndexInput);

    this.skipInterval[0] = skipInterval;

    for (int i = 1; i < maxSkipLevels; i++) {

       // cache skip intervals

       this.skipInterval[i] = this.skipInterval[i - 1] * skipInterval;

    }

    skipDoc = new int[maxSkipLevels];

}

       这里要注意的有:skipStream是直接指向skipStream,因为它是第一级,inputIsBuffered是判断skipStream是不是一个BufferedIndexInput,而skipInterval的每级是skipInterval^n

/** Loads the skip levels  */

private void loadSkipLevels() throws IOException {

    numberOfSkipLevels = docCount == 0 ? 0 : (int) Math.floor(Math

           .log(docCount) / Math.log(skipInterval[0]));

    if (numberOfSkipLevels > maxNumberOfSkipLevels) {

       numberOfSkipLevels = maxNumberOfSkipLevels;

    }

 

    skipStream[0].seek(skipPointer[0]);

 

    int toBuffer = numberOfLevelsToBuffer;

 

    for (int i = numberOfSkipLevels - 1; i > 0; i--) {

       // the length of the current level

       long length = skipStream[0].readVLong();

 

       // the start pointer of the current level

       skipPointer[i] = skipStream[0].getFilePointer();

       if (toBuffer > 0) {

           // buffer this level

           skipStream[i] = new SkipBuffer(skipStream[0], (int) length);

           toBuffer--;

       } else {

           // clone this stream, it is already at the start of

// the current level

           skipStream[i] = (IndexInput) skipStream[0].clone();

           if (inputIsBuffered && length <

              BufferedIndexInput.BUFFER_SIZE) {

              ((BufferedIndexInput) skipStream[i])

                     .setBufferSize((int) length);

           }

 

           // move base stream beyond the current level

           skipStream[0].seek(skipStream[0].getFilePointer()

+ length);

       }

    }

 

    // use base stream for the lowest level

    skipPointer[0] = skipStream[0].getFilePointer();

}

       loadSkipLevels开始先计算numOfSkipLevels公式还是以skipInterval为底的docCount的对数,如果大于maxNumberOfSkipLevels,就赋以maxNumberOfSkipLevelsskipStream[0]指向skipPoint[0],其实应该是什么也没做,它本来就在那,toBuffer为缓存的level数。这里从numberOfSkipLevels-1开始读是因为level=0要特殊处理,或者说写的时候就是这样写的。先读入长度,然后读入当前级别的开始指针。再下面如果还可以缓存这个level那么就将这个级别写入skipStream[i]。如果不能缓存,就克隆这个stream,并且如果输入是缓存那么就设置空间大小,在后面将 skipStream指向下一个level的位置。而skipPointer[0]就是在最后的位置,因为它没有写入长度。

private boolean loadNextSkip(int level) throws IOException {

    // we have to skip, the target document is greater than the current

    // skip list entry       

    setLastSkipData(level);

 

    numSkipped[level] += skipInterval[level];

 

    if (numSkipped[level] > docCount) {

       // this skip list is exhausted

       skipDoc[level] = Integer.MAX_VALUE;

       if (numberOfSkipLevels > level)

           numberOfSkipLevels = level;

       return false;

    }

 

    // read next skip entry

    skipDoc[level] += readSkipData(level, skipStream[level]);

 

    if (level != 0) {

       // read the child pointer if we are not on the leaf level

       childPointer[level] = skipStream[level].readVLong()

              + skipPointer[level - 1];

    }

 

    return true;

}

       设置上次Skip数据,这个没什么:

protected void setLastSkipData(int level) {

    super.setLastSkipData(level);

    lastFreqPointer = freqPointer[level];

    lastProxPointer = proxPointer[level];

    lastPayloadLength = payloadLength[level];

}

protected void setLastSkipData(int level) {

    lastDoc = skipDoc[level];

    lastChildPointer = childPointer[level];

}

       DefaultSkipListReadersetLastSkipData一起列出来。把这一级的数据记录一下。numSkipped[level]level这一级跳过了多少个doc。如果numSkipped[level]>docCount表示跳到了尽头不用再跳了,将skipDoc[level]设为Integer.MAX_VALUE表示不可能了,最后修改numberOfSkipLevelslevel。如果没有发生那么就差skipDoc中读入这级的skip docs。如果不是最后一级,那么childPointer将它下一级的指针位置读到。

/** Seeks the skip entry on the given level */

protected void seekChild(int level) throws IOException {

    skipStream[level].seek(lastChildPointer);

    numSkipped[level] = numSkipped[level + 1] - skipInterval[level + 1];

    skipDoc[level] = lastDoc;

    if (level > 0) {

       childPointer[level] = skipStream[level].readVLong()

              + skipPointer[level - 1];

    }

}

       skipStream指向上次记录的child PointernumSkipped[level]值是numSkipped[level+1]也就是它的上一级减去skipInteral[level+1]表示它的间隔,这个是因为上一次在loadNextSkip中多加了一次skipInterval[level](也就是这里的skipInterval[level+1]),numSkipped[level]是把上一级所跳过的直接得到了,这个值会一直传到numSkipped[0],也就是为什么在skipTo中最后会返回numSkipped[0]-skipInterval[0]-1。而childPointer[level]的值也要更新,就是它下一级的指针位置加上这一级的长度。

/** Skips entries to the first beyond the current whose document number

 * is greater than or equal to <i>target</i>. Returns the current doc count.

 */

int skipTo(int target) throws IOException {

    if (!haveSkipped) {

       // first time, load skip levels

       loadSkipLevels();

       haveSkipped = true;

    }

 

    // walk up the levels until highest level is found that has a skip

    // for this target

    int level = 0;

    while (level < numberOfSkipLevels - 1 && target > skipDoc[level + 1])

 {

       level++;

    }

 

    while (level >= 0) {

       if (target > skipDoc[level]) {

           if (!loadNextSkip(level)) {

              continue;

           }

       } else {

           // no more skips on this level, go down one level

           if (level > 0

                  && lastChildPointer > skipStream[level - 1]

                         .getFilePointer()) {

              seekChild(level - 1);

           }

           level--;

       }

    }

 

    return numSkipped[0] - skipInterval[0] - 1;

}

       如果!haveSkipped那么就读入Skip levels。现在确定target是在哪个level上,while里看起来有点奇怪,前面已经得到了它的level了,为什么还要读下一级呢?因为level还要受限于maxNumberOfSkipLevels的限制,所以不一定得到的是最终的level。这种情况下就要一直读到那需要的地方,最后它的skipDoc[level]会设置成Integer.MAX_VALUE,并返回false。它来完成全部的读取是每次读取最大level看还可不可以再以这个levelinterval读,比如skipInterval5,一共有4级,525125625,那么就先每次加625看会不会超过,如果超过了,说明要用125去读,则用seekChild降到125这个级别,以此类推。而else则会到下一级中去找到相应的入口,再交由loadNextSkip读,返回的值是一共跳过了多少,numSkipped[0]其实是一个累加值,而skipInterval[0]是它最后一次多加的一次。

protected int readSkipData(int level, IndexInput skipStream)

       throws IOException {

    int delta;

    if (currentFieldStoresPayloads) {

       // the current field stores payloads.

       // if the doc delta is odd then we have

       // to read the current payload length

       // because it differs from the length of the

       // previous payload

       delta = skipStream.readVInt();

       if ((delta & 1) != 0) {

           payloadLength[level] = skipStream.readVInt();

       }

       delta >>>= 1;

    } else {

       delta = skipStream.readVInt();

    }

    freqPointer[level] += skipStream.readVInt();

    proxPointer[level] += skipStream.readVInt();

 

    return delta;

}

       这个相对好理解一些,如果保存payloads那么读出它的delta,如果是最右一位是1,则说明与上次的payload长度不同,如果为0,当然是相同了,右移一位把它与上次的差值得到,然后读出freqPointer[level]proxPointer的差值。

       它在SegmentTermDocs中的skipTo中被下面的代码调用:

if (df >= skipInterval) { // optimized case

    if (skipListReader == null)

       skipListReader = new DefaultSkipListReader(

              (IndexInput) freqStream.clone(), maxSkipLevels,

              skipInterval); // lazily clone

 

    if (!haveSkipped) { // lazily initialize skip stream

       skipListReader.init(skipPointer, freqBasePointer,

              proxBasePointer, df, currentFieldStoresPayloads);

       haveSkipped = true;

    }

 

    int newCount = skipListReader.skipTo(target);

    if (newCount > count) {

       freqStream.seek(skipListReader.getFreqPointer());

       skipProx(skipListReader.getProxPointer(), skipListReader

              .getPayloadLength());

 

       doc = skipListReader.getDoc();

       count = newCount;

    }

}

       构造函数和init函数都没什么,一大堆初始化,而看这个函数中,skipTo是返回跳了多少,这里会与count相比,count是预读了多少,如果大于它,就要重新移动freqStreamProxStream的位置。

 

 

 

 

 

 

  评论这张
 
阅读(1348)| 评论(0)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017