注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Koala++'s blog

计算广告学 RTB

 
 
 

日志

 
 

Lucene源代码分析[7]  

2009-07-03 12:43:21|  分类: Lucene |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

// sort postingTable into an array

Posting[] postings = sortPostingTable();

    接着我们没有完成的代码,上面这一行是对PostingTable进行排序,最后返回一个数组。这一点写的也很高明,他先用HashTable提高速度,对它进行排序,返回。

private final Posting[] sortPostingTable() {

    // copy postingTable into an array

    Posting[] array = new Posting[postingTable.size()];

    Enumeration postings = postingTable.elements();

    for (int i = 0; postings.hasMoreElements(); i++)

       array[i] = (Posting) postings.nextElement();

 

    // sort the array

    quickSort(array, 0, array.length - 1);

 

    return array;

}

    里面有一个quickSort静态函数,是根据termpostings进行排序。quickSort中的compareTo函数如下:

public final int compareTo(Term other) {

    if (field == other.field) // fields are interned

       return text.compareTo(other.text);

    else

       return field.compareTo(other.field);

}

    我们看最后两个函数:

// write postings

writePostings(postings, segment);

 

// write norms of indexed fields

writeNorms(segment);

    writePosting代码列出来,我把try,catch删除了,因为太长了:

private final void writePostings(Posting[] postings, String segment)

       throws IOException {

    IndexOutput freq = null, prox = null;

    TermInfosWriter tis = null;

    TermVectorsWriter termVectorWriter = null;

    try {

       //open files for inverse index storage

       freq = directory.createOutput(segment + ".frq");

       prox = directory.createOutput(segment + ".prx");

       tis = new TermInfosWriter(directory, segment, fieldInfos,

              termIndexInterval);

       TermInfo ti = new TermInfo();

       String currentField = null;

 

       for (int i = 0; i < postings.length; i++) {

           Posting posting = postings[i];

 

           // add an entry to the dictionary

// with pointers to prox and freq files

           ti.set(1, freq.getFilePointer(), prox.getFilePointer(), -1);

           tis.add(posting.term, ti);

 

           // add an entry to the freq file

           int postingFreq = posting.freq;

           if (postingFreq == 1) // optimize freq=1

              freq.writeVInt(1); // set low bit of doc num.

           else {

              freq.writeVInt(0); // the document number

              freq.writeVInt(postingFreq); // frequency in doc

           }

 

           int lastPosition = 0; // write positions

           int[] positions = posting.positions;

           for (int j = 0; j < postingFreq; j++) { // use delta-encoding

              int position = positions[j];

              prox.writeVInt(position - lastPosition);

              lastPosition = position;

           }

           // check to see if we switched to a new field

           String termField = posting.term.field();

           if (currentField != termField) {

              // changing field - see if there is something to save

              currentField = termField;

              FieldInfo fi = fieldInfos.fieldInfo(currentField);

              if (fi.storeTermVector) {

                  if (termVectorWriter == null) {

                     termVectorWriter = new

                         TermVectorsWriter(directory,

                            segment, fieldInfos);

                     termVectorWriter.openDocument();

                  }

                  termVectorWriter.openField(currentField);

 

              } else if (termVectorWriter != null) {

                  termVectorWriter.closeField();

              }

           }

           if (termVectorWriter != null &&

termVectorWriter.isFieldOpen()) {

              termVectorWriter.addTerm(posting.term.text(),

postingFreq,

                     posting.positions, posting.offsets);

           }

       }

       if (termVectorWriter != null)

           termVectorWriter.closeDocument();

    }

}

    这里我们又看到两个新的文件.frq.prx.frq保存的是term的频率,.prx保存的是词的位置。

    看一下TermInfoWriter的构造函数:

TermInfosWriter(Directory directory, String segment, FieldInfos fis,

       int interval) throws IOException {

    initialize(directory, segment, fis, interval, false);

    other = new TermInfosWriter(directory, segment, fis, interval, true);

    other.other = this;

}

 

private TermInfosWriter(Directory directory, String segment,

       FieldInfos fis, int interval, boolean isIndex) throws IOException {

    initialize(directory, segment, fis, interval, isIndex);

}

 

private void initialize(Directory directory, String segment,

       FieldInfos fis, int interval, boolean isi) throws IOException {

    indexInterval = interval;

    fieldInfos = fis;

    isIndex = isi;

    output = directory.createOutput(segment + (isIndex ? ".tii" :

        ".tis"));

    output.writeInt(FORMAT); // write format

    output.writeLong(0); // leave space for size

    output.writeInt(indexInterval); // write indexInterval

    output.writeInt(skipInterval); // write skipInterval

}

    可以看出在inialize函数中产生了tis文件,在下面的构造函数中产生了tii文件,对象名为other

 

 

 

 

 

  评论这张
 
阅读(988)| 评论(0)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017