注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Koala++'s blog

计算广告学 RTB

 
 
 

日志

 
 

Weka开发[27]——SMO源代码分析[2]   

2009-09-19 22:24:28|  分类: 机器学习 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

初始化error cache,初始化kernelRBF核,多项式核,这些代码就先不管了。

// Build up I1 and I4

for (int i = 0; i < m_class.length; i++) {

    if (m_class[i] == 1) {

       m_I1.insert(i);

    } else {

       m_I4.insert(i);

    }

}

       初始化m_I1m_I4,这个可以看Keerthi论文的第639页的定义

// Loop to find all the support vectors

int numChanged = 0;

boolean examineAll = true;

while ((numChanged > 0) || examineAll) {

    numChanged = 0;

    if (examineAll) {

       for (int i = 0; i < m_alpha.length; i++) {

           if (examineExample(i)) {

              numChanged++;

           }

       }

    } else {

       // This code implements Modification 1 from Keerthi et al.'s paper

       for (int i = 0; i < m_alpha.length; i++) {

           if ((m_alpha[i] > 0)

                 && (m_alpha[i] < m_C * m_data.instance(i).weight())) {

              if (examineExample(i)) {

                  numChanged++;

              }

 

              // Is optimality on unbound vectors obtained?

              if (m_bUp > m_bLow - 2 * m_tol) {

                  numChanged = 0;

                  break;

              }

           }

       }

    }

 

    if (examineAll) {

       examineAll = false;

    } else if (numChanged == 0) {

       examineAll = true;

    }

}

       这段代码与Platt论文中第5253页的伪代码基本是一样的,只有注释的地方是Keerthi的改进。可以看Platt论文中的第47页,原文如下:To speed training, the outer loop does not always iterate through the entire training set. After one pass through the training set, the outer loop iterates over only those examples whose Lagrange multipliers are neither 0 nor CAgain, each example is checked against the KKT conditions, and violating examples are eligible for immediate optimization and update. The outer loop makes repeated passes over the non-bound examples until all of the non-bound examples obey the KKT conditions within ethta. The outer loop then iterates over the entire set again. The outer loop keeps alternating between single passes over the entire training set and multiple passes over the non-bound subset until the entire training set obeys the KKT conditions within ethta. At that point, the algorithm terminates.

       这里的examineAll就是论文中说是是否需要iterate through the entire training set。而examineExample就是判断这个样本是不是在边界上,也就是它的Lagrange乘子不是0也不是C。而else的代码第1ifPlatt给出的伪代码是一样的,看边界上的样本是否满足KKT条件了,Is optimality on unbounded vectors obtained?这句话对应的是Keerthi中的公式2.8while中的numChanged > 0表示如果还有改变的,那么就继续。如果entire training set obeys the KKT conditions within ethta. At that point the algorithm terminates

       接下来是examineExample的代码:

double y2, alph2, F2;

int i1 = -1;

 

y2 = m_class[i2];

alph2 = m_alpha[i2];

if (m_I0.contains(i2)) {

    F2 = m_errors[i2];

} else {

    F2 = SVMOutput(i2, m_data.instance(i2)) + m_b - y2;

    m_errors[i2] = F2;

 

    // Update thresholds

    if ((m_I1.contains(i2) || m_I2.contains(i2)) && (F2 < m_bUp)) {

       m_bUp = F2;

       m_iUp = i2;

    } else if ((m_I3.contains(i2) || m_I4.contains(i2))

           && (F2 > m_bLow)) {

       m_bLow = F2;

       m_iLow = i2;

    }

}

       y2是第i2个样本的类别,而alph2是第i2alpha值,如果i2是非边界点那么它F2就在error cache中,直接取得就可以了。而else则是不在边界上的点,当然也就不在error cache中了,关于SVMOutput的代码马上再看,而下面的ifelse if则是Keerthi中的公式2.4。更新相应的阈值。

       SVMOutputRBF核的代码就不看了,也就是只看if的代码:

double result = 0;

 

// Is the machine linear?

if (!m_useRBF && m_exponent == 1.0) {

 

    // Is weight vector stored in sparse format?

    if (m_sparseWeights == null) {

       int n1 = inst.numValues();

       for (int p = 0; p < n1; p++) {

           if (inst.index(p) != m_classIndex) {

              result += m_weights[inst.index(p)]

                     * inst.valueSparse(p);

           }

       }

    } else {

       int n1 = inst.numValues();

       int n2 = m_sparseWeights.length;

       for (int p1 = 0, p2 = 0; p1 < n1 && p2 < n2;) {

           int ind1 = inst.index(p1);

           int ind2 = m_sparseIndices[p2];

           if (ind1 == ind2) {

              if (ind1 != m_classIndex) {

                  result += inst.valueSparse(p1)

                         * m_sparseWeights[p2];

              }

              p1++;

              p2++;

           } else if (ind1 > ind2) {

              p2++;

           } else {

              p1++;

           }

       }

    }

}

result -= m_b;

 

return result;

       如果不想看稀疏矩阵的压缩存储方式,看最上面的那一点就可以了,其中的for循环就是wx的点积,最后减去阈值。而这也就是刚才为什么计算F2的时候要-m_b的原因。公式是Keerthi论文639页。

// Check optimality using current bLow and bUp and, if

// violated, find an index i1 to do joint optimization

// with i2...

boolean optimal = true;

if (m_I0.contains(i2) || m_I1.contains(i2) || m_I2.contains(i2)) {

    if (m_bLow - F2 > 2 * m_tol) {

       optimal = false;

       i1 = m_iLow;

    }

}

if (m_I0.contains(i2) || m_I3.contains(i2) || m_I4.contains(i2)) {

    if (F2 - m_bUp > 2 * m_tol) {

       optimal = false;

       i1 = m_iUp;

    }

}

if (optimal) {

    return false;

}

       注释上写到,用当前的bLowbUp来检查最优性,如果违反了,找i1坾与i2联合最优化。可以从Keerthi中的2.5中看到m_bLowm_bUp分别是Fi在相应集合中的最小值和最大值,所有如果是满足最优性条件,不应该出现代码中的两个if如果出现了,则说明不满足,如果已经满足就返回false。而将i1赋以m_iLow或是m_iUp,因为它能使progress进展最快。

// For i2 unbound choose the better i1...

if (m_I0.contains(i2)) {

    if (m_bLow - F2 > F2 - m_bUp) {

       i1 = m_iLow;

    } else {

       i1 = m_iUp;

    }

}

if (i1 == -1) {

    throw new Exception("This should never happen!");

}

return takeStep(i1, i2, F2);

       m_bLow-F2F2-m_bUp必有一个在m_tol范围内是正值,也就是m_bLow本应该比F2小,但是如果这里比它大了,那么又因为m_bLow<m_bUp(公式2.6),那么F2-m_bUp应该是一个负值,而相反如果F2>m_bUp刚好相反。

       下面是takeStep的代码:

double C1 = m_C * m_data.instance(i1).weight();

double C2 = m_C * m_data.instance(i2).weight();

 

// Don't do anything if the two instances are the same

if (i1 == i2) {

    return false;

}

 

// Initialize variables

alph1 = m_alpha[i1];

alph2 = m_alpha[i2];

y1 = m_class[i1];

y2 = m_class[i2];

F1 = m_errors[i1];

s = y1 * y2;

       从这里可以看出来一点,为什么前面说不能把样本的权重设为0。惩罚因子为一个常量乘以样本的权重。下面注释写到如果两个样本一样就什么也不做。再下面是初始化,s=y1 * y2的说明在Platt论文的45页最下面图的说明中。

// Find the constraints on a2

if (y1 != y2) {

    L = Math.max(0, alph2 - alph1);

    H = Math.min(C2, C1 + alph2 - alph1);

} else {

    L = Math.max(0, alph1 + alph2 - C1);

    H = Math.min(C2, alph1 + alph2);

}

if (L >= H) {

    return false;

}

        Platt的公式12.312.4开始我看这个公式竟然有点糊涂,如果你也不幸糊涂了,把那个点做到x轴和y轴的垂线就明白了。

// Compute second derivative of objective function

k11 = m_kernel.eval(i1, i1, m_data.instance(i1));

k12 = m_kernel.eval(i1, i2, m_data.instance(i1));

k22 = m_kernel.eval(i2, i2, m_data.instance(i2));

eta = 2 * k12 - k11 - k22;

       这段代码是Platt公式12.5m_kernel.eval过会再看。

// Check if second derivative is negative

if (eta < 0) {

 

    // Compute unconstrained maximum

    a2 = alph2 - y2 * (F1 - F2) / eta;

 

    // Compute constrained maximum

    if (a2 < L) {

       a2 = L;

    } else if (a2 > H) {

       a2 = H;

    }

}

       注意一下Platt论文中的这段话:Under normal circumstances, there will be a maximum along the direction of the linear equality constraint, and eta will be less than zero, In this case, SMO computes the maximum along the direction of the constraint,下面是公式12.612.7。就是上面代码的解释。
  评论这张
 
阅读(2152)| 评论(1)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017