O(1) 的小乐

Job Hunting

公告

记录我的生活和工作。。。
<2010年10月>
262728293012
3456789
10111213141516
17181920212223
24252627282930
31123456

统计

  • 随笔 - 182
  • 文章 - 1
  • 评论 - 41
  • 引用 - 0

留言簿(10)

随笔分类(70)

随笔档案(182)

文章档案(1)

如影随形

搜索

  •  

最新随笔

最新评论

阅读排行榜

评论排行榜

Kullback–Leibler divergence KL散度

In probability theory and information theory, the Kullback–Leibler divergence[1][2][3] (also information divergence,information gain, relative entropy, or KLIC) is a non-symmetric measure of the difference between two probability distributions P and Q. KL measures the expected number of extra bits required to code samples from P when using a code based on Q, rather than using a code based on P. Typically P represents the "true" distribution of data, observations, or a precise calculated theoretical distribution. The measure Q typically represents a theory, model, description, or approximation of P.

Although it is often intuited as a distance metric, the KL divergence is not a true metric – for example, the KL from P to Q is not necessarily the same as the KL from Q to P.

KL divergence is a special case of a broader class of divergences called f-divergences. Originally introduced by Solomon Kullbackand Richard Leibler in 1951 as the directed divergence between two distributions, it is not the same as a divergence incalculus. However, the KL divergence can be derived from the Bregman divergence.

 

 

注意P通常指数据集,我们已有的数据集,Q表示理论结果,所以KL divergence 的物理含义就是当用Q来编码P中的采样时,比用P来编码P中的采用需要多用的位数!

 

KL散度,也有人称为KL距离,但是它并不是严格的距离概念,其不满足三角不等式

 

KL散度是不对称的,当然,如果希望把它变对称,

Ds(p1, p2) = [D(p1, p2) + D(p2, p1)] / 2

 

下面是KL散度的离散和连续定义!

D_{\mathrm{KL}}(P\|Q) = \sum_i P(i) \log \frac{P(i)}{Q(i)}. \!

D_{\mathrm{KL}}(P\|Q) = \int_{-\infty}^\infty p(x) \log \frac{p(x)}{q(x)} \; dx, \!

注意的一点是p(x) 和q(x)分别是pq两个随机变量的PDF,D(P||Q)是一个数值,而不是一个函数,看下图!

 

注意:KL Area to be Integrated!

 

File:KL-Gauss-Example.png

 

KL 散度一个很强大的性质:

The Kullback–Leibler divergence is always non-negative,

D_{\mathrm{KL}}(P\|Q) \geq 0, \,

a result known as Gibbs' inequality, with DKL(P||Q) zero if and only if P = Q.

 

计算KL散度的时候,注意问题是在稀疏数据集上KL散度计算通常会出现分母为零的情况!

 

 

Matlab中的函数:KLDIV给出了两个分布的KL散度

Description

KLDIV Kullback-Leibler or Jensen-Shannon divergence between two distributions.

KLDIV(X,P1,P2) returns the Kullback-Leibler divergence between two distributions specified over the M variable values in vector X. P1 is a length-M vector of probabilities representing distribution 1, and P2 is a length-M vector of probabilities representing distribution 2. Thus, the probability of value X(i) is P1(i) for distribution 1 and P2(i) for distribution 2. The Kullback-Leibler divergence is given by:

   KL(P1(x),P2(x)) = sum[P1(x).log(P1(x)/P2(x))]

If X contains duplicate values, there will be an warning message, and these values will be treated as distinct values. (I.e., the actual values do not enter into the computation, but the probabilities for the two duplicate values will be considered as probabilities corresponding to two unique values.) The elements of probability vectors P1 and P2 must each sum to 1 +/- .00001.

A "log of zero" warning will be thrown for zero-valued probabilities. Handle this however you wish. Adding 'eps' or some other small value to all probabilities seems reasonable. (Renormalize if necessary.)

KLDIV(X,P1,P2,'sym') returns a symmetric variant of the Kullback-Leibler divergence, given by [KL(P1,P2)+KL(P2,P1)]/2. See Johnson and Sinanovic (2001).

KLDIV(X,P1,P2,'js') returns the Jensen-Shannon divergence, given by [KL(P1,Q)+KL(P2,Q)]/2, where Q = (P1+P2)/2. See the Wikipedia article for "Kullback–Leibler divergence". This is equal to 1/2 the so-called "Jeffrey divergence." See Rubner et al. (2000).

EXAMPLE: Let the event set and probability sets be as follow:
   X = [1 2 3 3 4]';
   P1 = ones(5,1)/5;
   P2 = [0 0 .5 .2 .3]' + eps;
Note that the event set here has duplicate values (two 3's). These will be treated as DISTINCT events by KLDIV. If you want these to be treated as the SAME event, you will need to collapse their probabilities together before running KLDIV. One way to do this is to use UNIQUE to find the set of unique events, and then iterate over that set, summing probabilities for each instance of each unique event. Here, we just leave the duplicate values to be treated independently (the default):
   KL = kldiv(X,P1,P2);
   KL =
        19.4899

Note also that we avoided the log-of-zero warning by adding 'eps' to all probability values in P2. We didn't need to renormalize because we're still within the sum-to-one tolerance.

REFERENCES:
1) Cover, T.M. and J.A. Thomas. "Elements of Information Theory," Wiley, 1991.
2) Johnson, D.H. and S. Sinanovic. "Symmetrizing the Kullback-Leibler distance." IEEE Transactions on Information Theory (Submitted).
3) Rubner, Y., Tomasi, C., and Guibas, L. J., 2000. "The Earth Mover's distance as a metric for image retrieval." International Journal of Computer Vision, 40(2): 99-121.
4) <a href="http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence"&gt;Kullback–Leibler divergence</a>. Wikipedia, The Free Encyclopedia.

posted on 2010-10-16 15:04 Sosi 阅读(9907) 评论(2)  编辑 收藏 引用 所属分类: Taps in Research

评论

# re: Kullback&ndash;Leibler divergence KL散度 2010-11-30 16:17 tintin0324

博主,本人的研究方向需要了解kl距离,有些问题想请教下,怎么联系呢?
  回复  更多评论    

# re: Kullback&ndash;Leibler divergence KL散度 2010-12-05 22:37 Sosi

@tintin0324
KL 距离本身很简单,如果就是那样子定义的,意义也如上面所说。。如果你想深入了解的话,可以读以下相关文献
  回复  更多评论    

只有注册用户登录后才能发表评论。
网站导航: 博客园   IT新闻   BlogJava   知识库   博问   管理


统计系统