function rate = KNN(Train_data,Train_label,Test_data,Test_label,k,Distance_mark);

% K-Nearest-Neighbor classifier(K-NN classifier)
%Input:
%     Train_data,Test_data are training data set and test data
%     set,respectively.(Each row is a data point)
%     Train_label,Test_label are column vectors.They are labels of training
%     data set and test data set,respectively.
%     k is the number of nearest neighbors
%     Distance_mark           :   ['Euclidean', 'L2'| 'L1' | 'Cos'] 
%     'Cos' represents Cosine distance.
%Output:
%     rate:Accuracy of K-NN classifier
%
%    Examples:
%      
% %Classification problem with three classes
% A = rand(50,300);
% B = rand(50,300)+2;
% C = rand(50,300)+3;
% % label vector for the three classes
% gnd = [ones(300,1);2*ones(300,1);3*ones(300,1)];
% fea = [A B C]';
% trainIdx = [1:150,301:450,601:750]';
% testIdx = [151:300,451:600,751:900]';
% fea_Train = fea(trainIdx,:);
% gnd_Train = gnd(trainIdx);
% fea_Test = fea(testIdx,:);
% gnd_Test = gnd(testIdx);
% rate = KNN(fea_Train,gnd_Train,fea_Test,gnd_Test,1)
%
%
%
%Reference:
%
% If you used my matlab code, we appreciate it very much if you can cite our following papers:
% Jie Gui, Tongliang Liu, Dacheng Tao, Zhenan Sun, Tieniu Tan, "Representative Vector Machines: A unified framework for classical classifiers", IEEE  
% Transactions on Cybernetics.
    
%This code is written by Gui Jie in the evening 2009/03/11.
%If you have find some bugs in the codes, feel free to contract me
if nargin < 5
    error('Not enought arguments!');
elseif nargin < 6
    Distance_mark='L2';
end
 
[n dim]    = size(Test_data);% number of test data set
train_num  = size(Train_data, 1); % number of training data set
% Normalize each feature to have zero mean and unit variance.
% If you need the following four rows,you can uncomment them.
% M        = mean(Train_data); % mean & std of the training data set
% S        = std(Train_data);
% Train_data = (Train_data - ones(train_num, 1) * M)./(ones(train_num, 1) * S); % normalize training data set
% Test_data            = (Test_data-ones(n,1)*M)./(ones(n,1)*S); % normalize data
U        = unique(Train_label); % class labels
nclasses = length(U);%number of classes
Result  = zeros(n, 1);
Count   = zeros(nclasses, 1);
dist=zeros(train_num,1);
for i = 1:n
    % compute distances between test data and all training data and
    % sort them
    test=Test_data(i,:);
    for j=1:train_num
        train=Train_data(j,:);V=test-train;
        switch Distance_mark
            case {'Euclidean', 'L2'}
                dist(j,1)=norm(V,2); % Euclead (L2) distance
            case 'L1'
                dist(j,1)=norm(V,1); % L1 distance
            case 'Cos'
                dist(j,1)=acos(test*train'/(norm(test,2)*norm(train,2)));     % cos distance
            otherwise
                dist(j,1)=norm(V,2); % Default distance
        end
    end
    [Dummy Inds] = sort(dist);
    % compute the class labels of the k nearest samples
    Count(:) = 0;
    for j = 1:k
        ind        = find(Train_label(Inds(j)) == U); %find the label of the j'th nearest neighbors 
        Count(ind) = Count(ind) + 1;
    end% Count:the number of each class of k nearest neighbors
    
    % determine the class of the data sample
    [dummy ind] = max(Count);
    Result(i)   = U(ind);
end
correctnumbers=length(find(Result==Test_label));
rate=correctnumbers/n;






--------------------------------------------------以上是代码---------------------------------------------------------------------

余弦距离和余弦相似度的区别

餘弦相似度(cosine similarity)乃是傳統文件分類中,最常被拿來度量文件間距離的基本度量方法,其以兩個 d 維向量間的角度差異來度量該向量間的距離,所得數據介於 0 ~ 1 之間,當兩向量角度越相近時,所求出的餘弦距離越接近1;反之,則越接近 0。假設在 d 維空間中有兩點a = [a1, a2, …, ad],b =  [b1, b2, …,bd],則其餘弦相似度可表示為: 
cosineSimilarity(a,b) = dot(a,b) / (norm(a)*norm(b))   [我觉得这里说成cosineSimilarity,不应该说成cosineDistance。相似度越大,距离应该越小。比如,a和b夹角为0,此时最相似,相似度最大,距离最小]
 dot(a,b)  代表a和b的内积,因为向量内积定义为a·b = |a| × |b| × cosθ, (一般情况下,θ∈[0,π], http://baike.baidu.com/view/1485493.htm )。故这样定义不能满足在 0 ~ 1 之間,而是-1到1之间,有两种方式:
(1) 我下面的代码是正确的,用acos,将这个余弦转化为[0, π]之间的角度. 未必一定要限制在0 ~ 1 ,我的代码转化成[0, π],值越大代表其距离越大;
(2) cosineDistance(a,b) = 1- cosineSimilarity(a,b) = 1- dot(a,b) / (norm(a)*norm(b))。cosineDistance的范围就在[0 2]。
範例: 
a=[1 1 1]; b=[1 0 0]; 
cosineDistance = dot(a,b) / (norm(a)*norm(b))   
cosineDistance = 
    0.5774   [http://neural.cs.nthu.edu.tw/jang/books/dcpr/doc/02%E8%B7%9D%E9%9B%A2%E8%88%87%E7%9B%B8%E4%BC%BC%E5%BA%A6.pdf ,已经保存到电脑:距离与相似度.pdf]
"KDD17_Linearized GMM Kernels"的公式1是cosine的等价表示形式;“Min-Max Kernels 1503.01737”的公式1是Min-Max Kernel的标准定义

Minimal Local Reconstruction Error Measure Based Discriminant Feature Extraction and Classification的P3
For robustness, before classification, we generally need to normalize feature vectors, making the length of each feature vector to be 1, that is, x -> x /|| x || .
The normalized Euclidean distance is equivalent to the cosine distance.
(1) Lin Zhu 师弟讲将循环改为计算距离矩阵会节省时间,因为matlab循环很耗时,但大样本还必须用循环否则out of memory.想起以前上课jinsong老师也提供了一个KNN代码,不过他的也是用循环实现的.matlab有自带的函数knnclassify,论文Sparsity preserving projections的代码SPP_1NN.m中就用的该函数。在ASLAN上我的KNN和knnclassify识别率完全一样
(2) 极其重要注意点:倒数第四行程序不要用Result(i)   = ind;这对Yale等标号依次为1,2,3等没问题。对二分类1和-1就有问题。SRC_QC和SRC_QC2也是类似的,倒数第三行不能用Result(i) = index, 要用Result(i) = classLabel(index);   原来只修改了这一处,其实SRC_QC2的50行和SRC_QC的42行也要将ii修改为classLabel(ii)。正因为这个错误,才得出SRC在ASLAN上是50%错误率方差是0的错误结果。正确的SRC_QC2和SRC_QC程序在ASLAN目录