http://en.wikipedia.org/wiki/Subgradient_method
Classical subgradient rules

Let f:\mathbb{R}^n \to \mathbb{R} be a convex function with domain \mathbb{R}^n. A classical subgradient method iterates

x^{(k+1)} = x^{(k)} - \alpha_k g^{(k)} \

where g^{(k)} denotes a subgradient of  f \  at x^{(k)} \ . If f \  is differentiable, then its only subgradient is the gradient vector \nabla f itself. It may happen that -g^{(k)} is not a descent direction for f \  at x^{(k)}. We therefore maintain a list f_{\rm{best}} \  that keeps track of the lowest objective function value found so far, i.e.

f_{\rm{best}}^{(k)} = \min\{f_{\rm{best}}^{(k-1)} , f(x^{(k)}) \}.
下图来自: http://www.stanford.edu/class/ee364b/notes/subgradients_notes.pdf
例2:SVM代价函数是hinge loss,在(1,0)除导数不存在,取1和1之间的数值,具体怎么取?Mingming Gong said好像这个pdf和
http://www.stanford.edu/class/ee364b/lectures/subgrad_method_slides.pdf,其中一个讲了。Mingming Gong asked tianyi, which is better, subgradient or smooth apprpximation?结论是不一定,subgradient解的是原问题,smooth不是解的原问题。一个相当于对梯度的近似,一个是对函数的近似,很难说哪个好。