Alexandre Passos, a PhD student in machine learning in ... (more)
27 upvotes by Quora User, Olivier Grisel, Quora User, (more)
It is very hard to characterize correctly. First, there are two complexities involved: at training time and at test time. For linear SVMs, at training time you must estimate the vector w and bias b by solving a quadratic problem, and at test time prediction is linear in the number of features and constant in the size of the training data. For kernel SVMs, at training time you must select the support vectors and at test time your complexity is linear on the number of the support vectors (which can be lower bounded by training set size * training set error rate) and linear on the number of features (since most kernels only compute a dos product; this will vary for graph kernels, string kernels, etc).

Solving the quadratic problem and choosing the support vectors is generally hard. In general, just testing that you have an optimal solution to the SVM problem involves of the order of n² dot products, while solving the quadratic problem directly involves inverting the kernel matrix, which has complexity on the order of n³ (where n is the size of your training set). More references for this in . However, one hardly ever needs to estimate the optimal solution; and the training time for a linear SVM to reach a certain level of generalization error actually decreases as training set size increases . In general, this depends a lot on what techniques you're using, but expect to have training times of the order of n² for all but state-of-the-art linear SVMs or approximate solvers

Olivier Grisel, Contributor to the scikit-learn proje... (more)
12 upvotes by Alexandre Passos, Quora User, Sébastien Royer, (more)
I agree with Alexandre. As a rule of thumb I use: O(n_samples^2 *n_features) for RBF kernel using SMO solver and n_sample * n_features for linear SVMs such as liblinear also noting that strong regularization (low C) makes it faster to converge too.

Diego Cantor, PhD student, musician, cook, geek.
1 upvote by Rahul K Mishra.
It's O(max(n,d)  min (n,d)^2), where n is the number of points and d is the number of dimensions, according to:

Chapelle, Olivier. "Training a support vector machine in the primal." Neural Computation 19.5 (2007): 1155-1178.