by Herb Sutter
A tale of two similar but different tools
Herb is a bestselling author and consultant on software development
topics, and a software architect at Microsoft. He can be contacted at
What does the volatile keyword mean? How should you use it?
Confusingly, there are two common answers, because depending on the
language you use volatile supports one or the other of two different
programming techniques: lock-free programming, and dealing with
'unusual' memory. (See Figure 1.)
Figure 1: A tale of two requirements.
Adding to the confusion, these two different uses have overlapping
requirements and impose overlapping restrictions, which makes them
appear more similar than they are. Let's define and understand them
clearly, and see how to spell them correctly in C, C++, Java and C# --
and not always as volatile.
Table 1: Comparing the overlapping but different requirements.
Case 1: Ordered Atomic Variables For
Lock-free programming is all about doing
communication and synchronization between threads using lower-level tools than
mutex locks. In the past and even today, however, these tools are all over the
map. In rough historical order, they include explicit fences/barriers (e.g.,
Linux's mb()), order-inducing special API calls (e.g., Windows'
InterlockedExchange), and various flavors of special atomic types. Many of
these tools are tedious and/or difficult, and their wide variety means that
lock-free code ends up being written differently in different environments.
The last few years, however, have seen a
major convergence across hardware and software vendors: The computing industry
is coalescing around sequentially consistent ordered atomic variables as the
default or only way to write lock-free code using the major languages and OS
platforms. In a nutshell, ordered atomic variables are safe to read and write
on multiple threads at the same time without doing any explicit locking because
they provide two guarantees: their reads and writes are guaranteed to be
executed in the order they appear in your program's source code; and each read
or write is guaranteed to be atomic, all-or-nothing. They also have special
operations such as compareAndSet that are guaranteed to be executed atomically.
See  for further details about ordered atomic variables and how to use them
Ordered atomic variables are available in
Java, C# and other .NET languages, and the forthcoming ISO C++ Standard, but
under different names:
Java provides ordered atomics under the volatile keyword (e.g., volatile int),
and solidified this support in Java 5 (2004). Java additionally provides a few
named types in java.util.concurrent.atomic, such as AtomicLongArray, that you
can use for the same purpose.
.NET mostly added them in Visual Studio 2005, also under the volatile keyword
(e.g., volatile int). These are suitable for nearly all lock-free code uses,
except for rare examples similar to Dekker's algorithm. .NET is fixing these
remaining corner cases in Visual Studio 2010, which is in early beta as of this
ISO C++ added them to the C++0x draft Standard in 2007, under the templated
name atomic<T> (e.g., atomic). They started to become available beginning
in 2008 in the Boost project and other implementations. . The ISO C++
atomics library also provides a C-compatible way to spell those types and their
operations (e.g., atomic_int), and these appear to be likely to be adopted by
ISO C in the future.
A Word About Optimization
We're going to look at how ordered atomics
restrict the optimizations that compilers, CPUs, cache effects, and other parts
of your execution environment might perform. So let's first briefly review some
basic rules of optimization.
The most fundamental rule of optimization
in pretty much any language is this: Optimizations that rearrange ('transform')
your code's execution are always legal if they don't change the meaning of the program,
so that the program can't tell the difference between executing the original
code and the transformed code. In some languages, this is also known as the 'as
if' rule -- which gets its name from the fact that the transformed code has the
same observable effects 'as if' the original source code had been executed as
This rule cuts two ways: First, an
optimization must never make it possible to get a result that wasn't possible
before, or break any guarantees that the original code was allowed to rely on,
including language semantics. If we produce an impossible result, after all,
the program and the user certainly can tell the difference, and it's not just
'as if' we'd executed the original untransformed code.
Second, optimizations are permitted to
reduce the set of possible executions. For example, an optimization might make
some potential (but not guaranteed) interleavings never actually happen. This
is okay, because the program couldn't rely on them happening anyway.
Ordered Atomics and Optimization
Using ordered atomic variables restricts
the kinds of optimizations your compiler and processor and cache system can do.
 There are two kinds of optimizations to consider:
Optimizations on the ordered atomic reads and writes themselves.
Optimizations on nearby ordinary reads and writes.
First, all of the ordered atomic reads and
writes on a given thread must execute exactly in source code order, because
that's one of the fundamental guarantees of ordered atomic variables. However,
we can still perform some optimizations, in particular, optimizations that have
the same effect as if this thread just always executed so quickly that another
thread didn't happen to ever interleave at certain points.
For instance, consider this code, where a
is an ordered atomic variable:
a = 1; // A
a = 2; // B
Is it legal for a compiler, processor,
cache, or other part of the execution environment to transform the above code
into the following, eliminating the redundant write in line A?
// A': OK: eliminate line A entirely
a = 2; // B
The answer is, 'Yes.' This is legal because
the program can't tell the difference; it's as if this thread always ran so
fast that no other thread accessing a concurrently ever got to interleave
between lines A and B to see the intermediate value. 
Similarly, if a is an ordered atomic
variable and local is an unshared local variable, it is legal to transform
a = 1; // C: write to a
local = a; // D:
read from a
a = 1; // C: write to a
local = 1; // D':
OK, apply "constant propagation"
which eliminates the read from a. Even if
another thread is concurrently trying to write to a, its as if this thread
always ran so fast that the other thread never got to interleave between lines
C and D to change the value before we can read our own back into local.
Second, nearby ordinary reads and writes
can still be reordered around ordered atomic reads and writes, but subject to
some restrictions. In particular, as described in , ordinary reads and
writes can't move upward across (from after to before) an ordered atomic read,
and can't move downward across (from before to after) an ordered atomic write.
In brief, that could move them out of a critical section of code, and you can
write programs that can tell the difference. For more details, see .
That's it for lock-free programming and
ordered atomics. What about the other case that some "volatiles"
Case 2: Semantic-Free Variables for
"Unusual" Memory Semantics
The second requirement is to deal with
"unusual" memory that goes beyond a given language's memory model,
where the compiler must assume that the variable can change value at any time
and/or that reads and writes may have unknowable semantics and consequences.
Classic examples include:
Hardware registers, part 1: Asynchronous changes. For example, consider a
memory location M on a custom board that is connected to an instrument that
directly writes to M. Unlike ordinary memory that is only modified by the
program itself, the value stored in M can change at any time even if no program
thread writes to it; therefore, the compiler cannot make any assumptions that
the value will be stable.
Hardware registers, part 2: Semantics. For example, consider a memory location
M on a custom board where writing to that location always automatically
increments by one. Unlike an ordinary RAM memory location, the compiler can't
even assume that performing a write to M and then immediately following it with
a read from M will necessarily read the same value that was written.
Memory having more than one address. If a given memory location is accessible
using two different addresses A1 and A2, a compiler or processor may not be
able to know that writing to location A1 can change the value at location A2. Any
optimization that assumes a write to A1 does not change the value of A2 will
break the program, and must be prevented.
Variables in such memory locations are
unoptimizable variables, because the compiler can't safely make any assumptions
about them at all. Put another way, the compiler needs to be told that such a
variable doesn't participate in the normal type system, even if it appears to
have a given type. For example, if memory location M or A1/A2 in the
aforementioned examples are typed as an "int" in the program, what
does that really mean? It means at most that it has the size and layout of an
int, but it cannot mean that it behaves like an int -- after all, ints don't
autoincrement themselves when you write to them, or mysteriously change their values
when you write to what looks like a different variable at some other address.
We need a way to turn off all optimizations
on their reads and writes. ISO C and C++ have a portable, standard way to tell
the compiler that this is such a special variable that it must not optimize:
Java and .NET have no comparable concept.
Managed environments are supposed to know the full semantics of the program
they execute, after all, so it's not surprising that they don't support memory
with "unknowable" semantics. But both Java and .NET do provide escape
hatches to leave the managed environment and invoke native code: Java provides
the Java Native Interface (JNI) and .NET provides Platform Invoke (P/Invoke).
The JNI specification  is silent about volatile, however, and doesn't
mention either Java volatile or C/C++ volatile at all; similarly, the P/Invoke
documentation doesn't mention interactions with .NET volatile or C/C++
volatile. So to correctly access an unoptimizable memory location in either Java
or .NET, you must write C/C++ functions that use C/C++ volatile to do the
needed work on behalf of their managed calling code, make sure they entirely
encapsulate and hide the volatile memory (i.e., neither take nor return
anything volatile), and call those functions through JNI and P/Invoke.
Unoptimizable Variables and (Not)
All reads and writes of unoptimizable
variables on a given thread must execute exactly as written; no optimizations
are allowed at all, because the compiler can't know the full semantics of the
variable and when and how its value can change. This is a stronger statement
than for ordered atomics, which only need to execute in source code order.
Consider again the two transformations we
considered before, but this time replacing the ordered atomic variable a with
the unoptimizable (C/C++ volatile) variable v:
v = 1; // A
v = 2; // B
Is it legal to transform this as follows to
remove the apparently redundant write in line A?
// A': invalid, cannot eliminate write
2; // B
The answer is no, because the compiler
cannot possibly know that eliminating line A's write to v won't change the
meaning of the program. For example, v could be a location accessed by custom
hardware that expects to see the value 1 before the value 2 and won't work
Similarly, if v is an unoptimizable
variable and local is an unshared local variable, it is not legal to transform
v = 1; // C: write to v local = v; // D:
read from v
a = 1; // C: write to v local = 1;l ; // D':
invalid, cannot do // "constant propagation"
to eliminate the read from v. For example,
v could be a hardware address that automatically increments itself every time
it's written, so that writing 1 will yield the value 2 on the next read.
Second, what about nearby ordinary reads
and writes -- can those still be reordered around unoptimizable reads and
writes? Today, there is no practical portable answer because C/C++ compiler
implementations vary widely and aren't likely to converge anytime soon. For example,
one interpretation of the C++ Standard holds that ordinary reads can move
freely in either direction across a C/C++ volatile read or write, but that an
ordinary write cannot move at all across a C/C++ volatile read or write --
which would make C/C++ volatile both less restrictive and more restrictive,
respectively, than an ordered atomic. Some compiler vendors support that
interpretation; others don't optimize across volatile reads or writes at all;
and still others have their own preferred semantics.
To safely write lock-free code that
communicates between threads without using locks, prefer to use ordered atomic
variables: Java/.NET volatile, C++0x atomic<T>, and C-compatible
To safely communicate with special hardware
or other memory that has unusual semantics, use unoptimizable variables: ISO
C/C++ volatile. Remember that reads and writes of these variables are not
necessarily atomic, however.
Finally, to express a variable that both
has unusual semantics and has any or all of the atomicity and/or ordering
guarantees needed for lock-free coding, only the ISO C++0x draft Standard
provides a direct way to spell it: volatile atomic<T>.
 H. Sutter. "Writing Lock-Free
Code: A Corrected Queue" (DDJ, October 2008). Available online at
 See www.boost.org.
 H. Sutter. "Apply Critical
Sections Consistently" (DDJ, November 2007). Available online at
 A common objection at this point is:
"In the original code, it was possible for another thread to see the
intermediate value, but that is impossible in the transformed code. Isn't that
a change in observable behavior? " The answer is, "No, " because
the program was never guaranteed to ever actually interleave just in time to
see that value; it was already a legal outcome for this thread to just always
run so fast that the interleaving happened to never manifest. Again, what this
optimization does is reduce the set of possible executions, which is always
 S. Liang. Java Native Interface:
Programmer's Guide and Specification. (Prentice Hall, 1999). Available online
trace back: http://www.ddj.com/windows/212701484?cid=RSSfeed_DDJ_Windows/.NET