Discussion:
[concurrency-interest] Relativity of guarantees provided by volatile
Marko Topolnik
2012-08-17 21:24:07 UTC
Permalink
Consider the following synchronization order of a program execution involving a total of two threads, R and W:

- thread R begins;

- thread R reads a volatile int sharedVar several times. Each time it reads the value 0;

- thread R completes;

- thread W begins;

- thread W writes the sharedVar several times. Each time it writes the value 1;

- thread W completes.

Now consider the wall-clock timing of the events:

- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.

As far as the Java Memory Model is concerned, there is no contradiction between the synchronization order and the wall-clock times, as the JMM is wall-clock agnostic. However, I have yet to meet a single Java professional who wouldn't at least be very surprised to hear that the specification allows this.

I understand that the SMP architecture that dominates the world of computing today practically never takes these liberties and makes the volatile writes visible almost instantaneously. This may change at any time, however, especially with the advent of massively parrallel architectures that seem to be the future. For example, an optimization technique may choose to chunk many volatile writes together and make them visible in a single bulk operation. This can be safely done as long as there are no intervening read-y actions (targets of the synchronizes-with edges as defined by JLS/JSE7 17.4.4).

Now, my questions are:

1. Is there a loophole in my reasoning?

2. If there is no loophole, is there anything to worry about, given that practically 100% developers out there consider as guaranteed something that isn't?


-Marko
Vitaly Davidovich
2012-08-17 22:44:31 UTC
Permalink
That's an interesting question. I think you mean whether JVMs (at least in
the context of this mailing list) will attempt to perform such
optimizations? The CPU doesn't know what a volatile variable is - it's a
language construct, of course.

There's been some similar discussion here in the past. One example is
whether a write is ever guaranteed to be observed - think cooperatively
scheduled system. Or perhaps the store buffer is so large that the write
simply lingers there indefinitely; I think most people today assume that
this doesn't happen and that it ultimately gets drained (without explicit
fences, that is).

I will say that it's been stated all over the web that if you write a
volatile value in thread A and then read that value in thread B and those
things happen in that chronological order, that one will see the write from
A immediately (assume no other writers of that memory). Technically, the
JMM doesn't require that in this case- it only imposes order with respect
to other operations, not timeliness. However, I don't see how JVM vendors
can break this behavior, even if it conforms to today's JMM spec - there's
just too much potential pain to customers.

Hardware vendors have the same issue - too much change that's negatively
observed by software isn't going to fly. They're much more likely to add
new optional instructions than to modify existing behavior.

Sent from my phone
Post by Marko Topolnik
Consider the following synchronization order of a program execution
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no contradiction
between the synchronization order and the wall-clock times, as the JMM is
wall-clock agnostic. However, I have yet to meet a single Java professional
who wouldn't at least be very surprised to hear that the specification
allows this.
I understand that the SMP architecture that dominates the world of
computing today practically never takes these liberties and makes the
volatile writes visible almost instantaneously. This may change at any
time, however, especially with the advent of massively parrallel
architectures that seem to be the future. For example, an optimization
technique may choose to chunk many volatile writes together and make them
visible in a single bulk operation. This can be safely done as long as
there are no intervening read-y actions (targets of the synchronizes-with
edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about, given that
practically 100% developers out there consider as guaranteed something that
isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Nathan Reynolds
2012-08-17 22:46:43 UTC
Permalink
Post by Marko Topolnik
I have yet to meet a single Java professional who wouldn't at least
be very surprised to hear that the specification allows this.

Add me to the list of surprised.

Here's an excerpt from the specification.

A write to a volatile variable (§8.3.1.4)
<http://docs.oracle.com/javase/specs/jls/se5.0/html/classes.html#36930>
/v/ synchronizes-with all subsequent reads of /v/ by any thread (where
subsequent is defined according to the synchronization order).

http://docs.oracle.com/javase/specs/jls/se5.0/html/memory.html#17.4.4

Doesn't this mean that R will always read 1 since the read occurs after
the write?

Nathan Reynolds
<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds> |
Consulting Member of Technical Staff | 602.333.9091
Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
Post by Marko Topolnik
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no contradiction between the synchronization order and the wall-clock times, as the JMM is wall-clock agnostic. However, I have yet to meet a single Java professional who wouldn't at least be very surprised to hear that the specification allows this.
I understand that the SMP architecture that dominates the world of computing today practically never takes these liberties and makes the volatile writes visible almost instantaneously. This may change at any time, however, especially with the advent of massively parrallel architectures that seem to be the future. For example, an optimization technique may choose to chunk many volatile writes together and make them visible in a single bulk operation. This can be safely done as long as there are no intervening read-y actions (targets of the synchronizes-with edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about, given that practically 100% developers out there consider as guaranteed something that isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Vitaly Davidovich
2012-08-17 23:01:27 UTC
Permalink
Is it ambiguous whether synchronizes with implies timeliness? I know we
treat it as such, as Mark mentioned, but is that mentioned somewhere in the
JMM?

Sent from my phone
Post by Marko Topolnik
Post by Marko Topolnik
I have yet to meet a single Java professional who wouldn't at least be
very surprised to hear that the specification allows this.
Add me to the list of surprised.
Here's an excerpt from the specification.
A write to a volatile variable (§8.3.1.4)<http://docs.oracle.com/javase/specs/jls/se5.0/html/classes.html#36930>
*v* synchronizes-with all subsequent reads of *v* by any thread (where
subsequent is defined according to the synchronization order).
http://docs.oracle.com/javase/specs/jls/se5.0/html/memory.html#17.4.4
Doesn't this mean that R will always read 1 since the read occurs after
the write?
Nathan Reynolds<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds>| Consulting Member of Technical Staff |
602.333.9091
Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no contradiction between the synchronization order and the wall-clock times, as the JMM is wall-clock agnostic. However, I have yet to meet a single Java professional who wouldn't at least be very surprised to hear that the specification allows this.
I understand that the SMP architecture that dominates the world of computing today practically never takes these liberties and makes the volatile writes visible almost instantaneously. This may change at any time, however, especially with the advent of massively parrallel architectures that seem to be the future. For example, an optimization technique may choose to chunk many volatile writes together and make them visible in a single bulk operation. This can be safely done as long as there are no intervening read-y actions (targets of the synchronizes-with edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about, given that practically 100% developers out there consider as guaranteed something that isn't?
-Marko
_______________________________________________
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
David Holmes
2012-08-17 22:58:56 UTC
Permalink
Hi Marko,

I think the "surprise" is only in the way you formulated this. Said another
way a write takes a finite amount of time from when the instruction starts
to execute to when the store is actually available for a read to see.
(Similarly a read takes a finite amount of time.) So depending on those two
times a read and write that happen "around the same time" may appear to have
occurred in either order. But when you program with threads you never know
the relative interleavings (or should never assume) so it makes no
difference how the program perceives the order compared to how some external
observer might perceive it.

As for your optimization to "chunk" volatile writes, I don't see a problem
here if you are basically asking if given:

w = 1; // w is volatile
w = 2;
w = 3;

that this could be reduced to the last write alone? I see no reason why not.
Without some additional coordination between a reader thread and the writer
thread, reading w==3 is a legitimate outcome. If you are thinking about how
the hardware might chunk things then that is a different matter. We have to
use the hardware in a way that complies with the memory model - if the
hardware can't comply then you can't run Java on it.

David Holmes
------------
-----Original Message-----
Topolnik
Sent: Saturday, 18 August 2012 7:24 AM
Subject: [concurrency-interest] Relativity of guarantees provided by
volatile
Consider the following synchronization order of a program
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each
time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it
writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no
contradiction between the synchronization order and the
wall-clock times, as the JMM is wall-clock agnostic. However, I
have yet to meet a single Java professional who wouldn't at least
be very surprised to hear that the specification allows this.
I understand that the SMP architecture that dominates the world
of computing today practically never takes these liberties and
makes the volatile writes visible almost instantaneously. This
may change at any time, however, especially with the advent of
massively parrallel architectures that seem to be the future. For
example, an optimization technique may choose to chunk many
volatile writes together and make them visible in a single bulk
operation. This can be safely done as long as there are no
intervening read-y actions (targets of the synchronizes-with
edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about,
given that practically 100% developers out there consider as
guaranteed something that isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Yuval Shavit
2012-08-17 23:10:15 UTC
Permalink
The more I think about Marko's problem, the more it bugs me. I don't think
it's that the three writes can be reduced to the last one -- it's that
there's no requirement for this to ever be seen. That is, given threads t1
and t2, such that by clock time:

(A) t1: write to volatileField
(B) t2: read from volatileField

The JLS states that if A happens-before B, then B must see A. It defines
hb(A, B) as synchronized-with(A, B), which is true if t2 is subsequent to
t1. But "subsequent" isn't defined as clock time. It's left undefined in
JLS 17, except for twice in 17.4.4 in which it's defined as "according to
the synchronization order" -- which seems like a tautology!

In other words, I think it comes down to the definition of "subsequent,"
which is undefined. There's nothing preventing a JVM from deciding that
even though A happened before B in clock time, A is subsequent to B.
Post by David Holmes
Hi Marko,
I think the "surprise" is only in the way you formulated this. Said another
way a write takes a finite amount of time from when the instruction starts
to execute to when the store is actually available for a read to see.
(Similarly a read takes a finite amount of time.) So depending on those two
times a read and write that happen "around the same time" may appear to have
occurred in either order. But when you program with threads you never know
the relative interleavings (or should never assume) so it makes no
difference how the program perceives the order compared to how some external
observer might perceive it.
As for your optimization to "chunk" volatile writes, I don't see a problem
w = 1; // w is volatile
w = 2;
w = 3;
that this could be reduced to the last write alone? I see no reason why not.
Without some additional coordination between a reader thread and the writer
thread, reading w==3 is a legitimate outcome. If you are thinking about how
the hardware might chunk things then that is a different matter. We have to
use the hardware in a way that complies with the memory model - if the
hardware can't comply then you can't run Java on it.
David Holmes
------------
-----Original Message-----
Topolnik
Sent: Saturday, 18 August 2012 7:24 AM
Subject: [concurrency-interest] Relativity of guarantees provided by
volatile
Consider the following synchronization order of a program
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each
time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it
writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no
contradiction between the synchronization order and the
wall-clock times, as the JMM is wall-clock agnostic. However, I
have yet to meet a single Java professional who wouldn't at least
be very surprised to hear that the specification allows this.
I understand that the SMP architecture that dominates the world
of computing today practically never takes these liberties and
makes the volatile writes visible almost instantaneously. This
may change at any time, however, especially with the advent of
massively parrallel architectures that seem to be the future. For
example, an optimization technique may choose to chunk many
volatile writes together and make them visible in a single bulk
operation. This can be safely done as long as there are no
intervening read-y actions (targets of the synchronizes-with
edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about,
given that practically 100% developers out there consider as
guaranteed something that isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Vitaly Davidovich
2012-08-17 23:11:59 UTC
Permalink
I don't think the writes to w can be reduced to just the last one as it
would violate the JMM. R may only see the last one due to interleaving
though. Not sure if that's what you meant.

Sent from my phone
Post by David Holmes
Hi Marko,
I think the "surprise" is only in the way you formulated this. Said another
way a write takes a finite amount of time from when the instruction starts
to execute to when the store is actually available for a read to see.
(Similarly a read takes a finite amount of time.) So depending on those two
times a read and write that happen "around the same time" may appear to have
occurred in either order. But when you program with threads you never know
the relative interleavings (or should never assume) so it makes no
difference how the program perceives the order compared to how some external
observer might perceive it.
As for your optimization to "chunk" volatile writes, I don't see a problem
w = 1; // w is volatile
w = 2;
w = 3;
that this could be reduced to the last write alone? I see no reason why not.
Without some additional coordination between a reader thread and the writer
thread, reading w==3 is a legitimate outcome. If you are thinking about how
the hardware might chunk things then that is a different matter. We have to
use the hardware in a way that complies with the memory model - if the
hardware can't comply then you can't run Java on it.
David Holmes
------------
-----Original Message-----
Topolnik
Sent: Saturday, 18 August 2012 7:24 AM
Subject: [concurrency-interest] Relativity of guarantees provided by
volatile
Consider the following synchronization order of a program
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each
time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it
writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no
contradiction between the synchronization order and the
wall-clock times, as the JMM is wall-clock agnostic. However, I
have yet to meet a single Java professional who wouldn't at least
be very surprised to hear that the specification allows this.
I understand that the SMP architecture that dominates the world
of computing today practically never takes these liberties and
makes the volatile writes visible almost instantaneously. This
may change at any time, however, especially with the advent of
massively parrallel architectures that seem to be the future. For
example, an optimization technique may choose to chunk many
volatile writes together and make them visible in a single bulk
operation. This can be safely done as long as there are no
intervening read-y actions (targets of the synchronizes-with
edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about,
given that practically 100% developers out there consider as
guaranteed something that isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
David Holmes
2012-08-17 23:03:48 UTC
Permalink
Nathan,

Is there a synchronization order here that indicates that the read occurs
after the write? "subsequent" is not defined by wall-clock time as
externally observed.

And to batch responses :) Hotspot's implementation requires that all writes
will become visible without need for explicit barriers/fences to force that.

David
-----Original Message-----
From: concurrency-interest-***@cs.oswego.edu
[mailto:concurrency-interest-***@cs.oswego.edu]On Behalf Of Nathan
Reynolds
Sent: Saturday, 18 August 2012 8:47 AM
To: concurrency-***@cs.oswego.edu
Subject: Re: [concurrency-interest] Relativity of guarantees provided
byvolatile
Post by Marko Topolnik
I have yet to meet a single Java professional who wouldn't at least be
very surprised to hear that the specification allows this.

Add me to the list of surprised.

Here's an excerpt from the specification.

A write to a volatile variable (§8.3.1.4) v synchronizes-with all
subsequent reads of v by any thread (where subsequent is defined according
to the synchronization order).

http://docs.oracle.com/javase/specs/jls/se5.0/html/memory.html#17.4.4

Doesn't this mean that R will always read 1 since the read occurs after
the write?


Nathan Reynolds | Consulting Member of Technical Staff | 602.333.9091
Oracle PSR Engineering | Server Technology

On 8/17/2012 2:24 PM, Marko Topolnik wrote:

Consider the following synchronization order of a program execution
involving a total of two threads, R and W:

- thread R begins;

- thread R reads a volatile int sharedVar several times. Each time it reads
the value 0;

- thread R completes;

- thread W begins;

- thread W writes the sharedVar several times. Each time it writes the value
1;

- thread W completes.

Now consider the wall-clock timing of the events:

- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.

As far as the Java Memory Model is concerned, there is no contradiction
between the synchronization order and the wall-clock times, as the JMM is
wall-clock agnostic. However, I have yet to meet a single Java professional
who wouldn't at least be very surprised to hear that the specification
allows this.

I understand that the SMP architecture that dominates the world of computing
today practically never takes these liberties and makes the volatile writes
visible almost instantaneously. This may change at any time, however,
especially with the advent of massively parrallel architectures that seem to
be the future. For example, an optimization technique may choose to chunk
many volatile writes together and make them visible in a single bulk
operation. This can be safely done as long as there are no intervening
read-y actions (targets of the synchronizes-with edges as defined by
JLS/JSE7 17.4.4).

Now, my questions are:

1. Is there a loophole in my reasoning?

2. If there is no loophole, is there anything to worry about, given that
practically 100% developers out there consider as guaranteed something that
isn't?


-Marko
Nathan Reynolds
2012-08-17 23:25:17 UTC
Permalink
Hmm... I am getting tripped up on JMM vs Intel memory model vs HotSpot
implementation.

Dictionary.com says subsequent = occurring or coming later or after;
following in order or succession. Given that each read happens 1 unit
of time after every write, I would think that the read is "subsequent"
to the write. In other words, once the write happens, then any read
operation which starts execution after that point in time will be
subsequent to the write and hence will see the effects of the write.

Nathan Reynolds
<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds> |
Consulting Member of Technical Staff | 602.333.9091
Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
Post by David Holmes
Nathan,
Is there a synchronization order here that indicates that the read
occurs after the write? "subsequent" is not defined by wall-clock time
as externally observed.
And to batch responses :) Hotspot's implementation requires that all
writes will become visible without need for explicit barriers/fences
to force that.
David
-----Original Message-----
*Nathan Reynolds
*Sent:* Saturday, 18 August 2012 8:47 AM
*Subject:* Re: [concurrency-interest] Relativity of guarantees
provided byvolatile
Post by Marko Topolnik
I have yet to meet a single Java professional who wouldn't at
least be very surprised to hear that the specification allows this.
Add me to the list of surprised.
Here's an excerpt from the specification.
A write to a volatile variable (§8.3.1.4)
<http://docs.oracle.com/javase/specs/jls/se5.0/html/classes.html#36930>
/v/ synchronizes-with all subsequent reads of /v/ by any thread
(where subsequent is defined according to the synchronization order).
http://docs.oracle.com/javase/specs/jls/se5.0/html/memory.html#17.4.4
Doesn't this mean that R will always read 1 since the read occurs
after the write?
Nathan Reynolds
<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds> |
Consulting Member of Technical Staff | 602.333.9091
Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
Post by Marko Topolnik
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no contradiction between the synchronization order and the wall-clock times, as the JMM is wall-clock agnostic. However, I have yet to meet a single Java professional who wouldn't at least be very surprised to hear that the specification allows this.
I understand that the SMP architecture that dominates the world of computing today practically never takes these liberties and makes the volatile writes visible almost instantaneously. This may change at any time, however, especially with the advent of massively parrallel architectures that seem to be the future. For example, an optimization technique may choose to chunk many volatile writes together and make them visible in a single bulk operation. This can be safely done as long as there are no intervening read-y actions (targets of the synchronizes-with edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about, given that practically 100% developers out there consider as guaranteed something that isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Kirk Pepperdine
2012-08-18 15:10:58 UTC
Permalink
Post by David Holmes
Nathan,
Is there a synchronization order here that indicates that the read occurs after the write? "subsequent" is not defined by wall-clock time as externally observed.
subsequent |ˈsəbsəkwənt|
adjective
coming after something in time;

What we have is an intractable problem that is as old as stone tablets. At the hardware level, no matter how small the time quantum is, as long as instructions take more than 1 of them and we have multiple hardware threads, there will be a gap in time where there will be a non-zero probability that a write hasn't finished and a read will have started. That yields a non-zero probability were no one will be able to provide any guarantee as to what any value will be read. At that point all you can say is that any calculation you perform will be speculative and as is the case with any speculative calculation, you're going to have to make assumption and then before you do anything else, you're going to have to revalidate those assumptions. In this case the assumption is that you're working with a stable value as defined by the JMM. And this problem will only get worse as we add more hardware threads. The only positive is that we're gradually moving to more hardware threads which means we're able to find and correct the most probable conditions thus eliminating them before we get to the next round.

Regards,
Kirk
David Holmes
2012-08-17 23:16:34 UTC
Permalink
I think this is looking at things the wrong way. IF t2 is subsequent to t1
THEN the JMM says the read must see the write. The JMM doesn't, and can't
define "subsequent" in an abstract sense.

David
-----Original Message-----
From: Yuval Shavit [mailto:***@akiban.com]
Sent: Saturday, 18 August 2012 9:10 AM
To: ***@ieee.org
Cc: Marko Topolnik; concurrency-***@cs.oswego.edu
Subject: Re: [concurrency-interest] Relativity of guarantees provided by
volatile


The more I think about Marko's problem, the more it bugs me. I don't think
it's that the three writes can be reduced to the last one -- it's that
there's no requirement for this to ever be seen. That is, given threads t1
and t2, such that by clock time:


(A) t1: write to volatileField
(B) t2: read from volatileField


The JLS states that if A happens-before B, then B must see A. It defines
hb(A, B) as synchronized-with(A, B), which is true if t2 is subsequent to
t1. But "subsequent" isn't defined as clock time. It's left undefined in JLS
17, except for twice in 17.4.4 in which it's defined as "according to the
synchronization order" -- which seems like a tautology!


In other words, I think it comes down to the definition of "subsequent,"
which is undefined. There's nothing preventing a JVM from deciding that even
though A happened before B in clock time, A is subsequent to B.


On Fri, Aug 17, 2012 at 6:58 PM, David Holmes <***@aapt.net.au>
wrote:

Hi Marko,

I think the "surprise" is only in the way you formulated this. Said
another
way a write takes a finite amount of time from when the instruction
starts
to execute to when the store is actually available for a read to see.
(Similarly a read takes a finite amount of time.) So depending on those
two
times a read and write that happen "around the same time" may appear to
have
occurred in either order. But when you program with threads you never
know
the relative interleavings (or should never assume) so it makes no
difference how the program perceives the order compared to how some
external
observer might perceive it.

As for your optimization to "chunk" volatile writes, I don't see a
problem
here if you are basically asking if given:

w = 1; // w is volatile
w = 2;
w = 3;

that this could be reduced to the last write alone? I see no reason why
not.
Without some additional coordination between a reader thread and the
writer
thread, reading w==3 is a legitimate outcome. If you are thinking about
how
the hardware might chunk things then that is a different matter. We have
to
use the hardware in a way that complies with the memory model - if the
hardware can't comply then you can't run Java on it.

David Holmes
------------
-----Original Message-----
Topolnik
Sent: Saturday, 18 August 2012 7:24 AM
Subject: [concurrency-interest] Relativity of guarantees provided by
volatile
Consider the following synchronization order of a program
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each
time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it
writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no
contradiction between the synchronization order and the
wall-clock times, as the JMM is wall-clock agnostic. However, I
have yet to meet a single Java professional who wouldn't at least
be very surprised to hear that the specification allows this.
I understand that the SMP architecture that dominates the world
of computing today practically never takes these liberties and
makes the volatile writes visible almost instantaneously. This
may change at any time, however, especially with the advent of
massively parrallel architectures that seem to be the future. For
example, an optimization technique may choose to chunk many
volatile writes together and make them visible in a single bulk
operation. This can be safely done as long as there are no
intervening read-y actions (targets of the synchronizes-with
edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about,
given that practically 100% developers out there consider as
guaranteed something that isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
Concurrency-***@cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
David Holmes
2012-08-17 23:19:43 UTC
Permalink
How does it violate the JMM? There is nothing to establish that any read has
to have occurred prior to w=3. An external observer may say "hey if we'd
actually written w=1 at this point then the read would see 1" but that is
irrelevant. The program can not tell the other writes did not occur.

David
-----Original Message-----
From: Vitaly Davidovich [mailto:***@gmail.com]
Sent: Saturday, 18 August 2012 9:12 AM
To: ***@ieee.org
Cc: Marko Topolnik; concurrency-***@cs.oswego.edu
Subject: Re: [concurrency-interest] Relativity of guarantees provided by
volatile


I don't think the writes to w can be reduced to just the last one as it
would violate the JMM. R may only see the last one due to interleaving
though. Not sure if that's what you meant.

Sent from my phone

On Aug 17, 2012 7:03 PM, "David Holmes" <***@aapt.net.au> wrote:

Hi Marko,

I think the "surprise" is only in the way you formulated this. Said
another
way a write takes a finite amount of time from when the instruction
starts
to execute to when the store is actually available for a read to see.
(Similarly a read takes a finite amount of time.) So depending on those
two
times a read and write that happen "around the same time" may appear to
have
occurred in either order. But when you program with threads you never
know
the relative interleavings (or should never assume) so it makes no
difference how the program perceives the order compared to how some
external
observer might perceive it.

As for your optimization to "chunk" volatile writes, I don't see a
problem
here if you are basically asking if given:

w = 1; // w is volatile
w = 2;
w = 3;

that this could be reduced to the last write alone? I see no reason why
not.
Without some additional coordination between a reader thread and the
writer
thread, reading w==3 is a legitimate outcome. If you are thinking about
how
the hardware might chunk things then that is a different matter. We have
to
use the hardware in a way that complies with the memory model - if the
hardware can't comply then you can't run Java on it.

David Holmes
------------
-----Original Message-----
Topolnik
Sent: Saturday, 18 August 2012 7:24 AM
Subject: [concurrency-interest] Relativity of guarantees provided by
volatile
Consider the following synchronization order of a program
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each
time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it
writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no
contradiction between the synchronization order and the
wall-clock times, as the JMM is wall-clock agnostic. However, I
have yet to meet a single Java professional who wouldn't at least
be very surprised to hear that the specification allows this.
I understand that the SMP architecture that dominates the world
of computing today practically never takes these liberties and
makes the volatile writes visible almost instantaneously. This
may change at any time, however, especially with the advent of
massively parrallel architectures that seem to be the future. For
example, an optimization technique may choose to chunk many
volatile writes together and make them visible in a single bulk
operation. This can be safely done as long as there are no
intervening read-y actions (targets of the synchronizes-with
edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about,
given that practically 100% developers out there consider as
guaranteed something that isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
Concurrency-***@cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Vitaly Davidovich
2012-08-17 23:21:44 UTC
Permalink
Two volatile writes emit a store-store barrier in between, which to me
means they cannot be collapsed and must be made visible in that order (on
non-TSO this would require a h/w fence). In other words, I don't think
compiler can remove the redundant stores as if this was a non-volatile
field, where it's a perfectly valid (and good) optimization.

Sent from my phone
Post by David Holmes
**
How does it violate the JMM? There is nothing to establish that any read
has to have occurred prior to w=3. An external observer may say "hey if
we'd actually written w=1 at this point then the read would see 1" but that
is irrelevant. The program can not tell the other writes did not occur.
David
-----Original Message-----
*Sent:* Saturday, 18 August 2012 9:12 AM
*Subject:* Re: [concurrency-interest] Relativity of guarantees provided
by volatile
I don't think the writes to w can be reduced to just the last one as it
would violate the JMM. R may only see the last one due to interleaving
though. Not sure if that's what you meant.
Sent from my phone
Post by David Holmes
Hi Marko,
I think the "surprise" is only in the way you formulated this. Said another
way a write takes a finite amount of time from when the instruction starts
to execute to when the store is actually available for a read to see.
(Similarly a read takes a finite amount of time.) So depending on those two
times a read and write that happen "around the same time" may appear to have
occurred in either order. But when you program with threads you never know
the relative interleavings (or should never assume) so it makes no
difference how the program perceives the order compared to how some external
observer might perceive it.
As for your optimization to "chunk" volatile writes, I don't see a problem
w = 1; // w is volatile
w = 2;
w = 3;
that this could be reduced to the last write alone? I see no reason why not.
Without some additional coordination between a reader thread and the writer
thread, reading w==3 is a legitimate outcome. If you are thinking about how
the hardware might chunk things then that is a different matter. We have to
use the hardware in a way that complies with the memory model - if the
hardware can't comply then you can't run Java on it.
David Holmes
------------
-----Original Message-----
Topolnik
Sent: Saturday, 18 August 2012 7:24 AM
Subject: [concurrency-interest] Relativity of guarantees provided by
volatile
Consider the following synchronization order of a program
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each
time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it
writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no
contradiction between the synchronization order and the
wall-clock times, as the JMM is wall-clock agnostic. However, I
have yet to meet a single Java professional who wouldn't at least
be very surprised to hear that the specification allows this.
I understand that the SMP architecture that dominates the world
of computing today practically never takes these liberties and
makes the volatile writes visible almost instantaneously. This
may change at any time, however, especially with the advent of
massively parrallel architectures that seem to be the future. For
example, an optimization technique may choose to chunk many
volatile writes together and make them visible in a single bulk
operation. This can be safely done as long as there are no
intervening read-y actions (targets of the synchronizes-with
edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about,
given that practically 100% developers out there consider as
guaranteed something that isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Yuval Shavit
2012-08-17 23:30:25 UTC
Permalink
Sure, but it could decide that the execution order is [w1, w2, w3, r]. In
fact, as far as we know, the thread may have been scheduled such that that
was the clock-time ordering, too.
Post by Vitaly Davidovich
Two volatile writes emit a store-store barrier in between, which to me
means they cannot be collapsed and must be made visible in that order (on
non-TSO this would require a h/w fence). In other words, I don't think
compiler can remove the redundant stores as if this was a non-volatile
field, where it's a perfectly valid (and good) optimization.
Sent from my phone
Post by David Holmes
**
How does it violate the JMM? There is nothing to establish that any read
has to have occurred prior to w=3. An external observer may say "hey if
we'd actually written w=1 at this point then the read would see 1" but that
is irrelevant. The program can not tell the other writes did not occur.
David
-----Original Message-----
*Sent:* Saturday, 18 August 2012 9:12 AM
*Subject:* Re: [concurrency-interest] Relativity of guarantees provided
by volatile
I don't think the writes to w can be reduced to just the last one as it
would violate the JMM. R may only see the last one due to interleaving
though. Not sure if that's what you meant.
Sent from my phone
Post by David Holmes
Hi Marko,
I think the "surprise" is only in the way you formulated this. Said another
way a write takes a finite amount of time from when the instruction starts
to execute to when the store is actually available for a read to see.
(Similarly a read takes a finite amount of time.) So depending on those two
times a read and write that happen "around the same time" may appear to have
occurred in either order. But when you program with threads you never know
the relative interleavings (or should never assume) so it makes no
difference how the program perceives the order compared to how some external
observer might perceive it.
As for your optimization to "chunk" volatile writes, I don't see a problem
w = 1; // w is volatile
w = 2;
w = 3;
that this could be reduced to the last write alone? I see no reason why not.
Without some additional coordination between a reader thread and the writer
thread, reading w==3 is a legitimate outcome. If you are thinking about how
the hardware might chunk things then that is a different matter. We have to
use the hardware in a way that complies with the memory model - if the
hardware can't comply then you can't run Java on it.
David Holmes
------------
-----Original Message-----
Topolnik
Sent: Saturday, 18 August 2012 7:24 AM
Subject: [concurrency-interest] Relativity of guarantees provided by
volatile
Consider the following synchronization order of a program
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each
time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it
writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no
contradiction between the synchronization order and the
wall-clock times, as the JMM is wall-clock agnostic. However, I
have yet to meet a single Java professional who wouldn't at least
be very surprised to hear that the specification allows this.
I understand that the SMP architecture that dominates the world
of computing today practically never takes these liberties and
makes the volatile writes visible almost instantaneously. This
may change at any time, however, especially with the advent of
massively parrallel architectures that seem to be the future. For
example, an optimization technique may choose to chunk many
volatile writes together and make them visible in a single bulk
operation. This can be safely done as long as there are no
intervening read-y actions (targets of the synchronizes-with
edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about,
given that practically 100% developers out there consider as
guaranteed something that isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Vitaly Davidovich
2012-08-17 23:41:20 UTC
Permalink
David and you may be right in the theoretical aspect. In practice, I can't
fathom how a JVM can do this type of analysis. That's an issue that I have
with JMM's wording of happens-before -- it doesn't translate to reality, it
seems like.

Sent from my phone
Post by Yuval Shavit
Sure, but it could decide that the execution order is [w1, w2, w3, r]. In
fact, as far as we know, the thread may have been scheduled such that that
was the clock-time ordering, too.
Post by Vitaly Davidovich
Two volatile writes emit a store-store barrier in between, which to me
means they cannot be collapsed and must be made visible in that order (on
non-TSO this would require a h/w fence). In other words, I don't think
compiler can remove the redundant stores as if this was a non-volatile
field, where it's a perfectly valid (and good) optimization.
Sent from my phone
Post by David Holmes
**
How does it violate the JMM? There is nothing to establish that any read
has to have occurred prior to w=3. An external observer may say "hey if
we'd actually written w=1 at this point then the read would see 1" but that
is irrelevant. The program can not tell the other writes did not occur.
David
-----Original Message-----
*Sent:* Saturday, 18 August 2012 9:12 AM
*Subject:* Re: [concurrency-interest] Relativity of guarantees provided
by volatile
I don't think the writes to w can be reduced to just the last one as it
would violate the JMM. R may only see the last one due to interleaving
though. Not sure if that's what you meant.
Sent from my phone
Post by David Holmes
Hi Marko,
I think the "surprise" is only in the way you formulated this. Said another
way a write takes a finite amount of time from when the instruction starts
to execute to when the store is actually available for a read to see.
(Similarly a read takes a finite amount of time.) So depending on those two
times a read and write that happen "around the same time" may appear to have
occurred in either order. But when you program with threads you never know
the relative interleavings (or should never assume) so it makes no
difference how the program perceives the order compared to how some external
observer might perceive it.
As for your optimization to "chunk" volatile writes, I don't see a problem
w = 1; // w is volatile
w = 2;
w = 3;
that this could be reduced to the last write alone? I see no reason why not.
Without some additional coordination between a reader thread and the writer
thread, reading w==3 is a legitimate outcome. If you are thinking about how
the hardware might chunk things then that is a different matter. We have to
use the hardware in a way that complies with the memory model - if the
hardware can't comply then you can't run Java on it.
David Holmes
------------
-----Original Message-----
Topolnik
Sent: Saturday, 18 August 2012 7:24 AM
Subject: [concurrency-interest] Relativity of guarantees provided by
volatile
Consider the following synchronization order of a program
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each
time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it
writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no
contradiction between the synchronization order and the
wall-clock times, as the JMM is wall-clock agnostic. However, I
have yet to meet a single Java professional who wouldn't at least
be very surprised to hear that the specification allows this.
I understand that the SMP architecture that dominates the world
of computing today practically never takes these liberties and
makes the volatile writes visible almost instantaneously. This
may change at any time, however, especially with the advent of
massively parrallel architectures that seem to be the future. For
example, an optimization technique may choose to chunk many
volatile writes together and make them visible in a single bulk
operation. This can be safely done as long as there are no
intervening read-y actions (targets of the synchronizes-with
edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about,
given that practically 100% developers out there consider as
guaranteed something that isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Yuval Shavit
2012-08-17 23:48:36 UTC
Permalink
(Resending -- I'd accidentally sent this only to Nathan)

So, Nathan, you're essentially saying that "subsequent" is defined in terms
of clock time. If we take that as the definition, then I agree that it
works as we'd all expect. Furthermore, I agree that that's the intuitive
definition, as well as the de-facto one. But it doesn't seem to be the one
explicitly defined by the JLS. (And if it is the one implicitly defined by
the JLS, I there's a serious bug waiting for any distributed-node JVM whose
nodes are traveling away from each other at relativistic speeds! ... though
I'm no physics expert.)

That said, 17.4.3 does imply that the reads will be viewable in a
wall-clock-sequential way, albeit informally

Sequential consistency is a very strong guarantee that is made about
visibility and ordering in an execution of a program. Within a sequentially
consistent execution, there is a total order over all individual actions
(such as reads and writes) which is consistent with the order of the
program, and each individual action is atomic and is immediately visible to
every thread.

(emphasis on "is immediately visible")
Post by Vitaly Davidovich
David and you may be right in the theoretical aspect. In practice, I
can't fathom how a JVM can do this type of analysis. That's an issue that
I have with JMM's wording of happens-before -- it doesn't translate to
reality, it seems like.
Sent from my phone
Post by Yuval Shavit
Sure, but it could decide that the execution order is [w1, w2, w3, r]. In
fact, as far as we know, the thread may have been scheduled such that that
was the clock-time ordering, too.
Post by Vitaly Davidovich
Two volatile writes emit a store-store barrier in between, which to me
means they cannot be collapsed and must be made visible in that order (on
non-TSO this would require a h/w fence). In other words, I don't think
compiler can remove the redundant stores as if this was a non-volatile
field, where it's a perfectly valid (and good) optimization.
Sent from my phone
Post by David Holmes
**
How does it violate the JMM? There is nothing to establish that any
read has to have occurred prior to w=3. An external observer may say "hey
if we'd actually written w=1 at this point then the read would see 1" but
that is irrelevant. The program can not tell the other writes did not occur.
David
-----Original Message-----
*Sent:* Saturday, 18 August 2012 9:12 AM
*Subject:* Re: [concurrency-interest] Relativity of guarantees
provided by volatile
I don't think the writes to w can be reduced to just the last one as it
would violate the JMM. R may only see the last one due to interleaving
though. Not sure if that's what you meant.
Sent from my phone
Post by David Holmes
Hi Marko,
I think the "surprise" is only in the way you formulated this. Said another
way a write takes a finite amount of time from when the instruction starts
to execute to when the store is actually available for a read to see.
(Similarly a read takes a finite amount of time.) So depending on those two
times a read and write that happen "around the same time" may appear to have
occurred in either order. But when you program with threads you never know
the relative interleavings (or should never assume) so it makes no
difference how the program perceives the order compared to how some external
observer might perceive it.
As for your optimization to "chunk" volatile writes, I don't see a problem
w = 1; // w is volatile
w = 2;
w = 3;
that this could be reduced to the last write alone? I see no reason why not.
Without some additional coordination between a reader thread and the writer
thread, reading w==3 is a legitimate outcome. If you are thinking about how
the hardware might chunk things then that is a different matter. We have to
use the hardware in a way that complies with the memory model - if the
hardware can't comply then you can't run Java on it.
David Holmes
------------
-----Original Message-----
Marko
Topolnik
Sent: Saturday, 18 August 2012 7:24 AM
Subject: [concurrency-interest] Relativity of guarantees provided by
volatile
Consider the following synchronization order of a program
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each
time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it
writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no
contradiction between the synchronization order and the
wall-clock times, as the JMM is wall-clock agnostic. However, I
have yet to meet a single Java professional who wouldn't at least
be very surprised to hear that the specification allows this.
I understand that the SMP architecture that dominates the world
of computing today practically never takes these liberties and
makes the volatile writes visible almost instantaneously. This
may change at any time, however, especially with the advent of
massively parrallel architectures that seem to be the future. For
example, an optimization technique may choose to chunk many
volatile writes together and make them visible in a single bulk
operation. This can be safely done as long as there are no
intervening read-y actions (targets of the synchronizes-with
edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about,
given that practically 100% developers out there consider as
guaranteed something that isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Zhong Yu
2012-08-18 01:33:35 UTC
Permalink
Consider this physical model:

Each thread is a person Tx.

There's a person V managing all variables.

To make a write, Tx sends a paper mail to V. No return mail is
required, therefore a write is very short.

To make a read, Tx sends a paper mail to V, and waits for return mail.

The synchronization order is the order the mails received by V.

This seems to be a valid JMM model.

--

Now suppose thread R is very close to V, therefor reads are also very
short. (it's easier to talk about temporal order between short
actions) Suppose thread W is very far away from V.

To realize OP's example, consider the numbers are in hours, and route
W -> V takes 48 hours.

On Monday, W writes v=1, it reaches V on Wednesday. On Tuesday R reads
v=0. So the write is after read in sync order, yet write is before
read in temporal order(even considering relativity - all persons are
on Earth)

Therefore sync order doesn't have to be consistent with temporal order.
Zhong Yu
2012-08-18 01:42:54 UTC
Permalink
Post by Zhong Yu
Each thread is a person Tx.
There's a person V managing all variables.
To make a write, Tx sends a paper mail to V. No return mail is
required, therefore a write is very short.
To make a read, Tx sends a paper mail to V, and waits for return mail.
The synchronization order is the order the mails received by V.
This seems to be a valid JMM model.
--
Now suppose thread R is very close to V, therefor reads are also very
short. (it's easier to talk about temporal order between short
actions) Suppose thread W is very far away from V.
To realize OP's example, consider the numbers are in hours, and route
W -> V takes 48 hours.
On Monday, W writes v=1, it reaches V on Wednesday. On Tuesday R reads
v=0. So the write is after read in sync order, yet write is before
read in temporal order(even considering relativity - all persons are
on Earth)
Therefore sync order doesn't have to be consistent with temporal order.
On the other hand, if w is before r in sync order, there's a causal
relationship, it seems necessary that, the beginning of w is before
the end of r in temporal order. So there may be some constraints
between the two orders.
Marko Topolnik
2012-08-18 13:02:29 UTC
Permalink
Post by Boehm, Hans
That said, 17.4.3 does imply that the reads will be viewable in a wall-clock-sequential way, albeit informally
Sequential consistency is a very strong guarantee that is made about visibility and ordering in an execution of a program. Within a sequentially consistent execution, there is a total order over all individual actions (such as reads and writes) which is consistent with the order of the program, and each individual action is atomic and is immediately visible to every thread.
(emphasis on "is immediately visible")
The major point to note is that the JLS **does not** enforce sequential consistency! It even spells it out directly below your quote:

"If we were to use sequential consistency as our memory model, many of the compiler and processor optimizations that we have discussed would be illegal."

The whole model of happens-before revolves around making sequentially INCONSISTENT executions APPEAR to be consistent, as observed by all executing threads, thus allowing all the optimizations that are discussed on this mailing list.


-Marko
Wolfgang Baltes
2012-08-18 15:48:17 UTC
Permalink
The memory model is just that: a model. It is not a hardware spec, nor a
prediction of what happens in which order in the future. It allows to
interpret observations and draw some conclusions.

The simple programming model that all programmers assume is program
order. For example, in the following lines, everyone can expect x to
hold the value 1 and y the value 2.

int a = 1;
int b = 2;
int y = b;
int x = a;

However, we also know that optimizations can happen. So for example,
there is no guarantee that a is set to 1 before b is set to 2. The only
guarantee we have is that - once a write operation has appeared in
program code - a (subsequent in program order) read operation is
guaranteed to succeed with the expected value, independent of any
optimization. In this example, we do not know when a is set to 1, except
we know that whenever a is read after the assignment instruction in
program order, a will have a value of 1. For example, an optimization
could consist of reordering the write to a after y is assigned the value
of b. This reordering would not be observable in this thread, and
therefore does not change the reasoning of what the program
accomplishes. The program order guarantee is there to allow the
programmer to reason about a program in simple terms, such as "I put
instructions in this order to achieve this result" and the program order
rule guarantees that the result can be observed, no matter which
optimizations are involved.

The memory model is nothing more than a similar guarantee regarding
program order when multiple threads are involved. The memory model does
not say what will happen sometimes in the future. It only allows to draw
conclusions about what happened in the past >if< certain observations
are made.

For example,

Thread A:
int a = 1;
volatile int b = 2;
int y = b;
int x = a;

Thread B:
y = b; // volatile read of b.
x = a;

If we apply the program order rule for single threads, we cannot be sure
which value is assigned to x; is a set to 1 or does it have the value 0?
Using the memory model, we expand the program order rules by the
volatile rule: when thread B performs a volatile read, then there are
certain guarantees as follows:
- When thread B reads the value of b and it finds 0, then the conclusion
is that thread A has not reached its volatile write to b. Nothing else
can be concluded.
- If thread B reads a value of 2 for b, then we are allowed to conclude
that instructions of thread A that appear before the volatile write in
program order have "happened-before". This rule allows us to conclude
that >if< we observe a value of 2 for b, then we are sure to observe a
value of 1 for variable a. (There is nothing in the memory model of
when exactly< the value 1 has to be written to a, just that we can
count on the observation.)

We have the JVM's guarantee that these conclusions are permitted,
despite any optimizations that are ongoing. This makes concurrent
programming almost as easy to reason about as single threaded programming.

However, in the example given, the memory model does not >predict< in
any way whatsoever when thread B will reach the volatile read
instruction relative in time to the volatile write in thread A.

Note also that the memory model does not deal with the extend in time
that it takes to perform operations. For the examples above, it doesn't
matter when assignment operations start, it only matters when they
finish. And for reads, it only matters when they start. This is why the
memory model uses the concept (and terminology) of synchronization: a
trailing volatile write edge is considered synchronized with a leading
read edge >if< the read operation observes the result of the write.

Wolfgang.
Post by Boehm, Hans
That said, 17.4.3 does imply that the reads will be viewable in a wall-clock-sequential way, albeit informally
Sequential consistency is a very strong guarantee that is made about visibility and ordering in an execution of a program. Within a sequentially consistent execution, there is a total order over all individual actions (such as reads and writes) which is consistent with the order of the program, and each individual action is atomic and is immediately visible to every thread.
(emphasis on "is immediately visible")
"If we were to use sequential consistency as our memory model, many of the compiler and processor optimizations that we have discussed would be illegal."
The whole model of happens-before revolves around making sequentially INCONSISTENT executions APPEAR to be consistent, as observed by all executing threads, thus allowing all the optimizations that are discussed on this mailing list.
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Boehm, Hans
2012-08-20 18:46:35 UTC
Permalink
From: Marko Topolnik
Post by Yuval Shavit
That said, 17.4.3 does imply that the reads will be viewable in a
wall-clock-sequential way, albeit informally
Post by Yuval Shavit
Sequential consistency is a very strong guarantee that is made
about visibility and ordering in an execution of a program. Within a
sequentially consistent execution, there is a total order over all
individual actions (such as reads and writes) which is consistent with
the order of the program, and each individual action is atomic and is
immediately visible to every thread.
Post by Yuval Shavit
(emphasis on "is immediately visible")
The major point to note is that the JLS **does not** enforce sequential
"If we were to use sequential consistency as our memory model, many of
the compiler and processor optimizations that we have discussed would
be illegal."
The whole model of happens-before revolves around making sequentially
INCONSISTENT executions APPEAR to be consistent, as observed by all
executing threads, thus allowing all the optimizations that are
discussed on this mailing list.
It is intended to promise sequential consistency in the absence of data races. And the program Marko posted originally was data-race-free. However, I don't believe there is a guarantee that the total ordering implied by sequential consistency is consistent with any means of retrieving the current time. At least I haven't found such a claim. Time retrieved on different processors may be out-of-sync, and has no implication on synchronization order. That by itself can explain the behavior of the program that started this discussion.

If we changed the specification of currentTimeMillis() or the like so that it was guaranteed to access a volatile variable regularly updated by a "timer" thread, then the originally proposed behavior would no longer be allowed, since it would be compatible with any synchronization order for both the original and timer variables. The problem is that timers don't appear to play by the same rules as variables. Whether or not they should is unclear to me.

Hans
Boehm, Hans
2012-08-20 21:20:24 UTC
Permalink
In the meantime I have read the JMM again very carefully. These are the
Sequential consistency is an idealization that is not guaranteed in any
way. Real implementations virtually never produce sequentially
consistent executions.
Sequential consistency is not normally defined to have anything to do with timing. Thus I don't understand the above statement.
There is a guarantee that a properly synchronized program will APPEAR
to execute with sequential consistency. This appearance is, however,
limited only to the observed ordering of actions and nothing else. In
particular, no time ordering is guaranteed, promised, or even implied.
Right. But time ordering is not really observable. Values returned by currentTimeMillis() and the like are observable, but the specification is, AFAICT, unclear about the extent to which those values have to reflect any kind of real "time ordering". I suspect that, at least on some machines, they use per-processors timers that are not perfectly synchronized.
Post by Boehm, Hans
If we changed the specification of currentTimeMillis() or the like so
that it was guaranteed to access a volatile variable regularly updated
by a "timer" thread, then the originally proposed behavior would no
longer be allowed, since it would be compatible with any
synchronization order for both the original and timer variables. The
problem is that timers don't appear to play by the same rules as
variables. Whether or not they should is unclear to me.
It would be very bad for performance if they played by those rules, I
- a hard guarantee of proper happens-before ordering of actions on it;
- a soft **promise** of timely publishing of write actions.
So the volatile concept is by its nature much more about ordering than
about timing.
Maybe. Having a thread in the kernel update the time values and having it play by Java volatile rules when writing the time value actually doesn't sound that expensive for a low resolution timer. You will probably get a cache miss on the timer location from every core after every update, so it is likely to be a problem for high resolution timers or very high core counts.
Post by Boehm, Hans
Post by Marko Topolnik
This a very nice catch. Isn't in fact any read in a data race with
all
Post by Boehm, Hans
Post by Marko Topolnik
writes by another thread following the write it observed? That would
make it impossible to write a program free of data races. In other
words this sounds like a loophole in the definition of a data race.
That's a bug in the wording of the spec. I have a vague recollection
that it was pointed out before a couple of years ago, possibly even on
this list. It's one of these things that should really be fixed, but
is waiting for a resolution of the harder Java memory model issues.
If you could remember any detail that would help me find this in the
archives, I would be very grateful. Just now I am composing another
question on StackOverflow that deals with this particular definition in
the JLS. I have already concluded beyond doubt that the wording is off,
however it is not at all clear how to fix it. The problem is with the
word "program" in that definition because a program is not the entity
that contains data races, it is its execution.
I wasn't immediately able to find it with a mail search. I'll let you know if I do. Sorry. Unfortunately, I had a mail accident around January 2010, and it's hard for me to search earlier mail. It may also have been on the memory model list. Maybe someone else remembers?

Hans
Marko Topolnik
2012-08-21 08:12:27 UTC
Permalink
Post by Boehm, Hans
In the meantime I have read the JMM again very carefully. These are the
Sequential consistency is an idealization that is not guaranteed in any
way. Real implementations virtually never produce sequentially
consistent executions.
Sequential consistency is not normally defined to have anything to do with timing. Thus I don't understand the above statement.
JLS uses this phrase: "each individual action is atomic and is immediately visible to every thread." That "immediately" sounds timing-related, but on second reading this is not necessary: it just implies visibility to all the following actions in the execution order, and the execution order itself is still not required to be consistent with wall-clock time.
Post by Boehm, Hans
There is a guarantee that a properly synchronized program will APPEAR
to execute with sequential consistency. This appearance is, however,
limited only to the observed ordering of actions and nothing else. In
particular, no time ordering is guaranteed, promised, or even implied.
Right. But time ordering is not really observable. Values returned by currentTimeMillis() and the like are observable, but the specification is, AFAICT, unclear about the extent to which those values have to reflect any kind of real "time ordering". I suspect that, at least on some machines, they use per-processors timers that are not perfectly synchronized.
Let's say that they could in theory be reflecting the exact time. We don't have to descend into the gory details of specific CPUs. Imagine a CPU specifically built to allow precise wall-clock time observation.
Post by Boehm, Hans
If you could remember any detail that would help me find this in the
archives, I would be very grateful. Just now I am composing another
question on StackOverflow that deals with this particular definition in
the JLS. I have already concluded beyond doubt that the wording is off,
however it is not at all clear how to fix it. The problem is with the
word "program" in that definition because a program is not the entity
that contains data races, it is its execution.
I wasn't immediately able to find it with a mail search. I'll let you know if I do. Sorry. Unfortunately, I had a mail accident around January 2010, and it's hard for me to search earlier mail. It may also have been on the memory model list. Maybe someone else remembers?
In the meantime I was successful in my own search. It is indeed on the Memory Model list:

http://www.cs.umd.edu/~pugh/java/memoryModel/archive/2483.html

It says

"a data race should be defined as conflicting actions on *non-volatile* variables
that are not ordered by happens-before".

Only with that addition is the loophole closed and even the original paper by Jeremy Manson and Bill Pugh doesn't close it. The need for the special case stems from the fact that volatile actions are asymmetric, as opposed to synchronized blocks, in terms of the pairing between acquire and release actions.

-Marko
Boehm, Hans
2012-08-21 18:03:06 UTC
Permalink
From: Marko Topolnik
Post by Boehm, Hans
Right. But time ordering is not really observable. Values returned
by currentTimeMillis() and the like are observable, but the
specification is, AFAICT, unclear about the extent to which those
values have to reflect any kind of real "time ordering". I suspect
that, at least on some machines, they use per-processors timers that
are not perfectly synchronized.
Let's say that they could in theory be reflecting the exact time. We
don't have to descend into the gory details of specific CPUs. Imagine a
CPU specifically built to allow precise wall-clock time observation.
Then I think it depends on how the CPU makes that time available to clients.
If it does it in a way that's equivalent to regularly updating a volatile
variable that can be read by all threads, then timing should work as expected.
I think the current specification for the timing functions is effectively
much weaker, because it's silent on the issues.

I think that if we agree this is a problem, then it could be fixed by
updating the specifications for the timing functions, without touching the
memory model per se. I'm not sure I understand the implementation
implications of that, so I'm neither advocating it, nor lobbying against it.
The C++ specification is somewhere in the middle, and I remain a bit nervous
about implementability there.

Hans
Marko Topolnik
2012-08-21 19:13:16 UTC
Permalink
Post by Boehm, Hans
Post by Marko Topolnik
Let's say that they could in theory be reflecting the exact time. We
don't have to descend into the gory details of specific CPUs. Imagine a
CPU specifically built to allow precise wall-clock time observation.
Then I think it depends on how the CPU makes that time available to clients.
If it does it in a way that's equivalent to regularly updating a volatile
variable that can be read by all threads, then timing should work as expected.
I think the current specification for the timing functions is effectively
much weaker, because it's silent on the issues.
I think that if we agree this is a problem, then it could be fixed by
updating the specifications for the timing functions, without touching the
memory model per se. I'm not sure I understand the implementation
implications of that, so I'm neither advocating it, nor lobbying against it.
The C++ specification is somewhere in the middle, and I remain a bit nervous
about implementability there.
I feel that this issue is not about actually calling any timing functions, but about the expectations on the timeliness of the visibility of volatile writes. Currently the programmer is left with a very vague and uncertain "general expectation" that volatile writes will be made visible "as soon as possible". It reminds me of the contract for Runtime.gc().
Zhong Yu
2012-08-21 21:33:14 UTC
Permalink
Post by Marko Topolnik
Post by Boehm, Hans
Post by Marko Topolnik
Let's say that they could in theory be reflecting the exact time. We
don't have to descend into the gory details of specific CPUs. Imagine a
CPU specifically built to allow precise wall-clock time observation.
Then I think it depends on how the CPU makes that time available to clients.
If it does it in a way that's equivalent to regularly updating a volatile
variable that can be read by all threads, then timing should work as expected.
I think the current specification for the timing functions is effectively
much weaker, because it's silent on the issues.
I think that if we agree this is a problem, then it could be fixed by
updating the specifications for the timing functions, without touching the
memory model per se. I'm not sure I understand the implementation
implications of that, so I'm neither advocating it, nor lobbying against it.
The C++ specification is somewhere in the middle, and I remain a bit nervous
about implementability there.
I feel that this issue is not about actually calling any timing functions, but about the expectations on the timeliness of the visibility of volatile writes. Currently the programmer is left with a very vague and uncertain "general expectation" that volatile writes will be made visible "as soon as possible".
Similar to your concern that consecutive volatile writes can be
compressed into the last write, it also seems true that consecutive
volatile reads can be compressed into the first read - exactly the
kind of optimization to be disabled by C's volatile. It's
inconceivable that any JVM will do such optimization on volatile
reads, it'll break lots of programs, e.g. busy waits.

Two reasonable physical constraints can prevent these oddities:

1. every synchronization action takes non-zero time.

2. upper limit on discrepancy between synchronization order and
temporal order: if A is before B in synchronization order, A cannot be
indefinitely later than B, A must occur before (B.time + T) where T is
the upper limit.
Marko Topolnik
2012-08-21 21:46:19 UTC
Permalink
Post by Zhong Yu
Similar to your concern that consecutive volatile writes can be
compressed into the last write, it also seems true that consecutive
volatile reads can be compressed into the first read - exactly the
kind of optimization to be disabled by C's volatile. It's
inconceivable that any JVM will do such optimization on volatile
reads, it'll break lots of programs, e.g. busy waits.
Actually there is already a provision in the JMM (in the original paper, at least) that prevents a busy-waiting loop whose condition involves a volatile read to forever read a stale value. There can be only a finite number of such read actions. But, this is a small consolation, really. Most code DOES make a difference between "eternity minus one" and "right now".

-Marko
Boehm, Hans
2012-08-21 23:40:38 UTC
Permalink
Post by Marko Topolnik
Post by Zhong Yu
Similar to your concern that consecutive volatile writes can be
compressed into the last write, it also seems true that consecutive
volatile reads can be compressed into the first read - exactly the
kind of optimization to be disabled by C's volatile. It's
inconceivable that any JVM will do such optimization on volatile
reads, it'll break lots of programs, e.g. busy waits.
Actually there is already a provision in the JMM (in the original
paper, at least) that prevents a busy-waiting loop whose condition
involves a volatile read to forever read a stale value. There can be
only a finite number of such read actions. But, this is a small
consolation, really. Most code DOES make a difference between "eternity
minus one" and "right now".
My recollection is that this is only sort of/mostly of true. If your entire program consists of

Thread 1:
while (!flag) {}

Thread 2:
flag = true;

There is intentionally no requirement that thread 2 ever be scheduled. If it's not, that looks a lot like flag having been read once. If, on the other hand, thread 2 sets flag and then prints "Hello", and you see the "Hello", then I believe you are correct that thread 1 must terminate.

My own feeling is that (under the right safety conditions), it's OK to collapse a bounded, "short" sequence of volatile reads into one, but not an unbounded number. Java "volatile" is not "C volatile". C "volatile" is neither intended for, nor safe for, synchronization. Java "volatile" is not intended for device register access, or for use with setjmp, or ...

Hans
Zhong Yu
2012-08-22 01:04:14 UTC
Permalink
Post by Boehm, Hans
Post by Marko Topolnik
Actually there is already a provision in the JMM (in the original
paper, at least) that prevents a busy-waiting loop whose condition
involves a volatile read to forever read a stale value. There can be
only a finite number of such read actions.
My own feeling is that (under the right safety conditions), it's OK to collapse a bounded, "short" sequence of volatile reads into one, but not an unbounded number.
You guys make sense. For every action, there are only finite number of
actions that can be ordered before it.
Marko Topolnik
2012-08-22 06:39:14 UTC
Permalink
Post by Boehm, Hans
Post by Marko Topolnik
Post by Zhong Yu
Similar to your concern that consecutive volatile writes can be
compressed into the last write, it also seems true that consecutive
volatile reads can be compressed into the first read - exactly the
kind of optimization to be disabled by C's volatile. It's
inconceivable that any JVM will do such optimization on volatile
reads, it'll break lots of programs, e.g. busy waits.
Actually there is already a provision in the JMM (in the original
paper, at least) that prevents a busy-waiting loop whose condition
involves a volatile read to forever read a stale value. There can be
only a finite number of such read actions. But, this is a small
consolation, really. Most code DOES make a difference between "eternity
minus one" and "right now".
My recollection is that this is only sort of/mostly of true. If your entire program consists of
while (!flag) {}
flag = true;
There is intentionally no requirement that thread 2 ever be scheduled. If it's not, that looks a lot like flag having been read once. If, on the other hand, thread 2 sets flag and then prints "Hello", and you see the "Hello", then I believe you are correct that thread 1 must terminate.
If thread 2 never gets scheduled then the value of the flag is not stale (that's what I said---"read a stale value"). This JMM provision precludes bunching together an infinite amount of busy-wait flag checks, all reading the stale false value.

-Marko
Vitaly Davidovich
2012-08-22 13:01:23 UTC
Permalink
If the JMM ever goes through a revision, it would be useful to state some
basic assumptions such as threads get a chance to run on a processor so
that "perverse" scenarios don't muddy the waters.

Also, as I mentioned earlier, I have an issue with the wording of a
happens-before edge. For example, thread A writes volatile memory and
thread B reads that memory; but if there's no read, the JMM says nothing
about what should happen to the write. I also don't understand how a
normal (i.e. non experimental/academic) JVM could even do this type of
analysis consistently (e.g. the volatile field is public/protected) if
we're not sacrificing performance.

I understand it's an abstract model and shouldn't be clouded with impl
details of JVM/os/hardware, but I don't know if that's a good or bad
thing. It's bad enough that folks on this list have ambiguity over it, but
what's a compiler writer to do, for example? Maybe the JMM can be lowered
into something a bit more concrete?

Sent from my phone
Post by Boehm, Hans
Post by Boehm, Hans
Post by Marko Topolnik
Post by Zhong Yu
Similar to your concern that consecutive volatile writes can be
compressed into the last write, it also seems true that consecutive
volatile reads can be compressed into the first read - exactly the
kind of optimization to be disabled by C's volatile. It's
inconceivable that any JVM will do such optimization on volatile
reads, it'll break lots of programs, e.g. busy waits.
Actually there is already a provision in the JMM (in the original
paper, at least) that prevents a busy-waiting loop whose condition
involves a volatile read to forever read a stale value. There can be
only a finite number of such read actions. But, this is a small
consolation, really. Most code DOES make a difference between "eternity
minus one" and "right now".
My recollection is that this is only sort of/mostly of true. If your
entire program consists of
Post by Boehm, Hans
while (!flag) {}
flag = true;
There is intentionally no requirement that thread 2 ever be scheduled.
If it's not, that looks a lot like flag having been read once. If, on the
other hand, thread 2 sets flag and then prints "Hello", and you see the
"Hello", then I believe you are correct that thread 1 must terminate.
If thread 2 never gets scheduled then the value of the flag is not stale
(that's what I said---"read a stale value"). This JMM provision precludes
bunching together an infinite amount of busy-wait flag checks, all reading
the stale false value.
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Boehm, Hans
2012-08-23 03:05:52 UTC
Permalink
Progress guarantees were discussed during both the original JSR133 effort, and again in connection with the C++ memory model. It is really hard to specify anything that is universally acceptable, and is reasonably easy to write down. What about a special purpose embedded JVM that runs stored procedures in a database? Do we want to require it to have a preemptive scheduler? What does it mean to get a "chance to run" in the presence of standard library functions that acquire locks under the covers? Can another thread calling those same functions repeatedly prevent it from making progress? If so, do we now have to specify which locks are acquired by each library function? What does it mean for a JVM to provide a progress guarantee when the hardware specifications don't (e.g. because they don't promise that store buffers drain in a bounded amount of time)?

In the case of C++11, we ended giving a vague guideline along the lines mentioned below, so that implementers knew what to strive for, but we ended up not making it binding on implementations, in part because we couldn't make it precise.

As far as the JMM itself is concerned, it's important to remember that hardware memory models vary a lot, and at least the 2005 variants were often less well defined, and even less comprehensible than the language level models. I think that defining a memory model in terms of hardware translations, for example, is a completely lost cause. We need to keep things at an abstract level.

Hans

From: Vitaly Davidovich [mailto:***@gmail.com]
Sent: Wednesday, August 22, 2012 6:01 AM
To: Marko Topolnik
Cc: concurrency-***@cs.oswego.edu; Boehm, Hans
Subject: Re: [concurrency-interest] Relativity of guarantees provided by volatile


If the JMM ever goes through a revision, it would be useful to state some basic assumptions such as threads get a chance to run on a processor so that "perverse" scenarios don't muddy the waters.

Also, as I mentioned earlier, I have an issue with the wording of a happens-before edge. For example, thread A writes volatile memory and thread B reads that memory; but if there's no read, the JMM says nothing about what should happen to the write. I also don't understand how a normal (i.e. non experimental/academic) JVM could even do this type of analysis consistently (e.g. the volatile field is public/protected) if we're not sacrificing performance.

I understand it's an abstract model and shouldn't be clouded with impl details of JVM/os/hardware, but I don't know if that's a good or bad thing. It's bad enough that folks on this list have ambiguity over it, but what's a compiler writer to do, for example? Maybe the JMM can be lowered into something a bit more concrete?

Sent from my phone
Post by Boehm, Hans
Post by Marko Topolnik
Post by Zhong Yu
Similar to your concern that consecutive volatile writes can be
compressed into the last write, it also seems true that consecutive
volatile reads can be compressed into the first read - exactly the
kind of optimization to be disabled by C's volatile. It's
inconceivable that any JVM will do such optimization on volatile
reads, it'll break lots of programs, e.g. busy waits.
Actually there is already a provision in the JMM (in the original
paper, at least) that prevents a busy-waiting loop whose condition
involves a volatile read to forever read a stale value. There can be
only a finite number of such read actions. But, this is a small
consolation, really. Most code DOES make a difference between "eternity
minus one" and "right now".
My recollection is that this is only sort of/mostly of true. If your entire program consists of
while (!flag) {}
flag = true;
There is intentionally no requirement that thread 2 ever be scheduled. If it's not, that looks a lot like flag having been read once. If, on the other hand, thread 2 sets flag and then prints "Hello", and you see the "Hello", then I believe you are correct that thread 1 must terminate.
If thread 2 never gets scheduled then the value of the flag is not stale (that's what I said---"read a stale value"). This JMM provision precludes bunching together an infinite amount of busy-wait flag checks, all reading the stale false value.

-Marko
Zhong Yu
2012-08-22 17:18:04 UTC
Permalink
Post by Marko Topolnik
Post by Boehm, Hans
Post by Marko Topolnik
Post by Zhong Yu
Similar to your concern that consecutive volatile writes can be
compressed into the last write, it also seems true that consecutive
volatile reads can be compressed into the first read - exactly the
kind of optimization to be disabled by C's volatile. It's
inconceivable that any JVM will do such optimization on volatile
reads, it'll break lots of programs, e.g. busy waits.
Actually there is already a provision in the JMM (in the original
paper, at least) that prevents a busy-waiting loop whose condition
involves a volatile read to forever read a stale value. There can be
only a finite number of such read actions. But, this is a small
consolation, really. Most code DOES make a difference between "eternity
minus one" and "right now".
My recollection is that this is only sort of/mostly of true. If your entire program consists of
while (!flag) {}
flag = true;
There is intentionally no requirement that thread 2 ever be scheduled. If it's not, that looks a lot like flag having been read once. If, on the other hand, thread 2 sets flag and then prints "Hello", and you see the "Hello", then I believe you are correct that thread 1 must terminate.
If thread 2 never gets scheduled then the value of the flag is not stale (that's what I said---"read a stale value"). This JMM provision
Can you give the reference to this provision? (I didn't see an
explicit one; though it's quite natural, since the set of actions
should be countable)

precludes bunching together an infinite amount of busy-wait flag
checks, all reading the stale false value.
Post by Marko Topolnik
-Marko
Gregg Wonderly
2012-08-22 18:15:03 UTC
Permalink
Post by Zhong Yu
Post by Marko Topolnik
Post by Boehm, Hans
Post by Marko Topolnik
Post by Zhong Yu
Similar to your concern that consecutive volatile writes can be
compressed into the last write, it also seems true that consecutive
volatile reads can be compressed into the first read - exactly the
kind of optimization to be disabled by C's volatile. It's
inconceivable that any JVM will do such optimization on volatile
reads, it'll break lots of programs, e.g. busy waits.
Actually there is already a provision in the JMM (in the original
paper, at least) that prevents a busy-waiting loop whose condition
involves a volatile read to forever read a stale value. There can be
only a finite number of such read actions. But, this is a small
consolation, really. Most code DOES make a difference between "eternity
minus one" and "right now".
My recollection is that this is only sort of/mostly of true. If your entire program consists of
while (!flag) {}
flag = true;
There is intentionally no requirement that thread 2 ever be scheduled. If it's not, that looks a lot like flag having been read once. If, on the other hand, thread 2 sets flag and then prints "Hello", and you see the "Hello", then I believe you are correct that thread 1 must terminate.
If thread 2 never gets scheduled then the value of the flag is not stale (that's what I said---"read a stale value"). This JMM provision
Can you give the reference to this provision? (I didn't see an
explicit one; though it's quite natural, since the set of actions
should be countable)
precludes bunching together an infinite amount of busy-wait flag
checks, all reading the stale false value.
Java provides for no guarantees that the host operating system will actually,
effectively (for you), schedule any thread other than the current one, with any
reliability. It just states when a thread is running, this is the behavior you
will perceive based on the JLS and JMM.

So, Thread.yield()'s necessity of use is unspecified. But there were some early
JVMs which did require the use of Thread.yield because the thread scheduling
and/or os were not preemptive.

Gregg
√iktor Ҡlang
2012-08-22 19:35:57 UTC
Permalink
Post by Gregg Wonderly
Post by Zhong Yu
Post by Marko Topolnik
Post by Boehm, Hans
Post by Zhong Yu
Similar to your concern that consecutive volatile writes can be
Post by Zhong Yu
compressed into the last write, it also seems true that consecutive
volatile reads can be compressed into the first read - exactly the
kind of optimization to be disabled by C's volatile. It's
inconceivable that any JVM will do such optimization on volatile
reads, it'll break lots of programs, e.g. busy waits.
Actually there is already a provision in the JMM (in the original
paper, at least) that prevents a busy-waiting loop whose condition
involves a volatile read to forever read a stale value. There can be
only a finite number of such read actions. But, this is a small
consolation, really. Most code DOES make a difference between "eternity
minus one" and "right now".
My recollection is that this is only sort of/mostly of true. If your
entire program consists of
while (!flag) {}
flag = true;
There is intentionally no requirement that thread 2 ever be scheduled.
If it's not, that looks a lot like flag having been read once. If, on the
other hand, thread 2 sets flag and then prints "Hello", and you see the
"Hello", then I believe you are correct that thread 1 must terminate.
If thread 2 never gets scheduled then the value of the flag is not stale
(that's what I said---"read a stale value"). This JMM provision
Can you give the reference to this provision? (I didn't see an
explicit one; though it's quite natural, since the set of actions
should be countable)
precludes bunching together an infinite amount of busy-wait flag
checks, all reading the stale false value.
Java provides for no guarantees that the host operating system will
actually, effectively (for you), schedule any thread other than the current
one, with any reliability. It just states when a thread is running, this
is the behavior you will perceive based on the JLS and JMM.
Also, there is no way for the current Thread to know how much of cpu quanta
it has left. So one can never assume execution will be done.
Post by Gregg Wonderly
So, Thread.yield()'s necessity of use is unspecified. But there were some
early JVMs which did require the use of Thread.yield because the thread
scheduling and/or os were not preemptive.
Gregg
______________________________**_________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
--
Viktor Klang

Akka Tech Lead
Typesafe <http://www.typesafe.com/> - The software stack for applications
that scale

Twitter: @viktorklang
Marko Topolnik
2012-08-22 21:51:45 UTC
Permalink
Post by Zhong Yu
Post by Marko Topolnik
Post by Boehm, Hans
Post by Marko Topolnik
Post by Zhong Yu
Similar to your concern that consecutive volatile writes can be
compressed into the last write, it also seems true that consecutive
volatile reads can be compressed into the first read - exactly the
kind of optimization to be disabled by C's volatile. It's
inconceivable that any JVM will do such optimization on volatile
reads, it'll break lots of programs, e.g. busy waits.
Actually there is already a provision in the JMM (in the original
paper, at least) that prevents a busy-waiting loop whose condition
involves a volatile read to forever read a stale value. There can be
only a finite number of such read actions. But, this is a small
consolation, really. Most code DOES make a difference between "eternity
minus one" and "right now".
My recollection is that this is only sort of/mostly of true. If your entire program consists of
while (!flag) {}
flag = true;
There is intentionally no requirement that thread 2 ever be scheduled. If it's not, that looks a lot like flag having been read once. If, on the other hand, thread 2 sets flag and then prints "Hello", and you see the "Hello", then I believe you are correct that thread 1 must terminate.
If thread 2 never gets scheduled then the value of the flag is not stale (that's what I said---"read a stale value"). This JMM provision
Can you give the reference to this provision? (I didn't see an
explicit one; though it's quite natural, since the set of actions
should be countable)
precludes bunching together an infinite amount of busy-wait flag
checks, all reading the stale false value.
This would be it:

http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.9

However, it is much clearer if you refer to Manson, Pugh, "The Java Memory Model", Section 7, and especially Figure 9.

http://rsim.cs.illinois.edu/Pubs/popl05.pdf


-Marko
Marko Topolnik
2012-08-22 13:24:16 UTC
Permalink
Post by Boehm, Hans
Post by Marko Topolnik
Let's say that they could in theory be reflecting the exact time. We
don't have to descend into the gory details of specific CPUs. Imagine a
CPU specifically built to allow precise wall-clock time observation.
Then I think it depends on how the CPU makes that time available to clients.
If it does it in a way that's equivalent to regularly updating a volatile
variable that can be read by all threads, then timing should work as expected.
I think the current specification for the timing functions is effectively
much weaker, because it's silent on the issues.
I think that if we agree this is a problem, then it could be fixed by
updating the specifications for the timing functions, without touching the
memory model per se. I'm not sure I understand the implementation
implications of that, so I'm neither advocating it, nor lobbying against it.
The C++ specification is somewhere in the middle, and I remain a bit nervous
about implementability there.
I have thought about this approach more carefully and the funny thing is, it still guarantees nothing. Please consider the following happens-before graph:


Thread R /--> Rr0 --> Rr1
-----------+--------/
/ |
Thread W --> Rw0 -> Ww0 ---> Rw1 -+--> Ww1
/ / ------/
| | /
Thread T Wt0 -----------> Wt1


We have the Timer thread T, which writes the volatile variable "currentTime". Both threads R and W read that var. The rest is the same as in the opening post: W writes sharedVar, R reads sharedVar.

Analysis:

- T writes to currentTime with actions Wt0 and Wt1;

- W observes action Wt0 with the action Rw0;

- W writes to sharedVar with the action Ww0;

- W observes Wt1 with the action Rw1;

- R observes Wt1 with the action Rr0;

- R observes Ww0 with the action Rr1.

Conclusions:

- R first observes Wt1, then Ww0;

- W first commits Ww0, then observes Wt1.


Since there are no cycles in the graph, there is no contradiction.

-Marko
Zhong Yu
2012-08-22 17:00:06 UTC
Permalink
Post by Marko Topolnik
Post by Boehm, Hans
Post by Marko Topolnik
Let's say that they could in theory be reflecting the exact time. We
don't have to descend into the gory details of specific CPUs. Imagine a
CPU specifically built to allow precise wall-clock time observation.
Then I think it depends on how the CPU makes that time available to clients.
If it does it in a way that's equivalent to regularly updating a volatile
variable that can be read by all threads, then timing should work as expected.
I think the current specification for the timing functions is effectively
much weaker, because it's silent on the issues.
I think that if we agree this is a problem, then it could be fixed by
updating the specifications for the timing functions, without touching the
memory model per se. I'm not sure I understand the implementation
implications of that, so I'm neither advocating it, nor lobbying against it.
The C++ specification is somewhere in the middle, and I remain a bit nervous
about implementability there.
Thread R /--> Rr0 --> Rr1
-----------+--------/
/ |
Thread W --> Rw0 -> Ww0 ---> Rw1 -+--> Ww1
/ / ------/
| | /
Thread T Wt0 -----------> Wt1
We have the Timer thread T, which writes the volatile variable "currentTime". Both threads R and W read that var. The rest is the same as in the opening post: W writes sharedVar, R reads sharedVar.
- T writes to currentTime with actions Wt0 and Wt1;
- W observes action Wt0 with the action Rw0;
- W writes to sharedVar with the action Ww0;
- W observes Wt1 with the action Rw1;
- R observes Wt1 with the action Rr0;
- R observes Ww0 with the action Rr1.
- R first observes Wt1, then Ww0;
- W first commits Ww0, then observes Wt1.
Since there are no cycles in the graph, there is no contradiction.
Suppose W writes v=1, then observes t=1; R observes t=2, then read v.
The last read cannot see v=0.

Therefore if R/W actions are sandwiched with reading the timing
variable, we will not detect any apparent timing paradoxes.

Since we are not talking about physical time (beyond JMM) any more,
instead just a variable (within JMM), JMM guarantees that any
execution of the program appears to be sequentially consistent.
√iktor Ҡlang
2012-08-22 19:40:30 UTC
Permalink
Post by Boehm, Hans
Post by Marko Topolnik
Post by Boehm, Hans
Post by Marko Topolnik
Let's say that they could in theory be reflecting the exact time. We
don't have to descend into the gory details of specific CPUs. Imagine a
CPU specifically built to allow precise wall-clock time observation.
Then I think it depends on how the CPU makes that time available to
clients.
Post by Marko Topolnik
Post by Boehm, Hans
If it does it in a way that's equivalent to regularly updating a
volatile
Post by Marko Topolnik
Post by Boehm, Hans
variable that can be read by all threads, then timing should work as
expected.
Post by Marko Topolnik
Post by Boehm, Hans
I think the current specification for the timing functions is
effectively
Post by Marko Topolnik
Post by Boehm, Hans
much weaker, because it's silent on the issues.
I think that if we agree this is a problem, then it could be fixed by
updating the specifications for the timing functions, without touching
the
Post by Marko Topolnik
Post by Boehm, Hans
memory model per se. I'm not sure I understand the implementation
implications of that, so I'm neither advocating it, nor lobbying
against it.
Post by Marko Topolnik
Post by Boehm, Hans
The C++ specification is somewhere in the middle, and I remain a bit
nervous
Post by Marko Topolnik
Post by Boehm, Hans
about implementability there.
I have thought about this approach more carefully and the funny thing
is, it still guarantees nothing. Please consider the following
Post by Marko Topolnik
Thread R /--> Rr0 --> Rr1
-----------+--------/
/ |
Thread W --> Rw0 -> Ww0 ---> Rw1 -+--> Ww1
/ / ------/
| | /
Thread T Wt0 -----------> Wt1
We have the Timer thread T, which writes the volatile variable
"currentTime". Both threads R and W read that var. The rest is the same as
in the opening post: W writes sharedVar, R reads sharedVar.
Post by Marko Topolnik
- T writes to currentTime with actions Wt0 and Wt1;
- W observes action Wt0 with the action Rw0;
- W writes to sharedVar with the action Ww0;
- W observes Wt1 with the action Rw1;
- R observes Wt1 with the action Rr0;
- R observes Ww0 with the action Rr1.
- R first observes Wt1, then Ww0;
- W first commits Ww0, then observes Wt1.
Since there are no cycles in the graph, there is no contradiction.
Suppose W writes v=1, then observes t=1; R observes t=2, then read v.
The last read cannot see v=0.
Therefore if R/W actions are sandwiched with reading the timing
variable, we will not detect any apparent timing paradoxes.
Since we are not talking about physical time (beyond JMM) any more,
instead just a variable (within JMM), JMM guarantees that any
execution of the program appears to be sequentially consistent.
So flush of any write to an address only has to be done just prior to any
read from that address by some other thread than the writer.
So given a program that only has one thread; flushes could effectively be
elided.

Cheers,
√
Post by Boehm, Hans
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
--
Viktor Klang

Akka Tech Lead
Typesafe <http://www.typesafe.com/> - The software stack for applications
that scale

Twitter: @viktorklang
Vitaly Davidovich
2012-08-22 19:57:21 UTC
Permalink
A single threaded program may still migrate across processors, so OS will
do the flushing anyway.

Also, draining store buffer is not always an expensive operation. Really
the scalability issue is sharing memory at all between lots of cores,
doesn't have to be volatile to cause problems.

Sent from my phone
Post by √iktor Ҡlang
Post by Boehm, Hans
Post by Marko Topolnik
Post by Boehm, Hans
Post by Marko Topolnik
Let's say that they could in theory be reflecting the exact time. We
don't have to descend into the gory details of specific CPUs. Imagine
a
Post by Marko Topolnik
Post by Boehm, Hans
Post by Marko Topolnik
CPU specifically built to allow precise wall-clock time observation.
Then I think it depends on how the CPU makes that time available to
clients.
Post by Marko Topolnik
Post by Boehm, Hans
If it does it in a way that's equivalent to regularly updating a
volatile
Post by Marko Topolnik
Post by Boehm, Hans
variable that can be read by all threads, then timing should work as
expected.
Post by Marko Topolnik
Post by Boehm, Hans
I think the current specification for the timing functions is
effectively
Post by Marko Topolnik
Post by Boehm, Hans
much weaker, because it's silent on the issues.
I think that if we agree this is a problem, then it could be fixed by
updating the specifications for the timing functions, without touching
the
Post by Marko Topolnik
Post by Boehm, Hans
memory model per se. I'm not sure I understand the implementation
implications of that, so I'm neither advocating it, nor lobbying
against it.
Post by Marko Topolnik
Post by Boehm, Hans
The C++ specification is somewhere in the middle, and I remain a bit
nervous
Post by Marko Topolnik
Post by Boehm, Hans
about implementability there.
I have thought about this approach more carefully and the funny thing
is, it still guarantees nothing. Please consider the following
Post by Marko Topolnik
Thread R /--> Rr0 --> Rr1
-----------+--------/
/ |
Thread W --> Rw0 -> Ww0 ---> Rw1 -+--> Ww1
/ / ------/
| | /
Thread T Wt0 -----------> Wt1
We have the Timer thread T, which writes the volatile variable
"currentTime". Both threads R and W read that var. The rest is the same as
in the opening post: W writes sharedVar, R reads sharedVar.
Post by Marko Topolnik
- T writes to currentTime with actions Wt0 and Wt1;
- W observes action Wt0 with the action Rw0;
- W writes to sharedVar with the action Ww0;
- W observes Wt1 with the action Rw1;
- R observes Wt1 with the action Rr0;
- R observes Ww0 with the action Rr1.
- R first observes Wt1, then Ww0;
- W first commits Ww0, then observes Wt1.
Since there are no cycles in the graph, there is no contradiction.
Suppose W writes v=1, then observes t=1; R observes t=2, then read v.
The last read cannot see v=0.
Therefore if R/W actions are sandwiched with reading the timing
variable, we will not detect any apparent timing paradoxes.
Since we are not talking about physical time (beyond JMM) any more,
instead just a variable (within JMM), JMM guarantees that any
execution of the program appears to be sequentially consistent.
So flush of any write to an address only has to be done just prior to any
read from that address by some other thread than the writer.
So given a program that only has one thread; flushes could effectively be
elided.
Cheers,
√
Post by Boehm, Hans
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
--
Viktor Klang
Akka Tech Lead
Typesafe <http://www.typesafe.com/> - The software stack for applications
that scale
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
√iktor Ҡlang
2012-08-22 20:08:02 UTC
Permalink
Post by Vitaly Davidovich
A single threaded program may still migrate across processors, so OS will
do the flushing anyway.
Yes of course. Core migration will have to include packing the threads
belongings and ship them over to the new core.
Post by Vitaly Davidovich
Also, draining store buffer is not always an expensive operation. Really
the scalability issue is sharing memory at all between lots of cores,
doesn't have to be volatile to cause problems.
Sent from my phone
Post by √iktor Ҡlang
Post by Marko Topolnik
Post by Marko Topolnik
Post by Boehm, Hans
Post by Marko Topolnik
Let's say that they could in theory be reflecting the exact time. We
don't have to descend into the gory details of specific CPUs.
Imagine a
Post by Marko Topolnik
Post by Boehm, Hans
Post by Marko Topolnik
CPU specifically built to allow precise wall-clock time observation.
Then I think it depends on how the CPU makes that time available to
clients.
Post by Marko Topolnik
Post by Boehm, Hans
If it does it in a way that's equivalent to regularly updating a
volatile
Post by Marko Topolnik
Post by Boehm, Hans
variable that can be read by all threads, then timing should work as
expected.
Post by Marko Topolnik
Post by Boehm, Hans
I think the current specification for the timing functions is
effectively
Post by Marko Topolnik
Post by Boehm, Hans
much weaker, because it's silent on the issues.
I think that if we agree this is a problem, then it could be fixed by
updating the specifications for the timing functions, without
touching the
Post by Marko Topolnik
Post by Boehm, Hans
memory model per se. I'm not sure I understand the implementation
implications of that, so I'm neither advocating it, nor lobbying
against it.
Post by Marko Topolnik
Post by Boehm, Hans
The C++ specification is somewhere in the middle, and I remain a bit
nervous
Post by Marko Topolnik
Post by Boehm, Hans
about implementability there.
I have thought about this approach more carefully and the funny thing
is, it still guarantees nothing. Please consider the following
Post by Marko Topolnik
Thread R /--> Rr0 --> Rr1
-----------+--------/
/ |
Thread W --> Rw0 -> Ww0 ---> Rw1 -+--> Ww1
/ / ------/
| | /
Thread T Wt0 -----------> Wt1
We have the Timer thread T, which writes the volatile variable
"currentTime". Both threads R and W read that var. The rest is the same as
in the opening post: W writes sharedVar, R reads sharedVar.
Post by Marko Topolnik
- T writes to currentTime with actions Wt0 and Wt1;
- W observes action Wt0 with the action Rw0;
- W writes to sharedVar with the action Ww0;
- W observes Wt1 with the action Rw1;
- R observes Wt1 with the action Rr0;
- R observes Ww0 with the action Rr1.
- R first observes Wt1, then Ww0;
- W first commits Ww0, then observes Wt1.
Since there are no cycles in the graph, there is no contradiction.
Suppose W writes v=1, then observes t=1; R observes t=2, then read v.
The last read cannot see v=0.
Therefore if R/W actions are sandwiched with reading the timing
variable, we will not detect any apparent timing paradoxes.
Since we are not talking about physical time (beyond JMM) any more,
instead just a variable (within JMM), JMM guarantees that any
execution of the program appears to be sequentially consistent.
So flush of any write to an address only has to be done just prior to any
read from that address by some other thread than the writer.
So given a program that only has one thread; flushes could effectively be
elided.
Cheers,
√
Post by Marko Topolnik
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
--
Viktor Klang
Akka Tech Lead
Typesafe <http://www.typesafe.com/> - The software stack for
applications that scale
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
--
Viktor Klang

Akka Tech Lead
Typesafe <http://www.typesafe.com/> - The software stack for applications
that scale

Twitter: @viktorklang
Marko Topolnik
2012-08-22 21:44:07 UTC
Permalink
Post by Zhong Yu
Post by Marko Topolnik
Thread R /--> Rr0 --> Rr1
-----------+--------/
/ |
Thread W --> Rw0 -> Ww0 ---> Rw1 -+--> Ww1
/ / ------/
| | /
Thread T Wt0 -----------> Wt1
Suppose W writes v=1, then observes t=1; R observes t=2, then read v.
The last read cannot see v=0.
Therefore if R/W actions are sandwiched with reading the timing
variable, we will not detect any apparent timing paradoxes.
Since we are not talking about physical time (beyond JMM) any more,
instead just a variable (within JMM), JMM guarantees that any
execution of the program appears to be sequentially consistent.
I was trying to form a minimal example to demonstrate the point, it seems still lacking. Consider the following graph, now there's three wall-clock writes involved:


Thread R /--> Rr0 --> Rr1
------------------+---------/
/ |
Thread W --> Rw0 -> Ww0 ---> Rw1 -> Ww1 |
/ / |
| | |
Thread T Wt0 -----------> Wt1 ------------> Wt2


Say Wt0 writes t = 0 ms, Wt1 writes t = 3 ms, Wt2 writes t = 6 ms.

Also say Ww0 writes zero and Ww1 writes one.

Thread W says "I wrote zero at t = 0 ms and one at t = 3 ms."

Thread R says "I read zero at t = 6 ms."

Note also the general pattern: there are no constraints imposed by the arrows from T to R on the arrows from W to R.

-Marko
Zhong Yu
2012-08-22 23:17:39 UTC
Permalink
Post by Marko Topolnik
Post by Zhong Yu
Post by Marko Topolnik
Thread R /--> Rr0 --> Rr1
-----------+--------/
/ |
Thread W --> Rw0 -> Ww0 ---> Rw1 -+--> Ww1
/ / ------/
| | /
Thread T Wt0 -----------> Wt1
Suppose W writes v=1, then observes t=1; R observes t=2, then read v.
The last read cannot see v=0.
Therefore if R/W actions are sandwiched with reading the timing
variable, we will not detect any apparent timing paradoxes.
Since we are not talking about physical time (beyond JMM) any more,
instead just a variable (within JMM), JMM guarantees that any
execution of the program appears to be sequentially consistent.
Thread R /--> Rr0 --> Rr1
------------------+---------/
/ |
Thread W --> Rw0 -> Ww0 ---> Rw1 -> Ww1 |
/ / |
| | |
Thread T Wt0 -----------> Wt1 ------------> Wt2
Say Wt0 writes t = 0 ms, Wt1 writes t = 3 ms, Wt2 writes t = 6 ms.
Also say Ww0 writes zero and Ww1 writes one.
Thread W says "I wrote zero at t = 0 ms and one at t = 3 ms."
W cannot say that, it's more like W wrote one after seeing t=3. That
is not inconsistent with R reading zero after seeing t=6.

Note that the clock precision is not good here. For example, there is
no paradox in "W wrote one before seeing t=6" and "R read zero after
seeing t=6", because "t=6" can be seen in an extended time window.

A real paradox would be "W wrote one before seeing t=3" and "R read
zero after seeing t=6", a timing error greater than clock precision.

JMM does not allow that paradox. If Rw2 sees t=3, Wt1<Rw2<Wt2 in sync
order, therefore Ww1<Rw2 <Wt2<Rr0<Rr1, therefore Ww0<Ww1<Rr1, that
forbids Rr1 seeing Ww0.
Post by Marko Topolnik
Thread R says "I read zero at t = 6 ms."
Note also the general pattern: there are no constraints imposed by the arrows from T to R on the arrows from W to R.
You missed the arrows from a read to the write that comes after:)
Post by Marko Topolnik
-Marko
Marko Topolnik
2012-08-23 06:02:05 UTC
Permalink
Post by Zhong Yu
Post by Marko Topolnik
Post by Zhong Yu
Post by Marko Topolnik
Thread R /--> Rr0 --> Rr1
-----------+--------/
/ |
Thread W --> Rw0 -> Ww0 ---> Rw1 -+--> Ww1
/ / ------/
| | /
Thread T Wt0 -----------> Wt1
Suppose W writes v=1, then observes t=1; R observes t=2, then read v.
The last read cannot see v=0.
Therefore if R/W actions are sandwiched with reading the timing
variable, we will not detect any apparent timing paradoxes.
Since we are not talking about physical time (beyond JMM) any more,
instead just a variable (within JMM), JMM guarantees that any
execution of the program appears to be sequentially consistent.
Thread R /--> Rr0 --> Rr1
------------------+---------/
/ |
Thread W --> Rw0 -> Ww0 ---> Rw1 -> Ww1 |
/ / |
| | |
Thread T Wt0 -----------> Wt1 ------------> Wt2
Say Wt0 writes t = 0 ms, Wt1 writes t = 3 ms, Wt2 writes t = 6 ms.
Also say Ww0 writes zero and Ww1 writes one.
Thread W says "I wrote zero at t = 0 ms and one at t = 3 ms."
W cannot say that, it's more like W wrote one after seeing t=3. That
is not inconsistent with R reading zero after seeing t=6.
Note that the clock precision is not good here. For example, there is
no paradox in "W wrote one before seeing t=6" and "R read zero after
seeing t=6", because "t=6" can be seen in an extended time window.
A real paradox would be "W wrote one before seeing t=3" and "R read
zero after seeing t=6", a timing error greater than clock precision.
JMM does not allow that paradox. If Rw2 sees t=3, Wt1<Rw2<Wt2 in sync
order, therefore Ww1<Rw2 <Wt2<Rr0<Rr1, therefore Ww0<Ww1<Rr1, that
forbids Rr1 seeing Ww0.
This would be the situation you describe:

/--> Rrt6 --/-> Rrt9 --> Rv0
---------------------------+------------+--------/
/ | ------/
Wv0 ---> Rwt3 -> Wv1 --> Rwt6 | /
/ / --| /
| | / /
T3 ------------> T6 ----> T9


Notice I've fixed the naming somewhat.

T3, T6, T9 -- writes to currentTime
Rwt0, Rwt1 -- reads of currentTime by writing thread
Rrt1, Rrt2 -- reads of currentTime by reading thread
Wv0, Wv1 -- writes to the sharedVar
Rv0 -- read of the sharedVar

initially t = 0 ms;
T3 writes t = 3 ms;
T6 writes t = 6 ms;
T9 writes t = 9 ms.

The program outputs

Writing 0 at t = 0 ms
Writing 1 at t = 3 ms
Reading 0 at t = 9 ms
Post by Zhong Yu
Post by Marko Topolnik
Thread R says "I read zero at t = 6 ms."
Note also the general pattern: there are no constraints imposed by the arrows from T to R on the arrows from W to R.
You missed the arrows from a read to the write that comes after:)
Not sure what you mean, but there is no synchronizes-with edge going from a read to a write. No arrows extend from one thread reading to another thread writing.

-Marko
Andreas Lochbihler
2012-08-23 08:01:12 UTC
Permalink
Dear Marko,
/--> Rrt6 --/-> Rrt9 --> Rv0
---------------------------+------------+--------/
/ | ------/
Wv0 ---> Rwt3 -> Wv1 --> Rwt6 | /
/ / --| /
| | / /
T3 ------------> T6 ----> T9
Notice I've fixed the naming somewhat.
T3, T6, T9 -- writes to currentTime
Rwt0, Rwt1 -- reads of currentTime by writing thread
Rrt1, Rrt2 -- reads of currentTime by reading thread
Wv0, Wv1 -- writes to the sharedVar
Rv0 -- read of the sharedVar
initially t = 0 ms;
T3 writes t = 3 ms;
T6 writes t = 6 ms;
T9 writes t = 9 ms.
The program outputs
Writing 0 at t = 0 ms
Writing 1 at t = 3 ms
Reading 0 at t = 9 ms
The JMM does not allow this output if sharedVar and currentTime are volatile, as
the following reasoning shows: All actions in your example are synchronisation
actions, i.e., they must be totally ordered by the synchronisation order, but
there is none.

Assume for the sake of contradiction there was such a synchronisation order <so.
1. Wv0 <so Wv1 and Wv1 <so Rwt6 and Rrt9 <so Rv0 hold, because synchronisation
order is consistent with program order.
2. Not T9 <so Rwt6, because otherwise, T9 would be an intervening write between
T6 and Rwt6 since T6 <so T9 which contradicts well-formedness (JLS 17.4.7.5).
3. Hence Rwt6 <so T9 by totality of the synchronisation order.
4. T9 <so Rwt9, because Rwt9 sees T9, i.e., well-formedness forbids T9 <so Rwt9,
and <so is total.
5. Wv1 <so Rwt9 by transitivity of 1., 3., 4.
6. Hence, Wv1 is an intervening write between Wv0 and Rv0 in the synchronisation
order (since Wv0 <so Wv1 from 1.), which contradicts well-formedness (JLS 17.4.7.5).

Andreas
--
Karlsruher Institut für Technologie
IPD Snelting

Andreas Lochbihler
wissenschaftlicher Mitarbeiter
Am Fasanengarten 5, Geb. 50.34, Raum 025
76131 Karlsruhe

Telefon: +49 721 608-47399
Fax: +49 721 608-48457
E-Mail: ***@kit.edu
http://pp.info.uni-karlsruhe.de
KIT - Universität des Landes Baden-Württemberg und nationales Forschungszentrum
in der Helmholtz-Gemeinschaft
Marko Topolnik
2012-08-23 09:19:06 UTC
Permalink
Post by Andreas Lochbihler
Dear Marko,
/--> Rrt6 --/-> Rrt9 --> Rv0
---------------------------+------------+--------/
/ | ------/
Wv0 ---> Rwt3 -> Wv1 --> Rwt6 | /
/ / --| /
| | / /
T3 ------------> T6 ----> T9
Notice I've fixed the naming somewhat.
T3, T6, T9 -- writes to currentTime
Rwt0, Rwt1 -- reads of currentTime by writing thread
Rrt1, Rrt2 -- reads of currentTime by reading thread
Wv0, Wv1 -- writes to the sharedVar
Rv0 -- read of the sharedVar
initially t = 0 ms;
T3 writes t = 3 ms;
T6 writes t = 6 ms;
T9 writes t = 9 ms.
The program outputs
Writing 0 at t = 0 ms
Writing 1 at t = 3 ms
Reading 0 at t = 9 ms
The JMM does not allow this output if sharedVar and currentTime are volatile, as the following reasoning shows: All actions in your example are synchronisation actions, i.e., they must be totally ordered by the synchronisation order, but there is none.
Assume for the sake of contradiction there was such a synchronisation order <so.
1. Wv0 <so Wv1 and Wv1 <so Rwt6 and Rrt9 <so Rv0 hold, because synchronisation order is consistent with program order.
2. Not T9 <so Rwt6, because otherwise, T9 would be an intervening write between T6 and Rwt6 since T6 <so T9 which contradicts well-formedness (JLS 17.4.7.5).
3. Hence Rwt6 <so T9 by totality of the synchronisation order.
4. T9 <so Rwt9, because Rwt9 sees T9, i.e., well-formedness forbids T9 <so Rwt9, and <so is total.
5. Wv1 <so Rwt9 by transitivity of 1., 3., 4.
6. Hence, Wv1 is an intervening write between Wv0 and Rv0 in the synchronisation order (since Wv0 <so Wv1 from 1.), which contradicts well-formedness (JLS 17.4.7.5).
Andreas
You are right, my example disregards the constraints imposed by the total synchronization order.

-Marko

Boehm, Hans
2012-08-22 23:46:45 UTC
Permalink
-----Original Message-----
Sent: Wednesday, August 22, 2012 2:44 PM
To: Zhong Yu
Subject: Re: [concurrency-interest] Relativity of guarantees provided
by volatile
Post by Zhong Yu
Post by Marko Topolnik
Thread R /--> Rr0 --> Rr1
-----------+--------/
/ |
Thread W --> Rw0 -> Ww0 ---> Rw1 -+--> Ww1
/ / ------/
| | /
Thread T Wt0 -----------> Wt1
Suppose W writes v=1, then observes t=1; R observes t=2, then read v.
The last read cannot see v=0.
Therefore if R/W actions are sandwiched with reading the timing
variable, we will not detect any apparent timing paradoxes.
Since we are not talking about physical time (beyond JMM) any more,
instead just a variable (within JMM), JMM guarantees that any
execution of the program appears to be sequentially consistent.
I was trying to form a minimal example to demonstrate the point, it
seems still lacking. Consider the following graph, now there's three
Thread R /--> Rr0 --> Rr1
------------------+---------/
/ |
Thread W --> Rw0 -> Ww0 ---> Rw1 -> Ww1 |
/ / |
| | |
Thread T Wt0 -----------> Wt1 ------------> Wt2
Say Wt0 writes t = 0 ms, Wt1 writes t = 3 ms, Wt2 writes t = 6 ms.
Also say Ww0 writes zero and Ww1 writes one.
Thread W says "I wrote zero at t = 0 ms and one at t = 3 ms."
Thread R says "I read zero at t = 6 ms."
Note also the general pattern: there are no constraints imposed by the
arrows from T to R on the arrows from W to R.
-Marko
I'm having a bit of trouble interpreting your notation. Rw1 refers to a timer read? Why is this not exactly what you expect if Ww1 actually happened at 7 ms due to some scheduling delay? Is it strange just because of the way you drew the diagram? It seems to me that to be problematic, thread W would have to read the timer AFTER the second write, but that's no longer sequentially consistent.


Hans
David Holmes
2012-08-17 23:28:55 UTC
Permalink
Well we just have to agree to disagree. If nothing in the program order or
synchronization order establishes that a read must happen between a pair of
those writes, then the read need not happen until after the last write. As I
said externally you may be able to see that the optimization was performed
but the program can not tell - and that is what counts.

David
-----Original Message-----
From: Vitaly Davidovich [mailto:***@gmail.com]
Sent: Saturday, 18 August 2012 9:22 AM
To: ***@ieee.org
Cc: Marko Topolnik; concurrency-***@cs.oswego.edu
Subject: RE: [concurrency-interest] Relativity of guarantees provided by
volatile


Two volatile writes emit a store-store barrier in between, which to me
means they cannot be collapsed and must be made visible in that order (on
non-TSO this would require a h/w fence). In other words, I don't think
compiler can remove the redundant stores as if this was a non-volatile
field, where it's a perfectly valid (and good) optimization.

Sent from my phone

On Aug 17, 2012 7:18 PM, "David Holmes" <***@aapt.net.au> wrote:

How does it violate the JMM? There is nothing to establish that any read
has to have occurred prior to w=3. An external observer may say "hey if we'd
actually written w=1 at this point then the read would see 1" but that is
irrelevant. The program can not tell the other writes did not occur.

David
-----Original Message-----
From: Vitaly Davidovich [mailto:***@gmail.com]
Sent: Saturday, 18 August 2012 9:12 AM
To: ***@ieee.org
Cc: Marko Topolnik; concurrency-***@cs.oswego.edu
Subject: Re: [concurrency-interest] Relativity of guarantees provided
by volatile


I don't think the writes to w can be reduced to just the last one as
it would violate the JMM. R may only see the last one due to interleaving
though. Not sure if that's what you meant.

Sent from my phone

On Aug 17, 2012 7:03 PM, "David Holmes" <***@aapt.net.au>
wrote:

Hi Marko,

I think the "surprise" is only in the way you formulated this. Said
another
way a write takes a finite amount of time from when the instruction
starts
to execute to when the store is actually available for a read to
see.
(Similarly a read takes a finite amount of time.) So depending on
those two
times a read and write that happen "around the same time" may appear
to have
occurred in either order. But when you program with threads you
never know
the relative interleavings (or should never assume) so it makes no
difference how the program perceives the order compared to how some
external
observer might perceive it.

As for your optimization to "chunk" volatile writes, I don't see a
problem
here if you are basically asking if given:

w = 1; // w is volatile
w = 2;
w = 3;

that this could be reduced to the last write alone? I see no reason
why not.
Without some additional coordination between a reader thread and the
writer
thread, reading w==3 is a legitimate outcome. If you are thinking
about how
the hardware might chunk things then that is a different matter. We
have to
use the hardware in a way that complies with the memory model - if
the
hardware can't comply then you can't run Java on it.

David Holmes
------------
-----Original Message-----
Marko
Topolnik
Sent: Saturday, 18 August 2012 7:24 AM
Subject: [concurrency-interest] Relativity of guarantees provided
by
volatile
Consider the following synchronization order of a program
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each
time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it
writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no
contradiction between the synchronization order and the
wall-clock times, as the JMM is wall-clock agnostic. However, I
have yet to meet a single Java professional who wouldn't at least
be very surprised to hear that the specification allows this.
I understand that the SMP architecture that dominates the world
of computing today practically never takes these liberties and
makes the volatile writes visible almost instantaneously. This
may change at any time, however, especially with the advent of
massively parrallel architectures that seem to be the future. For
example, an optimization technique may choose to chunk many
volatile writes together and make them visible in a single bulk
operation. This can be safely done as long as there are no
intervening read-y actions (targets of the synchronizes-with
edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about,
given that practically 100% developers out there consider as
guaranteed something that isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
Concurrency-***@cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
David Holmes
2012-08-17 23:34:21 UTC
Permalink
"In other words, once the write happens, then any read operation which
starts execution after that point in time will be subsequent to the write
and hence will see the effects of the write."

EXACTLY! Once the write has happened the read is subsequent. But that isn't
the same as saying "once the write has started". The thing is, does "coming
after" mean "starting after" or "completing after"? I would argue completing
after. In fact I seem to recall that the full formal model for the JMM
assumed that instructions executed instantaneously, so that this kind of
discussion would not arise. If this discussion were on the JMM list where it
belongs someone may have already pointed that out. :) I may be
mis-remembering of course.

David

-----Original Message-----
From: Nathan Reynolds [mailto:***@oracle.com]
Sent: Saturday, 18 August 2012 9:25 AM
To: ***@ieee.org
Cc: David Holmes; concurrency-***@cs.oswego.edu
Subject: Re: [concurrency-interest] Relativity of guarantees provided
byvolatile


Hmm... I am getting tripped up on JMM vs Intel memory model vs HotSpot
implementation.

Dictionary.com says subsequent = occurring or coming later or after;
following in order or succession. Given that each read happens 1 unit of
time after every write, I would think that the read is "subsequent" to the
write. In other words, once the write happens, then any read operation
which starts execution after that point in time will be subsequent to the
write and hence will see the effects of the write.


Nathan Reynolds | Consulting Member of Technical Staff | 602.333.9091
Oracle PSR Engineering | Server Technology

On 8/17/2012 4:03 PM, David Holmes wrote:

Nathan,

Is there a synchronization order here that indicates that the read
occurs after the write? "subsequent" is not defined by wall-clock time as
externally observed.

And to batch responses :) Hotspot's implementation requires that all
writes will become visible without need for explicit barriers/fences to
force that.

David
-----Original Message-----
From: concurrency-interest-***@cs.oswego.edu
[mailto:concurrency-interest-***@cs.oswego.edu]On Behalf Of Nathan
Reynolds
Sent: Saturday, 18 August 2012 8:47 AM
To: concurrency-***@cs.oswego.edu
Subject: Re: [concurrency-interest] Relativity of guarantees provided
byvolatile
Post by Marko Topolnik
I have yet to meet a single Java professional who wouldn't at least
be very surprised to hear that the specification allows this.

Add me to the list of surprised.

Here's an excerpt from the specification.

A write to a volatile variable (§8.3.1.4) v synchronizes-with all
subsequent reads of v by any thread (where subsequent is defined according
to the synchronization order).

http://docs.oracle.com/javase/specs/jls/se5.0/html/memory.html#17.4.4

Doesn't this mean that R will always read 1 since the read occurs
after the write?


Nathan Reynolds | Consulting Member of Technical Staff | 602.333.9091
Oracle PSR Engineering | Server Technology

On 8/17/2012 2:24 PM, Marko Topolnik wrote:

Consider the following synchronization order of a program execution
involving a total of two threads, R and W:

- thread R begins;

- thread R reads a volatile int sharedVar several times. Each time it reads
the value 0;

- thread R completes;

- thread W begins;

- thread W writes the sharedVar several times. Each time it writes the value
1;

- thread W completes.

Now consider the wall-clock timing of the events:

- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.

As far as the Java Memory Model is concerned, there is no contradiction
between the synchronization order and the wall-clock times, as the JMM is
wall-clock agnostic. However, I have yet to meet a single Java professional
who wouldn't at least be very surprised to hear that the specification
allows this.

I understand that the SMP architecture that dominates the world of computing
today practically never takes these liberties and makes the volatile writes
visible almost instantaneously. This may change at any time, however,
especially with the advent of massively parrallel architectures that seem to
be the future. For example, an optimization technique may choose to chunk
many volatile writes together and make them visible in a single bulk
operation. This can be safely done as long as there are no intervening
read-y actions (targets of the synchronizes-with edges as defined by
JLS/JSE7 17.4.4).

Now, my questions are:

1. Is there a loophole in my reasoning?

2. If there is no loophole, is there anything to worry about, given that
practically 100% developers out there consider as guaranteed something that
isn't?


-Marko
Zhong Yu
2012-08-17 23:41:51 UTC
Permalink
Actually we need to reverse engineer a synchronization order, from
values written and read. This is not always possible. For example,
thread W writes 1 then writes 2, and thread R reads 2 then reads 1.
(all volatile) There is no way to concoct a synchronization order
consistent with other JMM requirements. Therefore the example is
impossible under JMM.

Is OP's example possible under JMM? I think so, we can construct a
physical model that conforms to JMM but with a slow signal speed (say
data are delivered by USPS), and OP's example can be realized on it.

Zhong Yu
David Holmes
2012-08-18 00:03:20 UTC
Permalink
Ah I knew there was something about "immediate" in there somewhere :) So if
instructions were immediate (ie they completed as soon as they started) then
wall clock observations would be consistent with actual executions. But on
real systems instructions are not immediate so you have to take the
completion of the instruction as the point at which to make a wall clock
observation.

Aside: as well as being atomic and immediate you also need to preclude two
instructions from executing simultaneously :)

David (now departing to enjoy the weekend :) )
-----Original Message-----
From: Yuval Shavit [mailto:***@akiban.com]
Sent: Saturday, 18 August 2012 9:49 AM
To: Vitaly Davidovich
Cc: ***@ieee.org; concurrency-***@cs.oswego.edu
Subject: Re: [concurrency-interest] Relativity of guarantees provided by
volatile


(Resending -- I'd accidentally sent this only to Nathan)


So, Nathan, you're essentially saying that "subsequent" is defined in
terms of clock time. If we take that as the definition, then I agree that it
works as we'd all expect. Furthermore, I agree that that's the intuitive
definition, as well as the de-facto one. But it doesn't seem to be the one
explicitly defined by the JLS. (And if it is the one implicitly defined by
the JLS, I there's a serious bug waiting for any distributed-node JVM whose
nodes are traveling away from each other at relativistic speeds! ... though
I'm no physics expert.)


That said, 17.4.3 does imply that the reads will be viewable in a
wall-clock-sequential way, albeit informally


Sequential consistency is a very strong guarantee that is made about
visibility and ordering in an execution of a program. Within a sequentially
consistent execution, there is a total order over all individual actions
(such as reads and writes) which is consistent with the order of the
program, and each individual action is atomic and is immediately visible to
every thread.


(emphasis on "is immediately visible")


On Fri, Aug 17, 2012 at 7:41 PM, Vitaly Davidovich <***@gmail.com>
wrote:

David and you may be right in the theoretical aspect. In practice, I
can't fathom how a JVM can do this type of analysis. That's an issue that I
have with JMM's wording of happens-before -- it doesn't translate to
reality, it seems like.

Sent from my phone

On Aug 17, 2012 7:30 PM, "Yuval Shavit" <***@akiban.com> wrote:

Sure, but it could decide that the execution order is [w1, w2, w3, r].
In fact, as far as we know, the thread may have been scheduled such that
that was the clock-time ordering, too.


On Fri, Aug 17, 2012 at 7:21 PM, Vitaly Davidovich <***@gmail.com>
wrote:

Two volatile writes emit a store-store barrier in between, which to
me means they cannot be collapsed and must be made visible in that order (on
non-TSO this would require a h/w fence). In other words, I don't think
compiler can remove the redundant stores as if this was a non-volatile
field, where it's a perfectly valid (and good) optimization.

Sent from my phone

On Aug 17, 2012 7:18 PM, "David Holmes" <***@aapt.net.au>
wrote:

How does it violate the JMM? There is nothing to establish that
any read has to have occurred prior to w=3. An external observer may say
"hey if we'd actually written w=1 at this point then the read would see 1"
but that is irrelevant. The program can not tell the other writes did not
occur.

David
-----Original Message-----
From: Vitaly Davidovich [mailto:***@gmail.com]
Sent: Saturday, 18 August 2012 9:12 AM
To: ***@ieee.org
Cc: Marko Topolnik; concurrency-***@cs.oswego.edu
Subject: Re: [concurrency-interest] Relativity of guarantees
provided by volatile


I don't think the writes to w can be reduced to just the last
one as it would violate the JMM. R may only see the last one due to
interleaving though. Not sure if that's what you meant.

Sent from my phone

On Aug 17, 2012 7:03 PM, "David Holmes"
<***@aapt.net.au> wrote:

Hi Marko,

I think the "surprise" is only in the way you formulated this.
Said another
way a write takes a finite amount of time from when the
instruction starts
to execute to when the store is actually available for a read
to see.
(Similarly a read takes a finite amount of time.) So depending
on those two
times a read and write that happen "around the same time" may
appear to have
occurred in either order. But when you program with threads
you never know
the relative interleavings (or should never assume) so it
makes no
difference how the program perceives the order compared to how
some external
observer might perceive it.

As for your optimization to "chunk" volatile writes, I don't
see a problem
here if you are basically asking if given:

w = 1; // w is volatile
w = 2;
w = 3;

that this could be reduced to the last write alone? I see no
reason why not.
Without some additional coordination between a reader thread
and the writer
thread, reading w==3 is a legitimate outcome. If you are
thinking about how
the hardware might chunk things then that is a different
matter. We have to
use the hardware in a way that complies with the memory
model - if the
hardware can't comply then you can't run Java on it.

David Holmes
------------
-----Original Message-----
Of Marko
Topolnik
Sent: Saturday, 18 August 2012 7:24 AM
Subject: [concurrency-interest] Relativity of guarantees
provided by
volatile
Consider the following synchronization order of a program
- thread R begins;
- thread R reads a volatile int sharedVar several times.
Each
time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it
writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no
contradiction between the synchronization order and the
wall-clock times, as the JMM is wall-clock agnostic.
However, I
have yet to meet a single Java professional who wouldn't at
least
be very surprised to hear that the specification allows
this.
I understand that the SMP architecture that dominates the
world
of computing today practically never takes these liberties
and
makes the volatile writes visible almost instantaneously.
This
may change at any time, however, especially with the advent
of
massively parrallel architectures that seem to be the
future. For
example, an optimization technique may choose to chunk many
volatile writes together and make them visible in a single
bulk
operation. This can be safely done as long as there are no
intervening read-y actions (targets of the synchronizes-with
edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry
about,
given that practically 100% developers out there consider as
guaranteed something that isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
Concurrency-***@cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
Concurrency-***@cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Boehm, Hans
2012-08-18 01:21:46 UTC
Permalink
My reading is that "subsequent" in chapter 17 of the spec refers to synchronization order, except in one case in which it refers to a hypothetical sequentially consistent memory model.

I agree that the originally described behavior is allowed. There's another complication here that can explain this behavior (as can delayed stores). AFAIK, the spec does not clearly require that "wall clock time" is perfectly synchronized across threads, and implementations may, for good reason, fail to ensure that, particularly in this case. We had this discussion in connection with C++11, and that spec does require that a specific clock is consistent with happens-before. But even that spec, which is strict enough to raise some controversy, isn't sufficient here, since there is no happens-before relation between the threads.

Hans

From: concurrency-interest-***@cs.oswego.edu [mailto:concurrency-interest-***@cs.oswego.edu] On Behalf Of David Holmes
Sent: Friday, August 17, 2012 5:03 PM
To: Yuval Shavit; Vitaly Davidovich
Cc: concurrency-***@cs.oswego.edu
Subject: Re: [concurrency-interest] Relativity of guarantees provided by volatile

Ah I knew there was something about "immediate" in there somewhere :) So if instructions were immediate (ie they completed as soon as they started) then wall clock observations would be consistent with actual executions. But on real systems instructions are not immediate so you have to take the completion of the instruction as the point at which to make a wall clock observation.

Aside: as well as being atomic and immediate you also need to preclude two instructions from executing simultaneously :)

David (now departing to enjoy the weekend :) )
-----Original Message-----
From: Yuval Shavit [mailto:***@akiban.com]
Sent: Saturday, 18 August 2012 9:49 AM
To: Vitaly Davidovich
Cc: ***@ieee.org; concurrency-***@cs.oswego.edu
Subject: Re: [concurrency-interest] Relativity of guarantees provided by volatile
(Resending -- I'd accidentally sent this only to Nathan)
So, Nathan, you're essentially saying that "subsequent" is defined in terms of clock time. If we take that as the definition, then I agree that it works as we'd all expect. Furthermore, I agree that that's the intuitive definition, as well as the de-facto one. But it doesn't seem to be the one explicitly defined by the JLS. (And if it is the one implicitly defined by the JLS, I there's a serious bug waiting for any distributed-node JVM whose nodes are traveling away from each other at relativistic speeds! ... though I'm no physics expert.)

That said, 17.4.3 does imply that the reads will be viewable in a wall-clock-sequential way, albeit informally

Sequential consistency is a very strong guarantee that is made about visibility and ordering in an execution of a program. Within a sequentially consistent execution, there is a total order over all individual actions (such as reads and writes) which is consistent with the order of the program, and each individual action is atomic and is immediately visible to every thread.

(emphasis on "is immediately visible")

On Fri, Aug 17, 2012 at 7:41 PM, Vitaly Davidovich <***@gmail.com<mailto:***@gmail.com>> wrote:

David and you may be right in the theoretical aspect. In practice, I can't fathom how a JVM can do this type of analysis. That's an issue that I have with JMM's wording of happens-before -- it doesn't translate to reality, it seems like.

Sent from my phone
On Aug 17, 2012 7:30 PM, "Yuval Shavit" <***@akiban.com<mailto:***@akiban.com>> wrote:
Sure, but it could decide that the execution order is [w1, w2, w3, r]. In fact, as far as we know, the thread may have been scheduled such that that was the clock-time ordering, too.
On Fri, Aug 17, 2012 at 7:21 PM, Vitaly Davidovich <***@gmail.com<mailto:***@gmail.com>> wrote:

Two volatile writes emit a store-store barrier in between, which to me means they cannot be collapsed and must be made visible in that order (on non-TSO this would require a h/w fence). In other words, I don't think compiler can remove the redundant stores as if this was a non-volatile field, where it's a perfectly valid (and good) optimization.

Sent from my phone
On Aug 17, 2012 7:18 PM, "David Holmes" <***@aapt.net.au<mailto:***@aapt.net.au>> wrote:
How does it violate the JMM? There is nothing to establish that any read has to have occurred prior to w=3. An external observer may say "hey if we'd actually written w=1 at this point then the read would see 1" but that is irrelevant. The program can not tell the other writes did not occur.

David
-----Original Message-----
From: Vitaly Davidovich [mailto:***@gmail.com<mailto:***@gmail.com>]
Sent: Saturday, 18 August 2012 9:12 AM
To: ***@ieee.org<mailto:***@ieee.org>
Cc: Marko Topolnik; concurrency-***@cs.oswego.edu<mailto:concurrency-***@cs.oswego.edu>
Subject: Re: [concurrency-interest] Relativity of guarantees provided by volatile

I don't think the writes to w can be reduced to just the last one as it would violate the JMM. R may only see the last one due to interleaving though. Not sure if that's what you meant.

Sent from my phone
On Aug 17, 2012 7:03 PM, "David Holmes" <***@aapt.net.au<mailto:***@aapt.net.au>> wrote:
Hi Marko,

I think the "surprise" is only in the way you formulated this. Said another
way a write takes a finite amount of time from when the instruction starts
to execute to when the store is actually available for a read to see.
(Similarly a read takes a finite amount of time.) So depending on those two
times a read and write that happen "around the same time" may appear to have
occurred in either order. But when you program with threads you never know
the relative interleavings (or should never assume) so it makes no
difference how the program perceives the order compared to how some external
observer might perceive it.

As for your optimization to "chunk" volatile writes, I don't see a problem
here if you are basically asking if given:

w = 1; // w is volatile
w = 2;
w = 3;

that this could be reduced to the last write alone? I see no reason why not.
Without some additional coordination between a reader thread and the writer
thread, reading w==3 is a legitimate outcome. If you are thinking about how
the hardware might chunk things then that is a different matter. We have to
use the hardware in a way that complies with the memory model - if the
hardware can't comply then you can't run Java on it.

David Holmes
------------
-----Original Message-----
Topolnik
Sent: Saturday, 18 August 2012 7:24 AM
Subject: [concurrency-interest] Relativity of guarantees provided by
volatile
Consider the following synchronization order of a program
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each
time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it
writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no
contradiction between the synchronization order and the
wall-clock times, as the JMM is wall-clock agnostic. However, I
have yet to meet a single Java professional who wouldn't at least
be very surprised to hear that the specification allows this.
I understand that the SMP architecture that dominates the world
of computing today practically never takes these liberties and
makes the volatile writes visible almost instantaneously. This
may change at any time, however, especially with the advent of
massively parrallel architectures that seem to be the future. For
example, an optimization technique may choose to chunk many
volatile writes together and make them visible in a single bulk
operation. This can be safely done as long as there are no
intervening read-y actions (targets of the synchronizes-with
edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about,
given that practically 100% developers out there consider as
guaranteed something that isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Yuval Shavit
2012-08-18 01:39:50 UTC
Permalink
I was thinking about this on my way home, and I think Marko's initial
proposition is correct -- but that we shouldn't concern ourselves with it.

I still think it comes down to the meaning of "subsequent," but I think
wall-clock is not a good definition. In a multi-core machine, it's
conceivable that two actions would happen close enough that it's impossible
to tell which one happened first (by wall clock time). Forcing a consensus
about which came first would be expensive, and it would probably be
prohibitive on a many-core machine -- let alone a multi-node JVM (I was
only partially joking about that before).

So, what happens if we have some loose definition of "subsequent," probably
with transitivity and perhaps some other basic properties? If we do that,
then Marko's initial proposition holds. For instance, the JVM would be
allowed to have a while(someVolatileBool) thread go on forever, having been
determined to be subsequent all other actions.

But there are lots of things which the JVM is allowed to take its time on,
but shouldn't. I haven't read anything in the JLS that requires "int i = 1"
to complete before the heat death of the universe, and yet we would surely
file a bug if it were not nearly instantaneous.

I would argue that subsequent-ness is the same. There's nothing to prevent
the JLS from delaying actions in some horrid way (w.r.t. "subsequent" and
thus visibility), but we would surely file a bug if it were not extremely
quick on a single-node machine, and appropriately quick on a distributed
JVM.

On Aug 17, 2012, at 9:24 PM, "Boehm, Hans" <***@hp.com> wrote:

My reading is that “subsequent” in chapter 17 of the spec refers to
synchronization order, except in one case in which it refers to a
hypothetical sequentially consistent memory model.



I agree that the originally described behavior is allowed. There’s
another complication here that can explain this behavior (as can delayed
stores). AFAIK, the spec does not clearly require that “wall clock time”
is perfectly synchronized across threads, and implementations may, for good
reason, fail to ensure that, particularly in this case. We had this
discussion in connection with C++11, and that spec does require that a
specific clock is consistent with happens-before. But even that spec,
which is strict enough to raise some controversy, isn’t sufficient here,
since there is no happens-before relation between the threads.



Hans



*From:* concurrency-interest-***@cs.oswego.edu [mailto:
concurrency-interest-***@cs.oswego.edu] *On Behalf Of *David Holmes
*Sent:* Friday, August 17, 2012 5:03 PM
*To:* Yuval Shavit; Vitaly Davidovich
*Cc:* concurrency-***@cs.oswego.edu
*Subject:* Re: [concurrency-interest] Relativity of guarantees provided by
volatile



Ah I knew there was something about "immediate" in there somewhere :) So if
instructions were immediate (ie they completed as soon as they started)
then wall clock observations would be consistent with actual executions.
But on real systems instructions are not immediate so you have to take the
completion of the instruction as the point at which to make a wall clock
observation.



Aside: as well as being atomic and immediate you also need to preclude two
instructions from executing simultaneously :)



David (now departing to enjoy the weekend :) )

-----Original Message-----
*From:* Yuval Shavit [mailto:***@akiban.com]
*Sent:* Saturday, 18 August 2012 9:49 AM
*To:* Vitaly Davidovich
*Cc:* ***@ieee.org; concurrency-***@cs.oswego.edu
*Subject:* Re: [concurrency-interest] Relativity of guarantees provided by
volatile

(Resending -- I'd accidentally sent this only to Nathan)

So, Nathan, you're essentially saying that "subsequent" is defined in terms
of clock time. If we take that as the definition, then I agree that it
works as we'd all expect. Furthermore, I agree that that's the intuitive
definition, as well as the de-facto one. But it doesn't seem to be the one
explicitly defined by the JLS. (And if it is the one implicitly defined by
the JLS, I there's a serious bug waiting for any distributed-node JVM whose
nodes are traveling away from each other at relativistic speeds! ... though
I'm no physics expert.)



That said, 17.4.3 does imply that the reads will be viewable in a
wall-clock-sequential way, albeit informally



Sequential consistency is a very strong guarantee that is made about
visibility and ordering in an execution of a program. Within a sequentially
consistent execution, there is a total order over all individual actions
(such as reads and writes) which is consistent with the order of the
program, and each individual action is atomic and is immediately visible to
every thread.



(emphasis on "is immediately visible")



On Fri, Aug 17, 2012 at 7:41 PM, Vitaly Davidovich <***@gmail.com>
wrote:

David and you may be right in the theoretical aspect. In practice, I can't
fathom how a JVM can do this type of analysis. That's an issue that I have
with JMM's wording of happens-before -- it doesn't translate to reality, it
seems like.

Sent from my phone

On Aug 17, 2012 7:30 PM, "Yuval Shavit" <***@akiban.com> wrote:

Sure, but it could decide that the execution order is [w1, w2, w3, r]. In
fact, as far as we know, the thread may have been scheduled such that that
was the clock-time ordering, too.

On Fri, Aug 17, 2012 at 7:21 PM, Vitaly Davidovich <***@gmail.com>
wrote:

Two volatile writes emit a store-store barrier in between, which to me
means they cannot be collapsed and must be made visible in that order (on
non-TSO this would require a h/w fence). In other words, I don't think
compiler can remove the redundant stores as if this was a non-volatile
field, where it's a perfectly valid (and good) optimization.

Sent from my phone

On Aug 17, 2012 7:18 PM, "David Holmes" <***@aapt.net.au> wrote:

How does it violate the JMM? There is nothing to establish that any read
has to have occurred prior to w=3. An external observer may say "hey if
we'd actually written w=1 at this point then the read would see 1" but that
is irrelevant. The program can not tell the other writes did not occur.



David

-----Original Message-----
*From:* Vitaly Davidovich [mailto:***@gmail.com]
*Sent:* Saturday, 18 August 2012 9:12 AM
*To:* ***@ieee.org
*Cc:* Marko Topolnik; concurrency-***@cs.oswego.edu
*Subject:* Re: [concurrency-interest] Relativity of guarantees provided by
volatile

I don't think the writes to w can be reduced to just the last one as it
would violate the JMM. R may only see the last one due to interleaving
though. Not sure if that's what you meant.

Sent from my phone

On Aug 17, 2012 7:03 PM, "David Holmes" <***@aapt.net.au> wrote:

Hi Marko,

I think the "surprise" is only in the way you formulated this. Said another
way a write takes a finite amount of time from when the instruction starts
to execute to when the store is actually available for a read to see.
(Similarly a read takes a finite amount of time.) So depending on those two
times a read and write that happen "around the same time" may appear to have
occurred in either order. But when you program with threads you never know
the relative interleavings (or should never assume) so it makes no
difference how the program perceives the order compared to how some external
observer might perceive it.

As for your optimization to "chunk" volatile writes, I don't see a problem
here if you are basically asking if given:

w = 1; // w is volatile
w = 2;
w = 3;

that this could be reduced to the last write alone? I see no reason why not.
Without some additional coordination between a reader thread and the writer
thread, reading w==3 is a legitimate outcome. If you are thinking about how
the hardware might chunk things then that is a different matter. We have to
use the hardware in a way that complies with the memory model - if the
hardware can't comply then you can't run Java on it.

David Holmes
------------
-----Original Message-----
Topolnik
Sent: Saturday, 18 August 2012 7:24 AM
Subject: [concurrency-interest] Relativity of guarantees provided by
volatile
Consider the following synchronization order of a program
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each
time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it
writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no
contradiction between the synchronization order and the
wall-clock times, as the JMM is wall-clock agnostic. However, I
have yet to meet a single Java professional who wouldn't at least
be very surprised to hear that the specification allows this.
I understand that the SMP architecture that dominates the world
of computing today practically never takes these liberties and
makes the volatile writes visible almost instantaneously. This
may change at any time, however, especially with the advent of
massively parrallel architectures that seem to be the future. For
example, an optimization technique may choose to chunk many
volatile writes together and make them visible in a single bulk
operation. This can be safely done as long as there are no
intervening read-y actions (targets of the synchronizes-with
edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about,
given that practically 100% developers out there consider as
guaranteed something that isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Marko Topolnik
2012-08-18 07:35:12 UTC
Permalink
In this e-mail I'm chunking my responses to several people due to the timezone differences... actually an excellent example to demonstrate my point.

I claim to have woken up ten minutes ago and read all the replies in this thread. I immediately started writing up my answer and will soon send it.

I could also have been following the thread through the night, writing up replies to several people, but in the end decided to chunk them together in a single post. Nobody can prove it, though.

In the end, all your posts happen-before this reply, but in the wall-clock sense parts of my reply may have been written 4 hours ago. Perhaps I just delayed its visibility to you until now.


Nathan, the JLS is very careful to define what it means by "subsequent", for example 17.4.4: "where 'subsequent' is defined according to the synchronization order". Basically, if you take any real execution, note the wall-clock ordering of actions, and deduce the synchronization ordering from it, you can hold that sync order fixed and construct executions with wall-clock times shifted any way you like and the JLS will not give you a frown. Even the physical logic of cause-precedes-effect can be broken, the JLS won't care as long as there is a single, consistent synchronization order to it all.

Vitaly, a JVM could easily make sure that chunking is valid at least in the simple case where it can prove there's a section of code that makes a sequence of write-y actions with no intervening read-y action. I define write-y and read-y as the origin, respectively target, of a synchronizes-with edge as defined by the JLS. Of course, if you imagine a very specific hardware architecture, this may turn out not to be practical or lead to any performance advantages. But this discussion is not as specific as that.

Yuval, the existing semantics of volatile are very useful as they are, and when you add the informal promise of best effort towards timeliness, it's all you can realistically demand from an implementation. However, what is troubling is the belief of practically every developer out there that there's a hard realtime GUARANTEE of the instantaneous visibility of volatile writes. The architectures of today prove them right, but if that changes, for all we know many existing programs may start to break. See by analogy what happened when early Java developers were assuming guarantees of in-order observance of inter-thread actions, where they were suddenly proven wrong by the next-generation, aggressively optimized JVMs. In fact, a great many devs even today assume such guarantees and keep writing faulty code.

For reference, you can also have a look at my answer on StackOverflow containing several complete Java examples that help make this discussion more concrete:

http://stackoverflow.com/a/11765634/1103872


-Marko
I was thinking about this on my way home, and I think Marko's initial proposition is correct -- but that we shouldn't concern ourselves with it.
I still think it comes down to the meaning of "subsequent," but I think wall-clock is not a good definition. In a multi-core machine, it's conceivable that two actions would happen close enough that it's impossible to tell which one happened first (by wall clock time). Forcing a consensus about which came first would be expensive, and it would probably be prohibitive on a many-core machine -- let alone a multi-node JVM (I was only partially joking about that before).
So, what happens if we have some loose definition of "subsequent," probably with transitivity and perhaps some other basic properties? If we do that, then Marko's initial proposition holds. For instance, the JVM would be allowed to have a while(someVolatileBool) thread go on forever, having been determined to be subsequent all other actions.
But there are lots of things which the JVM is allowed to take its time on, but shouldn't. I haven't read anything in the JLS that requires "int i = 1" to complete before the heat death of the universe, and yet we would surely file a bug if it were not nearly instantaneous.
I would argue that subsequent-ness is the same. There's nothing to prevent the JLS from delaying actions in some horrid way (w.r.t. "subsequent" and thus visibility), but we would surely file a bug if it were not extremely quick on a single-node machine, and appropriately quick on a distributed JVM.
My reading is that “subsequent” in chapter 17 of the spec refers to synchronization order, except in one case in which it refers to a hypothetical sequentially consistent memory model.
I agree that the originally described behavior is allowed. There’s another complication here that can explain this behavior (as can delayed stores). AFAIK, the spec does not clearly require that “wall clock time” is perfectly synchronized across threads, and implementations may, for good reason, fail to ensure that, particularly in this case. We had this discussion in connection with C++11, and that spec does require that a specific clock is consistent with happens-before. But even that spec, which is strict enough to raise some controversy, isn’t sufficient here, since there is no happens-before relation between the threads.
Hans
Sent: Friday, August 17, 2012 5:03 PM
To: Yuval Shavit; Vitaly Davidovich
Subject: Re: [concurrency-interest] Relativity of guarantees provided by volatile
Ah I knew there was something about "immediate" in there somewhere :) So if instructions were immediate (ie they completed as soon as they started) then wall clock observations would be consistent with actual executions. But on real systems instructions are not immediate so you have to take the completion of the instruction as the point at which to make a wall clock observation.
Aside: as well as being atomic and immediate you also need to preclude two instructions from executing simultaneously :)
David (now departing to enjoy the weekend :) )
-----Original Message-----
Sent: Saturday, 18 August 2012 9:49 AM
To: Vitaly Davidovich
Subject: Re: [concurrency-interest] Relativity of guarantees provided by volatile
(Resending -- I'd accidentally sent this only to Nathan)
So, Nathan, you're essentially saying that "subsequent" is defined in terms of clock time. If we take that as the definition, then I agree that it works as we'd all expect. Furthermore, I agree that that's the intuitive definition, as well as the de-facto one. But it doesn't seem to be the one explicitly defined by the JLS. (And if it is the one implicitly defined by the JLS, I there's a serious bug waiting for any distributed-node JVM whose nodes are traveling away from each other at relativistic speeds! ... though I'm no physics expert.)
That said, 17.4.3 does imply that the reads will be viewable in a wall-clock-sequential way, albeit informally
Sequential consistency is a very strong guarantee that is made about visibility and ordering in an execution of a program. Within a sequentially consistent execution, there is a total order over all individual actions (such as reads and writes) which is consistent with the order of the program, and each individual action is atomic and is immediately visible to every thread.
(emphasis on "is immediately visible")
David and you may be right in the theoretical aspect. In practice, I can't fathom how a JVM can do this type of analysis. That's an issue that I have with JMM's wording of happens-before -- it doesn't translate to reality, it seems like.
Sent from my phone
Sure, but it could decide that the execution order is [w1, w2, w3, r]. In fact, as far as we know, the thread may have been scheduled such that that was the clock-time ordering, too.
Two volatile writes emit a store-store barrier in between, which to me means they cannot be collapsed and must be made visible in that order (on non-TSO this would require a h/w fence). In other words, I don't think compiler can remove the redundant stores as if this was a non-volatile field, where it's a perfectly valid (and good) optimization.
Sent from my phone
How does it violate the JMM? There is nothing to establish that any read has to have occurred prior to w=3. An external observer may say "hey if we'd actually written w=1 at this point then the read would see 1" but that is irrelevant. The program can not tell the other writes did not occur.
David
-----Original Message-----
Sent: Saturday, 18 August 2012 9:12 AM
Subject: Re: [concurrency-interest] Relativity of guarantees provided by volatile
I don't think the writes to w can be reduced to just the last one as it would violate the JMM. R may only see the last one due to interleaving though. Not sure if that's what you meant.
Sent from my phone
Hi Marko,
I think the "surprise" is only in the way you formulated this. Said another
way a write takes a finite amount of time from when the instruction starts
to execute to when the store is actually available for a read to see.
(Similarly a read takes a finite amount of time.) So depending on those two
times a read and write that happen "around the same time" may appear to have
occurred in either order. But when you program with threads you never know
the relative interleavings (or should never assume) so it makes no
difference how the program perceives the order compared to how some external
observer might perceive it.
As for your optimization to "chunk" volatile writes, I don't see a problem
w = 1; // w is volatile
w = 2;
w = 3;
that this could be reduced to the last write alone? I see no reason why not.
Without some additional coordination between a reader thread and the writer
thread, reading w==3 is a legitimate outcome. If you are thinking about how
the hardware might chunk things then that is a different matter. We have to
use the hardware in a way that complies with the memory model - if the
hardware can't comply then you can't run Java on it.
David Holmes
------------
-----Original Message-----
Topolnik
Sent: Saturday, 18 August 2012 7:24 AM
Subject: [concurrency-interest] Relativity of guarantees provided by
volatile
Consider the following synchronization order of a program
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each
time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it
writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no
contradiction between the synchronization order and the
wall-clock times, as the JMM is wall-clock agnostic. However, I
have yet to meet a single Java professional who wouldn't at least
be very surprised to hear that the specification allows this.
I understand that the SMP architecture that dominates the world
of computing today practically never takes these liberties and
makes the volatile writes visible almost instantaneously. This
may change at any time, however, especially with the advent of
massively parrallel architectures that seem to be the future. For
example, an optimization technique may choose to chunk many
volatile writes together and make them visible in a single bulk
operation. This can be safely done as long as there are no
intervening read-y actions (targets of the synchronizes-with
edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about,
given that practically 100% developers out there consider as
guaranteed something that isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Yuval Shavit
2012-08-18 08:24:12 UTC
Permalink
Post by Marko Topolnik
However, what is troubling is the belief of practically every developer
out there that there's a hard realtime GUARANTEE of the instantaneous
visibility of volatile writes.
Do they? I certainly believe that the read is going to see the write very
quickly, but a hard, realtime guarantee? Between the JIT, GC and other apps
that may hog CPU, I don't even depend on a realtime guarantee between "int
i = 5" and "i++".

With no evidence to back my claim, I suspect that most uses of volatiles
are for simple isDone checks or double-checked locking. The standard
patterns for both of those don't depend on a hard, realtime guarantee of
visibility. They both depend on the write being visible very quickly for
optimal performance (in the isDone case, for responsiveness of the cancel;
and in the DCL case, so that we don't need to lock the monitor), but
neither depends on it for correctness.

If my baseless claim is right, then when we have a zillion cores per
computer in a few years, I think you might see cancellations taking a few
milliseconds longer, and DCLs will needlessly grab a few more monitors. But
not many correctness issues.

There are of course more complex uses of volatiles, such as happens-before
piggybacking for lock-free algorithms. Those are easy to get wrong even
without delays to read actions, and hopefully the people who use volatiles
for those are very advanced programmers who know that "instantaneous
guarantee" and "multithreaded correctness" don't work well together. I
don't think you'd be able to exploit this so that two threads can both
remove the same element from a correct, lockless queue -- that would create
an inconsistent ordering where one is not allowed. And if the delay in
sequential ordering means that the queue looks empty forever to a sole
consumer thread, I would say we'd be within our rights to log a bug, JLS be
damned -- just as we would be if "int i = 5" took forever.
Marko Topolnik
2012-08-18 08:44:44 UTC
Permalink
Post by Marko Topolnik
However, what is troubling is the belief of practically every developer out there that there's a hard realtime GUARANTEE of the instantaneous visibility of volatile writes.
Do they? I certainly believe that the read is going to see the write very quickly, but a hard, realtime guarantee? Between the JIT, GC and other apps that may hog CPU, I don't even depend on a realtime guarantee between "int i = 5" and "i++".
Yes, my wording was too strong. I didn't mean "hard realtime", but a hard guarantee, as opposed to a soft promise of best effort. The volatile modifier gives you two things: a hard guarantee of proper observed ordering of actions, and a *hint* towards the timely publishing of a write. This is where confusion enters---take this typical argument: "Without the volatile modifier the reading thread is not guaranteed to ever observe a write to the var". Well guess what, with the modifier it still isn't *guaranteed* to ever observe it. This fact is very counterintuitive and many people would even religiously oppose it. I cannot escape the troubling feeling this gives me---a developer should have the right intuition about his code and shouldn't shake his head in disbelief when shown any piece of code and the output that code may legally produce. Somewhere down the line this must matter.

-Marko
Yuval Shavit
2012-08-18 09:10:57 UTC
Permalink
So, I agree with you that people's wording assumes some sort of timely
visibility. And you're right that the reader isn't *guaranteed* to ever
observe the write, in the same way that I'm right that ++i on a local var
isn't *guaranteed* to take fewer than 7 years. I think a JVM which
exploited either non-guarantee would be up for a critical-importance
performance bug.

I also agree that the expectation of blink-of-an-eye coherency is one that
could well be challenged in the future. Taking my example of a
distributed-node JVM, if one node is in Austrialia and the other is in NYC,
and the volatile read is in a lockless whose HB edge also pulls in all the
data in some huge element in the queue (an in-memory cache of a DB table,
say) -- then I could imagine the reader chugging along for a few seconds
pretending not to see the write, until finally it and all its implications
are sync'ed across the network, at which point it finally sees the write
and everything implied by it (ie, the huge element). That may well surprise
some programmers and generate bug reports like "why is my write to a
one-byte boolean taking 15 seconds to be seen?" -- and when those questions
crop up on SO, people on this list are going to reap some major karma
points. ;-)

In summary:
1) I agree that there's no guarantee for the write to ever be seen
2) I agree that in the future, it may take some noticeable time for the
write to be seen
3) I contend that if point 1 were exploited to the point of the write
not being seen in a reasonable timeframe (where "reasonable" depends on the
platform, includes some wiggle room, etc), that this would be a performance
(not correctness) bug against the JVM

Good night!
Post by Marko Topolnik
Post by Marko Topolnik
However, what is troubling is the belief of practically every developer
out there that there's a hard realtime GUARANTEE of the instantaneous
visibility of volatile writes.
Post by Marko Topolnik
Do they? I certainly believe that the read is going to see the write
very quickly, but a hard, realtime guarantee? Between the JIT, GC and other
apps that may hog CPU, I don't even depend on a realtime guarantee between
"int i = 5" and "i++".
Yes, my wording was too strong. I didn't mean "hard realtime", but a hard
guarantee, as opposed to a soft promise of best effort. The volatile
modifier gives you two things: a hard guarantee of proper observed ordering
of actions, and a *hint* towards the timely publishing of a write. This is
where confusion enters---take this typical argument: "Without the volatile
modifier the reading thread is not guaranteed to ever observe a write to
the var". Well guess what, with the modifier it still isn't *guaranteed* to
ever observe it. This fact is very counterintuitive and many people would
even religiously oppose it. I cannot escape the troubling feeling this
gives me---a developer should have the right intuition about his code and
shouldn't shake his head in disbelief when shown any piece of code and the
output that code may legally produce. Somewhere down the line this must
matter.
-Marko
Zhong Yu
2012-08-18 20:38:07 UTC
Permalink
My reading is that “subsequent” in chapter 17 of the spec refers to
synchronization order, except in one case in which it refers to a
hypothetical sequentially consistent memory model.
I agree that the originally described behavior is allowed. There’s another
complication here that can explain this behavior (as can delayed stores).
AFAIK, the spec does not clearly require that “wall clock time” is
perfectly synchronized across threads, and implementations may, for good
reason, fail to ensure that, particularly in this case. We had this
discussion in connection with C++11, and that spec does require that a
specific clock is consistent with happens-before. But even that spec, which
is strict enough to raise some controversy, isn’t sufficient here, since
there is no happens-before relation between the threads.
It seems imperative that, on any physical model of JMM, among
synchronization actions, if a happens-before b, then there's a causal
relationship from a to b, and it's necessary that
a.startTime<b.endTime (on any observer's clock). In that sense, the
term happens-before is literal after all?

In OP's example, there are so(r_i, w_j), but there's no hb(r_i, w_j),
therefore there's no temporal constraints, w_j can occur before r_i.
If there were something that established hb(r_i, w_j), then the
example is impossible under JMM.

Interestingly, JMM also defines r_i and w_j to be in a data race, even
though they are all volatile.

Could it be true that, if an execution is free of data race, it is
temporally consistent, i.e. its synchronization order is consistent
with temporal order.
Hans
Holmes
Sent: Friday, August 17, 2012 5:03 PM
To: Yuval Shavit; Vitaly Davidovich
Subject: Re: [concurrency-interest] Relativity of guarantees provided by volatile
Ah I knew there was something about "immediate" in there somewhere :) So if
instructions were immediate (ie they completed as soon as they started) then
wall clock observations would be consistent with actual executions. But on
real systems instructions are not immediate so you have to take the
completion of the instruction as the point at which to make a wall clock
observation.
Aside: as well as being atomic and immediate you also need to preclude two
instructions from executing simultaneously :)
David (now departing to enjoy the weekend :) )
-----Original Message-----
Sent: Saturday, 18 August 2012 9:49 AM
To: Vitaly Davidovich
Subject: Re: [concurrency-interest] Relativity of guarantees provided by volatile
(Resending -- I'd accidentally sent this only to Nathan)
So, Nathan, you're essentially saying that "subsequent" is defined in terms
of clock time. If we take that as the definition, then I agree that it works
as we'd all expect. Furthermore, I agree that that's the intuitive
definition, as well as the de-facto one. But it doesn't seem to be the one
explicitly defined by the JLS. (And if it is the one implicitly defined by
the JLS, I there's a serious bug waiting for any distributed-node JVM whose
nodes are traveling away from each other at relativistic speeds! ... though
I'm no physics expert.)
That said, 17.4.3 does imply that the reads will be viewable in a
wall-clock-sequential way, albeit informally
Sequential consistency is a very strong guarantee that is made about
visibility and ordering in an execution of a program. Within a sequentially
consistent execution, there is a total order over all individual actions
(such as reads and writes) which is consistent with the order of the
program, and each individual action is atomic and is immediately visible to
every thread.
(emphasis on "is immediately visible")
David and you may be right in the theoretical aspect. In practice, I can't
fathom how a JVM can do this type of analysis. That's an issue that I have
with JMM's wording of happens-before -- it doesn't translate to reality, it
seems like.
Sent from my phone
Sure, but it could decide that the execution order is [w1, w2, w3, r]. In
fact, as far as we know, the thread may have been scheduled such that that
was the clock-time ordering, too.
Two volatile writes emit a store-store barrier in between, which to me means
they cannot be collapsed and must be made visible in that order (on non-TSO
this would require a h/w fence). In other words, I don't think compiler can
remove the redundant stores as if this was a non-volatile field, where it's
a perfectly valid (and good) optimization.
Sent from my phone
How does it violate the JMM? There is nothing to establish that any read has
to have occurred prior to w=3. An external observer may say "hey if we'd
actually written w=1 at this point then the read would see 1" but that is
irrelevant. The program can not tell the other writes did not occur.
David
-----Original Message-----
Sent: Saturday, 18 August 2012 9:12 AM
Subject: Re: [concurrency-interest] Relativity of guarantees provided by volatile
I don't think the writes to w can be reduced to just the last one as it
would violate the JMM. R may only see the last one due to interleaving
though. Not sure if that's what you meant.
Sent from my phone
Hi Marko,
I think the "surprise" is only in the way you formulated this. Said another
way a write takes a finite amount of time from when the instruction starts
to execute to when the store is actually available for a read to see.
(Similarly a read takes a finite amount of time.) So depending on those two
times a read and write that happen "around the same time" may appear to have
occurred in either order. But when you program with threads you never know
the relative interleavings (or should never assume) so it makes no
difference how the program perceives the order compared to how some external
observer might perceive it.
As for your optimization to "chunk" volatile writes, I don't see a problem
w = 1; // w is volatile
w = 2;
w = 3;
that this could be reduced to the last write alone? I see no reason why not.
Without some additional coordination between a reader thread and the writer
thread, reading w==3 is a legitimate outcome. If you are thinking about how
the hardware might chunk things then that is a different matter. We have to
use the hardware in a way that complies with the memory model - if the
hardware can't comply then you can't run Java on it.
David Holmes
------------
-----Original Message-----
Topolnik
Sent: Saturday, 18 August 2012 7:24 AM
Subject: [concurrency-interest] Relativity of guarantees provided by
volatile
Consider the following synchronization order of a program
- thread R begins;
- thread R reads a volatile int sharedVar several times. Each
time it reads the value 0;
- thread R completes;
- thread W begins;
- thread W writes the sharedVar several times. Each time it
writes the value 1;
- thread W completes.
- thread R reads 0 at t = {1, 4, 7, 10};
- thread W writes 1 at t = {0, 3, 6, 9}.
As far as the Java Memory Model is concerned, there is no
contradiction between the synchronization order and the
wall-clock times, as the JMM is wall-clock agnostic. However, I
have yet to meet a single Java professional who wouldn't at least
be very surprised to hear that the specification allows this.
I understand that the SMP architecture that dominates the world
of computing today practically never takes these liberties and
makes the volatile writes visible almost instantaneously. This
may change at any time, however, especially with the advent of
massively parrallel architectures that seem to be the future. For
example, an optimization technique may choose to chunk many
volatile writes together and make them visible in a single bulk
operation. This can be safely done as long as there are no
intervening read-y actions (targets of the synchronizes-with
edges as defined by JLS/JSE7 17.4.4).
1. Is there a loophole in my reasoning?
2. If there is no loophole, is there anything to worry about,
given that practically 100% developers out there consider as
guaranteed something that isn't?
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Marko Topolnik
2012-08-18 21:50:52 UTC
Permalink
Post by Zhong Yu
It seems imperative that, on any physical model of JMM, among
synchronization actions, if a happens-before b, then there's a causal
relationship from a to b, and it's necessary that
a.startTime<b.endTime (on any observer's clock). In that sense, the
term happens-before is literal after all?
Although admittedly exotic, theoretically the JVM might _predict_ what a future action will write and let that value be read by another thread ahead of time, creating a happens-before that travels back through time without disrupting the JMM's stipulations.
Post by Zhong Yu
In OP's example, there are so(r_i, w_j), but there's no hb(r_i, w_j),
therefore there's no temporal constraints, w_j can occur before r_i.
If there were something that established hb(r_i, w_j), then the
example is impossible under JMM.
Interestingly, JMM also defines r_i and w_j to be in a data race, even
though they are all volatile.
This a very nice catch. Isn't in fact any read in a data race with all writes by another thread following the write it observed? That would make it impossible to write a program free of data races. In other words this sounds like a loophole in the definition of a data race.
Post by Zhong Yu
Could it be true that, if an execution is free of data race, it is
temporally consistent, i.e. its synchronization order is consistent
with temporal order.
I think that any synchronization order is temporally consistent, save for perversions like the one I mention above. Whether there are also data races shouldn't matter as they are not included in the synchronization order, anyway.

-Marko
Zhong Yu
2012-08-18 22:23:11 UTC
Permalink
Post by Marko Topolnik
Post by Zhong Yu
It seems imperative that, on any physical model of JMM, among
synchronization actions, if a happens-before b, then there's a causal
relationship from a to b, and it's necessary that
a.startTime<b.endTime (on any observer's clock). In that sense, the
term happens-before is literal after all?
Although admittedly exotic, theoretically the JVM might _predict_ what a future action will write and let that value be read by another thread ahead of time, creating a happens-before that travels back through time without disrupting the JMM's stipulations.
This is true. Probably more constraints on the physical model are
needed, e.g. a thread has no knowledge about the global program.
Post by Marko Topolnik
Post by Zhong Yu
In OP's example, there are so(r_i, w_j), but there's no hb(r_i, w_j),
therefore there's no temporal constraints, w_j can occur before r_i.
If there were something that established hb(r_i, w_j), then the
example is impossible under JMM.
Interestingly, JMM also defines r_i and w_j to be in a data race, even
though they are all volatile.
This a very nice catch. Isn't in fact any read in a data race with all writes by another thread following the write it observed? That would make it impossible to write a program free of data races. In other words this sounds like a loophole in the definition of a data race.
Post by Zhong Yu
Could it be true that, if an execution is free of data race, it is
temporally consistent, i.e. its synchronization order is consistent
with temporal order.
I think that any synchronization order is temporally consistent, save for perversions like the one I mention above. Whether there are also data races shouldn't matter as they are not included in the synchronization order, anyway.
-Marko
Boehm, Hans
2012-08-20 18:53:12 UTC
Permalink
Post by Marko Topolnik
Post by Zhong Yu
Interestingly, JMM also defines r_i and w_j to be in a data race,
even
Post by Zhong Yu
though they are all volatile.
This a very nice catch. Isn't in fact any read in a data race with all
writes by another thread following the write it observed? That would
make it impossible to write a program free of data races. In other
words this sounds like a loophole in the definition of a data race.
That's a bug in the wording of the spec. I have a vague recollection that it was pointed out before a couple of years ago, possibly even on this list. It's one of these things that should really be fixed, but is waiting for a resolution of the harder Java memory model issues.

Hans
David Holmes
2012-08-18 01:55:03 UTC
Permalink
As I keep saying, for this to "make sense" you have to make temporal
measurements when an action completes.

David
-----Original Message-----
Sent: Saturday, 18 August 2012 11:34 AM
To: Yuval Shavit
Subject: Re: [concurrency-interest] Relativity of guarantees provided
byvolatile
Each thread is a person Tx.
There's a person V managing all variables.
To make a write, Tx sends a paper mail to V. No return mail is
required, therefore a write is very short.
To make a read, Tx sends a paper mail to V, and waits for return mail.
The synchronization order is the order the mails received by V.
This seems to be a valid JMM model.
--
Now suppose thread R is very close to V, therefor reads are also very
short. (it's easier to talk about temporal order between short
actions) Suppose thread W is very far away from V.
To realize OP's example, consider the numbers are in hours, and route
W -> V takes 48 hours.
On Monday, W writes v=1, it reaches V on Wednesday. On Tuesday R reads
v=0. So the write is after read in sync order, yet write is before
read in temporal order(even considering relativity - all persons are
on Earth)
Therefore sync order doesn't have to be consistent with temporal order.
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Zhong Yu
2012-08-18 02:20:31 UTC
Permalink
In the model, a write completes as soon as the mail is sent, the same
thread can immediately make another write. A read completes when the
return mail is received; the reading thread is suspended during the
wait.

The example is designed so that reads are short too, so we can treat
actions as points in time, to order them easily.
Post by David Holmes
As I keep saying, for this to "make sense" you have to make temporal
measurements when an action completes.
David
-----Original Message-----
Sent: Saturday, 18 August 2012 11:34 AM
To: Yuval Shavit
Subject: Re: [concurrency-interest] Relativity of guarantees provided
byvolatile
Each thread is a person Tx.
There's a person V managing all variables.
To make a write, Tx sends a paper mail to V. No return mail is
required, therefore a write is very short.
To make a read, Tx sends a paper mail to V, and waits for return mail.
The synchronization order is the order the mails received by V.
This seems to be a valid JMM model.
--
Now suppose thread R is very close to V, therefor reads are also very
short. (it's easier to talk about temporal order between short
actions) Suppose thread W is very far away from V.
To realize OP's example, consider the numbers are in hours, and route
W -> V takes 48 hours.
On Monday, W writes v=1, it reaches V on Wednesday. On Tuesday R reads
v=0. So the write is after read in sync order, yet write is before
read in temporal order(even considering relativity - all persons are
on Earth)
Therefore sync order doesn't have to be consistent with temporal order.
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
David Holmes
2012-08-18 02:29:39 UTC
Permalink
Well that's your choice, but it makes more sense temporally to consider
completetion when the actual result of the action is available, in my
opinion.

David
-----Original Message-----
Sent: Saturday, 18 August 2012 12:21 PM
Subject: Re: [concurrency-interest] Relativity of guarantees provided
byvolatile
In the model, a write completes as soon as the mail is sent, the same
thread can immediately make another write. A read completes when the
return mail is received; the reading thread is suspended during the
wait.
The example is designed so that reads are short too, so we can treat
actions as points in time, to order them easily.
On Fri, Aug 17, 2012 at 8:55 PM, David Holmes
Post by David Holmes
As I keep saying, for this to "make sense" you have to make temporal
measurements when an action completes.
David
-----Original Message-----
Of Zhong Yu
Post by David Holmes
Sent: Saturday, 18 August 2012 11:34 AM
To: Yuval Shavit
Subject: Re: [concurrency-interest] Relativity of guarantees provided
byvolatile
Each thread is a person Tx.
There's a person V managing all variables.
To make a write, Tx sends a paper mail to V. No return mail is
required, therefore a write is very short.
To make a read, Tx sends a paper mail to V, and waits for return mail.
The synchronization order is the order the mails received by V.
This seems to be a valid JMM model.
--
Now suppose thread R is very close to V, therefor reads are also very
short. (it's easier to talk about temporal order between short
actions) Suppose thread W is very far away from V.
To realize OP's example, consider the numbers are in hours, and route
W -> V takes 48 hours.
On Monday, W writes v=1, it reaches V on Wednesday. On Tuesday R reads
v=0. So the write is after read in sync order, yet write is before
read in temporal order(even considering relativity - all persons are
on Earth)
Therefore sync order doesn't have to be consistent with temporal order.
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Zhong Yu
2012-08-18 04:41:53 UTC
Permalink
Post by David Holmes
Well that's your choice, but it makes more sense temporally to consider
completetion when the actual result of the action is available, in my
opinion.
We can consider another model:

A write must wait for confirmation mail from V.

V receives a write Wi, updates its state to Vi, sends mails containing
Vi to all threads. The mail is also the confirmation mail to the
writer.

Threads locally cache the last Vi it receives. A read checks the
locally cached Vi only, therefore reads are very short.

Sync order can be defined mainly based on the 'i's of actions. This
also appears to be a valid JMM model.

Then we move V close to W, far away from R. On day#1, W writes v=1 to
V. That change reaches R on day#3. On day#2 R reads v=0. Once again,
we have read<write in sync order, yet write<read in temporal order.
Post by David Holmes
David
-----Original Message-----
Sent: Saturday, 18 August 2012 12:21 PM
Subject: Re: [concurrency-interest] Relativity of guarantees provided
byvolatile
In the model, a write completes as soon as the mail is sent, the same
thread can immediately make another write. A read completes when the
return mail is received; the reading thread is suspended during the
wait.
The example is designed so that reads are short too, so we can treat
actions as points in time, to order them easily.
On Fri, Aug 17, 2012 at 8:55 PM, David Holmes
Post by David Holmes
As I keep saying, for this to "make sense" you have to make temporal
measurements when an action completes.
David
-----Original Message-----
Of Zhong Yu
Post by David Holmes
Sent: Saturday, 18 August 2012 11:34 AM
To: Yuval Shavit
Subject: Re: [concurrency-interest] Relativity of guarantees provided
byvolatile
Each thread is a person Tx.
There's a person V managing all variables.
To make a write, Tx sends a paper mail to V. No return mail is
required, therefore a write is very short.
To make a read, Tx sends a paper mail to V, and waits for return mail.
The synchronization order is the order the mails received by V.
This seems to be a valid JMM model.
--
Now suppose thread R is very close to V, therefor reads are also very
short. (it's easier to talk about temporal order between short
actions) Suppose thread W is very far away from V.
To realize OP's example, consider the numbers are in hours, and route
W -> V takes 48 hours.
On Monday, W writes v=1, it reaches V on Wednesday. On Tuesday R reads
v=0. So the write is after read in sync order, yet write is before
read in temporal order(even considering relativity - all persons are
on Earth)
Therefore sync order doesn't have to be consistent with temporal order.
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Yuval Shavit
2012-08-18 05:00:23 UTC
Permalink
Post by Zhong Yu
Post by David Holmes
Well that's your choice, but it makes more sense temporally to consider
completetion when the actual result of the action is available, in my
opinion.
A write must wait for confirmation mail from V.
V receives a write Wi, updates its state to Vi, sends mails containing
Vi to all threads. The mail is also the confirmation mail to the
writer.
Threads locally cache the last Vi it receives. A read checks the
locally cached Vi only, therefore reads are very short.
Sync order can be defined mainly based on the 'i's of actions. This
also appears to be a valid JMM model.
Then we move V close to W, far away from R. On day#1, W writes v=1 to
V. That change reaches R on day#3. On day#2 R reads v=0. Once again,
we have read<write in sync order, yet write<read in temporal order.
Except that in this metaphor, if v is volatile, wouldn't R's read of v
on day 2 require sending V a note requesting v's value -- at which
point V know about the letter from W on day 1?

This model has V being the sole timekeeper. An action A' is subsequent
to another action A iff V processes A before A'. V seems like quite
the bottleneck here. In particular, two sets of actions, each with HB
relations within them but with no HBs between them, will both have to
synchronize on the same V.
Post by Zhong Yu
Post by David Holmes
David
-----Original Message-----
Sent: Saturday, 18 August 2012 12:21 PM
Subject: Re: [concurrency-interest] Relativity of guarantees provided
byvolatile
In the model, a write completes as soon as the mail is sent, the same
thread can immediately make another write. A read completes when the
return mail is received; the reading thread is suspended during the
wait.
The example is designed so that reads are short too, so we can treat
actions as points in time, to order them easily.
On Fri, Aug 17, 2012 at 8:55 PM, David Holmes
Post by David Holmes
As I keep saying, for this to "make sense" you have to make temporal
measurements when an action completes.
David
-----Original Message-----
Of Zhong Yu
Post by David Holmes
Sent: Saturday, 18 August 2012 11:34 AM
To: Yuval Shavit
Subject: Re: [concurrency-interest] Relativity of guarantees provided
byvolatile
Each thread is a person Tx.
There's a person V managing all variables.
To make a write, Tx sends a paper mail to V. No return mail is
required, therefore a write is very short.
To make a read, Tx sends a paper mail to V, and waits for return mail.
The synchronization order is the order the mails received by V.
This seems to be a valid JMM model.
--
Now suppose thread R is very close to V, therefor reads are also very
short. (it's easier to talk about temporal order between short
actions) Suppose thread W is very far away from V.
To realize OP's example, consider the numbers are in hours, and route
W -> V takes 48 hours.
On Monday, W writes v=1, it reaches V on Wednesday. On Tuesday R reads
v=0. So the write is after read in sync order, yet write is before
read in temporal order(even considering relativity - all persons are
on Earth)
Therefore sync order doesn't have to be consistent with temporal order.
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Zhong Yu
2012-08-18 16:33:25 UTC
Permalink
Post by Yuval Shavit
Post by Zhong Yu
Post by David Holmes
Well that's your choice, but it makes more sense temporally to consider
completetion when the actual result of the action is available, in my
opinion.
A write must wait for confirmation mail from V.
V receives a write Wi, updates its state to Vi, sends mails containing
Vi to all threads. The mail is also the confirmation mail to the
writer.
Threads locally cache the last Vi it receives. A read checks the
locally cached Vi only, therefore reads are very short.
Sync order can be defined mainly based on the 'i's of actions. This
also appears to be a valid JMM model.
Then we move V close to W, far away from R. On day#1, W writes v=1 to
V. That change reaches R on day#3. On day#2 R reads v=0. Once again,
we have read<write in sync order, yet write<read in temporal order.
Except that in this metaphor, if v is volatile, wouldn't R's read of v
on day 2 require sending V a note requesting v's value -- at which
point V know about the letter from W on day 1?
In my 2nd model, read only consults the local cache, so it's very fast.
Post by Yuval Shavit
This model has V being the sole timekeeper. An action A' is subsequent
to another action A iff V processes A before A'. V seems like quite
the bottleneck here. In particular, two sets of actions, each with HB
relations within them but with no HBs between them, will both have to
synchronize on the same V.
I'm not trying to design a realistic model. I'm seeking a simple
physical model that conforms to JMM and demonstrate that OP's example
can be realized on this model, therefore OP's example is legal under
JMM; and in general, sync order and temporal order can differ.

However the discrepancy becomes less and less when the message
transmission speed approaches the speed of light, so we don't really
need to worry too much about it on real computer hardware.
Post by Yuval Shavit
Post by Zhong Yu
Post by David Holmes
David
-----Original Message-----
Sent: Saturday, 18 August 2012 12:21 PM
Subject: Re: [concurrency-interest] Relativity of guarantees provided
byvolatile
In the model, a write completes as soon as the mail is sent, the same
thread can immediately make another write. A read completes when the
return mail is received; the reading thread is suspended during the
wait.
The example is designed so that reads are short too, so we can treat
actions as points in time, to order them easily.
On Fri, Aug 17, 2012 at 8:55 PM, David Holmes
Post by David Holmes
As I keep saying, for this to "make sense" you have to make temporal
measurements when an action completes.
David
-----Original Message-----
Of Zhong Yu
Post by David Holmes
Sent: Saturday, 18 August 2012 11:34 AM
To: Yuval Shavit
Subject: Re: [concurrency-interest] Relativity of guarantees provided
byvolatile
Each thread is a person Tx.
There's a person V managing all variables.
To make a write, Tx sends a paper mail to V. No return mail is
required, therefore a write is very short.
To make a read, Tx sends a paper mail to V, and waits for return mail.
The synchronization order is the order the mails received by V.
This seems to be a valid JMM model.
--
Now suppose thread R is very close to V, therefor reads are also very
short. (it's easier to talk about temporal order between short
actions) Suppose thread W is very far away from V.
To realize OP's example, consider the numbers are in hours, and route
W -> V takes 48 hours.
On Monday, W writes v=1, it reaches V on Wednesday. On Tuesday R reads
v=0. So the write is after read in sync order, yet write is before
read in temporal order(even considering relativity - all persons are
on Earth)
Therefore sync order doesn't have to be consistent with temporal order.
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
David Holmes
2012-08-18 09:27:53 UTC
Permalink
I think this is getting a little extreme and esoteric. There is a basic
requirement that the JVM implement the Java language which means that a
putfield (for example) must update the field (a volatile one of course) -
which is a memory location. Similarly when the processor is requested to
write a value to a memory location then the value must get written to
memory. There are no time guarantees expressed about these actions but they
must happen for the JVM and the processor to be deemed to be working
correctly.

The actual latencies are a quality of implementation issue.

David
-----Original Message-----
Topolnik
Sent: Saturday, 18 August 2012 6:45 PM
To: Yuval Shavit
Subject: Re: [concurrency-interest] Relativity of guarantees provided
byvolatile
On Sat, Aug 18, 2012 at 3:35 AM, Marko Topolnik
However, what is troubling is the belief of practically every
developer out there that there's a hard realtime GUARANTEE of the
instantaneous visibility of volatile writes.
Do they? I certainly believe that the read is going to see the
write very quickly, but a hard, realtime guarantee? Between the
JIT, GC and other apps that may hog CPU, I don't even depend on a
realtime guarantee between "int i = 5" and "i++".
Yes, my wording was too strong. I didn't mean "hard realtime",
but a hard guarantee, as opposed to a soft promise of best
effort. The volatile modifier gives you two things: a hard
guarantee of proper observed ordering of actions, and a *hint*
towards the timely publishing of a write. This is where confusion
enters---take this typical argument: "Without the volatile
modifier the reading thread is not guaranteed to ever observe a
write to the var". Well guess what, with the modifier it still
isn't *guaranteed* to ever observe it. This fact is very
counterintuitive and many people would even religiously oppose
it. I cannot escape the troubling feeling this gives me---a
developer should have the right intuition about his code and
shouldn't shake his head in disbelief when shown any piece of
code and the output that code may legally produce. Somewhere down
the line this must matter.
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Marko Topolnik
2012-08-18 10:48:07 UTC
Permalink
Yes, we all agree on the fundamental points. I raised this question primarily to discuss the aspect of the disconnect between the public perception and the actual specification. What is generally overlooked is that a write action has no one-to-one correspondence with a single physical event, but with two events: (write done, write becomes visible). Since the JMM formal model is timeless, it doesn't have to bother with the distinction, but the name "happens-before" creates a powerful illusion that time is involved in the provided guarantees.

As I said, volatile has two disparate aspects:

1. A guarantee of proper action ordering;

2. A best-effort promise of the action being published in a timely fashion.

Future software projects may tend to put increasing stress on this distinction, however, and demand different combinations than given by volatile. A case in point is the lazySet method of the AtomicXxx classes. The way I understand it, it gives all the guarantees of a write to a volatile, just without the promise of timeliness. Theoretically, a JVM that implemented volatile exactly as lazySet would still comply with the formal model. Whether it would completely fail at being useful is not clear-cut, though, and could depend on many subtle factors and trade-offs.
Post by David Holmes
I think this is getting a little extreme and esoteric. There is a basic
requirement that the JVM implement the Java language which means that a
putfield (for example) must update the field (a volatile one of course) -
which is a memory location. Similarly when the processor is requested to
write a value to a memory location then the value must get written to
memory. There are no time guarantees expressed about these actions but they
must happen for the JVM and the processor to be deemed to be working
correctly.
The actual latencies are a quality of implementation issue.
David
-----Original Message-----
Topolnik
Sent: Saturday, 18 August 2012 6:45 PM
To: Yuval Shavit
Subject: Re: [concurrency-interest] Relativity of guarantees provided
byvolatile
On Sat, Aug 18, 2012 at 3:35 AM, Marko Topolnik
However, what is troubling is the belief of practically every
developer out there that there's a hard realtime GUARANTEE of the
instantaneous visibility of volatile writes.
Do they? I certainly believe that the read is going to see the
write very quickly, but a hard, realtime guarantee? Between the
JIT, GC and other apps that may hog CPU, I don't even depend on a
realtime guarantee between "int i = 5" and "i++".
Yes, my wording was too strong. I didn't mean "hard realtime",
but a hard guarantee, as opposed to a soft promise of best
effort. The volatile modifier gives you two things: a hard
guarantee of proper observed ordering of actions, and a *hint*
towards the timely publishing of a write. This is where confusion
enters---take this typical argument: "Without the volatile
modifier the reading thread is not guaranteed to ever observe a
write to the var". Well guess what, with the modifier it still
isn't *guaranteed* to ever observe it. This fact is very
counterintuitive and many people would even religiously oppose
it. I cannot escape the troubling feeling this gives me---a
developer should have the right intuition about his code and
shouldn't shake his head in disbelief when shown any piece of
code and the output that code may legally produce. Somewhere down
the line this must matter.
-Marko
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Jan Nielsen
2012-08-18 16:13:36 UTC
Permalink
Post by Marko Topolnik
2. If there is no loophole, is there anything to worry about, given that
practically 100% developers out there consider as guaranteed something that
isn't?
I'm part of the 99%. When I read: "A field may be declared volatile, in
which case the Java Memory Model ensures that all threads see a consistent
value for the variable" [1] I understand this to mean for volatile variable
's' updated in one thread and read from another, these scenarios are
possible:

Thread0
0: volatile int s=0;

Thread0 Thread1
-----------------
1: s=1
2: s==1
3: s=2
4: s==2
5: s=3
6: s==3
-----------------
1: s==1
2: s==1
3: s==1
4: s=1
5: s=2
6: s=3
-----------------
1: s=1
2: s=2
3: s=3
4: s==3
5: s==3
6: s==3

[1] http://docs.oracle.com/javase/specs/jls/se7/html/jls-8.html#jls-8.3.1.4
Marko Topolnik
2012-08-18 18:11:23 UTC
Permalink
Jon,

the only thing you need to notice is that the rows in your table are sequenced by the synchronization order, which has nothing (at least formally) to do with the wall-clock timing order of the events, and especially: you show a write as a single event where in fact it is two events separated in time: the moment Thread0 writes a value and a later moment when Thread1 observes it is written. That later moment can be arbitrarily postponed as long as the overall happens-before is maintained. This can in the end create the behavior that I demonstrate in my original post.

That is why the JMM **does not** guarantee that Thread1 will observe at t=3 a write by Thread0 at t=0. It only guarantees that **when** it observes it, it will definitely also observe any writes preceding it, so it will see the actions of Thread0 as if they happened in program order.

-Marko
Post by Marko Topolnik
2. If there is no loophole, is there anything to worry about, given that practically 100% developers out there consider as guaranteed something that isn't?
Thread0
0: volatile int s=0;
Thread0 Thread1
-----------------
1: s=1
2: s==1
3: s=2
4: s==2
5: s=3
6: s==3
-----------------
1: s==1
2: s==1
3: s==1
4: s=1
5: s=2
6: s=3
-----------------
1: s=1
2: s=2
3: s=3
4: s==3
5: s==3
6: s==3
[1] http://docs.oracle.com/javase/specs/jls/se7/html/jls-8.html#jls-8.3.1.4
_______________________________________________
Concurrency-interest mailing list
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Loading...