Commit 7073bc66 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "The main RCU changes in this cycle are:

   - the combination of tree geometry-initialization simplifications and
     OS-jitter-reduction changes to expedited grace periods.  These two
     are stacked due to the large number of conflicts that would
     otherwise result.

   - privatize smp_mb__after_unlock_lock().

     This commit moves the definition of smp_mb__after_unlock_lock() to
     kernel/rcu/tree.h, in recognition of the fact that RCU is the only
     thing using this, that nothing else is likely to use it, and that
     it is likely to go away completely.

   - documentation updates.

   - torture-test updates.

   - misc fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  rcu,locking: Privatize smp_mb__after_unlock_lock()
  rcu: Silence lockdep false positive for expedited grace periods
  rcu: Don't disable CPU hotplug during OOM notifiers
  scripts: Make checkpatch.pl warn on expedited RCU grace periods
  rcu: Update MAINTAINERS entry
  rcu: Clarify CONFIG_RCU_EQS_DEBUG help text
  rcu: Fix backwards RCU_LOCKDEP_WARN() in synchronize_rcu_tasks()
  rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()
  rcu: Make rcu_is_watching() really notrace
  cpu: Wait for RCU grace periods concurrently
  rcu: Create a synchronize_rcu_mult()
  rcu: Fix obsolete priority-boosting comment
  rcu: Use WRITE_ONCE in RCU_INIT_POINTER
  rcu: Hide RCU_NOCB_CPU behind RCU_EXPERT
  rcu: Add RCU-sched flavors of get-state and cond-sync
  rcu: Add fastpath bypassing funnel locking
  rcu: Rename RCU_GP_DONE_FQS to RCU_GP_DOING_FQS
  rcu: Pull out wait_event*() condition into helper function
  documentation: Describe new expedited stall warnings
  rcu: Add stall warnings to synchronize_sched_expedited()
  ...
parents d4c90396 f612a7b1
...@@ -28,7 +28,7 @@ o You must use one of the rcu_dereference() family of primitives ...@@ -28,7 +28,7 @@ o You must use one of the rcu_dereference() family of primitives
o Avoid cancellation when using the "+" and "-" infix arithmetic o Avoid cancellation when using the "+" and "-" infix arithmetic
operators. For example, for a given variable "x", avoid operators. For example, for a given variable "x", avoid
"(x-x)". There are similar arithmetic pitfalls from other "(x-x)". There are similar arithmetic pitfalls from other
arithmetic operatiors, such as "(x*0)", "(x/(x+1))" or "(x%1)". arithmetic operators, such as "(x*0)", "(x/(x+1))" or "(x%1)".
The compiler is within its rights to substitute zero for all of The compiler is within its rights to substitute zero for all of
these expressions, so that subsequent accesses no longer depend these expressions, so that subsequent accesses no longer depend
on the rcu_dereference(), again possibly resulting in bugs due on the rcu_dereference(), again possibly resulting in bugs due
......
...@@ -26,12 +26,6 @@ CONFIG_RCU_CPU_STALL_TIMEOUT ...@@ -26,12 +26,6 @@ CONFIG_RCU_CPU_STALL_TIMEOUT
Stall-warning messages may be enabled and disabled completely via Stall-warning messages may be enabled and disabled completely via
/sys/module/rcupdate/parameters/rcu_cpu_stall_suppress. /sys/module/rcupdate/parameters/rcu_cpu_stall_suppress.
CONFIG_RCU_CPU_STALL_INFO
This kernel configuration parameter causes the stall warning to
print out additional per-CPU diagnostic information, including
information on scheduling-clock ticks and RCU's idle-CPU tracking.
RCU_STALL_DELAY_DELTA RCU_STALL_DELAY_DELTA
Although the lockdep facility is extremely useful, it does add Although the lockdep facility is extremely useful, it does add
...@@ -101,15 +95,13 @@ interact. Please note that it is not possible to entirely eliminate this ...@@ -101,15 +95,13 @@ interact. Please note that it is not possible to entirely eliminate this
sort of false positive without resorting to things like stop_machine(), sort of false positive without resorting to things like stop_machine(),
which is overkill for this sort of problem. which is overkill for this sort of problem.
If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set, Recent kernels will print a long form of the stall-warning message:
more information is printed with the stall-warning message, for example:
INFO: rcu_preempt detected stall on CPU INFO: rcu_preempt detected stall on CPU
0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 softirq=82/543 0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 softirq=82/543
(t=65000 jiffies) (t=65000 jiffies)
In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed:
printed:
INFO: rcu_preempt detected stall on CPU INFO: rcu_preempt detected stall on CPU
0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 nonlazy_posted: 25 .D 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 nonlazy_posted: 25 .D
...@@ -171,6 +163,23 @@ message will be about three times the interval between the beginning ...@@ -171,6 +163,23 @@ message will be about three times the interval between the beginning
of the stall and the first message. of the stall and the first message.
Stall Warnings for Expedited Grace Periods
If an expedited grace period detects a stall, it will place a message
like the following in dmesg:
INFO: rcu_sched detected expedited stalls on CPUs: { 1 2 6 } 26009 jiffies s: 1043
This indicates that CPUs 1, 2, and 6 have failed to respond to a
reschedule IPI, that the expedited grace period has been going on for
26,009 jiffies, and that the expedited grace-period sequence counter is
1043. The fact that this last value is odd indicates that an expedited
grace period is in flight.
It is entirely possible to see stall warnings from normal and from
expedited grace periods at about the same time from the same run.
What Causes RCU CPU Stall Warnings? What Causes RCU CPU Stall Warnings?
So your kernel printed an RCU CPU stall warning. The next question is So your kernel printed an RCU CPU stall warning. The next question is
......
...@@ -237,42 +237,26 @@ o "ktl" is the low-order 16 bits (in hexadecimal) of the count of ...@@ -237,42 +237,26 @@ o "ktl" is the low-order 16 bits (in hexadecimal) of the count of
The output of "cat rcu/rcu_preempt/rcuexp" looks as follows: The output of "cat rcu/rcu_preempt/rcuexp" looks as follows:
s=21872 d=21872 w=0 tf=0 wd1=0 wd2=0 n=0 sc=21872 dt=21872 dl=0 dx=21872 s=21872 wd0=0 wd1=0 wd2=0 wd3=5 n=0 enq=0 sc=21872
These fields are as follows: These fields are as follows:
o "s" is the starting sequence number. o "s" is the sequence number, with an odd number indicating that
an expedited grace period is in progress.
o "d" is the ending sequence number. When the starting and ending o "wd0", "wd1", "wd2", and "wd3" are the number of times that an
numbers differ, there is an expedited grace period in progress. attempt to start an expedited grace period found that someone
else had completed an expedited grace period that satisfies the
o "w" is the number of times that the sequence numbers have been
in danger of wrapping.
o "tf" is the number of times that contention has resulted in a
failure to begin an expedited grace period.
o "wd1" and "wd2" are the number of times that an attempt to
start an expedited grace period found that someone else had
completed an expedited grace period that satisfies the
attempted request. "Our work is done." attempted request. "Our work is done."
o "n" is number of times that contention was so great that o "n" is number of times that a concurrent CPU-hotplug operation
the request was demoted from an expedited grace period to forced a fallback to a normal grace period.
a normal grace period.
o "enq" is the number of quiescent states still outstanding.
o "sc" is the number of times that the attempt to start a o "sc" is the number of times that the attempt to start a
new expedited grace period succeeded. new expedited grace period succeeded.
o "dt" is the number of times that we attempted to update
the "d" counter.
o "dl" is the number of times that we failed to update the "d"
counter.
o "dx" is the number of times that we succeeded in updating
the "d" counter.
The output of "cat rcu/rcu_preempt/rcugp" looks as follows: The output of "cat rcu/rcu_preempt/rcugp" looks as follows:
......
...@@ -883,7 +883,7 @@ All: lockdep-checked RCU-protected pointer access ...@@ -883,7 +883,7 @@ All: lockdep-checked RCU-protected pointer access
rcu_access_pointer rcu_access_pointer
rcu_dereference_raw rcu_dereference_raw
rcu_lockdep_assert RCU_LOCKDEP_WARN
rcu_sleep_check rcu_sleep_check
RCU_NONIDLE RCU_NONIDLE
......
...@@ -3137,22 +3137,35 @@ bytes respectively. Such letter suffixes can also be entirely omitted. ...@@ -3137,22 +3137,35 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
in a given burst of a callback-flood test. in a given burst of a callback-flood test.
rcutorture.fqs_duration= [KNL] rcutorture.fqs_duration= [KNL]
Set duration of force_quiescent_state bursts. Set duration of force_quiescent_state bursts
in microseconds.
rcutorture.fqs_holdoff= [KNL] rcutorture.fqs_holdoff= [KNL]
Set holdoff time within force_quiescent_state bursts. Set holdoff time within force_quiescent_state bursts
in microseconds.
rcutorture.fqs_stutter= [KNL] rcutorture.fqs_stutter= [KNL]
Set wait time between force_quiescent_state bursts. Set wait time between force_quiescent_state bursts
in seconds.
rcutorture.gp_cond= [KNL]
Use conditional/asynchronous update-side
primitives, if available.
rcutorture.gp_exp= [KNL] rcutorture.gp_exp= [KNL]
Use expedited update-side primitives. Use expedited update-side primitives, if available.
rcutorture.gp_normal= [KNL] rcutorture.gp_normal= [KNL]
Use normal (non-expedited) update-side primitives. Use normal (non-expedited) asynchronous
If both gp_exp and gp_normal are set, do both. update-side primitives, if available.
If neither gp_exp nor gp_normal are set, still
do both. rcutorture.gp_sync= [KNL]
Use normal (non-expedited) synchronous
update-side primitives, if available. If all
of rcutorture.gp_cond=, rcutorture.gp_exp=,
rcutorture.gp_normal=, and rcutorture.gp_sync=
are zero, rcutorture acts as if is interpreted
they are all non-zero.
rcutorture.n_barrier_cbs= [KNL] rcutorture.n_barrier_cbs= [KNL]
Set callbacks/threads for rcu_barrier() testing. Set callbacks/threads for rcu_barrier() testing.
...@@ -3179,9 +3192,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted. ...@@ -3179,9 +3192,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Set time (s) between CPU-hotplug operations, or Set time (s) between CPU-hotplug operations, or
zero to disable CPU-hotplug testing. zero to disable CPU-hotplug testing.
rcutorture.torture_runnable= [BOOT]
Start rcutorture running at boot time.
rcutorture.shuffle_interval= [KNL] rcutorture.shuffle_interval= [KNL]
Set task-shuffle interval (s). Shuffling tasks Set task-shuffle interval (s). Shuffling tasks
allows some CPUs to go into dyntick-idle mode allows some CPUs to go into dyntick-idle mode
...@@ -3222,6 +3232,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted. ...@@ -3222,6 +3232,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Test RCU's dyntick-idle handling. See also the Test RCU's dyntick-idle handling. See also the
rcutorture.shuffle_interval parameter. rcutorture.shuffle_interval parameter.
rcutorture.torture_runnable= [BOOT]
Start rcutorture running at boot time.
rcutorture.torture_type= [KNL] rcutorture.torture_type= [KNL]
Specify the RCU implementation to test. Specify the RCU implementation to test.
......
...@@ -194,22 +194,22 @@ There are some minimal guarantees that may be expected of a CPU: ...@@ -194,22 +194,22 @@ There are some minimal guarantees that may be expected of a CPU:
(*) On any given CPU, dependent memory accesses will be issued in order, with (*) On any given CPU, dependent memory accesses will be issued in order, with
respect to itself. This means that for: respect to itself. This means that for:
ACCESS_ONCE(Q) = P; smp_read_barrier_depends(); D = ACCESS_ONCE(*Q); WRITE_ONCE(Q, P); smp_read_barrier_depends(); D = READ_ONCE(*Q);
the CPU will issue the following memory operations: the CPU will issue the following memory operations:
Q = LOAD P, D = LOAD *Q Q = LOAD P, D = LOAD *Q
and always in that order. On most systems, smp_read_barrier_depends() and always in that order. On most systems, smp_read_barrier_depends()
does nothing, but it is required for DEC Alpha. The ACCESS_ONCE() does nothing, but it is required for DEC Alpha. The READ_ONCE()
is required to prevent compiler mischief. Please note that you and WRITE_ONCE() are required to prevent compiler mischief. Please
should normally use something like rcu_dereference() instead of note that you should normally use something like rcu_dereference()
open-coding smp_read_barrier_depends(). instead of open-coding smp_read_barrier_depends().
(*) Overlapping loads and stores within a particular CPU will appear to be (*) Overlapping loads and stores within a particular CPU will appear to be
ordered within that CPU. This means that for: ordered within that CPU. This means that for:
a = ACCESS_ONCE(*X); ACCESS_ONCE(*X) = b; a = READ_ONCE(*X); WRITE_ONCE(*X, b);
the CPU will only issue the following sequence of memory operations: the CPU will only issue the following sequence of memory operations:
...@@ -217,7 +217,7 @@ There are some minimal guarantees that may be expected of a CPU: ...@@ -217,7 +217,7 @@ There are some minimal guarantees that may be expected of a CPU:
And for: And for:
ACCESS_ONCE(*X) = c; d = ACCESS_ONCE(*X); WRITE_ONCE(*X, c); d = READ_ONCE(*X);
the CPU will only issue: the CPU will only issue:
...@@ -228,11 +228,11 @@ There are some minimal guarantees that may be expected of a CPU: ...@@ -228,11 +228,11 @@ There are some minimal guarantees that may be expected of a CPU:
And there are a number of things that _must_ or _must_not_ be assumed: And there are a number of things that _must_ or _must_not_ be assumed:
(*) It _must_not_ be assumed that the compiler will do what you want with (*) It _must_not_ be assumed that the compiler will do what you want
memory references that are not protected by ACCESS_ONCE(). Without with memory references that are not protected by READ_ONCE() and
ACCESS_ONCE(), the compiler is within its rights to do all sorts WRITE_ONCE(). Without them, the compiler is within its rights to
of "creative" transformations, which are covered in the Compiler do all sorts of "creative" transformations, which are covered in
Barrier section. the Compiler Barrier section.
(*) It _must_not_ be assumed that independent loads and stores will be issued (*) It _must_not_ be assumed that independent loads and stores will be issued
in the order given. This means that for: in the order given. This means that for:
...@@ -520,8 +520,8 @@ following sequence of events: ...@@ -520,8 +520,8 @@ following sequence of events:
{ A == 1, B == 2, C = 3, P == &A, Q == &C } { A == 1, B == 2, C = 3, P == &A, Q == &C }
B = 4; B = 4;
<write barrier> <write barrier>
ACCESS_ONCE(P) = &B WRITE_ONCE(P, &B)
Q = ACCESS_ONCE(P); Q = READ_ONCE(P);
D = *Q; D = *Q;
There's a clear data dependency here, and it would seem that by the end of the There's a clear data dependency here, and it would seem that by the end of the
...@@ -547,8 +547,8 @@ between the address load and the data load: ...@@ -547,8 +547,8 @@ between the address load and the data load:
{ A == 1, B == 2, C = 3, P == &A, Q == &C } { A == 1, B == 2, C = 3, P == &A, Q == &C }
B = 4; B = 4;
<write barrier> <write barrier>
ACCESS_ONCE(P) = &B WRITE_ONCE(P, &B);
Q = ACCESS_ONCE(P); Q = READ_ONCE(P);
<data dependency barrier> <data dependency barrier>
D = *Q; D = *Q;
...@@ -574,8 +574,8 @@ access: ...@@ -574,8 +574,8 @@ access:
{ M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 } { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
M[1] = 4; M[1] = 4;
<write barrier> <write barrier>
ACCESS_ONCE(P) = 1 WRITE_ONCE(P, 1);
Q = ACCESS_ONCE(P); Q = READ_ONCE(P);
<data dependency barrier> <data dependency barrier>
D = M[Q]; D = M[Q];
...@@ -596,10 +596,10 @@ A load-load control dependency requires a full read memory barrier, not ...@@ -596,10 +596,10 @@ A load-load control dependency requires a full read memory barrier, not
simply a data dependency barrier to make it work correctly. Consider the simply a data dependency barrier to make it work correctly. Consider the
following bit of code: following bit of code:
q = ACCESS_ONCE(a); q = READ_ONCE(a);
if (q) { if (q) {
<data dependency barrier> /* BUG: No data dependency!!! */ <data dependency barrier> /* BUG: No data dependency!!! */
p = ACCESS_ONCE(b); p = READ_ONCE(b);
} }
This will not have the desired effect because there is no actual data This will not have the desired effect because there is no actual data
...@@ -608,10 +608,10 @@ by attempting to predict the outcome in advance, so that other CPUs see ...@@ -608,10 +608,10 @@ by attempting to predict the outcome in advance, so that other CPUs see
the load from b as having happened before the load from a. In such a the load from b as having happened before the load from a. In such a
case what's actually required is: case what's actually required is:
q = ACCESS_ONCE(a); q = READ_ONCE(a);
if (q) { if (q) {
<read barrier> <read barrier>
p = ACCESS_ONCE(b); p = READ_ONCE(b);
} }
However, stores are not speculated. This means that ordering -is- provided However, stores are not speculated. This means that ordering -is- provided
...@@ -619,7 +619,7 @@ for load-store control dependencies, as in the following example: ...@@ -619,7 +619,7 @@ for load-store control dependencies, as in the following example:
q = READ_ONCE_CTRL(a); q = READ_ONCE_CTRL(a);
if (q) { if (q) {
ACCESS_ONCE(b) = p; WRITE_ONCE(b, p);
} }
Control dependencies pair normally with other types of barriers. That Control dependencies pair normally with other types of barriers. That
...@@ -647,11 +647,11 @@ branches of the "if" statement as follows: ...@@ -647,11 +647,11 @@ branches of the "if" statement as follows:
q = READ_ONCE_CTRL(a); q = READ_ONCE_CTRL(a);
if (q) { if (q) {
barrier(); barrier();
ACCESS_ONCE(b) = p; WRITE_ONCE(b, p);
do_something(); do_something();
} else { } else {
barrier(); barrier();
ACCESS_ONCE(b) = p; WRITE_ONCE(b, p);
do_something_else(); do_something_else();
} }
...@@ -660,12 +660,12 @@ optimization levels: ...@@ -660,12 +660,12 @@ optimization levels:
q = READ_ONCE_CTRL(a); q = READ_ONCE_CTRL(a);
barrier(); barrier();
ACCESS_ONCE(b) = p; /* BUG: No ordering vs. load from a!!! */ WRITE_ONCE(b, p); /* BUG: No ordering vs. load from a!!! */
if (q) { if (q) {
/* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */ /* WRITE_ONCE(b, p); -- moved up, BUG!!! */
do_something(); do_something();
} else { } else {
/* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */ /* WRITE_ONCE(b, p); -- moved up, BUG!!! */
do_something_else(); do_something_else();
} }
...@@ -676,7 +676,7 @@ assembly code even after all compiler optimizations have been applied. ...@@ -676,7 +676,7 @@ assembly code even after all compiler optimizations have been applied.
Therefore, if you need ordering in this example, you need explicit Therefore, if you need ordering in this example, you need explicit
memory barriers, for example, smp_store_release(): memory barriers, for example, smp_store_release():
q = ACCESS_ONCE(a); q = READ_ONCE(a);
if (q) { if (q) {
smp_store_release(&b, p); smp_store_release(&b, p);
do_something(); do_something();
...@@ -690,10 +690,10 @@ ordering is guaranteed only when the stores differ, for example: ...@@ -690,10 +690,10 @@ ordering is guaranteed only when the stores differ, for example:
q = READ_ONCE_CTRL(a); q = READ_ONCE_CTRL(a);
if (q) { if (q) {
ACCESS_ONCE(b) = p; WRITE_ONCE(b, p);
do_something(); do_something();
} else { } else {
ACCESS_ONCE(b) = r; WRITE_ONCE(b, r);
do_something_else(); do_something_else();
} }
...@@ -706,10 +706,10 @@ the needed conditional. For example: ...@@ -706,10 +706,10 @@ the needed conditional. For example:
q = READ_ONCE_CTRL(a); q = READ_ONCE_CTRL(a);
if (q % MAX) { if (q % MAX) {
ACCESS_ONCE(b) = p; WRITE_ONCE(b, p);
do_something(); do_something();
} else { } else {
ACCESS_ONCE(b) = r; WRITE_ONCE(b, r);
do_something_else(); do_something_else();
} }
...@@ -718,7 +718,7 @@ equal to zero, in which case the compiler is within its rights to ...@@ -718,7 +718,7 @@ equal to zero, in which case the compiler is within its rights to
transform the above code into the following: transform the above code into the following:
q = READ_ONCE_CTRL(a); q = READ_ONCE_CTRL(a);
ACCESS_ONCE(b) = p; WRITE_ONCE(b, p);
do_something_else(); do_something_else();
Given this transformation, the CPU is not required to respect the ordering Given this transformation, the CPU is not required to respect the ordering
...@@ -731,10 +731,10 @@ one, perhaps as follows: ...@@ -731,10 +731,10 @@ one, perhaps as follows:
q = READ_ONCE_CTRL(a); q = READ_ONCE_CTRL(a);
BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */ BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
if (q % MAX) { if (q % MAX) {
ACCESS_ONCE(b) = p; WRITE_ONCE(b, p);
do_something(); do_something();
} else { } else {
ACCESS_ONCE(b) = r; WRITE_ONCE(b, r);
do_something_else(); do_something_else();
} }
...@@ -746,18 +746,18 @@ You must also be careful not to rely too much on boolean short-circuit ...@@ -746,18 +746,18 @@ You must also be careful not to rely too much on boolean short-circuit
evaluation. Consider this example: evaluation. Consider this example:
q = READ_ONCE_CTRL(a); q = READ_ONCE_CTRL(a);
if (a || 1 > 0) if (q || 1 > 0)
ACCESS_ONCE(b) = 1; WRITE_ONCE(b, 1);
Because the first condition cannot fault and the second condition is Because the first condition cannot fault and the second condition is
always true, the compiler can transform this example as following, always true, the compiler can transform this example as following,
defeating control dependency: defeating control dependency:
q = READ_ONCE_CTRL(a); q = READ_ONCE_CTRL(a);
ACCESS_ONCE(b) = 1; WRITE_ONCE(b, 1);
This example underscores the need to ensure that the compiler cannot This example underscores the need to ensure that the compiler cannot
out-guess your code. More generally, although ACCESS_ONCE() does force out-guess your code. More generally, although READ_ONCE() does force
the compiler to actually emit code for a given load, it does not force the compiler to actually emit code for a given load, it does not force
the compiler to use the results. the compiler to use the results.
...@@ -769,7 +769,7 @@ x and y both being zero: ...@@ -769,7 +769,7 @@ x and y both being zero:
======================= ======================= ======================= =======================
r1 = READ_ONCE_CTRL(x); r2 = READ_ONCE_CTRL(y); r1 = READ_ONCE_CTRL(x); r2 = READ_ONCE_CTRL(y);
if (r1 > 0) if (r2 > 0) if (r1 > 0) if (r2 > 0)
ACCESS_ONCE(y) = 1; ACCESS_ONCE(x) = 1; WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
assert(!(r1 == 1 && r2 == 1)); assert(!(r1 == 1 && r2 == 1));
...@@ -779,7 +779,7 @@ then adding the following CPU would guarantee a related assertion: ...@@ -779,7 +779,7 @@ then adding the following CPU would guarantee a related assertion:
CPU 2 CPU 2
===================== =====================
ACCESS_ONCE(x) = 2; WRITE_ONCE(x, 2);
assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */ assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */
...@@ -798,8 +798,7 @@ In summary: ...@@ -798,8 +798,7 @@ In summary:
(*) Control dependencies must be headed by READ_ONCE_CTRL(). (*) Control dependencies must be headed by READ_ONCE_CTRL().
Or, as a much less preferable alternative, interpose Or, as a much less preferable alternative, interpose
be headed by READ_ONCE() or an ACCESS_ONCE() read and must smp_read_barrier_depends() between a READ_ONCE() and the
have smp_read_barrier_depends() between this read and the
control-dependent write. control-dependent write.
(*) Control dependencies can order prior loads against later stores. (*) Control dependencies can order prior loads against later stores.
...@@ -815,15 +814,16 @@ In summary: ...@@ -815,15 +814,16 @@ In summary:
(*) Control dependencies require at least one run-time conditional (*) Control dependencies require at least one run-time conditional
between the prior load and the subsequent store, and this between the prior load and the subsequent store, and this
conditional must involve the prior load. If the compiler conditional must involve the prior load. If the compiler is able
is able to optimize the conditional away, it will have also to optimize the conditional away, it will have also optimized
optimized away the ordering. Careful use of ACCESS_ONCE() can away the ordering. Careful use of READ_ONCE_CTRL() READ_ONCE(),
help to preserve the needed conditional. and WRITE_ONCE() can help to preserve the needed conditional.
(*) Control dependencies require that the compiler avoid reordering the (*) Control dependencies require that the compiler avoid reordering the
dependency into nonexistence. Careful use of ACCESS_ONCE() or dependency into nonexistence. Careful use of READ_ONCE_CTRL()
barrier() can help to preserve your control dependency. Please or smp_read_barrier_depends() can help to preserve your control
see the Compiler Barrier section for more information. dependency. Please see the Compiler Barrier section for more
information.
(*) Control dependencies pair normally with other types of barriers. (*) Control dependencies pair normally with other types of barriers.
...@@ -848,11 +848,11 @@ barrier, an acquire barrier, a release barrier, or a general barrier: ...@@ -848,11 +848,11 @@ barrier, an acquire barrier, a release barrier, or a general barrier:
CPU 1 CPU 2 CPU 1 CPU 2
=============== =============== =============== ===============
ACCESS_ONCE(a) = 1; WRITE_ONCE(a, 1);
<write barrier> <write barrier>
ACCESS_ONCE(b) = 2; x = ACCESS_ONCE(b); WRITE_ONCE(b, 2); x = READ_ONCE(b);
<read barrier> <read barrier>
y = ACCESS_ONCE(a); y = READ_ONCE(a);
Or: Or:
...@@ -860,7 +860,7 @@ Or: ...@@ -860,7 +860,7 @@ Or:
=============== =============================== =============== ===============================
a = 1; a = 1;
<write barrier> <write barrier>
ACCESS_ONCE(b) = &a; x = ACCESS_ONCE(b); WRITE_ONCE(b, &a); x = READ_ONCE(b);
<data dependency barrier> <data dependency barrier>
y = *x; y = *x;
...@@ -868,11 +868,11 @@ Or even: ...@@ -868,11 +868,11 @@ Or even:
CPU 1 CPU 2 CPU 1 CPU 2
=============== =============================== =============== ===============================
r1 = ACCESS_ONCE(y); r1 = READ_ONCE(y);
<general barrier> <general barrier>
ACCESS_ONCE(y) = 1; if (r2 = ACCESS_ONCE(x)) { WRITE_ONCE(y, 1); if (r2 = READ_ONCE(x)) {
<implicit control dependency> <implicit control dependency>
ACCESS_ONCE(y) = 1; WRITE_ONCE(y, 1);
} }
assert(r1 == 0 || r2 == 0); assert(r1 == 0 || r2 == 0);
...@@ -886,11 +886,11 @@ versa: ...@@ -886,11 +886,11 @@ versa:
CPU 1 CPU 2 CPU 1 CPU 2
=================== =================== =================== ===================
ACCESS_ONCE(a) = 1; }---- --->{ v = ACCESS_ONCE(c); WRITE_ONCE(a, 1); }---- --->{ v = READ_ONCE(c);
ACCESS_ONCE(b) = 2; } \ / { w = ACCESS_ONCE(d); WRITE_ONCE(b, 2); } \ / { w = READ_ONCE(d);
<write barrier> \ <read barrier> <write barrier> \ <read barrier>
ACCESS_ONCE(c) = 3; } / \ { x = ACCESS_ONCE(a); WRITE_ONCE(c, 3); } / \ { x = READ_ONCE(a);
ACCESS_ONCE(d) = 4; }---- --->{ y = ACCESS_ONCE(b); WRITE_ONCE(d, 4); }---- --->{ y = READ_ONCE(b);
EXAMPLES OF MEMORY BARRIER SEQUENCES EXAMPLES OF MEMORY BARRIER SEQUENCES
...@@ -1340,10 +1340,10 @@ compiler from moving the memory accesses either side of it to the other side: ...@@ -1340,10 +1340,10 @@ compiler from moving the memory accesses either side of it to the other side:
barrier(); barrier();
This is a general barrier -- there are no read-read or write-write variants This is a general barrier -- there are no read-read or write-write
of barrier(). However, ACCESS_ONCE() can be thought of as a weak form variants of barrier(). However, READ_ONCE() and WRITE_ONCE() can be
for barrier() that affects only the specific accesses flagged by the thought of as weak forms of barrier() that affect only the specific
ACCESS_ONCE(). accesses flagged by the READ_ONCE() or WRITE_ONCE().
The barrier() function has the following effects: The barrier() function has the following effects:
...@@ -1355,9 +1355,10 @@ The barrier() function has the following effects: ...@@ -1355,9 +1355,10 @@ The barrier() function has the following effects:
(*) Within a loop, forces the compiler to load the variables used (*) Within a loop, forces the compiler to load the variables used
in that loop's conditional on each pass through that loop. in that loop's conditional on each pass through that loop.
The ACCESS_ONCE() function can prevent any number of optimizations that, The READ_ONCE() and WRITE_ONCE() functions can prevent any number of
while perfectly safe in single-threaded code, can be fatal in concurrent optimizations that, while perfectly safe in single-threaded code, can
code. Here are some examples of these sorts of optimizations: be fatal in concurrent code. Here are some examples of these sorts
of optimizations:
(*) The compiler is within its rights to reorder loads and stores (*) The compiler is within its rights to reorder loads and stores
to the same variable, and in some cases, the CPU is within its to the same variable, and in some cases, the CPU is within its
...@@ -1370,11 +1371,11 @@ code. Here are some examples of these sorts of optimizations: ...@@ -1370,11 +1371,11 @@ code. Here are some examples of these sorts of optimizations:
Might result in an older value of x stored in a[1] than in a[0]. Might result in an older value of x stored in a[1] than in a[0].
Prevent both the compiler and the CPU from doing this as follows: Prevent both the compiler and the CPU from doing this as follows:
a[0] = ACCESS_ONCE(x); a[0] = READ_ONCE(x);
a[1] = ACCESS_ONCE(x); a[1] = READ_ONCE(x);
In short, ACCESS_ONCE() provides cache coherence for accesses from In short, READ_ONCE() and WRITE_ONCE() provide cache coherence for
multiple CPUs to a single variable. accesses from multiple CPUs to a single variable.
(*) The compiler is within its rights to merge successive loads from (*) The compiler is within its rights to merge successive loads from
the same variable. Such merging can cause the compiler to "optimize" the same variable. Such merging can cause the compiler to "optimize"
...@@ -1391,9 +1392,9 @@ code. Here are some examples of these sorts of optimizations: ...@@ -1391,9 +1392,9 @@ code. Here are some examples of these sorts of optimizations:
for (;;) for (;;)
do_something_with(tmp); do_something_with(tmp);
Use ACCESS_ONCE() to prevent the compiler from doing this to you: Use READ_ONCE() to prevent the compiler from doing this to you:
while (tmp = ACCESS_ONCE(a)) while (tmp = READ_ONCE(a))
do_something_with(tmp); do_something_with(tmp);
(*) The compiler is within its rights to reload a variable, for example, (*) The compiler is within its rights to reload a variable, for example,
...@@ -1415,9 +1416,9 @@ code. Here are some examples of these sorts of optimizations: ...@@ -1415,9 +1416,9 @@ code. Here are some examples of these sorts of optimizations:
a was modified by some other CPU between the "while" statement and a was modified by some other CPU between the "while" statement and
the call to do_something_with(). the call to do_something_with().
Again, use ACCESS_ONCE() to prevent the compiler from doing this: Again, use READ_ONCE() to prevent the compiler from doing this:
while (tmp = ACCESS_ONCE(a)) while (tmp = READ_ONCE(a))
do_something_with(tmp); do_something_with(tmp);
Note that if the compiler runs short of registers, it might save Note that if the compiler runs short of registers, it might save
...@@ -1437,21 +1438,21 @@ code. Here are some examples of these sorts of optimizations: ...@@ -1437,21 +1438,21 @@ code. Here are some examples of these sorts of optimizations:
do { } while (0); do { } while (0);
This transformation is a win for single-threaded code because it gets This transformation is a win for single-threaded code because it
rid of a load and a branch. The problem is that the compiler will gets rid of a load and a branch. The problem is that the compiler
carry out its proof assuming that the current CPU is the only one will carry out its proof assuming that the current CPU is the only
updating variable 'a'. If variable 'a' is shared, then the compiler's one updating variable 'a'. If variable 'a' is shared, then the
proof will be erroneous. Use ACCESS_ONCE() to tell the compiler compiler's proof will be erroneous. Use READ_ONCE() to tell the
that it doesn't know as much as it thinks it does: compiler that it doesn't know as much as it thinks it does:
while (tmp = ACCESS_ONCE(a)) while (tmp = READ_ONCE(a))
do_something_with(tmp); do_something_with(tmp);
But please note that the compiler is also closely watching what you But please note that the compiler is also closely watching what you
do with the value after the ACCESS_ONCE(). For example, suppose you do with the value after the READ_ONCE(). For example, suppose you
do the following and MAX is a preprocessor macro with the value 1: do the following and MAX is a preprocessor macro with the value 1:
while ((tmp = ACCESS_ONCE(a)) % MAX) while ((tmp = READ_ONCE(a)) % MAX)
do_something_with(tmp); do_something_with(tmp);
Then the compiler knows that the result of the "%" operator applied Then the compiler knows that the result of the "%" operator applied
...@@ -1475,12 +1476,12 @@ code. Here are some examples of these sorts of optimizations: ...@@ -1475,12 +1476,12 @@ code. Here are some examples of these sorts of optimizations:
surprise if some other CPU might have stored to variable 'a' in the surprise if some other CPU might have stored to variable 'a' in the
meantime. meantime.
Use ACCESS_ONCE() to prevent the compiler from making this sort of Use WRITE_ONCE() to prevent the compiler from making this sort of
wrong guess: wrong guess:
ACCESS_ONCE(a) = 0; WRITE_ONCE(a, 0);
/* Code that does not store to variable a. */ /* Code that does not store to variable a. */
ACCESS_ONCE(a) = 0; WRITE_ONCE(a, 0);
(*) The compiler is within its rights to reorder memory accesses unless (*) The compiler is within its rights to reorder memory accesses unless
you tell it not to. For example, consider the following interaction you tell it not to. For example, consider the following interaction
...@@ -1509,40 +1510,43 @@ code. Here are some examples of these sorts of optimizations: ...@@ -1509,40 +1510,43 @@ code. Here are some examples of these sorts of optimizations:
} }
If the interrupt occurs between these two statement, then If the interrupt occurs between these two statement, then
interrupt_handler() might be passed a garbled msg. Use ACCESS_ONCE() interrupt_handler() might be passed a garbled msg. Use WRITE_ONCE()
to prevent this as follows: to prevent this as follows:
void process_level(void) void process_level(void)
{ {
ACCESS_ONCE(msg) = get_message(); WRITE_ONCE(msg, get_message());
ACCESS_ONCE(flag) = true; WRITE_ONCE(flag, true);
} }
void interrupt_handler(void) void interrupt_handler(void)
{ {
if (ACCESS_ONCE(flag)) if (READ_ONCE(flag))
process_message(ACCESS_ONCE(msg)); process_message(READ_ONCE(msg));
} }
Note that the ACCESS_ONCE() wrappers in interrupt_handler() Note that the READ_ONCE() and WRITE_ONCE() wrappers in
are needed if this interrupt handler can itself be interrupted interrupt_handler() are needed if this interrupt handler can itself
by something that also accesses 'flag' and 'msg', for example, be interrupted by something that also accesses 'flag' and 'msg',
a nested interrupt or an NMI. Otherwise, ACCESS_ONCE() is not for example, a nested interrupt or an NMI. Otherwise, READ_ONCE()
needed in interrupt_handler() other than for documentation purposes. and WRITE_ONCE() are not needed in interrupt_handler() other than
(Note also that nested interrupts do not typically occur in modern for documentation purposes. (Note also that nested interrupts
Linux kernels, in fact, if an interrupt handler returns with do not typically occur in modern Linux kernels, in fact, if an
interrupts enabled, you will get a WARN_ONCE() splat.) interrupt handler returns with interrupts enabled, you will get a
WARN_ONCE() splat.)
You should assume that the compiler can move ACCESS_ONCE() past
code not containing ACCESS_ONCE(), barrier(), or similar primitives. You should assume that the compiler can move READ_ONCE() and
WRITE_ONCE() past code not containing READ_ONCE(), WRITE_ONCE(),
This effect could also be achieved using barrier(), but ACCESS_ONCE() barrier(), or similar primitives.
is more selective: With ACCESS_ONCE(), the compiler need only forget
the contents of the indicated memory locations, while with barrier() This effect could also be achieved using barrier(), but READ_ONCE()
the compiler must discard the value of all memory locations that and WRITE_ONCE() are more selective: With READ_ONCE() and
it has currented cached in any machine registers. Of course, WRITE_ONCE(), the compiler need only forget the contents of the
the compiler must also respect the order in which the ACCESS_ONCE()s indicated memory locations, while with barrier() the compiler must
occur, though the CPU of course need not do so. discard the value of all memory locations that it has currented
cached in any machine registers. Of course, the compiler must also
respect the order in which the READ_ONCE()s and WRITE_ONCE()s occur,
though the CPU of course need not do so.
(*) The compiler is within its rights to invent stores to a variable, (*) The compiler is within its rights to invent stores to a variable,
as in the following example: as in the following example:
...@@ -1562,16 +1566,16 @@ code. Here are some examples of these sorts of optimizations: ...@@ -1562,16 +1566,16 @@ code. Here are some examples of these sorts of optimizations:
a branch. Unfortunately, in concurrent code, this optimization a branch. Unfortunately, in concurrent code, this optimization
could cause some other CPU to see a spurious value of 42 -- even could cause some other CPU to see a spurious value of 42 -- even
if variable 'a' was never zero -- when loading variable 'b'. if variable 'a' was never zero -- when loading variable 'b'.
Use ACCESS_ONCE() to prevent this as follows: Use WRITE_ONCE() to prevent this as follows:
if (a) if (a)
ACCESS_ONCE(b) = a; WRITE_ONCE(b, a);
else else
ACCESS_ONCE(b) = 42; WRITE_ONCE(b, 42);
The compiler can also invent loads. These are usually less The compiler can also invent loads. These are usually less
damaging, but they can result in cache-line bouncing and thus in damaging, but they can result in cache-line bouncing and thus in
poor performance and scalability. Use ACCESS_ONCE() to prevent poor performance and scalability. Use READ_ONCE() to prevent
invented loads. invented loads.
(*) For aligned memory locations whose size allows them to be accessed (*) For aligned memory locations whose size allows them to be accessed
...@@ -1590,9 +1594,9 @@ code. Here are some examples of these sorts of optimizations: ...@@ -1590,9 +1594,9 @@ code. Here are some examples of these sorts of optimizations:
This optimization can therefore be a win in single-threaded code. This optimization can therefore be a win in single-threaded code.
In fact, a recent bug (since fixed) caused GCC to incorrectly use In fact, a recent bug (since fixed) caused GCC to incorrectly use
this optimization in a volatile store. In the absence of such bugs, this optimization in a volatile store. In the absence of such bugs,
use of ACCESS_ONCE() prevents store tearing in the following example: use of WRITE_ONCE() prevents store tearing in the following example:
ACCESS_ONCE(p) = 0x00010002; WRITE_ONCE(p, 0x00010002);
Use of packed structures can also result in load and store tearing, Use of packed structures can also result in load and store tearing,
as in this example: as in this example:
...@@ -1609,22 +1613,23 @@ code. Here are some examples of these sorts of optimizations: ...@@ -1609,22 +1613,23 @@ code. Here are some examples of these sorts of optimizations:
foo2.b = foo1.b; foo2.b = foo1.b;
foo2.c = foo1.c; foo2.c = foo1.c;
Because there are no ACCESS_ONCE() wrappers and no volatile markings, Because there are no READ_ONCE() or WRITE_ONCE() wrappers and no
the compiler would be well within its rights to implement these three volatile markings, the compiler would be well within its rights to
assignment statements as a pair of 32-bit loads followed by a pair implement these three assignment statements as a pair of 32-bit
of 32-bit stores. This would result in load tearing on 'foo1.b' loads followed by a pair of 32-bit stores. This would result in
and store tearing on 'foo2.b'. ACCESS_ONCE() again prevents tearing load tearing on 'foo1.b' and store tearing on 'foo2.b'. READ_ONCE()
in this example: and WRITE_ONCE() again prevent tearing in this example:
foo2.a = foo1.a; foo2.a = foo1.a;
ACCESS_ONCE(foo2.b) = ACCESS_ONCE(foo1.b); WRITE_ONCE(foo2.b, READ_ONCE(foo1.b));
foo2.c = foo1.c; foo2.c = foo1.c;
All that aside, it is never necessary to use ACCESS_ONCE() on a variable All that aside, it is never necessary to use READ_ONCE() and
that has been marked volatile. For example, because 'jiffies' is marked WRITE_ONCE() on a variable that has been marked volatile. For example,
volatile, it is never necessary to say ACCESS_ONCE(jiffies). The reason because 'jiffies' is marked volatile, it is never necessary to
for this is that ACCESS_ONCE() is implemented as a volatile cast, which say READ_ONCE(jiffies). The reason for this is that READ_ONCE() and
has no effect when its argument is already marked volatile. WRITE_ONCE() are implemented as volatile casts, which has no effect when
its argument is already marked volatile.
Please note that these compiler barriers have no direct effect on the CPU, Please note that these compiler barriers have no direct effect on the CPU,
which may then reorder things however it wishes. which may then reorder things however it wishes.
...@@ -1646,14 +1651,15 @@ The Linux kernel has eight basic CPU memory barriers: ...@@ -1646,14 +1651,15 @@ The Linux kernel has eight basic CPU memory barriers:
All memory barriers except the data dependency barriers imply a compiler All memory barriers except the data dependency barriers imply a compiler
barrier. Data dependencies do not impose any additional compiler ordering. barrier. Data dependencies do not impose any additional compiler ordering.
Aside: In the case of data dependencies, the compiler would be expected to Aside: In the case of data dependencies, the compiler would be expected
issue the loads in the correct order (eg. `a[b]` would have to load the value to issue the loads in the correct order (eg. `a[b]` would have to load
of b before loading a[b]), however there is no guarantee in the C specification the value of b before loading a[b]), however there is no guarantee in
that the compiler may not speculate the value of b (eg. is equal to 1) and load the C specification that the compiler may not speculate the value of b
a before b (eg. tmp = a[1]; if (b != 1) tmp = a[b]; ). There is also the (eg. is equal to 1) and load a before b (eg. tmp = a[1]; if (b != 1)
problem of a compiler reloading b after having loaded a[b], thus having a newer tmp = a[b]; ). There is also the problem of a compiler reloading b after
copy of b than a[b]. A consensus has not yet been reached about these problems, having loaded a[b], thus having a newer copy of b than a[b]. A consensus
however the ACCESS_ONCE macro is a good place to start looking. has not yet been reached about these problems, however the READ_ONCE()
macro is a good place to start looking.
SMP memory barriers are reduced to compiler barriers on uniprocessor compiled SMP memory barriers are reduced to compiler barriers on uniprocessor compiled
systems because it is assumed that a CPU will appear to be self-consistent, systems because it is assumed that a CPU will appear to be self-consistent,
...@@ -1848,15 +1854,10 @@ RELEASE are to the same lock variable, but only from the perspective of ...@@ -1848,15 +1854,10 @@ RELEASE are to the same lock variable, but only from the perspective of
another CPU not holding that lock. In short, a ACQUIRE followed by an another CPU not holding that lock. In short, a ACQUIRE followed by an
RELEASE may -not- be assumed to be a full memory barrier. RELEASE may -not- be assumed to be a full memory barrier.
Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not Similarly, the reverse case of a RELEASE followed by an ACQUIRE does
imply a full memory barrier. If it is necessary for a RELEASE-ACQUIRE not imply a full memory barrier. Therefore, the CPU's execution of the
pair to produce a full barrier, the ACQUIRE can be followed by an critical sections corresponding to the RELEASE and the ACQUIRE can cross,
smp_mb__after_unlock_lock() invocation. This will produce a full barrier so that:
if either (a) the RELEASE and the ACQUIRE are executed by the same
CPU or task, or (b) the RELEASE and ACQUIRE act on the same variable.
The smp_mb__after_unlock_lock() primitive is free on many architectures.
Without smp_mb__after_unlock_lock(), the CPU's execution of the critical
sections corresponding to the RELEASE and the ACQUIRE can cross, so that:
*A = a; *A = a;
RELEASE M RELEASE M
...@@ -1894,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding the deadlock. ...@@ -1894,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding the deadlock.
a sleep-unlock race, but the locking primitive needs to resolve a sleep-unlock race, but the locking primitive needs to resolve
such races properly in any case. such races properly in any case.
With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
For example, with the following code, the store to *A will always be
seen by other CPUs before the store to *B:
*A = a;
RELEASE M
ACQUIRE N
smp_mb__after_unlock_lock();
*B = b;
The operations will always occur in one of the following orders:
STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B
STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B
ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B
If the RELEASE and ACQUIRE were instead both operating on the same lock
variable, only the first of these alternatives can occur. In addition,
the more strongly ordered systems may rule out some of the above orders.
But in any case, as noted earlier, the smp_mb__after_unlock_lock()
ensures that the store to *A will always be seen as happening before
the store to *B.
Locks and semaphores may not provide any guarantee of ordering on UP compiled Locks and semaphores may not provide any guarantee of ordering on UP compiled
systems, and so cannot be counted on in such a situation to actually achieve systems, and so cannot be counted on in such a situation to actually achieve
anything at all - especially with respect to I/O accesses - unless combined anything at all - especially with respect to I/O accesses - unless combined
...@@ -2126,12 +2104,12 @@ three CPUs; then should the following sequence of events occur: ...@@ -2126,12 +2104,12 @@ three CPUs; then should the following sequence of events occur:
CPU 1 CPU 2 CPU 1 CPU 2
=============================== =============================== =============================== ===============================
ACCESS_ONCE(*A) = a; ACCESS_ONCE(*E) = e; WRITE_ONCE(*A, a); WRITE_ONCE(*E, e);
ACQUIRE M ACQUIRE Q ACQUIRE M ACQUIRE Q
ACCESS_ONCE(*B) = b; ACCESS_ONCE(*F) = f; WRITE_ONCE(*B, b); WRITE_ONCE(*F, f);
ACCESS_ONCE(*C) = c; ACCESS_ONCE(*G) = g; WRITE_ONCE(*C, c); WRITE_ONCE(*G, g);
RELEASE M RELEASE Q RELEASE M RELEASE Q
ACCESS_ONCE(*D) = d; ACCESS_ONCE(*H) = h; WRITE_ONCE(*D, d); WRITE_ONCE(*H, h);
Then there is no guarantee as to what order CPU 3 will see the accesses to *A Then there is no guarantee as to what order CPU 3 will see the accesses to *A
through *H occur in, other than the constraints imposed by the separate locks through *H occur in, other than the constraints imposed by the separate locks
...@@ -2147,40 +2125,6 @@ But it won't see any of: ...@@ -2147,40 +2125,6 @@ But it won't see any of:
*E, *F or *G following RELEASE Q *E, *F or *G following RELEASE Q
However, if the following occurs:
CPU 1 CPU 2
=============================== ===============================
ACCESS_ONCE(*A) = a;
ACQUIRE M [1]
ACCESS_ONCE(*B) = b;
ACCESS_ONCE(*C) = c;
RELEASE M [1]
ACCESS_ONCE(*D) = d; ACCESS_ONCE(*E) = e;
ACQUIRE M [2]
smp_mb__after_unlock_lock();
ACCESS_ONCE(*F) = f;
ACCESS_ONCE(*G) = g;
RELEASE M [2]
ACCESS_ONCE(*H) = h;
CPU 3 might see:
*E, ACQUIRE M [1], *C, *B, *A, RELEASE M [1],
ACQUIRE M [2], *H, *F, *G, RELEASE M [2], *D
But assuming CPU 1 gets the lock first, CPU 3 won't see any of:
*B, *C, *D, *F, *G or *H preceding ACQUIRE M [1]
*A, *B or *C following RELEASE M [1]
*F, *G or *H preceding ACQUIRE M [2]
*A, *B, *C, *E, *F or *G following RELEASE M [2]
Note that the smp_mb__after_unlock_lock() is critically important
here: Without it CPU 3 might see some of the above orderings.
Without smp_mb__after_unlock_lock(), the accesses are not guaranteed
to be seen in order unless CPU 3 holds lock M.
ACQUIRES VS I/O ACCESSES ACQUIRES VS I/O ACCESSES
------------------------ ------------------------
...@@ -2881,11 +2825,11 @@ A programmer might take it for granted that the CPU will perform memory ...@@ -2881,11 +2825,11 @@ A programmer might take it for granted that the CPU will perform memory
operations in exactly the order specified, so that if the CPU is, for example, operations in exactly the order specified, so that if the CPU is, for example,
given the following piece of code to execute: given the following piece of code to execute:
a = ACCESS_ONCE(*A); a = READ_ONCE(*A);
ACCESS_ONCE(*B) = b; WRITE_ONCE(*B, b);
c = ACCESS_ONCE(*C); c = READ_ONCE(*C);
d = ACCESS_ONCE(*D); d = READ_ONCE(*D);
ACCESS_ONCE(*E) = e; WRITE_ONCE(*E, e);
they would then expect that the CPU will complete the memory operation for each they would then expect that the CPU will complete the memory operation for each
instruction before moving on to the next one, leading to a definite sequence of instruction before moving on to the next one, leading to a definite sequence of
...@@ -2932,12 +2876,12 @@ However, it is guaranteed that a CPU will be self-consistent: it will see its ...@@ -2932,12 +2876,12 @@ However, it is guaranteed that a CPU will be self-consistent: it will see its
_own_ accesses appear to be correctly ordered, without the need for a memory _own_ accesses appear to be correctly ordered, without the need for a memory
barrier. For instance with the following code: barrier. For instance with the following code:
U = ACCESS_ONCE(*A); U = READ_ONCE(*A);
ACCESS_ONCE(*A) = V; WRITE_ONCE(*A, V);
ACCESS_ONCE(*A) = W; WRITE_ONCE(*A, W);
X = ACCESS_ONCE(*A); X = READ_ONCE(*A);
ACCESS_ONCE(*A) = Y; WRITE_ONCE(*A, Y);
Z = ACCESS_ONCE(*A); Z = READ_ONCE(*A);
and assuming no intervention by an external influence, it can be assumed that and assuming no intervention by an external influence, it can be assumed that
the final result will appear to be: the final result will appear to be:
...@@ -2953,13 +2897,14 @@ accesses: ...@@ -2953,13 +2897,14 @@ accesses:
U=LOAD *A, STORE *A=V, STORE *A=W, X=LOAD *A, STORE *A=Y, Z=LOAD *A U=LOAD *A, STORE *A=V, STORE *A=W, X=LOAD *A, STORE *A=Y, Z=LOAD *A
in that order, but, without intervention, the sequence may have almost any in that order, but, without intervention, the sequence may have almost any
combination of elements combined or discarded, provided the program's view of combination of elements combined or discarded, provided the program's view
the world remains consistent. Note that ACCESS_ONCE() is -not- optional of the world remains consistent. Note that READ_ONCE() and WRITE_ONCE()
in the above example, as there are architectures where a given CPU might are -not- optional in the above example, as there are architectures
reorder successive loads to the same location. On such architectures, where a given CPU might reorder successive loads to the same location.
ACCESS_ONCE() does whatever is necessary to prevent this, for example, on On such architectures, READ_ONCE() and WRITE_ONCE() do whatever is
Itanium the volatile casts used by ACCESS_ONCE() cause GCC to emit the necessary to prevent this, for example, on Itanium the volatile casts
special ld.acq and st.rel instructions that prevent such reordering. used by READ_ONCE() and WRITE_ONCE() cause GCC to emit the special ld.acq
and st.rel instructions (respectively) that prevent such reordering.
The compiler may also combine, discard or defer elements of the sequence before The compiler may also combine, discard or defer elements of the sequence before
the CPU even sees them. the CPU even sees them.
...@@ -2973,13 +2918,14 @@ may be reduced to: ...@@ -2973,13 +2918,14 @@ may be reduced to:
*A = W; *A = W;
since, without either a write barrier or an ACCESS_ONCE(), it can be since, without either a write barrier or an WRITE_ONCE(), it can be
assumed that the effect of the storage of V to *A is lost. Similarly: assumed that the effect of the storage of V to *A is lost. Similarly:
*A = Y; *A = Y;
Z = *A; Z = *A;
may, without a memory barrier or an ACCESS_ONCE(), be reduced to: may, without a memory barrier or an READ_ONCE() and WRITE_ONCE(), be
reduced to:
*A = Y; *A = Y;
Z = Y; Z = Y;
......
...@@ -8518,7 +8518,7 @@ M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> ...@@ -8518,7 +8518,7 @@ M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
M: Josh Triplett <josh@joshtriplett.org> M: Josh Triplett <josh@joshtriplett.org>
R: Steven Rostedt <rostedt@goodmis.org> R: Steven Rostedt <rostedt@goodmis.org>
R: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> R: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
R: Lai Jiangshan <laijs@cn.fujitsu.com> R: Lai Jiangshan <jiangshanlai@gmail.com>
L: linux-kernel@vger.kernel.org L: linux-kernel@vger.kernel.org
S: Supported S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
...@@ -8545,7 +8545,7 @@ M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> ...@@ -8545,7 +8545,7 @@ M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
M: Josh Triplett <josh@joshtriplett.org> M: Josh Triplett <josh@joshtriplett.org>
R: Steven Rostedt <rostedt@goodmis.org> R: Steven Rostedt <rostedt@goodmis.org>
R: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> R: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
R: Lai Jiangshan <laijs@cn.fujitsu.com> R: Lai Jiangshan <jiangshanlai@gmail.com>
L: linux-kernel@vger.kernel.org L: linux-kernel@vger.kernel.org
W: http://www.rdrop.com/users/paulmck/RCU/ W: http://www.rdrop.com/users/paulmck/RCU/
S: Supported S: Supported
...@@ -9417,7 +9417,7 @@ F: include/linux/sl?b*.h ...@@ -9417,7 +9417,7 @@ F: include/linux/sl?b*.h
F: mm/sl?b* F: mm/sl?b*
SLEEPABLE READ-COPY UPDATE (SRCU) SLEEPABLE READ-COPY UPDATE (SRCU)
M: Lai Jiangshan <laijs@cn.fujitsu.com> M: Lai Jiangshan <jiangshanlai@gmail.com>
M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
M: Josh Triplett <josh@joshtriplett.org> M: Josh Triplett <josh@joshtriplett.org>
R: Steven Rostedt <rostedt@goodmis.org> R: Steven Rostedt <rostedt@goodmis.org>
......
...@@ -28,8 +28,6 @@ ...@@ -28,8 +28,6 @@
#include <asm/synch.h> #include <asm/synch.h>
#include <asm/ppc-opcode.h> #include <asm/ppc-opcode.h>
#define smp_mb__after_unlock_lock() smp_mb() /* Full ordering for lock. */
#ifdef CONFIG_PPC64 #ifdef CONFIG_PPC64
/* use 0x800000yy when locked, where yy == CPU number */ /* use 0x800000yy when locked, where yy == CPU number */
#ifdef __BIG_ENDIAN__ #ifdef __BIG_ENDIAN__
......
...@@ -54,9 +54,9 @@ static DEFINE_MUTEX(mce_chrdev_read_mutex); ...@@ -54,9 +54,9 @@ static DEFINE_MUTEX(mce_chrdev_read_mutex);
#define rcu_dereference_check_mce(p) \ #define rcu_dereference_check_mce(p) \
({ \ ({ \
rcu_lockdep_assert(rcu_read_lock_sched_held() || \ RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() && \
lockdep_is_held(&mce_chrdev_read_mutex), \ !lockdep_is_held(&mce_chrdev_read_mutex), \
"suspicious rcu_dereference_check_mce() usage"); \ "suspicious rcu_dereference_check_mce() usage"); \
smp_load_acquire(&(p)); \ smp_load_acquire(&(p)); \
}) })
......
...@@ -136,7 +136,7 @@ enum ctx_state ist_enter(struct pt_regs *regs) ...@@ -136,7 +136,7 @@ enum ctx_state ist_enter(struct pt_regs *regs)
preempt_count_add(HARDIRQ_OFFSET); preempt_count_add(HARDIRQ_OFFSET);
/* This code is a bit fragile. Test it. */ /* This code is a bit fragile. Test it. */
rcu_lockdep_assert(rcu_is_watching(), "ist_enter didn't work"); RCU_LOCKDEP_WARN(!rcu_is_watching(), "ist_enter didn't work");
return prev_state; return prev_state;
} }
......
...@@ -110,8 +110,8 @@ static DEFINE_MUTEX(dev_opp_list_lock); ...@@ -110,8 +110,8 @@ static DEFINE_MUTEX(dev_opp_list_lock);
#define opp_rcu_lockdep_assert() \ #define opp_rcu_lockdep_assert() \
do { \ do { \
rcu_lockdep_assert(rcu_read_lock_held() || \ RCU_LOCKDEP_WARN(!rcu_read_lock_held() && \
lockdep_is_held(&dev_opp_list_lock), \ !lockdep_is_held(&dev_opp_list_lock), \
"Missing rcu_read_lock() or " \ "Missing rcu_read_lock() or " \
"dev_opp_list_lock protection"); \ "dev_opp_list_lock protection"); \
} while (0) } while (0)
......
...@@ -86,8 +86,8 @@ static inline struct file *__fcheck_files(struct files_struct *files, unsigned i ...@@ -86,8 +86,8 @@ static inline struct file *__fcheck_files(struct files_struct *files, unsigned i
static inline struct file *fcheck_files(struct files_struct *files, unsigned int fd) static inline struct file *fcheck_files(struct files_struct *files, unsigned int fd)
{ {
rcu_lockdep_assert(rcu_read_lock_held() || RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
lockdep_is_held(&files->file_lock), !lockdep_is_held(&files->file_lock),
"suspicious rcu_dereference_check() usage"); "suspicious rcu_dereference_check() usage");
return __fcheck_files(files, fd); return __fcheck_files(files, fd);
} }
......
...@@ -226,6 +226,37 @@ struct rcu_synchronize { ...@@ -226,6 +226,37 @@ struct rcu_synchronize {
}; };
void wakeme_after_rcu(struct rcu_head *head); void wakeme_after_rcu(struct rcu_head *head);
void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array,
struct rcu_synchronize *rs_array);
#define _wait_rcu_gp(checktiny, ...) \
do { \
call_rcu_func_t __crcu_array[] = { __VA_ARGS__ }; \
const int __n = ARRAY_SIZE(__crcu_array); \
struct rcu_synchronize __rs_array[__n]; \
\
__wait_rcu_gp(checktiny, __n, __crcu_array, __rs_array); \
} while (0)
#define wait_rcu_gp(...) _wait_rcu_gp(false, __VA_ARGS__)
/**
* synchronize_rcu_mult - Wait concurrently for multiple grace periods
* @...: List of call_rcu() functions for the flavors to wait on.
*
* This macro waits concurrently for multiple flavors of RCU grace periods.
* For example, synchronize_rcu_mult(call_rcu, call_rcu_bh) would wait
* on concurrent RCU and RCU-bh grace periods. Waiting on a give SRCU
* domain requires you to write a wrapper function for that SRCU domain's
* call_srcu() function, supplying the corresponding srcu_struct.
*
* If Tiny RCU, tell _wait_rcu_gp() not to bother waiting for RCU
* or RCU-bh, given that anywhere synchronize_rcu_mult() can be called
* is automatically a grace period.
*/
#define synchronize_rcu_mult(...) \
_wait_rcu_gp(IS_ENABLED(CONFIG_TINY_RCU), __VA_ARGS__)
/** /**
* call_rcu_tasks() - Queue an RCU for invocation task-based grace period * call_rcu_tasks() - Queue an RCU for invocation task-based grace period
* @head: structure to be used for queueing the RCU updates. * @head: structure to be used for queueing the RCU updates.
...@@ -309,7 +340,7 @@ static inline void rcu_sysrq_end(void) ...@@ -309,7 +340,7 @@ static inline void rcu_sysrq_end(void)
} }
#endif /* #else #ifdef CONFIG_RCU_STALL_COMMON */ #endif /* #else #ifdef CONFIG_RCU_STALL_COMMON */
#ifdef CONFIG_RCU_USER_QS #ifdef CONFIG_NO_HZ_FULL
void rcu_user_enter(void); void rcu_user_enter(void);
void rcu_user_exit(void); void rcu_user_exit(void);
#else #else
...@@ -317,7 +348,7 @@ static inline void rcu_user_enter(void) { } ...@@ -317,7 +348,7 @@ static inline void rcu_user_enter(void) { }
static inline void rcu_user_exit(void) { } static inline void rcu_user_exit(void) { }
static inline void rcu_user_hooks_switch(struct task_struct *prev, static inline void rcu_user_hooks_switch(struct task_struct *prev,
struct task_struct *next) { } struct task_struct *next) { }
#endif /* CONFIG_RCU_USER_QS */ #endif /* CONFIG_NO_HZ_FULL */
#ifdef CONFIG_RCU_NOCB_CPU #ifdef CONFIG_RCU_NOCB_CPU
void rcu_init_nohz(void); void rcu_init_nohz(void);
...@@ -392,10 +423,6 @@ bool __rcu_is_watching(void); ...@@ -392,10 +423,6 @@ bool __rcu_is_watching(void);
* TREE_RCU and rcu_barrier_() primitives in TINY_RCU. * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
*/ */
typedef void call_rcu_func_t(struct rcu_head *head,
void (*func)(struct rcu_head *head));
void wait_rcu_gp(call_rcu_func_t crf);
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
#include <linux/rcutree.h> #include <linux/rcutree.h>
#elif defined(CONFIG_TINY_RCU) #elif defined(CONFIG_TINY_RCU)
...@@ -469,46 +496,10 @@ int rcu_read_lock_bh_held(void); ...@@ -469,46 +496,10 @@ int rcu_read_lock_bh_held(void);
* If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an * If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an
* RCU-sched read-side critical section. In absence of * RCU-sched read-side critical section. In absence of
* CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side * CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side
* critical section unless it can prove otherwise. Note that disabling * critical section unless it can prove otherwise.
* of preemption (including disabling irqs) counts as an RCU-sched
* read-side critical section. This is useful for debug checks in functions
* that required that they be called within an RCU-sched read-side
* critical section.
*
* Check debug_lockdep_rcu_enabled() to prevent false positives during boot
* and while lockdep is disabled.
*
* Note that if the CPU is in the idle loop from an RCU point of
* view (ie: that we are in the section between rcu_idle_enter() and
* rcu_idle_exit()) then rcu_read_lock_held() returns false even if the CPU
* did an rcu_read_lock(). The reason for this is that RCU ignores CPUs
* that are in such a section, considering these as in extended quiescent
* state, so such a CPU is effectively never in an RCU read-side critical
* section regardless of what RCU primitives it invokes. This state of
* affairs is required --- we need to keep an RCU-free window in idle
* where the CPU may possibly enter into low power mode. This way we can
* notice an extended quiescent state to other CPUs that started a grace
* period. Otherwise we would delay any grace period as long as we run in
* the idle task.
*
* Similarly, we avoid claiming an SRCU read lock held if the current
* CPU is offline.
*/ */
#ifdef CONFIG_PREEMPT_COUNT #ifdef CONFIG_PREEMPT_COUNT
static inline int rcu_read_lock_sched_held(void) int rcu_read_lock_sched_held(void);
{
int lockdep_opinion = 0;
if (!debug_lockdep_rcu_enabled())
return 1;
if (!rcu_is_watching())
return 0;
if (!rcu_lockdep_current_cpu_online())
return 0;
if (debug_locks)
lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
return lockdep_opinion || preempt_count() != 0 || irqs_disabled();
}
#else /* #ifdef CONFIG_PREEMPT_COUNT */ #else /* #ifdef CONFIG_PREEMPT_COUNT */
static inline int rcu_read_lock_sched_held(void) static inline int rcu_read_lock_sched_held(void)
{ {
...@@ -545,6 +536,11 @@ static inline int rcu_read_lock_sched_held(void) ...@@ -545,6 +536,11 @@ static inline int rcu_read_lock_sched_held(void)
#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */ #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
/* Deprecate rcu_lockdep_assert(): Use RCU_LOCKDEP_WARN() instead. */
static inline void __attribute((deprecated)) deprecate_rcu_lockdep_assert(void)
{
}
#ifdef CONFIG_PROVE_RCU #ifdef CONFIG_PROVE_RCU
/** /**
...@@ -555,17 +551,32 @@ static inline int rcu_read_lock_sched_held(void) ...@@ -555,17 +551,32 @@ static inline int rcu_read_lock_sched_held(void)
#define rcu_lockdep_assert(c, s) \ #define rcu_lockdep_assert(c, s) \
do { \ do { \
static bool __section(.data.unlikely) __warned; \ static bool __section(.data.unlikely) __warned; \
deprecate_rcu_lockdep_assert(); \
if (debug_lockdep_rcu_enabled() && !__warned && !(c)) { \ if (debug_lockdep_rcu_enabled() && !__warned && !(c)) { \
__warned = true; \ __warned = true; \
lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
} \ } \
} while (0) } while (0)
/**
* RCU_LOCKDEP_WARN - emit lockdep splat if specified condition is met
* @c: condition to check
* @s: informative message
*/
#define RCU_LOCKDEP_WARN(c, s) \
do { \
static bool __section(.data.unlikely) __warned; \
if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \
__warned = true; \
lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
} \
} while (0)
#if defined(CONFIG_PROVE_RCU) && !defined(CONFIG_PREEMPT_RCU) #if defined(CONFIG_PROVE_RCU) && !defined(CONFIG_PREEMPT_RCU)
static inline void rcu_preempt_sleep_check(void) static inline void rcu_preempt_sleep_check(void)
{ {
rcu_lockdep_assert(!lock_is_held(&rcu_lock_map), RCU_LOCKDEP_WARN(lock_is_held(&rcu_lock_map),
"Illegal context switch in RCU read-side critical section"); "Illegal context switch in RCU read-side critical section");
} }
#else /* #ifdef CONFIG_PROVE_RCU */ #else /* #ifdef CONFIG_PROVE_RCU */
static inline void rcu_preempt_sleep_check(void) static inline void rcu_preempt_sleep_check(void)
...@@ -576,15 +587,16 @@ static inline void rcu_preempt_sleep_check(void) ...@@ -576,15 +587,16 @@ static inline void rcu_preempt_sleep_check(void)
#define rcu_sleep_check() \ #define rcu_sleep_check() \
do { \ do { \
rcu_preempt_sleep_check(); \ rcu_preempt_sleep_check(); \
rcu_lockdep_assert(!lock_is_held(&rcu_bh_lock_map), \ RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map), \
"Illegal context switch in RCU-bh read-side critical section"); \ "Illegal context switch in RCU-bh read-side critical section"); \
rcu_lockdep_assert(!lock_is_held(&rcu_sched_lock_map), \ RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map), \
"Illegal context switch in RCU-sched read-side critical section"); \ "Illegal context switch in RCU-sched read-side critical section"); \
} while (0) } while (0)
#else /* #ifdef CONFIG_PROVE_RCU */ #else /* #ifdef CONFIG_PROVE_RCU */
#define rcu_lockdep_assert(c, s) do { } while (0) #define rcu_lockdep_assert(c, s) deprecate_rcu_lockdep_assert()
#define RCU_LOCKDEP_WARN(c, s) do { } while (0)
#define rcu_sleep_check() do { } while (0) #define rcu_sleep_check() do { } while (0)
#endif /* #else #ifdef CONFIG_PROVE_RCU */ #endif /* #else #ifdef CONFIG_PROVE_RCU */
...@@ -615,13 +627,13 @@ static inline void rcu_preempt_sleep_check(void) ...@@ -615,13 +627,13 @@ static inline void rcu_preempt_sleep_check(void)
({ \ ({ \
/* Dependency order vs. p above. */ \ /* Dependency order vs. p above. */ \
typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \ typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
rcu_lockdep_assert(c, "suspicious rcu_dereference_check() usage"); \ RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \
rcu_dereference_sparse(p, space); \ rcu_dereference_sparse(p, space); \
((typeof(*p) __force __kernel *)(________p1)); \ ((typeof(*p) __force __kernel *)(________p1)); \
}) })
#define __rcu_dereference_protected(p, c, space) \ #define __rcu_dereference_protected(p, c, space) \
({ \ ({ \
rcu_lockdep_assert(c, "suspicious rcu_dereference_protected() usage"); \ RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_protected() usage"); \
rcu_dereference_sparse(p, space); \ rcu_dereference_sparse(p, space); \
((typeof(*p) __force __kernel *)(p)); \ ((typeof(*p) __force __kernel *)(p)); \
}) })
...@@ -845,8 +857,8 @@ static inline void rcu_read_lock(void) ...@@ -845,8 +857,8 @@ static inline void rcu_read_lock(void)
__rcu_read_lock(); __rcu_read_lock();
__acquire(RCU); __acquire(RCU);
rcu_lock_acquire(&rcu_lock_map); rcu_lock_acquire(&rcu_lock_map);
rcu_lockdep_assert(rcu_is_watching(), RCU_LOCKDEP_WARN(!rcu_is_watching(),
"rcu_read_lock() used illegally while idle"); "rcu_read_lock() used illegally while idle");
} }
/* /*
...@@ -896,8 +908,8 @@ static inline void rcu_read_lock(void) ...@@ -896,8 +908,8 @@ static inline void rcu_read_lock(void)
*/ */
static inline void rcu_read_unlock(void) static inline void rcu_read_unlock(void)
{ {
rcu_lockdep_assert(rcu_is_watching(), RCU_LOCKDEP_WARN(!rcu_is_watching(),
"rcu_read_unlock() used illegally while idle"); "rcu_read_unlock() used illegally while idle");
__release(RCU); __release(RCU);
__rcu_read_unlock(); __rcu_read_unlock();
rcu_lock_release(&rcu_lock_map); /* Keep acq info for rls diags. */ rcu_lock_release(&rcu_lock_map); /* Keep acq info for rls diags. */
...@@ -925,8 +937,8 @@ static inline void rcu_read_lock_bh(void) ...@@ -925,8 +937,8 @@ static inline void rcu_read_lock_bh(void)
local_bh_disable(); local_bh_disable();
__acquire(RCU_BH); __acquire(RCU_BH);
rcu_lock_acquire(&rcu_bh_lock_map); rcu_lock_acquire(&rcu_bh_lock_map);
rcu_lockdep_assert(rcu_is_watching(), RCU_LOCKDEP_WARN(!rcu_is_watching(),
"rcu_read_lock_bh() used illegally while idle"); "rcu_read_lock_bh() used illegally while idle");
} }
/* /*
...@@ -936,8 +948,8 @@ static inline void rcu_read_lock_bh(void) ...@@ -936,8 +948,8 @@ static inline void rcu_read_lock_bh(void)
*/ */
static inline void rcu_read_unlock_bh(void) static inline void rcu_read_unlock_bh(void)
{ {
rcu_lockdep_assert(rcu_is_watching(), RCU_LOCKDEP_WARN(!rcu_is_watching(),
"rcu_read_unlock_bh() used illegally while idle"); "rcu_read_unlock_bh() used illegally while idle");
rcu_lock_release(&rcu_bh_lock_map); rcu_lock_release(&rcu_bh_lock_map);
__release(RCU_BH); __release(RCU_BH);
local_bh_enable(); local_bh_enable();
...@@ -961,8 +973,8 @@ static inline void rcu_read_lock_sched(void) ...@@ -961,8 +973,8 @@ static inline void rcu_read_lock_sched(void)
preempt_disable(); preempt_disable();
__acquire(RCU_SCHED); __acquire(RCU_SCHED);
rcu_lock_acquire(&rcu_sched_lock_map); rcu_lock_acquire(&rcu_sched_lock_map);
rcu_lockdep_assert(rcu_is_watching(), RCU_LOCKDEP_WARN(!rcu_is_watching(),
"rcu_read_lock_sched() used illegally while idle"); "rcu_read_lock_sched() used illegally while idle");
} }
/* Used by lockdep and tracing: cannot be traced, cannot call lockdep. */ /* Used by lockdep and tracing: cannot be traced, cannot call lockdep. */
...@@ -979,8 +991,8 @@ static inline notrace void rcu_read_lock_sched_notrace(void) ...@@ -979,8 +991,8 @@ static inline notrace void rcu_read_lock_sched_notrace(void)
*/ */
static inline void rcu_read_unlock_sched(void) static inline void rcu_read_unlock_sched(void)
{ {
rcu_lockdep_assert(rcu_is_watching(), RCU_LOCKDEP_WARN(!rcu_is_watching(),
"rcu_read_unlock_sched() used illegally while idle"); "rcu_read_unlock_sched() used illegally while idle");
rcu_lock_release(&rcu_sched_lock_map); rcu_lock_release(&rcu_sched_lock_map);
__release(RCU_SCHED); __release(RCU_SCHED);
preempt_enable(); preempt_enable();
...@@ -1031,7 +1043,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) ...@@ -1031,7 +1043,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
#define RCU_INIT_POINTER(p, v) \ #define RCU_INIT_POINTER(p, v) \
do { \ do { \
rcu_dereference_sparse(p, __rcu); \ rcu_dereference_sparse(p, __rcu); \
p = RCU_INITIALIZER(v); \ WRITE_ONCE(p, RCU_INITIALIZER(v)); \
} while (0) } while (0)
/** /**
......
...@@ -37,6 +37,16 @@ static inline void cond_synchronize_rcu(unsigned long oldstate) ...@@ -37,6 +37,16 @@ static inline void cond_synchronize_rcu(unsigned long oldstate)
might_sleep(); might_sleep();
} }
static inline unsigned long get_state_synchronize_sched(void)
{
return 0;
}
static inline void cond_synchronize_sched(unsigned long oldstate)
{
might_sleep();
}
static inline void rcu_barrier_bh(void) static inline void rcu_barrier_bh(void)
{ {
wait_rcu_gp(call_rcu_bh); wait_rcu_gp(call_rcu_bh);
......
...@@ -76,6 +76,8 @@ void rcu_barrier_bh(void); ...@@ -76,6 +76,8 @@ void rcu_barrier_bh(void);
void rcu_barrier_sched(void); void rcu_barrier_sched(void);
unsigned long get_state_synchronize_rcu(void); unsigned long get_state_synchronize_rcu(void);
void cond_synchronize_rcu(unsigned long oldstate); void cond_synchronize_rcu(unsigned long oldstate);
unsigned long get_state_synchronize_sched(void);
void cond_synchronize_sched(unsigned long oldstate);
extern unsigned long rcutorture_testseq; extern unsigned long rcutorture_testseq;
extern unsigned long rcutorture_vernum; extern unsigned long rcutorture_vernum;
......
...@@ -130,16 +130,6 @@ do { \ ...@@ -130,16 +130,6 @@ do { \
#define smp_mb__before_spinlock() smp_wmb() #define smp_mb__before_spinlock() smp_wmb()
#endif #endif
/*
* Place this after a lock-acquisition primitive to guarantee that
* an UNLOCK+LOCK pair act as a full barrier. This guarantee applies
* if the UNLOCK and LOCK are executed by the same CPU or if the
* UNLOCK and LOCK operate on the same lock variable.
*/
#ifndef smp_mb__after_unlock_lock
#define smp_mb__after_unlock_lock() do { } while (0)
#endif
/** /**
* raw_spin_unlock_wait - wait until the spinlock gets unlocked * raw_spin_unlock_wait - wait until the spinlock gets unlocked
* @lock: the spinlock in question. * @lock: the spinlock in question.
......
...@@ -212,6 +212,9 @@ struct callback_head { ...@@ -212,6 +212,9 @@ struct callback_head {
}; };
#define rcu_head callback_head #define rcu_head callback_head
typedef void (*rcu_callback_t)(struct rcu_head *head);
typedef void (*call_rcu_func_t)(struct rcu_head *head, rcu_callback_t func);
/* clocksource cycle base type */ /* clocksource cycle base type */
typedef u64 cycle_t; typedef u64 cycle_t;
......
...@@ -661,7 +661,6 @@ TRACE_EVENT(rcu_torture_read, ...@@ -661,7 +661,6 @@ TRACE_EVENT(rcu_torture_read,
* Tracepoint for _rcu_barrier() execution. The string "s" describes * Tracepoint for _rcu_barrier() execution. The string "s" describes
* the _rcu_barrier phase: * the _rcu_barrier phase:
* "Begin": _rcu_barrier() started. * "Begin": _rcu_barrier() started.
* "Check": _rcu_barrier() checking for piggybacking.
* "EarlyExit": _rcu_barrier() piggybacked, thus early exit. * "EarlyExit": _rcu_barrier() piggybacked, thus early exit.
* "Inc1": _rcu_barrier() piggyback check counter incremented. * "Inc1": _rcu_barrier() piggyback check counter incremented.
* "OfflineNoCB": _rcu_barrier() found callback on never-online CPU * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU
......
...@@ -538,15 +538,6 @@ config RCU_STALL_COMMON ...@@ -538,15 +538,6 @@ config RCU_STALL_COMMON
config CONTEXT_TRACKING config CONTEXT_TRACKING
bool bool
config RCU_USER_QS
bool
help
This option sets hooks on kernel / userspace boundaries and
puts RCU in extended quiescent state when the CPU runs in
userspace. It means that when a CPU runs in userspace, it is
excluded from the global RCU state machine and thus doesn't
try to keep the timer tick on for RCU.
config CONTEXT_TRACKING_FORCE config CONTEXT_TRACKING_FORCE
bool "Force context tracking" bool "Force context tracking"
depends on CONTEXT_TRACKING depends on CONTEXT_TRACKING
...@@ -707,6 +698,7 @@ config RCU_BOOST_DELAY ...@@ -707,6 +698,7 @@ config RCU_BOOST_DELAY
config RCU_NOCB_CPU config RCU_NOCB_CPU
bool "Offload RCU callback processing from boot-selected CPUs" bool "Offload RCU callback processing from boot-selected CPUs"
depends on TREE_RCU || PREEMPT_RCU depends on TREE_RCU || PREEMPT_RCU
depends on RCU_EXPERT || NO_HZ_FULL
default n default n
help help
Use this option to reduce OS jitter for aggressive HPC or Use this option to reduce OS jitter for aggressive HPC or
......
...@@ -107,8 +107,8 @@ static DEFINE_SPINLOCK(release_agent_path_lock); ...@@ -107,8 +107,8 @@ static DEFINE_SPINLOCK(release_agent_path_lock);
struct percpu_rw_semaphore cgroup_threadgroup_rwsem; struct percpu_rw_semaphore cgroup_threadgroup_rwsem;
#define cgroup_assert_mutex_or_rcu_locked() \ #define cgroup_assert_mutex_or_rcu_locked() \
rcu_lockdep_assert(rcu_read_lock_held() || \ RCU_LOCKDEP_WARN(!rcu_read_lock_held() && \
lockdep_is_held(&cgroup_mutex), \ !lockdep_is_held(&cgroup_mutex), \
"cgroup_mutex or RCU read lock required"); "cgroup_mutex or RCU read lock required");
/* /*
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment