1. 22 Apr, 2013 1 commit
  2. 21 Apr, 2013 1 commit
    • Paul E. McKenney's avatar
      events: Protect access via task_subsys_state_check() · c79aa0d9
      Paul E. McKenney authored
      
      The following RCU splat indicates lack of RCU protection:
      
      [  953.267649] ===============================
      [  953.267652] [ INFO: suspicious RCU usage. ]
      [  953.267657] 3.9.0-0.rc6.git2.4.fc19.ppc64p7 #1 Not tainted
      [  953.267661] -------------------------------
      [  953.267664] include/linux/cgroup.h:534 suspicious rcu_dereference_check() usage!
      [  953.267669]
      [  953.267669] other info that might help us debug this:
      [  953.267669]
      [  953.267675]
      [  953.267675] rcu_scheduler_active = 1, debug_locks = 0
      [  953.267680] 1 lock held by glxgears/1289:
      [  953.267683]  #0:  (&sig->cred_guard_mutex){+.+.+.}, at: [<c00000000027f884>] .prepare_bprm_creds+0x34/0xa0
      [  953.267700]
      [  953.267700] stack backtrace:
      [  953.267704] Call Trace:
      [  953.267709] [c0000001f0d1b6e0] [c000000000016e30] .show_stack+0x130/0x200 (unreliable)
      [  953.267717] [c0000001f0d1b7b0] [c0000000001267f8] .lockdep_rcu_suspicious+0x138/0x180
      [  953.267724] [c0000001f0d1b840] [c0000000001d43a4] .perf_event_comm+0x4c4/0x690
      [  953.267731] [c0000001f0d1b950] [c00000000027f6e4] .set_task_comm+0x84/0x1f0
      [  953.267737] [c0000001f0d1b9f0] [c000000000280414] .setup_new_exec+0x94/0x220
      [  953.267744] [c0000001f0d1ba70] [c0000000002f665c] .load_elf_binary+0x58c/0x19b0
      ...
      
      This commit therefore adds the required RCU read-side critical
      section to perf_event_comm().
      Reported-by: default avatarAdam Jackson <ajax@redhat.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: a.p.zijlstra@chello.nl
      Cc: paulus@samba.org
      Cc: acme@ghostprotocols.net
      Link: http://lkml.kernel.org/r/20130419190124.GA8638@linux.vnet.ibm.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Tested-by: default avatarGustavo Luiz Duarte <gusld@br.ibm.com>
      c79aa0d9
  3. 18 Apr, 2013 2 commits
  4. 17 Apr, 2013 4 commits
  5. 15 Apr, 2013 4 commits
  6. 14 Apr, 2013 1 commit
  7. 12 Apr, 2013 4 commits
  8. 09 Apr, 2013 1 commit
  9. 08 Apr, 2013 10 commits
    • Huacai Chen's avatar
      PM / reboot: call syscore_shutdown() after disable_nonboot_cpus() · 6f389a8f
      Huacai Chen authored
      As commit 40dc166c (PM / Core: Introduce struct syscore_ops for core
      subsystems PM) say, syscore_ops operations should be carried with one
      CPU on-line and interrupts disabled. However, after commit f96972f2
      
      
      (kernel/sys.c: call disable_nonboot_cpus() in kernel_restart()),
      syscore_shutdown() is called before disable_nonboot_cpus(), so break
      the rules. We have a MIPS machine with a 8259A PIC, and there is an
      external timer (HPET) linked at 8259A. Since 8259A has been shutdown
      too early (by syscore_shutdown()), disable_nonboot_cpus() runs without
      timer interrupt, so it hangs and reboot fails. This patch call
      syscore_shutdown() a little later (after disable_nonboot_cpus()) to
      avoid reboot failure, this is the same way as poweroff does.
      
      For consistency, add disable_nonboot_cpus() to kernel_halt().
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6f389a8f
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Do not call stub functions in control loop · 395b97a3
      Steven Rostedt (Red Hat) authored
      The function tracing control loop used by perf spits out a warning
      if the called function is not a control function. This is because
      the control function references a per cpu allocated data structure
      on struct ftrace_ops that is not allocated for other types of
      functions.
      
      commit 0a016409 "ftrace: Optimize the function tracer list loop"
      
      Had an optimization done to all function tracing loops to optimize
      for a single registered ops. Unfortunately, this allows for a slight
      race when tracing starts or ends, where the stub function might be
      called after the current registered ops is removed. In this case we
      get the following dump:
      
      root# perf stat -e ftrace:function sleep 1
      [   74.339105] WARNING: at include/linux/ftrace.h:209 ftrace_ops_control_func+0xde/0xf0()
      [   74.349522] Hardware name: PRIMERGY RX200 S6
      [   74.357149] Modules linked in: sg igb iTCO_wdt ptp pps_core iTCO_vendor_support i7core_edac dca lpc_ich i2c_i801 coretemp edac_core crc32c_intel mfd_core ghash_clmulni_intel dm_multipath acpi_power_meter pcspk
      r microcode vhost_net tun macvtap macvlan nfsd kvm_intel kvm auth_rpcgss nfs_acl lockd sunrpc uinput xfs libcrc32c sd_mod crc_t10dif sr_mod cdrom mgag200 i2c_algo_bit drm_kms_helper ttm qla2xxx mptsas ahci drm li
      bahci scsi_transport_sas mptscsih libata scsi_transport_fc i2c_core mptbase scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
      [   74.446233] Pid: 1377, comm: perf Tainted: G        W    3.9.0-rc1 #1
      [   74.453458] Call Trace:
      [   74.456233]  [<ffffffff81062e3f>] warn_slowpath_common+0x7f/0xc0
      [   74.462997]  [<ffffffff810fbc60>] ? rcu_note_context_switch+0xa0/0xa0
      [   74.470272]  [<ffffffff811041a2>] ? __unregister_ftrace_function+0xa2/0x1a0
      [   74.478117]  [<ffffffff81062e9a>] warn_slowpath_null+0x1a/0x20
      [   74.484681]  [<ffffffff81102ede>] ftrace_ops_control_func+0xde/0xf0
      [   74.491760]  [<ffffffff8162f400>] ftrace_call+0x5/0x2f
      [   74.497511]  [<ffffffff8162f400>] ? ftrace_call+0x5/0x2f
      [   74.503486]  [<ffffffff8162f400>] ? ftrace_call+0x5/0x2f
      [   74.509500]  [<ffffffff810fbc65>] ? synchronize_sched+0x5/0x50
      [   74.516088]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
      [   74.522268]  [<ffffffff810fbc65>] ? synchronize_sched+0x5/0x50
      [   74.528837]  [<ffffffff811041a2>] ? __unregister_ftrace_function+0xa2/0x1a0
      [   74.536696]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
      [   74.542878]  [<ffffffff8162402d>] ? mutex_lock+0x1d/0x50
      [   74.548869]  [<ffffffff81105c67>] unregister_ftrace_function+0x27/0x50
      [   74.556243]  [<ffffffff8111eadf>] perf_ftrace_event_register+0x9f/0x140
      [   74.563709]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
      [   74.569887]  [<ffffffff8162402d>] ? mutex_lock+0x1d/0x50
      [   74.575898]  [<ffffffff8111e94e>] perf_trace_destroy+0x2e/0x50
      [   74.582505]  [<ffffffff81127ba9>] tp_perf_event_destroy+0x9/0x10
      [   74.589298]  [<ffffffff811295d0>] free_event+0x70/0x1a0
      [   74.595208]  [<ffffffff8112a579>] perf_event_release_kernel+0x69/0xa0
      [   74.602460]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
      [   74.608667]  [<ffffffff8112a640>] put_event+0x90/0xc0
      [   74.614373]  [<ffffffff8112a740>] perf_release+0x10/0x20
      [   74.620367]  [<ffffffff811a3044>] __fput+0xf4/0x280
      [   74.625894]  [<ffffffff811a31de>] ____fput+0xe/0x10
      [   74.631387]  [<ffffffff81083697>] task_work_run+0xa7/0xe0
      [   74.637452]  [<ffffffff81014981>] do_notify_resume+0x71/0xb0
      [   74.643843]  [<ffffffff8162fa92>] int_signal+0x12/0x17
      
      To fix this a new ftrace_ops flag is added that denotes the ftrace_list_end
      ftrace_ops stub as just that, a stub. This flag is now checked in the
      control loop and the function is not called if the flag is set.
      
      Thanks to Jovi for not just reporting the bug, but also pointing out
      where the bug was in the code.
      
      Link: http://lkml.kernel.org/r/514A8855.7090402@redhat.com
      Link: http://lkml.kernel.org/r/1364377499-1900-15-git-send-email-jovi.zhangwei@huawei.com
      
      Tested-by: default avatarWANG Chao <chaowang@redhat.com>
      Reported-by: default avatarWANG Chao <chaowang@redhat.com>
      Reported-by: default avatarzhangwei(Jovi) <jovi.zhangwei@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      395b97a3
    • Jan Kiszka's avatar
      ftrace: Consistently restore trace function on sysctl enabling · 5000c418
      Jan Kiszka authored
      If we reenable ftrace via syctl, we currently set ftrace_trace_function
      based on the previous simplistic algorithm. This is inconsistent with
      what update_ftrace_function does. So better call that helper instead.
      
      Link: http://lkml.kernel.org/r/5151D26F.1070702@siemens.com
      
      
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      5000c418
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix race with update_max_tr_single and changing tracers · 2930e04d
      Steven Rostedt (Red Hat) authored
      The commit 34600f0e
      
       "tracing: Fix race with max_tr and changing tracers"
      fixed the updating of the main buffers with the race of changing
      tracers, but left out the fix to the updating of just a per cpu buffer.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      2930e04d
    • Stanislaw Gruszka's avatar
      sched/cputime: Fix accounting on multi-threaded processes · e614b333
      Stanislaw Gruszka authored
      Recent commit 6fac4829
      
       ("cputime: Use accessors to read task
      cputime stats") introduced a bug, where we account many times
      the cputime of the first thread, instead of cputimes of all
      the different threads.
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20130404085740.GA2495@redhat.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e614b333
    • Chen Gang's avatar
      ftrace: Fix strncpy() use, use strlcpy() instead of strncpy() · 75761cc1
      Chen Gang authored
      
      For NUL terminated string we always need to set '\0' at the end.
      Signed-off-by: default avatarChen Gang <gang.chen@asianux.com>
      Cc: rostedt@goodmis.org
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/516243B7.9020405@asianux.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      75761cc1
    • Chen Gang's avatar
      perf: Fix strncpy() use, use strlcpy() instead of strncpy() · 67012ab1
      Chen Gang authored
      
      For NUL terminated string we always need to set '\0' at the end.
      Signed-off-by: default avatarChen Gang <gang.chen@asianux.com>
      Cc: rostedt@goodmis.org
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/51624254.30301@asianux.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      67012ab1
    • Chen Gang's avatar
      perf: Fix strncpy() use, always make sure it's NUL terminated · c97847d2
      Chen Gang authored
      
      For NUL terminated string, always make sure that there's '\0' at the end.
      
      In our case we need a return value, so still use strncpy() and
      fix up the tail explicitly.
      
      (strlcpy() returns the size, not the pointer)
      Signed-off-by: default avatarChen Gang <gang.chen@asianux.com>
      Cc: a.p.zijlstra@chello.nl <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org <paulus@samba.org>
      Cc: acme@ghostprotocols.net <acme@ghostprotocols.net>
      Link: http://lkml.kernel.org/r/51623E0B.7070101@asianux.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c97847d2
    • libin's avatar
      sched/debug: Fix sd->*_idx limit range avoiding overflow · fd9b86d3
      libin authored
      Commit 201c373e
      
       ("sched/debug: Limit sd->*_idx range on
      sysctl") was an incomplete bug fix.
      
      This patch fixes sd->*_idx limit range to [0 ~ CPU_LOAD_IDX_MAX-1]
      avoiding array overflow caused by setting sd->*_idx to CPU_LOAD_IDX_MAX
      on sysctl.
      Signed-off-by: default avatarLibin <huawei.libin@huawei.com>
      Cc: <jiang.liu@huawei.com>
      Cc: <guohanjun@huawei.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/51626610.2040607@huawei.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fd9b86d3
    • Thomas Gleixner's avatar
      sched_clock: Prevent 64bit inatomicity on 32bit systems · a1cbcaa9
      Thomas Gleixner authored
      
      The sched_clock_remote() implementation has the following inatomicity
      problem on 32bit systems when accessing the remote scd->clock, which
      is a 64bit value.
      
      CPU0			CPU1
      
      sched_clock_local()	sched_clock_remote(CPU0)
      ...
      			remote_clock = scd[CPU0]->clock
      			    read_low32bit(scd[CPU0]->clock)
      cmpxchg64(scd->clock,...)
      			    read_high32bit(scd[CPU0]->clock)
      
      While the update of scd->clock is using an atomic64 mechanism, the
      readout on the remote cpu is not, which can cause completely bogus
      readouts.
      
      It is a quite rare problem, because it requires the update to hit the
      narrow race window between the low/high readout and the update must go
      across the 32bit boundary.
      
      The resulting misbehaviour is, that CPU1 will see the sched_clock on
      CPU1 ~4 seconds ahead of it's own and update CPU1s sched_clock value
      to this bogus timestamp. This stays that way due to the clamping
      implementation for about 4 seconds until the synchronization with
      CLOCK_MONOTONIC undoes the problem.
      
      The issue is hard to observe, because it might only result in a less
      accurate SCHED_OTHER timeslicing behaviour. To create observable
      damage on realtime scheduling classes, it is necessary that the bogus
      update of CPU1 sched_clock happens in the context of an realtime
      thread, which then gets charged 4 seconds of RT runtime, which results
      in the RT throttler mechanism to trigger and prevent scheduling of RT
      tasks for a little less than 4 seconds. So this is quite unlikely as
      well.
      
      The issue was quite hard to decode as the reproduction time is between
      2 days and 3 weeks and intrusive tracing makes it less likely, but the
      following trace recorded with trace_clock=global, which uses
      sched_clock_local(), gave the final hint:
      
        <idle>-0   0d..30 400269.477150: hrtimer_cancel: hrtimer=0xf7061e80
        <idle>-0   0d..30 400269.477151: hrtimer_start:  hrtimer=0xf7061e80 ...
      irq/20-S-587 1d..32 400273.772118: sched_wakeup:   comm= ... target_cpu=0
        <idle>-0   0dN.30 400273.772118: hrtimer_cancel: hrtimer=0xf7061e80
      
      What happens is that CPU0 goes idle and invokes
      sched_clock_idle_sleep_event() which invokes sched_clock_local() and
      CPU1 runs a remote wakeup for CPU0 at the same time, which invokes
      sched_remote_clock(). The time jump gets propagated to CPU0 via
      sched_remote_clock() and stays stale on both cores for ~4 seconds.
      
      There are only two other possibilities, which could cause a stale
      sched clock:
      
      1) ktime_get() which reads out CLOCK_MONOTONIC returns a sporadic
         wrong value.
      
      2) sched_clock() which reads the TSC returns a sporadic wrong value.
      
      #1 can be excluded because sched_clock would continue to increase for
         one jiffy and then go stale.
      
      #2 can be excluded because it would not make the clock jump
         forward. It would just result in a stale sched_clock for one jiffy.
      
      After quite some brain twisting and finding the same pattern on other
      traces, sched_clock_remote() remained the only place which could cause
      such a problem and as explained above it's indeed racy on 32bit
      systems.
      
      So while on 64bit systems the readout is atomic, we need to verify the
      remote readout on 32bit machines. We need to protect the local->clock
      readout in sched_clock_remote() on 32bit as well because an NMI could
      hit between the low and the high readout, call sched_clock_local() and
      modify local->clock.
      
      Thanks to Siegfried Wulsch for bearing with my debug requests and
      going through the tedious tasks of running a bunch of reproducer
      systems to generate the debug information which let me decode the
      issue.
      Reported-by: default avatarSiegfried Wulsch <Siegfried.Wulsch@rovema.de>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304051544160.21884@ionos
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      a1cbcaa9
  10. 31 Mar, 2013 1 commit
    • Paul Walmsley's avatar
      Revert "lockdep: check that no locks held at freeze time" · dbf520a9
      Paul Walmsley authored
      This reverts commit 6aa97070.
      
      Commit 6aa97070 ("lockdep: check that no locks held at freeze time")
      causes problems with NFS root filesystems.  The failures were noticed on
      OMAP2 and 3 boards during kernel init:
      
        [ BUG: swapper/0/1 still has locks held! ]
        3.9.0-rc3-00344-ga937536b #1 Not tainted
        -------------------------------------
        1 lock held by swapper/0/1:
         #0:  (&type->s_umount_key#13/1){+.+.+.}, at: [<c011e84c>] sget+0x248/0x574
      
        stack backtrace:
          rpc_wait_bit_killable
          __wait_on_bit
          out_of_line_wait_on_bit
          __rpc_execute
          rpc_run_task
          rpc_call_sync
          nfs_proc_get_root
          nfs_get_root
          nfs_fs_mount_common
          nfs_try_mount
          nfs_fs_mount
          mount_fs
          vfs_kern_mount
          do_mount
          sys_mount
          do_mount_root
          mount_root
          prepare_namespace
          kernel_init_freeable
          kernel_init
      
      Although the rootfs mounts, the system is unstable.  Here's a transcript
      from a PM test:
      
        http://www.pwsan.com/omap/testlogs/test_v3.9-rc3/20130317194234/pm/37xxevm/37xxevm_log.txt
      
      Here's what the test log should look like:
      
        http://www.pwsan.com/omap/testlogs/test_v3.8/20130218214403/pm/37xxevm/37xxevm_log.txt
      
      Mailing list discussion is here:
      
        http://lkml.org/lkml/2013/3/4/221
      
      
      
      Deal with this for v3.9 by reverting the problem commit, until folks can
      figure out the right long-term course of action.
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Cc: Mandeep Singh Baines <msb@chromium.org>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: Shawn Guo <shawn.guo@linaro.org>
      Cc: <maciej.rutecki@gmail.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ben Chan <benchan@chromium.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dbf520a9
  11. 27 Mar, 2013 2 commits
    • Eric W. Biederman's avatar
      userns: Restrict when proc and sysfs can be mounted · 87a8ebd6
      Eric W. Biederman authored
      
      Only allow unprivileged mounts of proc and sysfs if they are already
      mounted when the user namespace is created.
      
      proc and sysfs are interesting because they have content that is
      per namespace, and so fresh mounts are needed when new namespaces
      are created while at the same time proc and sysfs have content that
      is shared between every instance.
      
      Respect the policy of who may see the shared content of proc and sysfs
      by only allowing new mounts if there was an existing mount at the time
      the user namespace was created.
      
      In practice there are only two interesting cases: proc and sysfs are
      mounted at their usual places, proc and sysfs are not mounted at all
      (some form of mount namespace jail).
      
      Cc: stable@vger.kernel.org
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      87a8ebd6
    • Eric W. Biederman's avatar
      userns: Don't allow creation if the user is chrooted · 3151527e
      Eric W. Biederman authored
      
      Guarantee that the policy of which files may be access that is
      established by setting the root directory will not be violated
      by user namespaces by verifying that the root directory points
      to the root of the mount namespace at the time of user namespace
      creation.
      
      Changing the root is a privileged operation, and as a matter of policy
      it serves to limit unprivileged processes to files below the current
      root directory.
      
      For reasons of simplicity and comprehensibility the privilege to
      change the root directory is gated solely on the CAP_SYS_CHROOT
      capability in the user namespace.  Therefore when creating a user
      namespace we must ensure that the policy of which files may be access
      can not be violated by changing the root directory.
      
      Anyone who runs a processes in a chroot and would like to use user
      namespace can setup the same view of filesystems with a mount
      namespace instead.  With this result that this is not a practical
      limitation for using user namespaces.
      
      Cc: stable@vger.kernel.org
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      3151527e
  12. 26 Mar, 2013 2 commits
    • Michael Bohan's avatar
      hrtimer: Don't reinitialize a cpu_base lock on CPU_UP · 84cc8fd2
      Michael Bohan authored
      
      The current code makes the assumption that a cpu_base lock won't be
      held if the CPU corresponding to that cpu_base is offline, which isn't
      always true.
      
      If a hrtimer is not queued, then it will not be migrated by
      migrate_hrtimers() when a CPU is offlined. Therefore, the hrtimer's
      cpu_base may still point to a CPU which has subsequently gone offline
      if the timer wasn't enqueued at the time the CPU went down.
      
      Normally this wouldn't be a problem, but a cpu_base's lock is blindly
      reinitialized each time a CPU is brought up. If a CPU is brought
      online during the period that another thread is performing a hrtimer
      operation on a stale hrtimer, then the lock will be reinitialized
      under its feet, and a SPIN_BUG() like the following will be observed:
      
      <0>[   28.082085] BUG: spinlock already unlocked on CPU#0, swapper/0/0
      <0>[   28.087078]  lock: 0xc4780b40, value 0x0 .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1
      <4>[   42.451150] [<c0014398>] (unwind_backtrace+0x0/0x120) from [<c0269220>] (do_raw_spin_unlock+0x44/0xdc)
      <4>[   42.460430] [<c0269220>] (do_raw_spin_unlock+0x44/0xdc) from [<c071b5bc>] (_raw_spin_unlock+0x8/0x30)
      <4>[   42.469632] [<c071b5bc>] (_raw_spin_unlock+0x8/0x30) from [<c00a9ce0>] (__hrtimer_start_range_ns+0x1e4/0x4f8)
      <4>[   42.479521] [<c00a9ce0>] (__hrtimer_start_range_ns+0x1e4/0x4f8) from [<c00aa014>] (hrtimer_start+0x20/0x28)
      <4>[   42.489247] [<c00aa014>] (hrtimer_start+0x20/0x28) from [<c00e6190>] (rcu_idle_enter_common+0x1ac/0x320)
      <4>[   42.498709] [<c00e6190>] (rcu_idle_enter_common+0x1ac/0x320) from [<c00e6440>] (rcu_idle_enter+0xa0/0xb8)
      <4>[   42.508259] [<c00e6440>] (rcu_idle_enter+0xa0/0xb8) from [<c000f268>] (cpu_idle+0x24/0xf0)
      <4>[   42.516503] [<c000f268>] (cpu_idle+0x24/0xf0) from [<c06ed3c0>] (rest_init+0x88/0xa0)
      <4>[   42.524319] [<c06ed3c0>] (rest_init+0x88/0xa0) from [<c0c00978>] (start_kernel+0x3d0/0x434)
      
      As an example, this particular crash occurred when hrtimer_start() was
      executed on CPU #0. The code locked the hrtimer's current cpu_base
      corresponding to CPU #1. CPU #0 then tried to switch the hrtimer's
      cpu_base to an optimal CPU which was online. In this case, it selected
      the cpu_base corresponding to CPU #3.
      
      Before it could proceed, CPU #1 came online and reinitialized the
      spinlock corresponding to its cpu_base. Thus now CPU #0 held a lock
      which was reinitialized. When CPU #0 finally ended up unlocking the
      old cpu_base corresponding to CPU #1 so that it could switch to CPU
      #3, we hit this SPIN_BUG() above while in switch_hrtimer_base().
      
      CPU #0                            CPU #1
      ----                              ----
      ...                               <offline>
      hrtimer_start()
      lock_hrtimer_base(base #1)
      ...                               init_hrtimers_cpu()
      switch_hrtimer_base()             ...
      ...                               raw_spin_lock_init(&cpu_base->lock)
      raw_spin_unlock(&cpu_base->lock)  ...
      <spin_bug>
      
      Solve this by statically initializing the lock.
      Signed-off-by: default avatarMichael Bohan <mbohan@codeaurora.org>
      Link: http://lkml.kernel.org/r/1363745965-23475-1-git-send-email-mbohan@codeaurora.org
      
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      84cc8fd2
    • Eric W. Biederman's avatar
      pid: Handle the exit of a multi-threaded init. · 751c644b
      Eric W. Biederman authored
      
      When a multi-threaded init exits and the initial thread is not the
      last thread to exit the initial thread hangs around as a zombie
      until the last thread exits.  In that case zap_pid_ns_processes
      needs to wait until there are only 2 hashed pids in the pid
      namespace not one.
      
      v2. Replace thread_pid_vnr(me) == 1 with the test thread_group_leader(me)
          as suggested by Oleg.
      
      Cc: stable@vger.kernel.org
      Cc: Oleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarCaj Larsson <caj@omnicloud.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      751c644b
  13. 22 Mar, 2013 2 commits
    • Oleg Nesterov's avatar
      poweroff: change orderly_poweroff() to use schedule_work() · 2ca067ef
      Oleg Nesterov authored
      David said:
      
          Commit 6c0c0d4d
      
       ("poweroff: fix bug in orderly_poweroff()")
          apparently fixes one bug in orderly_poweroff(), but introduces
          another.  The comments on orderly_poweroff() claim it can be called
          from any context - and indeed we call it from interrupt context in
          arch/powerpc/platforms/pseries/ras.c for example.  But since that
          commit this is no longer safe, since call_usermodehelper_fns() is not
          safe in interrupt context without the UMH_NO_WAIT option.
      
      orderly_poweroff() can be used from any context but UMH_WAIT_EXEC is
      sleepable.  Move the "force" logic into __orderly_poweroff() and change
      orderly_poweroff() to use the global poweroff_work which simply calls
      __orderly_poweroff().
      
      While at it, remove the unneeded "int argc" and change argv_split() to
      use GFP_KERNEL.
      
      We use the global "bool poweroff_force" to pass the argument, this can
      obviously affect the previous request if it is pending/running.  So we
      only allow the "false => true" transition assuming that the pending
      "true" should succeed anyway.  If schedule_work() fails after that we
      know that work->func() was not called yet, it must see the new value.
      
      This means that orderly_poweroff() becomes async even if we do not run
      the command and always succeeds, schedule_work() can only fail if the
      work is already pending.  We can export __orderly_poweroff() and change
      the non-atomic callers which want the old semantics.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Reported-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
      Cc: Feng Hong <hongfeng@marvell.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2ca067ef
    • Frederic Weisbecker's avatar
      printk: Provide a wake_up_klogd() off-case · dc72c32e
      Frederic Weisbecker authored
      
      wake_up_klogd() is useless when CONFIG_PRINTK=n because neither printk()
      nor printk_sched() are in use and there are actually no waiter on
      log_wait waitqueue.  It should be a stub in this case for users like
      bust_spinlocks().
      
      Otherwise this results in this warning when CONFIG_PRINTK=n and
      CONFIG_IRQ_WORK=n:
      
      	kernel/built-in.o In function `wake_up_klogd':
      	(.text.wake_up_klogd+0xb4): undefined reference to `irq_work_queue'
      
      To fix this, provide an off-case for wake_up_klogd() when
      CONFIG_PRINTK=n.
      
      There is much more from console_unlock() and other console related code
      in printk.c that should be moved under CONFIG_PRINTK.  But for now,
      focus on a minimal fix as we passed the merged window already.
      
      [akpm@linux-foundation.org: include printk.h in bust_spinlocks.c]
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Reported-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dc72c32e
  14. 21 Mar, 2013 2 commits
  15. 18 Mar, 2013 2 commits
    • Namhyung Kim's avatar
      perf: Generate EXIT event only once per task context · d610d98b
      Namhyung Kim authored
      
      perf_event_task_event() iterates pmu list and generate events
      for each eligible pmu context.  But if task_event has task_ctx
      like in EXIT it'll generate events even though the pmu doesn't
      have an eligible one. Fix it by moving the code to proper
      places.
      
      Before this patch:
      
        $ perf record -n true
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.006 MB perf.data (~248 samples) ]
      
        $ perf report -D | tail
        Aggregated stats:
                   TOTAL events:         73
                    MMAP events:         67
                    COMM events:          2
                    EXIT events:          4
        cycles stats:
                   TOTAL events:         73
                    MMAP events:         67
                    COMM events:          2
                    EXIT events:          4
      
      After this patch:
      
        $ perf report -D | tail
        Aggregated stats:
                   TOTAL events:         70
                    MMAP events:         67
                    COMM events:          2
                    EXIT events:          1
        cycles stats:
                   TOTAL events:         70
                    MMAP events:         67
                    COMM events:          2
                    EXIT events:          1
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1363332433-7637-1-git-send-email-namhyung@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d610d98b
    • Namhyung Kim's avatar
      perf: Reset hwc->last_period on sw clock events · 778141e3
      Namhyung Kim authored
      
      When cpu/task clock events are initialized, their sampling
      frequencies are converted to have a fixed value.  However it
      missed to update the hwc->last_period which was set to 1 for
      initial sampling frequency calibration.
      
      Because this hwc->last_period value is used as a period in
      perf_swevent_ hrtime(), every recorded sample will have an
      incorrected period of 1.
      
        $ perf record -e task-clock noploop 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.158 MB perf.data (~6919 samples) ]
      
        $ perf report -n --show-total-period  --stdio
        # Samples: 4K of event 'task-clock'
        # Event count (approx.): 4000
        #
        # Overhead       Samples        Period  Command  Shared Object              Symbol
        # ........  ............  ............  .......  .............  ..................
        #
            99.95%          3998          3998  noploop  noploop        [.] main
             0.03%             1             1  noploop  libc-2.15.so   [.] init_cacheinfo
             0.03%             1             1  noploop  ld-2.15.so     [.] open_verify
      
      Note that it doesn't affect the non-sampling event so that the
      perf stat still gets correct value with or without this patch.
      
        $ perf stat -e task-clock noploop 1
      
         Performance counter stats for 'noploop 1':
      
               1000.272525 task-clock                #    1.000 CPUs utilized
      
               1.000560605 seconds time elapsed
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1363574507-18808-1-git-send-email-namhyung@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      778141e3
  16. 15 Mar, 2013 1 commit