1. 13 Dec, 2013 1 commit
    • Jan Beulich's avatar
      procfs: also fix proc_reg_get_unmapped_area() for !MMU case · ae5758a1
      Jan Beulich authored
      Commit fad1a86e ("procfs: call default get_unmapped_area on
      MMU-present architectures"), as its title says, took care of only the
      MMU case, leaving the !MMU side still in the regressed state (returning
      -EIO in all cases where pde->proc_fops->get_unmapped_area is NULL).
      
      From the fad1a86e changelog:
      
       "Commit c4fe2448
      
       ("sparc: fix PCI device proc file mmap(2)") added
        proc_reg_get_unmapped_area in proc_reg_file_ops and
        proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
        get_unmapped_area method is not defined for the target procfs file, which
        causes regression of mmap on /proc/vmcore.
      
        To address this issue, like get_unmapped_area(), call default
        current->mm->get_unmapped_area on MMU-present architectures if
        pde->proc_fops->get_unmapped_area, i.e.  the one in actual file operation
        in the procfs file, is not defined"
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org>	[3.12.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ae5758a1
  2. 12 Dec, 2013 8 commits
    • Will Deacon's avatar
      dcache: allow word-at-a-time name hashing with big-endian CPUs · a5c21dce
      Will Deacon authored
      
      When explicitly hashing the end of a string with the word-at-a-time
      interface, we have to be careful which end of the word we pick up.
      
      On big-endian CPUs, the upper-bits will contain the data we're after, so
      ensure we generate our masks accordingly (and avoid hashing whatever
      random junk may have been sitting after the string).
      
      This patch adds a new dcache helper, bytemask_from_count, which creates
      a mask appropriate for the CPU endianness.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a5c21dce
    • Dan Carpenter's avatar
      Btrfs: fix access_ok() check in btrfs_ioctl_send() · 700ff4f0
      Dan Carpenter authored
      
      The closing parenthesis is in the wrong place.  We want to check
      "sizeof(*arg->clone_sources) * arg->clone_sources_count" instead of
      "sizeof(*arg->clone_sources * arg->clone_sources_count)".
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarJie Liu <jeff.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      cc: stable@vger.kernel.org
      700ff4f0
    • Wang Shilong's avatar
      Btrfs: make sure we cleanup all reloc roots if error happens · 467bb1d2
      Wang Shilong authored
      
      I hit an oops when merging reloc roots fails, the reason is that
      new reloc roots may be added and we should make sure we cleanup
      all reloc roots.
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      467bb1d2
    • Wang Shilong's avatar
      Btrfs: skip building backref tree for uuid and quota tree when doing balance relocation · 66463748
      Wang Shilong authored
      
      Quota tree and UUID Tree is only cowed, they can not be snapshoted.
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      66463748
    • Wang Shilong's avatar
      Btrfs: fix an oops when doing balance relocation · c974c464
      Wang Shilong authored
      
      I hit an oops when inserting reloc root into @reloc_root_tree(it can be
      easily triggered when forcing cow for relocation root)
      
      [  866.494539]  [<ffffffffa0499579>] btrfs_init_reloc_root+0x79/0xb0 [btrfs]
      [  866.495321]  [<ffffffffa044c240>] record_root_in_trans+0xb0/0x110 [btrfs]
      [  866.496109]  [<ffffffffa044d758>] btrfs_record_root_in_trans+0x48/0x80 [btrfs]
      [  866.496908]  [<ffffffffa0494da8>] select_reloc_root+0xa8/0x210 [btrfs]
      [  866.497703]  [<ffffffffa0495c8a>] do_relocation+0x16a/0x540 [btrfs]
      
      This is because reloc root inserted into @reloc_root_tree is not within one
      transaction,reloc root may be cowed and root block bytenr will be reused then
      oops happens.We should update reloc root in @reloc_root_tree when cow reloc
      root node, fix it.
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Reviewed-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      c974c464
    • Filipe David Borba Manana's avatar
      Btrfs: don't miss skinny extent items on delayed ref head contention · 639eefc8
      Filipe David Borba Manana authored
      Currently extent-tree.c:btrfs_lookup_extent_info() can miss the lookup
      of skinny extent items. This can happen when the execution flow is the
      following:
      
      * We do an extent tree lookup and fail to find a skinny extent item;
      
      * As a result, we attempt to see if a non-skinny extent item exists,
        either by looking at previous item in the leaf or by doing another
        full extent tree search;
      
      * We have a transaction and then we check for a matching delayed ref
        head in the transaction's delayed refs rbtree;
      
      * We find such delayed ref head and then we try to lock it with a
        call to mutex_trylock();
      
      * The lock was contended so we jump to the label "again", which repeats
        the extent tree search but for a non-skinny extent item, because we set
        previously metadata variable to 0 and the search key to look for a
        non-skinny extent-item;
      
      * After the jump (and after releasing the transaction's delayed refs
        lock), a skinny extent item might have been added to the extent tree
        but we will miss it because metadata is set to 0 and the search key
        is set for a non-skinny extent-item.
      
      The fix here is to not reset metadata to 0 and to jump to the initial search
      key setup if the delayed ref head is contended, instead of jumping directly
      to the extent tree search label ("again").
      
      This issue was found while investigating the issue reported at Bugzilla 64961.
      
      David Sterba suspected this function was missing extent items, and that
      this could be caused by the last change to this function, which was made
      in the following patch:
      
          [PATCH] Btrfs: optimize btrfs_lookup_extent_info()
          (commit 74be9510
      
      )
      
      But in fact this issue already existed before, because after failing to find
      a skinny extent item, the code set the search key for a non-skinny extent
      item, and on contention of a matching delayed ref head it would not search
      the extent tree for a skinny extent item anymore.
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      639eefc8
    • David Sterba's avatar
      btrfs: call mnt_drop_write after interrupted subvol deletion · e43f998e
      David Sterba authored
      
      If btrfs_ioctl_snap_destroy blocks on the mutex and the process is
      killed, mnt_write count is unbalanced and leads to unmountable
      filesystem.
      
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      e43f998e
    • Miao Xie's avatar
      Btrfs: don't clear the default compression type · a7e252af
      Miao Xie authored
      
      We met a oops caused by the wrong compression type:
      [  556.512356] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [  556.512370] IP: [<ffffffff811dbaa0>] __list_del_entry+0x1/0x98
      [SNIP]
      [  556.512490]  [<ffffffff811dbb44>] ? list_del+0xd/0x2b
      [  556.512539]  [<ffffffffa05dd5ce>] find_workspace+0x97/0x175 [btrfs]
      [  556.512546]  [<ffffffff813c14b5>] ? _raw_spin_lock+0xe/0x10
      [  556.512576]  [<ffffffffa05de276>] btrfs_compress_pages+0x2d/0xa2 [btrfs]
      [  556.512601]  [<ffffffffa05af060>] compress_file_range.constprop.54+0x1f2/0x4e8 [btrfs]
      [  556.512627]  [<ffffffffa05af388>] async_cow_start+0x32/0x4d [btrfs]
      [  556.512655]  [<ffffffffa05cc7a1>] worker_loop+0x144/0x4c3 [btrfs]
      [  556.512661]  [<ffffffff81059404>] ? finish_task_switch+0x80/0xb8
      [  556.512689]  [<ffffffffa05cc65d>] ? btrfs_queue_worker+0x244/0x244 [btrfs]
      [  556.512695]  [<ffffffff8104fa4e>] kthread+0x8d/0x95
      [  556.512699]  [<ffffffff81050000>] ? bit_waitqueue+0x34/0x7d
      [  556.512704]  [<ffffffff8104f9c1>] ? __kthread_parkme+0x65/0x65
      [  556.512709]  [<ffffffff813c7eec>] ret_from_fork+0x7c/0xb0
      [  556.512713]  [<ffffffff8104f9c1>] ? __kthread_parkme+0x65/0x65
      
      Steps to reproduce:
       # mkfs.btrfs -f <dev>
       # mount -o nodatacow <dev> <mnt>
       # touch <mnt>/<file>
       # chattr =c <mnt>/<file>
       # dd if=/dev/zero of=<mnt>/<file> bs=1M count=10
      
      It is because we cleared the default compression type when setting the
      nodatacow. In fact, we needn't do it because we have used COMPRESS flag to
      indicate if we need compressed the file data or not, needn't use the
      variant -- compress_type -- in btrfs_info to do the same thing, and just
      use it to hold the default compression type. Or we would get a wrong compress
      type for a file whose own compress flag is set but the compress flag of its
      filesystem is not set.
      Reported-by: default avatarTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      a7e252af
  3. 11 Dec, 2013 1 commit
    • Jeff Layton's avatar
      nfsd: when reusing an existing repcache entry, unhash it first · 781c2a5a
      Jeff Layton authored
      
      The DRC code will attempt to reuse an existing, expired cache entry in
      preference to allocating a new one. It'll then search the cache, and if
      it gets a hit it'll then free the cache entry that it was going to
      reuse.
      
      The cache code doesn't unhash the entry that it's going to reuse
      however, so it's possible for it end up designating an entry for reuse
      and then subsequently freeing the same entry after it finds it.  This
      leads it to a later use-after-free situation and usually some list
      corruption warnings or an oops.
      
      Fix this by simply unhashing the entry that we intend to reuse. That
      will mean that it's not findable via a search and should prevent this
      situation from occurring.
      
      Cc: stable@vger.kernel.org # v3.10+
      Reported-by: default avatarChristoph Hellwig <hch@infradead.org>
      Reported-by: default avatarg. artim <gartim@gmail.com>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      781c2a5a
  4. 10 Dec, 2013 3 commits
  5. 06 Dec, 2013 1 commit
  6. 04 Dec, 2013 2 commits
    • Helge Deller's avatar
      nfs: fix do_div() warning by instead using sector_div() · 3873d064
      Helge Deller authored
      
      When compiling a 32bit kernel with CONFIG_LBDAF=n the compiler complains like
      shown below.  Fix this warning by instead using sector_div() which is provided
      by the kernel.h header file.
      
      fs/nfs/blocklayout/extents.c: In function ‘normalize’:
      include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default]
      fs/nfs/blocklayout/extents.c:47:13: note: in expansion of macro ‘do_div’
      nfs/blocklayout/extents.c:47:2: warning: right shift count >= width of type [enabled by default]
      fs/nfs/blocklayout/extents.c:47:2: warning: passing argument 1 of ‘__div64_32’ from incompatible pointer type [enabled by default]
      include/asm-generic/div64.h:35:17: note: expected ‘uint64_t *’ but argument is of type ‘sector_t *’
       extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      3873d064
    • Trond Myklebust's avatar
      NFSv4.1: Prevent a 3-way deadlock between layoutreturn, open and state recovery · f22e5edd
      Trond Myklebust authored
      
      Andy Adamson reports:
      
      The state manager is recovering expired state and recovery OPENs are being
      processed. If kswapd is pruning inodes at the same time, a deadlock can occur
      when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the
      resultant layoutreturn gets an error that the state mangager is to handle,
      causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq.
      
      At the same time an open is waiting for the inode deletion to complete in
      __wait_on_freeing_inode.
      
      If the open is either the open called by the state manager, or an open from
      the same open owner that is holding the NFSv4 sequence id which causes the
      OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue,
      then the state is deadlocked with kswapd.
      
      The fix is simply to have layoutreturn ignore all errors except NFS4ERR_DELAY.
      We already know that layouts are dropped on all server reboots, and that
      it has to be coded to deal with the "forgetful client model" that doesn't
      send layoutreturns.
      Reported-by: default avatarAndy Adamson <andros@netapp.com>
      Link: http://lkml.kernel.org/r/1385402270-14284-1-git-send-email-andros@netapp.com
      
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@primarydata.com>
      f22e5edd
  7. 03 Dec, 2013 1 commit
  8. 02 Dec, 2013 1 commit
    • Linus Torvalds's avatar
      vfs: fix subtle use-after-free of pipe_inode_info · b0d8d229
      Linus Torvalds authored
      The pipe code was trying (and failing) to be very careful about freeing
      the pipe info only after the last access, with a pattern like:
      
              spin_lock(&inode->i_lock);
              if (!--pipe->files) {
                      inode->i_pipe = NULL;
                      kill = 1;
              }
              spin_unlock(&inode->i_lock);
              __pipe_unlock(pipe);
              if (kill)
                      free_pipe_info(pipe);
      
      where the final freeing is done last.
      
      HOWEVER.  The above is actually broken, because while the freeing is
      done at the end, if we have two racing processes releasing the pipe
      inode info, the one that *doesn't* free it will decrement the ->files
      count, and unlock the inode i_lock, but then still use the
      "pipe_inode_info" afterwards when it does the "__pipe_unlock(pipe)".
      
      This is *very* hard to trigger in practice, since the race window is
      very small, and adding debug options seems to just hide it by slowing
      things down.
      
      Simon originally reported this way back in July as an Oops in
      kmem_cache_allocate due to a single bit corruption (due to the final
      "spin_unlock(pipe->mutex.wait_lock)" incrementing a field in a different
      allocation that had re-used the free'd pipe-info), it's taken this long
      to figure out.
      
      Since the 'pipe->files' accesses aren't even protected by the pipe lock
      (we very much use the inode lock for that), the simple solution is to
      just drop the pipe lock early.  And since there were two users of this
      pattern, create a helper function for it.
      
      Introduced commit ba5bb147
      
       ("pipe: take allocation and freeing of
      pipe_inode_info out of ->i_mutex").
      Reported-by: default avatarSimon Kirby <sim@hostway.ca>
      Reported-by: default avatarIan Applegate <ia@cloudflare.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: stable@kernel.org   # v3.10+
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b0d8d229
  9. 29 Nov, 2013 1 commit
  10. 28 Nov, 2013 1 commit
  11. 27 Nov, 2013 1 commit
  12. 25 Nov, 2013 1 commit
    • Steve French's avatar
      [CIFS] Do not use btrfs refcopy ioctl for SMB2 copy offload · f19e84df
      Steve French authored
      Change cifs.ko to using CIFS_IOCTL_COPYCHUNK instead
      of BTRFS_IOC_CLONE to avoid confusion about whether
      copy-on-write is required or optional for this operation.
      
      SMB2/SMB3 copyoffload had used the BTRFS_IOC_CLONE ioctl since
      they both speed up copy by offloading the copy rather than
      passing many read and write requests back and forth and both have
      identical syntax (passing file handles), but for SMB2/SMB3
      CopyChunk the server is not required to use copy-on-write
      to make a copy of the file (although some do), and Christoph
      has commented that since CopyChunk does not require
      copy-on-write we should not reuse BTRFS_IOC_CLONE.
      
      This patch renames the ioctl to use a cifs specific IOCTL
      CIFS_IOCTL_COPYCHUNK.  This ioctl is particularly important
      for SMB2/SMB3 since large file copy over the network otherwise
      can be very slow, and with this is often more than 100 times
      faster putting less load on server and client.
      
      Note that if a copy syscall is ever ...
      f19e84df
  13. 24 Nov, 2013 2 commits
  14. 23 Nov, 2013 8 commits
    • Li Wang's avatar
      ceph: allocate non-zero page to fscache in readpage() · ff638b7d
      Li Wang authored
      
      ceph_osdc_readpages() returns number of bytes read, currently,
      the code only allocate full-zero page into fscache, this patch
      fixes this.
      Signed-off-by: default avatarLi Wang <liwang@ubuntukylin.com>
      Reviewed-by: default avatarMilosz Tanski <milosz@adfin.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      ff638b7d
    • Yan, Zheng's avatar
      ceph: wake up 'safe' waiters when unregistering request · fc55d2c9
      Yan, Zheng authored
      
      We also need to wake up 'safe' waiters if error occurs or request
      aborted. Otherwise sync(2)/fsync(2) may hang forever.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      fc55d2c9
    • Yan, Zheng's avatar
      ceph: cleanup aborted requests when re-sending requests. · eb1b8af3
      Yan, Zheng authored
      
      Aborted requests usually get cleared when the reply is received.
      If MDS crashes, no reply will be received. So we need to cleanup
      aborted requests when re-sending requests.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarGreg Farnum <greg@inktank.com>
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      eb1b8af3
    • Yan, Zheng's avatar
      ceph: handle race between cap reconnect and cap release · 99a9c273
      Yan, Zheng authored
      
      When a cap get released while composing the cap reconnect message.
      We should skip queuing the release message if the cap hasn't been
      added to the cap reconnect message.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      99a9c273
    • Yan, Zheng's avatar
      ceph: set caps count after composing cap reconnect message · 44c99757
      Yan, Zheng authored
      
      It's possible that some caps get released while composing the cap
      reconnect message.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      44c99757
    • Yan, Zheng's avatar
      ceph: queue cap release in __ceph_remove_cap() · a096b09a
      Yan, Zheng authored
      
      call __queue_cap_release() in __ceph_remove_cap(), this avoids
      acquiring s_cap_lock twice.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      a096b09a
    • Tejun Heo's avatar
      sysfs: use a separate locking class for open files depending on mmap · 027a485d
      Tejun Heo authored
      The following two commits implemented mmap support in the regular file
      path and merged bin file support into the regular path.
      
       73d97146 ("sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c")
       3124eb16
      
       ("sysfs: merge regular and bin file handling")
      
      After the merge, the following commands trigger a spurious lockdep
      warning.  "test-mmap-read" simply mmaps the file and dumps the
      content.
      
        $ cat /sys/block/sda/trace/act_mask
        $ test-mmap-read /sys/devices/pci0000\:00/0000\:00\:03.0/resource0 4096
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        3.12.0-work+ #378 Not tainted
        -------------------------------------------------------
        test-mmap-read/567 is trying to acquire lock:
         (&of->mutex){+.+.+.}, at: [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
      
        but task is already holding lock:
         (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #3 (&mm->mmap_sem){++++++}:
        ...
        -> #2 (sr_mutex){+.+.+.}:
        ...
        -> #1 (&bdev->bd_mutex){+.+.+.}:
        ...
        -> #0 (&of->mutex){+.+.+.}:
        ...
      
        other info that might help us debug this:
      
        Chain exists of:
         &of->mutex --> sr_mutex --> &mm->mmap_sem
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(&mm->mmap_sem);
      				 lock(sr_mutex);
      				 lock(&mm->mmap_sem);
          lock(&of->mutex);
      
         *** DEADLOCK ***
      
        1 lock held by test-mmap-read/567:
         #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
      
        stack backtrace:
        CPU: 3 PID: 567 Comm: test-mmap-read Not tainted 3.12.0-work+ #378
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
         ffffffff81ed41a0 ffff880009441bc8 ffffffff81611ad2 ffffffff81eccb80
         ffff880009441c08 ffffffff8160f215 ffff880009441c60 ffff880009c75208
         0000000000000000 ffff880009c751e0 ffff880009c75208 ffff880009c74ac0
        Call Trace:
         [<ffffffff81611ad2>] dump_stack+0x4e/0x7a
         [<ffffffff8160f215>] print_circular_bug+0x2b0/0x2bf
         [<ffffffff8109ca0a>] __lock_acquire+0x1a3a/0x1e60
         [<ffffffff8109d6ba>] lock_acquire+0x9a/0x1d0
         [<ffffffff81615547>] mutex_lock_nested+0x67/0x3f0
         [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
         [<ffffffff8115d363>] mmap_region+0x3b3/0x5b0
         [<ffffffff8115d8ae>] do_mmap_pgoff+0x34e/0x3d0
         [<ffffffff8114b3ba>] vm_mmap_pgoff+0x6a/0xa0
         [<ffffffff8115be3e>] SyS_mmap_pgoff+0xbe/0x250
         [<ffffffff81008282>] SyS_mmap+0x22/0x30
         [<ffffffff8161a4d2>] system_call_fastpath+0x16/0x1b
      
      This happens because one file nests sr_mutex, which nests mm->mmap_sem
      under it, under of->mutex while mmap implementation naturally nests
      of->mutex under mm->mmap_sem.  The warning is false positive as
      of->mutex is per open-file and the two paths belong to two different
      files.  This warning didn't trigger before regular and bin file
      supports were merged because only bin file supported mmap and the
      other side of locking happened only on regular files which used
      equivalent but separate locking.
      
      It'd be best if we give separate locking classes per file but we can't
      easily do that.  Let's differentiate on ->mmap() for now.  Later we'll
      add explicit file operations struct and can add per-ops lockdep key
      there.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      027a485d
    • Mika Westerberg's avatar
      sysfs: handle duplicate removal attempts in sysfs_remove_group() · 54d71145
      Mika Westerberg authored
      Commit bcdde7e2
      
       (sysfs: make __sysfs_remove_dir() recursive) changed
      the behavior so that directory removals will be done recursively. This
      means that the sysfs group might already be removed if its parent directory
      has been removed.
      
      The current code outputs warnings similar to following log snippet when it
      detects that there is no group for the given kobject:
      
       WARNING: CPU: 0 PID: 4 at fs/sysfs/group.c:214 sysfs_remove_group+0xc6/0xd0()
       sysfs group ffffffff81c6f1e0 not found for kobject 'host7'
       Modules linked in:
       CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0+ #13
       Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
       Workqueue: kacpi_hotplug acpi_hotplug_work_fn
        0000000000000009 ffff8801002459b0 ffffffff817daab1 ffff8801002459f8
        ffff8801002459e8 ffffffff810436b8 0000000000000000 ffffffff81c6f1e0
        ffff88006d440358 ffff88006d440188 ffff88006e8b4c28 ffff880100245a48
       Call Trace:
        [<ffffffff817daab1>] dump_stack+0x45/0x56
        [<ffffffff810436b8>] warn_slowpath_common+0x78/0xa0
        [<ffffffff81043727>] warn_slowpath_fmt+0x47/0x50
        [<ffffffff811ad319>] ? sysfs_get_dirent_ns+0x49/0x70
        [<ffffffff811ae526>] sysfs_remove_group+0xc6/0xd0
        [<ffffffff81432f7e>] dpm_sysfs_remove+0x3e/0x50
        [<ffffffff8142a0d0>] device_del+0x40/0x1b0
        [<ffffffff8142a24d>] device_unregister+0xd/0x20
        [<ffffffff8144131a>] scsi_remove_host+0xba/0x110
        [<ffffffff8145f526>] ata_host_detach+0xc6/0x100
        [<ffffffff8145f578>] ata_pci_remove_one+0x18/0x20
        [<ffffffff812e8f48>] pci_device_remove+0x28/0x60
        [<ffffffff8142d854>] __device_release_driver+0x64/0xd0
        [<ffffffff8142d8de>] device_release_driver+0x1e/0x30
        [<ffffffff8142d257>] bus_remove_device+0xf7/0x140
        [<ffffffff8142a1b1>] device_del+0x121/0x1b0
        [<ffffffff812e43d4>] pci_stop_bus_device+0x94/0xa0
        [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
        [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
        [<ffffffff812e44dd>] pci_stop_and_remove_bus_device+0xd/0x20
        [<ffffffff812fc743>] trim_stale_devices+0x73/0xe0
        [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
        [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
        [<ffffffff812fcb6e>] acpiphp_check_bridge+0x7e/0xd0
        [<ffffffff812fd90d>] hotplug_event+0xcd/0x160
        [<ffffffff812fd9c5>] hotplug_event_work+0x25/0x60
        [<ffffffff81316749>] acpi_hotplug_work_fn+0x17/0x22
        [<ffffffff8105cf3a>] process_one_work+0x17a/0x430
        [<ffffffff8105db29>] worker_thread+0x119/0x390
        [<ffffffff8105da10>] ? manage_workers.isra.25+0x2a0/0x2a0
        [<ffffffff81063a5d>] kthread+0xcd/0xf0
        [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
        [<ffffffff817eb33c>] ret_from_fork+0x7c/0xb0
        [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
      
      On this particular machine I see ~16 of these message during Thunderbolt
      hot-unplug.
      
      Fix this in similar way that was done for sysfs_remove_one() by checking
      if the parent directory has already been removed and bailing out early.
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54d71145
  15. 22 Nov, 2013 1 commit
    • Junxiao Bi's avatar
      configfs: fix race between dentry put and lookup · 76ae281f
      Junxiao Bi authored
      A race window in configfs, it starts from one dentry is UNHASHED and end
      before configfs_d_iput is called.  In this window, if a lookup happen,
      since the original dentry was UNHASHED, so a new dentry will be
      allocated, and then in configfs_attach_attr(), sd->s_dentry will be
      updated to the new dentry.  Then in configfs_d_iput(),
      BUG_ON(sd->s_dentry != dentry) will be triggered and system panic.
      
      sys_open:                     sys_close:
       ...                           fput
                                      dput
                                       dentry_kill
                                        __d_drop <--- dentry unhashed here,
                                                 but sd->dentry still point
                                                 to this dentry.
      
       lookup_real
        configfs_lookup
         configfs_attach_attr---> update sd->s_dentry
                                  to new allocated dentry here.
      
                                         d_kill
                                       ...
      76ae281f
  16. 21 Nov, 2013 7 commits