1. 19 Jun, 2017 1 commit
    • Hugh Dickins's avatar
      mm: larger stack guard gap, between vmas · 1be7107f
      Hugh Dickins authored
      
      Stack guard page is a useful feature to reduce a risk of stack smashing
      into a different mapping. We have been using a single page gap which
      is sufficient to prevent having stack adjacent to a different mapping.
      But this seems to be insufficient in the light of the stack usage in
      userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
      used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
      which is 256kB or stack strings with MAX_ARG_STRLEN.
      
      This will become especially dangerous for suid binaries and the default
      no limit for the stack size limit because those applications can be
      tricked to consume a large portion of the stack and a single glibc call
      could jump over the guard page. These attacks are not theoretical,
      unfortunatelly.
      
      Make those attacks less probable by increasing the stack guard gap
      to 1MB (on systems with 4k pages; but make it depend on the page size
      because systems with larger base pages might cap stack allocations in
      the PAGE_SIZE units) which should cover larger alloca() and VLA stack
      allocations. It is obviously not a full fix because the problem is
      somehow inherent, but it should reduce attack space a lot.
      
      One could argue that the gap size should be configurable from userspace,
      but that can be done later when somebody finds that the new 1MB is wrong
      for some special case applications.  For now, add a kernel command line
      option (stack_guard_gap) to specify the stack gap size (in page units).
      
      Implementation wise, first delete all the old code for stack guard page:
      because although we could get away with accounting one extra page in a
      stack vma, accounting a larger gap can break userspace - case in point,
      a program run with "ulimit -S -v 20000" failed when the 1MB gap was
      counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
      and strict non-overcommit mode.
      
      Instead of keeping gap inside the stack vma, maintain the stack guard
      gap as a gap between vmas: using vm_start_gap() in place of vm_start
      (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
      places which need to respect the gap - mainly arch_get_unmapped_area(),
      and and the vma tree's subtree_gap support for that.
      Original-patch-by: default avatarOleg Nesterov <oleg@redhat.com>
      Original-patch-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Tested-by: Helge Deller <deller@gmx.de> # parisc
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1be7107f
  2. 15 Jun, 2017 1 commit
  3. 14 Jun, 2017 2 commits
    • Bart Van Assche's avatar
      block: Fix a blk_exit_rl() regression · dc9edc44
      Bart Van Assche authored
      
      Avoid that the following complaint is reported:
      
       BUG: sleeping function called from invalid context at kernel/workqueue.c:2790
       in_atomic(): 1, irqs_disabled(): 0, pid: 41, name: rcuop/3
       1 lock held by rcuop/3/41:
        #0:  (rcu_callback){......}, at: [<ffffffff8111f9a2>] rcu_nocb_kthread+0x282/0x500
       Call Trace:
        dump_stack+0x86/0xcf
        ___might_sleep+0x174/0x260
        __might_sleep+0x4a/0x80
        flush_work+0x7e/0x2e0
        __cancel_work_timer+0x143/0x1c0
        cancel_work_sync+0x10/0x20
        blk_throtl_exit+0x25/0x60
        blkcg_exit_queue+0x35/0x40
        blk_release_queue+0x42/0x130
        kobject_put+0xa9/0x190
      
      This happens since we invoke callbacks that need to block from the
      queue release handler. Fix this by pushing the final release to
      a workqueue.
      Reported-by: default avatarRoss Zwisler <zwisler@gmail.com>
      Fixes: commit b425e504
      
       ("block: Avoid that blk_exit_rl() triggers a use-after-free")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Tested-by: Ross Zwisler <ross.zwisler@linux.intel...
      dc9edc44
    • Magnus Damm's avatar
      net: update undefined ->ndo_change_mtu() comment · db46a0e1
      Magnus Damm authored
      
      Update ->ndo_change_mtu() callback comment to remove text
      about returning error in case of undefined callback. This
      change makes the comment match the existing code behavior.
      Signed-off-by: default avatarMagnus Damm <damm+renesas@opensource.se>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db46a0e1
  4. 12 Jun, 2017 2 commits
    • Lv Zheng's avatar
      ACPICA: Tables: Mechanism to handle late stage acpi_get_table() imbalance · 83848fbe
      Lv Zheng authored
      
      Considering this case:
      
       1. A program opens a sysfs table file 65535 times, it can increase
          validation_count and first increment cause the table to be mapped:
      
           validation_count = 65535
      
       2. AML execution causes "Load" to be executed on the same
          table, this time it cannot increase validation_count, so
          validation_count remains:
      
            validation_count = 65535
      
       3. The program closes sysfs table file 65535 times, it can decrease
          validation_count and the last decrement cause the table to be
          unmapped:
      
           validation_count = 0
      
       4. AML code still accessing the loaded table, kernel crash can be
          observed.
      
      To prevent that from happening, add a validation_count threashold.
      When it is reached, the validation_count can no longer be
      incremented/decremented to invalidate the table descriptor (means
      preventing table unmappings)
      
      Note that code added in acpi_tb_put_table() is actually a no-op but
      changes the warning message into a "warn once" one. Lv Zheng.
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      [ rjw: Changelog, comments ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      83848fbe
    • Bart Van Assche's avatar
      configfs: Introduce config_item_get_unless_zero() · 19e72d3a
      Bart Van Assche authored
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      [hch: minor style tweak]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      19e72d3a
  5. 11 Jun, 2017 1 commit
  6. 09 Jun, 2017 2 commits
  7. 08 Jun, 2017 5 commits
  8. 07 Jun, 2017 1 commit
    • David S. Miller's avatar
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller authored
      
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf124db5
  9. 06 Jun, 2017 4 commits
    • Rafael J. Wysocki's avatar
      Revert "ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle" · f3b7eaae
      Rafael J. Wysocki authored
      Revert commit eed4d47e
      
       (ACPI / sleep: Ignore spurious SCI wakeups
      from suspend-to-idle) as it turned out to be premature and triggered
      a number of different issues on various systems.
      
      That includes, but is not limited to, premature suspend-to-RAM aborts
      on Dell XPS 13 (9343) reported by Dominik.
      
      The issue the commit in question attempted to address is real and
      will need to be taken care of going forward, but evidently more work
      is needed for this purpose.
      Reported-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f3b7eaae
    • David Rientjes's avatar
      compiler, clang: suppress warning for unused static inline functions · abb2ea7d
      David Rientjes authored
      
      GCC explicitly does not warn for unused static inline functions for
      -Wunused-function.  The manual states:
      
      	Warn whenever a static function is declared but not defined or
      	a non-inline static function is unused.
      
      Clang does warn for static inline functions that are unused.
      
      It turns out that suppressing the warnings avoids potentially complex
      #ifdef directives, which also reduces LOC.
      
      Suppress the warning for clang.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      abb2ea7d
    • Eric Biggers's avatar
      elevator: fix truncation of icq_cache_name · 9bd2bbc0
      Eric Biggers authored
      
      gcc 7.1 reports the following warning:
      
          block/elevator.c: In function ‘elv_register’:
          block/elevator.c:898:5: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
               "%s_io_cq", e->elevator_name);
               ^~~~~~~~~~
          block/elevator.c:897:3: note: ‘snprintf’ output between 7 and 22 bytes into a destination of size 21
             snprintf(e->icq_cache_name, sizeof(e->icq_cache_name),
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
               "%s_io_cq", e->elevator_name);
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      The bug is that the name of the icq_cache is 6 characters longer than
      the elevator name, but only ELV_NAME_MAX + 5 characters were reserved
      for it --- so in the case of a maximum-length elevator name, the 'q'
      character in "_io_cq" would be truncated by snprintf().  Fix it by
      reserving ELV_NAME_MAX + 6 characters instead.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarBart Van Assche <Bart.VanAssche@sandisk.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      9bd2bbc0
    • Arnd Bergmann's avatar
      [media] cec-notifier.h: handle unreachable CONFIG_CEC_CORE · ae8eb443
      Arnd Bergmann authored
      
      Fix a link error in this specific combination of config options:
      
      CONFIG_MEDIA_CEC_SUPPORT=y
      CONFIG_CEC_CORE=m
      CONFIG_MEDIA_CEC_NOTIFIER=y
      CONFIG_VIDEO_STI_HDMI_CEC=m
      CONFIG_DRM_STI=y
      
      drivers/gpu/drm/sti/sti_hdmi.o: In function `sti_hdmi_remove':
      sti_hdmi.c:(.text.sti_hdmi_remove+0x10): undefined reference to
      `cec_notifier_set_phys_addr'
      sti_hdmi.c:(.text.sti_hdmi_remove+0x34): undefined reference to
      `cec_notifier_put'
      drivers/gpu/drm/sti/sti_hdmi.o: In function `sti_hdmi_connector_get_modes':
      sti_hdmi.c:(.text.sti_hdmi_connector_get_modes+0x4a): undefined
      reference to `cec_notifier_set_phys_addr_from_edid'
      drivers/gpu/drm/sti/sti_hdmi.o: In function `sti_hdmi_probe':
      sti_hdmi.c:(.text.sti_hdmi_probe+0x204): undefined reference to
      `cec_notifier_get'
      drivers/gpu/drm/sti/sti_hdmi.o: In function `sti_hdmi_connector_detect':
      sti_hdmi.c:(.text.sti_hdmi_connector_detect+0x36): undefined reference
      to `cec_notifier_set_phys_addr'
      drivers/gpu/drm/sti/sti_hdmi.o: In function `sti_hdmi_disable':
      sti_hdmi.c:(.text.sti_hdmi_disable+0xc0): undefined reference to
      `cec_notifier_set_phys_addr'
      
      The version below seems to work, though I don't particularly
      like the IS_REACHABLE() addition since that can be confusing
      to users.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      ae8eb443
  10. 05 Jun, 2017 3 commits
  11. 04 Jun, 2017 1 commit
  12. 02 Jun, 2017 4 commits
    • Michal Hocko's avatar
      mm: consider memblock reservations for deferred memory initialization sizing · 864b9a39
      Michal Hocko authored
      We have seen an early OOM killer invocation on ppc64 systems with
      crashkernel=4096M:
      
      	kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
      	kthreadd cpuset=/ mems_allowed=7
      	CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
      	Call Trace:
      	  dump_stack+0xb0/0xf0 (unreliable)
      	  dump_header+0xb0/0x258
      	  out_of_memory+0x5f0/0x640
      	  __alloc_pages_nodemask+0xa8c/0xc80
      	  kmem_getpages+0x84/0x1a0
      	  fallback_alloc+0x2a4/0x320
      	  kmem_cache_alloc_node+0xc0/0x2e0
      	  copy_process.isra.25+0x260/0x1b30
      	  _do_fork+0x94/0x470
      	  kernel_thread+0x48/0x60
      	  kthreadd+0x264/0x330
      	  ret_from_kernel_thread+0x5c/0xa4
      
      	Mem-Info:
      	active_anon:0 inactive_anon:0 isolated_anon:0
      	 active_file:0 inactive_file:0 isolated_file:0
      	 unevictable:0 dirty:0 writeback:0 unstable:0
      	 slab_reclaimable:5 slab_unreclaimable:73
      	 mapped:0 shmem:0 pagetables:0 bounce:0
      	 free:0 free_pcp:0 free_cma:0
      	Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      	lowmem_reserve[]: 0 0 0 0
      	Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
      	0 total pagecache pages
      	0 pages in swap cache
      	Swap cache stats: add 0, delete 0, find 0/0
      	Free swap  = 0kB
      	Total swap = 0kB
      	819200 pages RAM
      	0 pages HighMem/MovableOnly
      	817481 pages reserved
      	0 pages cma reserved
      	0 pages hwpoisoned
      
      the reason is that the managed memory is too low (only 110MB) while the
      rest of the the 50GB is still waiting for the deferred intialization to
      be done.  update_defer_init estimates the initial memoty to initialize
      to 2GB at least but it doesn't consider any memory allocated in that
      range.  In this particular case we've had
      
      	Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)
      
      so the low 2GB is mostly depleted.
      
      Fix this by considering memblock allocations in the initial static
      initialization estimation.  Move the max_initialise to
      reset_deferred_meminit and implement a simple memblock_reserved_memory
      helper which iterates all reserved blocks and sums the size of all that
      start below the given address.  The cumulative size is than added on top
      of the initial estimation.  This is still not ideal because
      reset_deferred_meminit doesn't consider holes and so reservation might
      be above the initial estimation whihch we ignore but let's make the
      logic simpler until we really need to handle more complicated cases.
      
      Fixes: 3a80a7fa ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
      Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.cz
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Tested-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org>	[4.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      864b9a39
    • James Morse's avatar
      mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified · 9a291a7c
      James Morse authored
      KVM uses get_user_pages() to resolve its stage2 faults.  KVM sets the
      FOLL_HWPOISON flag causing faultin_page() to return -EHWPOISON when it
      finds a VM_FAULT_HWPOISON.  KVM handles these hwpoison pages as a
      special case.  (check_user_page_hwpoison())
      
      When huge pages are involved, this doesn't work so well.
      get_user_pages() calls follow_hugetlb_page(), which stops early if it
      receives VM_FAULT_HWPOISON from hugetlb_fault(), eventually returning
      -EFAULT to the caller.  The step to map this to -EHWPOISON based on the
      FOLL_ flags is missing.  The hwpoison special case is skipped, and
      -EFAULT is returned to user-space, causing Qemu or kvmtool to exit.
      
      Instead, move this VM_FAULT_ to errno mapping code into a header file
      and use it from faultin_page() and follow_hugetlb_page().
      
      With this, KVM works as expected.
      
      This isn't a problem for arm64 today as we haven't enabled
      MEMORY_FAILURE, but I can't see any reason this doesn't happen on x86
      too, so I think this should be a fix.  This doesn't apply earlier than
      stable's v4.11.1 due to all sorts of cleanup.
      
      [james.morse@arm.com: add vm_fault_to_errno() call to faultin_page()]
      suggested.
        Link: http://lkml.kernel.org/r/20170525171035.16359-1-james.morse@arm.com
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170524160900.28786-1-james.morse@arm.com
      
      Signed-off-by: default avatarJames Morse <james.morse@arm.com>
      Acked-by: default avatarPunit Agrawal <punit.agrawal@arm.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>	[4.11.1+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a291a7c
    • Matthias Kaehlcke's avatar
      frv: declare jiffies to be located in the .data section · 60b0a8c3
      Matthias Kaehlcke authored
      Commit 7c30f352 ("jiffies.h: declare jiffies and jiffies_64 with
      ____cacheline_aligned_in_smp") removed a section specification from the
      jiffies declaration that caused conflicts on some platforms.
      
      Unfortunately this change broke the build for frv:
      
        kernel/built-in.o: In function `__do_softirq': (.text+0x6460): relocation truncated to fit: R_FRV_GPREL12 against symbol
            `jiffies' defined in *ABS* section in .tmp_vmlinux1
        kernel/built-in.o: In function `__do_softirq': (.text+0x6574): relocation truncated to fit: R_FRV_GPREL12 against symbol
            `jiffies' defined in *ABS* section in .tmp_vmlinux1
        kernel/built-in.o: In function `pwq_activate_delayed_work': workqueue.c:(.text+0x15b9c): relocation truncated to fit: R_FRV_GPREL12 against
            symbol `jiffies' defined in *ABS* section in .tmp_vmlinux1
        ...
      
      Add __jiffy_arch_data to the declaration of jiffies and use it on frv to
      include the section specification.  For all other platforms
      __jiffy_arch_data (currently) has no effect.
      
      Fixes: 7c30f352 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp")
      Link: http://lkml.kernel.org/r/20170516221333.177280-1-mka@chromium.org
      
      Signed-off-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      60b0a8c3
    • Michal Hocko's avatar
      include/linux/gfp.h: fix ___GFP_NOLOCKDEP value · 1bde33e0
      Michal Hocko authored
      Igor Stoppa has noticed that __GFP_NOLOCKDEP can use a lower bit.  At
      the time commit 7e784422 ("lockdep: allow to disable reclaim lockup
      detection") was written we still had __GFP_OTHER_NODE but I have removed
      it in commit 41b6167e ("mm: get rid of __GFP_OTHER_NODE") and forgot
      to lower the bit value.
      
      The current value is outside of __GFP_BITS_SHIFT so it cannot be used
      actually.
      
      Fixes: 7e784422
      
       ("lockdep: allow to disable reclaim lockup detection")
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarIgor Stoppa <igor.stoppa@nokia.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1bde33e0
  13. 01 Jun, 2017 3 commits
    • Majd Dibbiny's avatar
      RDMA/SA: Fix kernel panic in CMA request handler flow · d3957b86
      Majd Dibbiny authored
      Commit 9fdca4da (IB/SA: Split struct sa_path_rec based on IB and
      ROCE specific fields) moved the service_id to be specific attribute
      for IB and OPA SA Path Record, and thus wasn't assigned for RoCE.
      
      This caused to the following kernel panic in the CMA request handler flow:
      
      [   27.074594] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [   27.074731] IP: __radix_tree_lookup+0x1d/0xe0
      ...
      [   27.075356] Workqueue: ib_cm cm_work_handler [ib_cm]
      [   27.075401] task: ffff88022e3b8000 task.stack: ffffc90001298000
      [   27.075449] RIP: 0010:__radix_tree_lookup+0x1d/0xe0
      ...
      [   27.075979] Call Trace:
      [   27.076015]  radix_tree_lookup+0xd/0x10
      [   27.076055]  cma_ps_find+0x59/0x70 [rdma_cm]
      [   27.076097]  cma_id_from_event+0xd2/0x470 [rdma_cm]
      [   27.076144]  ? ib_init_ah_from_path+0x39a/0x590 [ib_core]
      [   27.076193]  cma_req_handler+0x25/0x480 [rdma_cm]
      [   27.076237]  cm_process_work+0x25/0x120 [ib_cm]
      [   27.076280]  ? cm_get_bth_pkey.isra.62+0x3c/0xa0 [ib_cm]
      [   27.076350]  cm_req_handler+0xb03/0xd40 [ib_cm]
      [   27.076430]  ? sched_clock_cpu+0x11/0xb0
      [   27.076478]  cm_work_handler+0x194/0x1588 [ib_cm]
      [   27.076525]  process_one_work+0x160/0x410
      [   27.076565]  worker_thread+0x137/0x4a0
      [   27.076614]  kthread+0x112/0x150
      [   27.076684]  ? max_active_store+0x60/0x60
      [   27.077642]  ? kthread_park+0x90/0x90
      [   27.078530]  ret_from_fork+0x2c/0x40
      
      This patch moves it back to the common SA Path Record structure
      and removes the redundant setter and getter.
      
      Tested on Connect-IB and Connect-X4 in Infiniband and RoCE respectively.
      
      Fixes: 9fdca4da
      
       (IB/SA: Split struct sa_path_rec based on IB ands
      	ROCE specific fields)
      Signed-off-by: default avatarMajd Dibbiny <majd@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      d3957b86
    • Leon Romanovsky's avatar
      RDMA/netlink: Reduce exposure of RDMA netlink functions · 233c1955
      Leon Romanovsky authored
      
      RDMA netlink is part of ib_core, hence ibnl_chk_listeners(),
      ibnl_init() and ibnl_cleanup() don't need to be published
      in public header file.
      
      Let's remove EXPORT_SYMBOL from ibnl_chk_listeners() and move all these
      functions to private header file.
      
      CC: Yuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: default avatarYuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      233c1955
    • Max Gurtovoy's avatar
      net/mlx5: Define interface bits for fencing UMR wqe · 1410a90a
      Max Gurtovoy authored
      
      HW can implement UMR wqe re-transmission in various ways.
      Thus, add HCA cap to distinguish the needed fence for UMR to make
      sure that the wqe wouldn't fail on mkey checks.
      Signed-off-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Acked-by: default avatarLeon Romanovsky <leon@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      1410a90a
  14. 31 May, 2017 1 commit
    • Nicholas Bellinger's avatar
      iscsi-target: Fix initial login PDU asynchronous socket close OOPs · 25cdda95
      Nicholas Bellinger authored
      This patch fixes a OOPs originally introduced by:
      
         commit bb048357
      
      
         Author: Nicholas Bellinger <nab@linux-iscsi.org>
         Date:   Thu Sep 5 14:54:04 2013 -0700
      
         iscsi-target: Add sk->sk_state_change to cleanup after TCP failure
      
      which would trigger a NULL pointer dereference when a TCP connection
      was closed asynchronously via iscsi_target_sk_state_change(), but only
      when the initial PDU processing in iscsi_target_do_login() from iscsi_np
      process context was blocked waiting for backend I/O to complete.
      
      To address this issue, this patch makes the following changes.
      
      First, it introduces some common helper functions used for checking
      socket closing state, checking login_flags, and atomically checking
      socket closing state + setting login_flags.
      
      Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
      connection has dropped via iscsi_target_sk_state_change(), but the
      initial PDU processing within iscsi_target_do_login() in iscsi_np
      context is still running.  For this case, it sets LOGIN_FLAGS_CLOSED,
      but doesn't invoke schedule_delayed_work().
      
      The original NULL pointer dereference case reported by MNC is now handled
      by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
      transitioning to FFP to determine when the socket has already closed,
      or iscsi_target_start_negotiation() if the login needs to exchange
      more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
      closed.  For both of these cases, the cleanup up of remaining connection
      resources will occur in iscsi_target_start_negotiation() from iscsi_np
      process context once the failure is detected.
      
      Finally, to handle to case where iscsi_target_sk_state_change() is
      called after the initial PDU procesing is complete, it now invokes
      conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
      existing iscsi_target_sk_check_close() checks detect connection failure.
      For this case, the cleanup of remaining connection resources will occur
      in iscsi_target_do_login_rx() from delayed workqueue process context
      once the failure is detected.
      Reported-by: default avatarMike Christie <mchristi@redhat.com>
      Reviewed-by: default avatarMike Christie <mchristi@redhat.com>
      Tested-by: default avatarMike Christie <mchristi@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Reported-by: default avatarHannes Reinecke <hare@suse.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Varun Prakash <varun@chelsio.com>
      Cc: <stable@vger.kernel.org> # v3.12+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      25cdda95
  15. 30 May, 2017 1 commit
    • Arnd Bergmann's avatar
      iommu/dma: Fix function declaration · 4b1c8898
      Arnd Bergmann authored
      Newly added code in the ipmmu-vmsa driver showed a small mistake
      in a header file that can't be included by itself without CONFIG_IOMMU_DMA
      enabled:
      
      In file included from drivers/iommu/ipmmu-vmsa.c:13:0:
      include/linux/dma-iommu.h:105:94: error: 'struct device' declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
      
      This adds a forward declaration for 'struct device', similar to how
      we treat the other struct types in this case.
      
      Fixes: 3ae47292 ("iommu/ipmmu-vmsa: Add new IOMMU_DOMAIN_DMA ops")
      Fixes: 273df963
      
       ("iommu/dma: Make PCI window reservation generic")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      4b1c8898
  16. 29 May, 2017 2 commits
  17. 26 May, 2017 2 commits
  18. 25 May, 2017 1 commit
  19. 24 May, 2017 3 commits
    • Tahsin Erdogan's avatar
      ext4: fix quota charging for shared xattr blocks · b8cb5a54
      Tahsin Erdogan authored
      
      ext4_xattr_block_set() calls dquot_alloc_block() to charge for an xattr
      block when new references are made. However if dquot_initialize() hasn't
      been called on an inode, request for charging is effectively ignored
      because ext4_inode_info->i_dquot is not initialized yet.
      
      Add dquot_initialize() to call paths that lead to ext4_xattr_block_set().
      Signed-off-by: default avatarTahsin Erdogan <tahsin@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      b8cb5a54
    • Vlad Yasevich's avatar
      vlan: Fix tcp checksum offloads in Q-in-Q vlans · 35d2f80b
      Vlad Yasevich authored
      It appears that TCP checksum offloading has been broken for
      Q-in-Q vlans.  The behavior was execerbated by the
      series
          commit afb0bc97
      
       ("Merge branch 'stacked_vlan_tso'")
      that that enabled accleleration features on stacked vlans.
      
      However, event without that series, it is possible to trigger
      this issue.  It just requires a lot more specialized configuration.
      
      The root cause is the interaction between how
      netdev_intersect_features() works, the features actually set on
      the vlan devices and HW having the ability to run checksum with
      longer headers.
      
      The issue starts when netdev_interesect_features() replaces
      NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM,
      if the HW advertises IP|IPV6 specific checksums.  This happens
      for tagged and multi-tagged packets.   However, HW that enables
      IP|IPV6 checksum offloading doesn't gurantee that packets with
      arbitrarily long headers can be checksummed.
      
      This patch disables IP|IPV6 checksums on the packet for multi-tagged
      packets.
      
      CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      CC: Michal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35d2f80b
    • Tejun Heo's avatar
      cpuset: consider dying css as offline · 41c25707
      Tejun Heo authored
      
      In most cases, a cgroup controller don't care about the liftimes of
      cgroups.  For the controller, a css becomes online when ->css_online()
      is called on it and offline when ->css_offline() is called.
      
      However, cpuset is special in that the user interface it exposes cares
      whether certain cgroups exist or not.  Combined with the RCU delay
      between cgroup removal and css offlining, this can lead to user
      visible behavior oddities where operations which should succeed after
      cgroup removals fail for some time period.  The effects of cgroup
      removals are delayed when seen from userland.
      
      This patch adds css_is_dying() which tests whether offline is pending
      and updates is_cpuset_online() so that the function returns false also
      while offline is pending.  This gets rid of the userland visible
      delays.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarDaniel Jordan <daniel.m.jordan@oracle.com>
      Link: http://lkml.kernel.org/r/327ca1f5-7957-fbb9-9e5f-9ba149d40ba2@oracle.com
      Cc: st...
      41c25707