You need to sign in or sign up before continuing.
  1. 15 Jun, 2017 1 commit
  2. 02 Jun, 2017 4 commits
    • Michal Hocko's avatar
      mm: consider memblock reservations for deferred memory initialization sizing · 864b9a39
      Michal Hocko authored
      We have seen an early OOM killer invocation on ppc64 systems with
      crashkernel=4096M:
      
      	kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
      	kthreadd cpuset=/ mems_allowed=7
      	CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
      	Call Trace:
      	  dump_stack+0xb0/0xf0 (unreliable)
      	  dump_header+0xb0/0x258
      	  out_of_memory+0x5f0/0x640
      	  __alloc_pages_nodemask+0xa8c/0xc80
      	  kmem_getpages+0x84/0x1a0
      	  fallback_alloc+0x2a4/0x320
      	  kmem_cache_alloc_node+0xc0/0x2e0
      	  copy_process.isra.25+0x260/0x1b30
      	  _do_fork+0x94/0x470
      	  kernel_thread+0x48/0x60
      	  kthreadd+0x264/0x330
      	  ret_from_kernel_thread+0x5c/0xa4
      
      	Mem-Info:
      	active_anon:0 inactive_anon:0 isolated_anon:0
      	 active_file:0 inactive_file:0 isolated_file:0
      	 unevictable:0 dirty:0 writeback:0 unstable:0
      	 slab_reclaimable:5 slab_unreclaimable:73
      	 mapped:0 shmem:0 pagetables:0 bounce:0
      	 free:0 free_pcp:0 free_cma:0
      	Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      	lowmem_reserve[]: 0 0 0 0
      	Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
      	0 total pagecache pages
      	0 pages in swap cache
      	Swap cache stats: add 0, delete 0, find 0/0
      	Free swap  = 0kB
      	Total swap = 0kB
      	819200 pages RAM
      	0 pages HighMem/MovableOnly
      	817481 pages reserved
      	0 pages cma reserved
      	0 pages hwpoisoned
      
      the reason is that the managed memory is too low (only 110MB) while the
      rest of the the 50GB is still waiting for the deferred intialization to
      be done.  update_defer_init estimates the initial memoty to initialize
      to 2GB at least but it doesn't consider any memory allocated in that
      range.  In this particular case we've had
      
      	Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)
      
      so the low 2GB is mostly depleted.
      
      Fix this by considering memblock allocations in the initial static
      initialization estimation.  Move the max_initialise to
      reset_deferred_meminit and implement a simple memblock_reserved_memory
      helper which iterates all reserved blocks and sums the size of all that
      start below the given address.  The cumulative size is than added on top
      of the initial estimation.  This is still not ideal because
      reset_deferred_meminit doesn't consider holes and so reservation might
      be above the initial estimation whihch we ignore but let's make the
      logic simpler until we really need to handle more complicated cases.
      
      Fixes: 3a80a7fa ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
      Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.cz
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Tested-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org>	[4.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      864b9a39
    • James Morse's avatar
      mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified · 9a291a7c
      James Morse authored
      KVM uses get_user_pages() to resolve its stage2 faults.  KVM sets the
      FOLL_HWPOISON flag causing faultin_page() to return -EHWPOISON when it
      finds a VM_FAULT_HWPOISON.  KVM handles these hwpoison pages as a
      special case.  (check_user_page_hwpoison())
      
      When huge pages are involved, this doesn't work so well.
      get_user_pages() calls follow_hugetlb_page(), which stops early if it
      receives VM_FAULT_HWPOISON from hugetlb_fault(), eventually returning
      -EFAULT to the caller.  The step to map this to -EHWPOISON based on the
      FOLL_ flags is missing.  The hwpoison special case is skipped, and
      -EFAULT is returned to user-space, causing Qemu or kvmtool to exit.
      
      Instead, move this VM_FAULT_ to errno mapping code into a header file
      and use it from faultin_page() and follow_hugetlb_page().
      
      With this, KVM works as expected.
      
      This isn't a problem for arm64 today as we haven't enabled
      MEMORY_FAILURE, but I can't see any reason this doesn't happen on x86
      too, so I think this should be a fix.  This doesn't apply earlier than
      stable's v4.11.1 due to all sorts of cleanup.
      
      [james.morse@arm.com: add vm_fault_to_errno() call to faultin_page()]
      suggested.
        Link: http://lkml.kernel.org/r/20170525171035.16359-1-james.morse@arm.com
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170524160900.28786-1-james.morse@arm.com
      
      Signed-off-by: default avatarJames Morse <james.morse@arm.com>
      Acked-by: default avatarPunit Agrawal <punit.agrawal@arm.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>	[4.11.1+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a291a7c
    • Matthias Kaehlcke's avatar
      frv: declare jiffies to be located in the .data section · 60b0a8c3
      Matthias Kaehlcke authored
      Commit 7c30f352 ("jiffies.h: declare jiffies and jiffies_64 with
      ____cacheline_aligned_in_smp") removed a section specification from the
      jiffies declaration that caused conflicts on some platforms.
      
      Unfortunately this change broke the build for frv:
      
        kernel/built-in.o: In function `__do_softirq': (.text+0x6460): relocation truncated to fit: R_FRV_GPREL12 against symbol
            `jiffies' defined in *ABS* section in .tmp_vmlinux1
        kernel/built-in.o: In function `__do_softirq': (.text+0x6574): relocation truncated to fit: R_FRV_GPREL12 against symbol
            `jiffies' defined in *ABS* section in .tmp_vmlinux1
        kernel/built-in.o: In function `pwq_activate_delayed_work': workqueue.c:(.text+0x15b9c): relocation truncated to fit: R_FRV_GPREL12 against
            symbol `jiffies' defined in *ABS* section in .tmp_vmlinux1
        ...
      
      Add __jiffy_arch_data to the declaration of jiffies and use it on frv to
      include the section specification.  For all other platforms
      __jiffy_arch_data (currently) has no effect.
      
      Fixes: 7c30f352 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp")
      Link: http://lkml.kernel.org/r/20170516221333.177280-1-mka@chromium.org
      
      Signed-off-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      60b0a8c3
    • Michal Hocko's avatar
      include/linux/gfp.h: fix ___GFP_NOLOCKDEP value · 1bde33e0
      Michal Hocko authored
      Igor Stoppa has noticed that __GFP_NOLOCKDEP can use a lower bit.  At
      the time commit 7e784422 ("lockdep: allow to disable reclaim lockup
      detection") was written we still had __GFP_OTHER_NODE but I have removed
      it in commit 41b6167e ("mm: get rid of __GFP_OTHER_NODE") and forgot
      to lower the bit value.
      
      The current value is outside of __GFP_BITS_SHIFT so it cannot be used
      actually.
      
      Fixes: 7e784422
      
       ("lockdep: allow to disable reclaim lockup detection")
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarIgor Stoppa <igor.stoppa@nokia.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1bde33e0
  3. 01 Jun, 2017 3 commits
    • Majd Dibbiny's avatar
      RDMA/SA: Fix kernel panic in CMA request handler flow · d3957b86
      Majd Dibbiny authored
      Commit 9fdca4da (IB/SA: Split struct sa_path_rec based on IB and
      ROCE specific fields) moved the service_id to be specific attribute
      for IB and OPA SA Path Record, and thus wasn't assigned for RoCE.
      
      This caused to the following kernel panic in the CMA request handler flow:
      
      [   27.074594] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [   27.074731] IP: __radix_tree_lookup+0x1d/0xe0
      ...
      [   27.075356] Workqueue: ib_cm cm_work_handler [ib_cm]
      [   27.075401] task: ffff88022e3b8000 task.stack: ffffc90001298000
      [   27.075449] RIP: 0010:__radix_tree_lookup+0x1d/0xe0
      ...
      [   27.075979] Call Trace:
      [   27.076015]  radix_tree_lookup+0xd/0x10
      [   27.076055]  cma_ps_find+0x59/0x70 [rdma_cm]
      [   27.076097]  cma_id_from_event+0xd2/0x470 [rdma_cm]
      [   27.076144]  ? ib_init_ah_from_path+0x39a/0x590 [ib_core]
      [   27.076193]  cma_req_handler+0x25/0x480 [rdma_cm]
      [   27.076237]  cm_process_work+0x25/0x120 [ib_cm]
      [   27.076280]  ? cm_get_bth_pkey.isra.62+0x3c/0xa0 [ib_cm]
      [   27.076350]  cm_req_handler+0xb03/0xd40 [ib_cm]
      [   27.076430]  ? sched_clock_cpu+0x11/0xb0
      [   27.076478]  cm_work_handler+0x194/0x1588 [ib_cm]
      [   27.076525]  process_one_work+0x160/0x410
      [   27.076565]  worker_thread+0x137/0x4a0
      [   27.076614]  kthread+0x112/0x150
      [   27.076684]  ? max_active_store+0x60/0x60
      [   27.077642]  ? kthread_park+0x90/0x90
      [   27.078530]  ret_from_fork+0x2c/0x40
      
      This patch moves it back to the common SA Path Record structure
      and removes the redundant setter and getter.
      
      Tested on Connect-IB and Connect-X4 in Infiniband and RoCE respectively.
      
      Fixes: 9fdca4da
      
       (IB/SA: Split struct sa_path_rec based on IB ands
      	ROCE specific fields)
      Signed-off-by: default avatarMajd Dibbiny <majd@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      d3957b86
    • Leon Romanovsky's avatar
      RDMA/netlink: Reduce exposure of RDMA netlink functions · 233c1955
      Leon Romanovsky authored
      
      RDMA netlink is part of ib_core, hence ibnl_chk_listeners(),
      ibnl_init() and ibnl_cleanup() don't need to be published
      in public header file.
      
      Let's remove EXPORT_SYMBOL from ibnl_chk_listeners() and move all these
      functions to private header file.
      
      CC: Yuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: default avatarYuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      233c1955
    • Max Gurtovoy's avatar
      net/mlx5: Define interface bits for fencing UMR wqe · 1410a90a
      Max Gurtovoy authored
      
      HW can implement UMR wqe re-transmission in various ways.
      Thus, add HCA cap to distinguish the needed fence for UMR to make
      sure that the wqe wouldn't fail on mkey checks.
      Signed-off-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Acked-by: default avatarLeon Romanovsky <leon@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      1410a90a
  4. 31 May, 2017 1 commit
    • Nicholas Bellinger's avatar
      iscsi-target: Fix initial login PDU asynchronous socket close OOPs · 25cdda95
      Nicholas Bellinger authored
      This patch fixes a OOPs originally introduced by:
      
         commit bb048357
         Author: Nicholas Bellinger <nab@linux-iscsi.org>
         Date:   Thu Sep 5 14:54:04 2013 -0700
      
         iscsi-target: Add sk->sk_state_change to cleanup after TCP failure
      
      which would trigger a NULL pointer dereference when a TCP connection
      was closed asynchronously via iscsi_target_sk_state_change(), but only
      when the initial PDU processing in iscsi_target_do_login() from iscsi_np
      process context was blocked waiting for backend I/O to complete.
      
      To address this issue, this patch makes the following changes.
      
      First, it introduces some common helper functions used for checking
      socket closing state, checking login_flags, and atomically checking
      socket closing state + setting login_flags.
      
      Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
      connection has dropped via iscsi_target_sk_state_change(), but the
      initial PDU processing w...
      25cdda95
  5. 29 May, 2017 2 commits
  6. 26 May, 2017 2 commits
    • Eric Dumazet's avatar
      ipv4: add reference counting to metrics · 3fb07daf
      Eric Dumazet authored
      Andrey Konovalov reported crashes in ipv4_mtu()
      
      I could reproduce the issue with KASAN kernels, between
      10.246.7.151 and 10.246.7.152 :
      
      1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 &
      
      2) At the same time run following loop :
      while :
      do
       ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
       ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
      done
      
      Cong Wang attempted to add back rt->fi in commit
      82486aa6 ("ipv4: restore rt->fi for reference counting")
      but this proved to add some issues that were complex to solve.
      
      Instead, I suggested to add a refcount to the metrics themselves,
      being a standalone object (in particular, no reference to other objects)
      
      I tried to make this patch as small as possible to ease its backport,
      instead of being super clean. Note that we believe that only ipv4 dst
      need to take care of the metric refcount. But if this is wrong,
      this patch adds the basic infrastructure to extend this to other
      families.
      
      Many thank...
      3fb07daf
    • Christoph Hellwig's avatar
      PCI/msi: fix the pci_alloc_irq_vectors_affinity stub · 83b4605b
      Christoph Hellwig authored
      
      We need to return an error for any call that asks for MSI / MSI-X
      vectors only, so that non-trivial fallback logic can work properly.
      
      Also valid dev->irq and use the "correct" errno value based on feedback
      from Linus.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Fixes: aff17164
      
       ("PCI: Provide sensible IRQ vector alloc/free routines")
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83b4605b
  7. 25 May, 2017 1 commit
  8. 24 May, 2017 1 commit
    • Vlad Yasevich's avatar
      vlan: Fix tcp checksum offloads in Q-in-Q vlans · 35d2f80b
      Vlad Yasevich authored
      It appears that TCP checksum offloading has been broken for
      Q-in-Q vlans.  The behavior was execerbated by the
      series
          commit afb0bc97
      
       ("Merge branch 'stacked_vlan_tso'")
      that that enabled accleleration features on stacked vlans.
      
      However, event without that series, it is possible to trigger
      this issue.  It just requires a lot more specialized configuration.
      
      The root cause is the interaction between how
      netdev_intersect_features() works, the features actually set on
      the vlan devices and HW having the ability to run checksum with
      longer headers.
      
      The issue starts when netdev_interesect_features() replaces
      NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM,
      if the HW advertises IP|IPV6 specific checksums.  This happens
      for tagged and multi-tagged packets.   However, HW that enables
      IP|IPV6 checksum offloading doesn't gurantee that packets with
      arbitrarily long headers can be checksummed.
      
      This patch disables IP|IPV6 checksums on the packet for multi-tagged
      packets.
      
      CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      CC: Michal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35d2f80b
  9. 23 May, 2017 8 commits
  10. 22 May, 2017 4 commits
  11. 20 May, 2017 2 commits
  12. 18 May, 2017 6 commits
  13. 17 May, 2017 1 commit
  14. 16 May, 2017 2 commits
  15. 15 May, 2017 2 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: revisit chain/object refcounting from elements · 59105446
      Pablo Neira Ayuso authored
      
      Andreas reports that the following incremental update using our commit
      protocol doesn't work.
      
       # nft -f incremental-update.nft
       delete element ip filter client_to_any { 10.180.86.22 : goto CIn_1 }
       delete chain ip filter CIn_1
       ... Error: Could not process rule: Device or resource busy
      
      The existing code is not well-integrated into the commit phase protocol,
      since element deletions do not result in refcount decrement from the
      preparation phase. This results in bogus EBUSY errors like the one
      above.
      
      Two new functions come with this patch:
      
      * nft_set_elem_activate() function is used from the abort path, to
        restore the set element refcounting on objects that occurred from
        the preparation phase.
      
      * nft_set_elem_deactivate() that is called from nft_del_setelem() to
        decrement set element refcounting on objects from the preparation
        phase in the commit protocol.
      
      The nft_data_uninit() has been renamed to nft_data_release() since this
      function does not uninitialize any data store in the data register,
      instead just releases the references to objects. Moreover, a new
      function nft_data_hold() has been introduced to be used from
      nft_set_elem_activate().
      Reported-by: default avatarAndreas Schultz <aschultz@tpip.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      59105446
    • Willem de Bruijn's avatar
      netfilter: xtables: zero padding in data_to_user · 324318f0
      Willem de Bruijn authored
      When looking up an iptables rule, the iptables binary compares the
      aligned match and target data (XT_ALIGN). In some cases this can
      exceed the actual data size to include padding bytes.
      
      Before commit f77bc5b2 ("iptables: use match, target and data
      copy_to_user helpers") the malloc()ed bytes were overwritten by the
      kernel with kzalloced contents, zeroing the padding and making the
      comparison succeed. After this patch, the kernel copies and clears
      only data, leaving the padding bytes undefined.
      
      Extend the clear operation from data size to aligned data size to
      include the padding bytes, if any.
      
      Padding bytes can be observed in both match and target, and the bug
      triggered, by issuing a rule with match icmp and target ACCEPT:
      
        iptables -t mangle -A INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT
        iptables -t mangle -D INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT
      
      Fixes: f77bc5b2
      
       ("iptables: use match, target and data copy_to_user helpers")
      Reported-by: default avatarPaul Moore <pmoore@redhat.com>
      Reported-by: default avatarRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      324318f0