1. 02 Jun, 2017 1 commit
  2. 31 May, 2017 1 commit
    • Nicholas Bellinger's avatar
      iscsi-target: Fix initial login PDU asynchronous socket close OOPs · 25cdda95
      Nicholas Bellinger authored
      This patch fixes a OOPs originally introduced by:
      
         commit bb048357
      
      
         Author: Nicholas Bellinger <nab@linux-iscsi.org>
         Date:   Thu Sep 5 14:54:04 2013 -0700
      
         iscsi-target: Add sk->sk_state_change to cleanup after TCP failure
      
      which would trigger a NULL pointer dereference when a TCP connection
      was closed asynchronously via iscsi_target_sk_state_change(), but only
      when the initial PDU processing in iscsi_target_do_login() from iscsi_np
      process context was blocked waiting for backend I/O to complete.
      
      To address this issue, this patch makes the following changes.
      
      First, it introduces some common helper functions used for checking
      socket closing state, checking login_flags, and atomically checking
      socket closing state + setting login_flags.
      
      Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
      connection has dropped via iscsi_target_sk_state_change(), but the
      initial PDU processing within iscsi_target_do_login() in iscsi_np
      context is still running.  For this case, it sets LOGIN_FLAGS_CLOSED,
      but doesn't invoke schedule_delayed_work().
      
      The original NULL pointer dereference case reported by MNC is now handled
      by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
      transitioning to FFP to determine when the socket has already closed,
      or iscsi_target_start_negotiation() if the login needs to exchange
      more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
      closed.  For both of these cases, the cleanup up of remaining connection
      resources will occur in iscsi_target_start_negotiation() from iscsi_np
      process context once the failure is detected.
      
      Finally, to handle to case where iscsi_target_sk_state_change() is
      called after the initial PDU procesing is complete, it now invokes
      conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
      existing iscsi_target_sk_check_close() checks detect connection failure.
      For this case, the cleanup of remaining connection resources will occur
      in iscsi_target_do_login_rx() from delayed workqueue process context
      once the failure is detected.
      Reported-by: default avatarMike Christie <mchristi@redhat.com>
      Reviewed-by: default avatarMike Christie <mchristi@redhat.com>
      Tested-by: default avatarMike Christie <mchristi@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Reported-by: default avatarHannes Reinecke <hare@suse.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Varun Prakash <varun@chelsio.com>
      Cc: <stable@vger.kernel.org> # v3.12+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      25cdda95
  3. 26 May, 2017 2 commits
  4. 25 May, 2017 1 commit
  5. 24 May, 2017 1 commit
    • Vlad Yasevich's avatar
      vlan: Fix tcp checksum offloads in Q-in-Q vlans · 35d2f80b
      Vlad Yasevich authored
      It appears that TCP checksum offloading has been broken for
      Q-in-Q vlans.  The behavior was execerbated by the
      series
          commit afb0bc97
      
       ("Merge branch 'stacked_vlan_tso'")
      that that enabled accleleration features on stacked vlans.
      
      However, event without that series, it is possible to trigger
      this issue.  It just requires a lot more specialized configuration.
      
      The root cause is the interaction between how
      netdev_intersect_features() works, the features actually set on
      the vlan devices and HW having the ability to run checksum with
      longer headers.
      
      The issue starts when netdev_interesect_features() replaces
      NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM,
      if the HW advertises IP|IPV6 specific checksums.  This happens
      for tagged and multi-tagged packets.   However, HW that enables
      IP|IPV6 checksum offloading doesn't gurantee that packets with
      arbitrarily long headers can be checksummed.
      
      This patch disables IP|IPV6 checksums on the packet for multi-tagged
      packets.
      
      CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      CC: Michal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35d2f80b
  6. 23 May, 2017 8 commits
  7. 22 May, 2017 4 commits
  8. 20 May, 2017 2 commits
  9. 18 May, 2017 6 commits
  10. 17 May, 2017 1 commit
  11. 16 May, 2017 2 commits
  12. 15 May, 2017 4 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: revisit chain/object refcounting from elements · 59105446
      Pablo Neira Ayuso authored
      
      Andreas reports that the following incremental update using our commit
      protocol doesn't work.
      
       # nft -f incremental-update.nft
       delete element ip filter client_to_any { 10.180.86.22 : goto CIn_1 }
       delete chain ip filter CIn_1
       ... Error: Could not process rule: Device or resource busy
      
      The existing code is not well-integrated into the commit phase protocol,
      since element deletions do not result in refcount decrement from the
      preparation phase. This results in bogus EBUSY errors like the one
      above.
      
      Two new functions come with this patch:
      
      * nft_set_elem_activate() function is used from the abort path, to
        restore the set element refcounting on objects that occurred from
        the preparation phase.
      
      * nft_set_elem_deactivate() that is called from nft_del_setelem() to
        decrement set element refcounting on objects from the preparation
        phase in the commit protocol.
      
      The nft_data_uninit() has been renamed to nft_data_release() since this
      function does not uninitialize any data store in the data register,
      instead just releases the references to objects. Moreover, a new
      function nft_data_hold() has been introduced to be used from
      nft_set_elem_activate().
      Reported-by: default avatarAndreas Schultz <aschultz@tpip.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      59105446
    • Willem de Bruijn's avatar
      netfilter: xtables: zero padding in data_to_user · 324318f0
      Willem de Bruijn authored
      When looking up an iptables rule, the iptables binary compares the
      aligned match and target data (XT_ALIGN). In some cases this can
      exceed the actual data size to include padding bytes.
      
      Before commit f77bc5b2 ("iptables: use match, target and data
      copy_to_user helpers") the malloc()ed bytes were overwritten by the
      kernel with kzalloced contents, zeroing the padding and making the
      comparison succeed. After this patch, the kernel copies and clears
      only data, leaving the padding bytes undefined.
      
      Extend the clear operation from data size to aligned data size to
      include the padding bytes, if any.
      
      Padding bytes can be observed in both match and target, and the bug
      triggered, by issuing a rule with match icmp and target ACCEPT:
      
        iptables -t mangle -A INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT
        iptables -t mangle -D INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT
      
      Fixes: f77bc5b2
      
       ("iptables: use match, target and data copy_to_user helpers")
      Reported-by: default avatarPaul Moore <pmoore@redhat.com>
      Reported-by: default avatarRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      324318f0
    • Liping Zhang's avatar
      netfilter: nfnl_cthelper: reject del request if helper obj is in use · 9338d7b4
      Liping Zhang authored
      
      We can still delete the ct helper even if it is in use, this will cause
      a use-after-free error. In more detail, I mean:
        # nfct helper add ssdp inet udp
        # iptables -t raw -A OUTPUT -p udp -j CT --helper ssdp
        # nfct helper delete ssdp //--> oops, succeed!
        BUG: unable to handle kernel paging request at 000026ca
        IP: 0x26ca
        [...]
        Call Trace:
         ? ipv4_helper+0x62/0x80 [nf_conntrack_ipv4]
         nf_hook_slow+0x21/0xb0
         ip_output+0xe9/0x100
         ? ip_fragment.constprop.54+0xc0/0xc0
         ip_local_out+0x33/0x40
         ip_send_skb+0x16/0x80
         udp_send_skb+0x84/0x240
         udp_sendmsg+0x35d/0xa50
      
      So add reference count to fix this issue, if ct helper is used by
      others, reject the delete request.
      
      Apply this patch:
        # nfct helper delete ssdp
        nfct v1.4.3: netlink error: Device or resource busy
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9338d7b4
    • Liping Zhang's avatar
      netfilter: introduce nf_conntrack_helper_put helper function · d91fc59c
      Liping Zhang authored
      
      And convert module_put invocation to nf_conntrack_helper_put, this is
      prepared for the followup patch, which will add a refcnt for cthelper,
      so we can reject the deleting request when cthelper is in use.
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d91fc59c
  13. 14 May, 2017 2 commits
  14. 12 May, 2017 5 commits
    • Ross Zwisler's avatar
      dax: prevent invalidation of mapped DAX entries · 4636e70b
      Ross Zwisler authored
      Patch series "mm,dax: Fix data corruption due to mmap inconsistency",
      v4.
      
      This series fixes data corruption that can happen for DAX mounts when
      page faults race with write(2) and as a result page tables get out of
      sync with block mappings in the filesystem and thus data seen through
      mmap is different from data seen through read(2).
      
      The series passes testing with t_mmap_stale test program from Ross and
      also other mmap related tests on DAX filesystem.
      
      This patch (of 4):
      
      dax_invalidate_mapping_entry() currently removes DAX exceptional entries
      only if they are clean and unlocked.  This is done via:
      
        invalidate_mapping_pages()
          invalidate_exceptional_entry()
            dax_invalidate_mapping_entry()
      
      However, for page cache pages removed in invalidate_mapping_pages()
      there is an additional criteria which is that the page must not be
      mapped.  This is noted in the comments above invalidate_mapping_pages()
      and is checked in invalidate_inode_page().
      
      For DAX entries this means that we can can end up in a situation where a
      DAX exceptional entry, either a huge zero page or a regular DAX entry,
      could end up mapped but without an associated radix tree entry.  This is
      inconsistent with the rest of the DAX code and with what happens in the
      page cache case.
      
      We aren't able to unmap the DAX exceptional entry because according to
      its comments invalidate_mapping_pages() isn't allowed to block, and
      unmap_mapping_range() takes a write lock on the mapping->i_mmap_rwsem.
      
      Since we essentially never have unmapped DAX entries to evict from the
      radix tree, just remove dax_invalidate_mapping_entry().
      
      Fixes: c6dcf52c ("mm: Invalidate DAX radix tree entries only if appropriate")
      Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.cz
      
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reported-by: default avatarJan Kara <jack@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>    [4.10+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4636e70b
    • Michal Hocko's avatar
      mm, vmalloc: fix vmalloc users tracking properly · 8594a21c
      Michal Hocko authored
      Commit 1f5307b1 ("mm, vmalloc: properly track vmalloc users") has
      pulled asm/pgtable.h include dependency to linux/vmalloc.h and that
      turned out to be a bad idea for some architectures.  E.g.  m68k fails
      with
      
         In file included from arch/m68k/include/asm/pgtable_mm.h:145:0,
                          from arch/m68k/include/asm/pgtable.h:4,
                          from include/linux/vmalloc.h:9,
                          from arch/m68k/kernel/module.c:9:
         arch/m68k/include/asm/mcf_pgtable.h: In function 'nocache_page':
      >> arch/m68k/include/asm/mcf_pgtable.h:339:43: error: 'init_mm' undeclared (first use in this function)
          #define pgd_offset_k(address) pgd_offset(&init_mm, address)
      
      as spotted by kernel build bot. nios2 fails for other reason
      
        In file included from include/asm-generic/io.h:767:0,
                         from arch/nios2/include/asm/io.h:61,
                         from include/linux/io.h:25,
                         from arch/nios2/include/asm/pgtable.h:18,
                         from include/linux/mm.h:70,
                         from include/linux/pid_namespace.h:6,
                         from include/linux/ptrace.h:9,
                         from arch/nios2/include/uapi/asm/elf.h:23,
                         from arch/nios2/include/asm/elf.h:22,
                         from include/linux/elf.h:4,
                         from include/linux/module.h:15,
                         from init/main.c:16:
        include/linux/vmalloc.h: In function '__vmalloc_node_flags':
        include/linux/vmalloc.h:99:40: error: 'PAGE_KERNEL' undeclared (first use in this function); did you mean 'GFP_KERNEL'?
      
      which is due to the newly added #include <asm/pgtable.h>, which on nios2
      includes <linux/io.h> and thus <asm/io.h> and <asm-generic/io.h> which
      again includes <linux/vmalloc.h>.
      
      Tweaking that around just turns out a bigger headache than necessary.
      This patch reverts 1f5307b1 and reimplements the original fix in a
      different way.  __vmalloc_node_flags can stay static inline which will
      cover vmalloc* functions.  We only have one external user
      (kvmalloc_node) and we can export __vmalloc_node_flags_caller and
      provide the caller directly.  This is much simpler and it doesn't really
      need any games with header files.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [mhocko@kernel.org: revert old comment]
        Link: http://lkml.kernel.org/r/20170509211054.GB16325@dhcp22.suse.cz
      Fixes: 1f5307b1 ("mm, vmalloc: properly track vmalloc users")
      Link: http://lkml.kernel.org/r/20170509153702.GR6481@dhcp22.suse.cz
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tobias Klauser <tklauser@distanz.ch>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8594a21c
    • Deepa Dinamani's avatar
      time: delete current_fs_time() · 572e0ca9
      Deepa Dinamani authored
      All uses of the current_fs_time() function have been replaced by other
      time interfaces.
      
      And, its use cases can be fulfilled by current_time() or ktime_get_*
      variants.
      
      Link: http://lkml.kernel.org/r/1491613030-11599-13-git-send-email-deepa.kernel@gmail.com
      
      Signed-off-by: default avatarDeepa Dinamani <deepa.kernel@gmail.com>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      572e0ca9
    • Daniel Borkmann's avatar
      xdp: refine xdp api with regards to generic xdp · d67b9cd2
      Daniel Borkmann authored
      While working on the iproute2 generic XDP frontend, I noticed that
      as of right now it's possible to have native *and* generic XDP
      programs loaded both at the same time for the case when a driver
      supports native XDP.
      
      The intended model for generic XDP from b5cdae32 ("net: Generic
      XDP") is, however, that only one out of the two can be present at
      once which is also indicated as such in the XDP netlink dump part.
      The main rationale for generic XDP is to ease accessibility (in
      case a driver does not yet have XDP support) and to generically
      provide a semantical model as an example for driver developers
      wanting to add XDP support. The generic XDP option for an XDP
      aware driver can still be useful for comparing and testing both
      implementations.
      
      However, it is not intended to have a second XDP processing stage
      or layer with exactly the same functionality of the first native
      stage. Only reason could be to have a partial fallback for future
      XDP features that are not supported yet in the native implementation
      and we probably also shouldn't strive for such fallback and instead
      encourage native feature support in the first place. Given there's
      currently no such fallback issue or use case, lets not go there yet
      if we don't need to.
      
      Therefore, change semantics for loading XDP and bail out if the
      user tries to load a generic XDP program when a native one is
      present and vice versa. Another alternative to bailing out would
      be to handle the transition from one flavor to another gracefully,
      but that would require to bring the device down, exchange both
      types of programs, and bring it up again in order to avoid a tiny
      window where a packet could hit both hooks. Given this complicates
      the logic for just a debugging feature in the native case, I went
      with the simpler variant.
      
      For the dump, remove IFLA_XDP_FLAGS that was added with b5cdae32
      
      
      and reuse IFLA_XDP_ATTACHED for indicating the mode. Dumping all
      or just a subset of flags that were used for loading the XDP prog
      is suboptimal in the long run since not all flags are useful for
      dumping and if we start to reuse the same flag definitions for
      load and dump, then we'll waste bit space. What we really just
      want is to dump the mode for now.
      
      Current IFLA_XDP_ATTACHED semantics are: nothing was installed (0),
      a program is running at the native driver layer (1). Thus, add a
      mode that says that a program is running at generic XDP layer (2).
      Applications will handle this fine in that older binaries will
      just indicate that something is attached at XDP layer, effectively
      this is similar to IFLA_XDP_FLAGS attr that we would have had
      modulo the redundancy.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d67b9cd2
    • Daniel Borkmann's avatar
      xdp: add flag to enforce driver mode · 0489df9a
      Daniel Borkmann authored
      After commit b5cdae32
      
       ("net: Generic XDP") we automatically fall
      back to a generic XDP variant if the driver does not support native
      XDP. Allow for an option where the user can specify that always the
      native XDP variant should be selected and in case it's not supported
      by a driver, just bail out.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0489df9a