1. 11 Nov, 2018 2 commits
    • Eric Dumazet's avatar
      act_mirred: clear skb->tstamp on redirect · 7236ead1
      Eric Dumazet authored
      If sch_fq is used at ingress, skbs that might have been
      timestamped by net_timestamp_set() if a packet capture
      is requesting timestamps could be delayed by arbitrary
      amount of time, since sch_fq time base is MONOTONIC.
      
      Fix this problem by moving code from sch_netem.c to act_mirred.c.
      
      Fixes: fb420d5d
      
       ("tcp/fq: move back to CLOCK_MONOTONIC")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7236ead1
    • Jon Maloy's avatar
      tipc: fix link re-establish failure · 7ab412d3
      Jon Maloy authored
      When a link failure is detected locally, the link is reset, the flag
      link->in_session is set to false, and a RESET_MSG with the 'stopping'
      bit set is sent to the peer.
      
      The purpose of this bit is to inform the peer that this endpoint just
      is going down, and that the peer should handle the reception of this
      particular RESET message as a local failure. This forces the peer to
      accept another RESET or ACTIVATE message from this endpoint before it
      can re-establish the link. This again is necessary to ensure that
      link session numbers are properly exchanged before the link comes up
      again.
      
      If a failure is detected locally at the same time at the peer endpoint
      this will do the same, which is also a correct behavior.
      
      However, when receiving such messages, the endpoints will not
      distinguish between 'stopping' RESETs and ordinary ones when it comes
      to updating session numbers. Both endpoints will copy the received
      session number and set their 'in_session' flags to true at the
      reception, while they are still expecting another RESET from the
      peer before they can go ahead and re-establish. This is contradictory,
      since, after applying the validation check referred to below, the
      'in_session' flag will cause rejection of all such messages, and the
      link will never come up again.
      
      We now fix this by not only handling received RESET/STOPPING messages
      as a local failure, but also by omitting to set a new session number
      and the 'in_session' flag in such cases.
      
      Fixes: 7ea817f4
      
       ("tipc: check session number before accepting link protocol messages")
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ab412d3
  2. 10 Nov, 2018 2 commits
  3. 09 Nov, 2018 1 commit
  4. 08 Nov, 2018 1 commit
  5. 06 Nov, 2018 5 commits
    • Taehee Yoo's avatar
      net: bpfilter: fix iptables failure if bpfilter_umh is disabled · 97adadda
      Taehee Yoo authored
      When iptables command is executed, ip_{set/get}sockopt() try to upload
      bpfilter.ko if bpfilter is enabled. if it couldn't find bpfilter.ko,
      command is failed.
      bpfilter.ko is generated if CONFIG_BPFILTER_UMH is enabled.
      ip_{set/get}sockopt() only checks CONFIG_BPFILTER.
      So that if CONFIG_BPFILTER is enabled and CONFIG_BPFILTER_UMH is disabled,
      iptables command is always failed.
      
      test config:
         CONFIG_BPFILTER=y
         # CONFIG_BPFILTER_UMH is not set
      
      test command:
         %iptables -L
         iptables: No chain/target/match by that name.
      
      Fixes: d2ba09c1
      
       ("net: add skeleton of bpfilter kernel module")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97adadda
    • Andrei Vagin's avatar
      sock_diag: fix autoloading of the raw_diag module · c34c1287
      Andrei Vagin authored
      IPPROTO_RAW isn't registred as an inet protocol, so
      inet_protos[protocol] is always NULL for it.
      
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Fixes: bf2ae2e4
      
       ("sock_diag: request _diag module only when the family or proto has been registered")
      Signed-off-by: default avatarAndrei Vagin <avagin@gmail.com>
      Reviewed-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c34c1287
    • Matwey V. Kornilov's avatar
      net: core: netpoll: Enable netconsole IPv6 link local address · d016b4a3
      Matwey V. Kornilov authored
      
      There is no reason to discard using source link local address when
      remote netconsole IPv6 address is set to be link local one.
      
      The patch allows administrators to use IPv6 netconsole without
      explicitly configuring source address:
      
          netconsole=@/,@fe80::5054:ff:fe2f:6012/
      Signed-off-by: default avatarMatwey V. Kornilov <matwey@sai.msu.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d016b4a3
    • Alexey Kodanev's avatar
      ipv6: properly check return value in inet6_dump_all() · e22d0bfa
      Alexey Kodanev authored
      Make sure we call fib6_dump_end() if it happens that skb->len
      is zero. rtnl_dump_all() can reset cb->args on the next loop
      iteration there.
      
      Fixes: 08e814c9 ("net/ipv6: Bail early if user only wants cloned entries")
      Fixes: ae677bbb
      
       ("net: Don't return invalid table id error when dumping all families")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e22d0bfa
    • Alexey Kodanev's avatar
      rtnetlink: restore handling of dumpit return value in rtnl_dump_all() · 5e1acb4a
      Alexey Kodanev authored
      For non-zero return from dumpit() we should break the loop
      in rtnl_dump_all() and return the result. Otherwise, e.g.,
      we could get the memory leak in inet6_dump_fib() [1]. The
      pointer to the allocated struct fib6_walker there (saved
      in cb->args) can be lost, reset on the next iteration.
      
      Fix it by partially restoring the previous behavior before
      commit c63586dc ("net: rtnl_dump_all needs to propagate
      error from dumpit function"). The returned error from
      dumpit() is still passed further.
      
      [1]:
      unreferenced object 0xffff88001322a200 (size 96):
        comm "sshd", pid 1484, jiffies 4296032768 (age 1432.542s)
        hex dump (first 32 bytes):
          00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
          18 09 41 36 00 88 ff ff 18 09 41 36 00 88 ff ff  ..A6......A6....
        backtrace:
          [<0000000095846b39>] kmem_cache_alloc_trace+0x151/0x220
          [<000000007d12709f>] inet6_dump_fib+0x68d/0x940
          [<000000002775a316>] rtnl_dump_all+0x1d9/0x2d0
          [<00000000d7cd302b>] netlink_dump+0x945/0x11a0
          [<000000002f43485f>] __netlink_dump_start+0x55d/0x800
          [<00000000f76bbeec>] rtnetlink_rcv_msg+0x4fa/0xa00
          [<000000009b5761f3>] netlink_rcv_skb+0x29c/0x420
          [<0000000087a1dae1>] rtnetlink_rcv+0x15/0x20
          [<00000000691b703b>] netlink_unicast+0x4e3/0x6c0
          [<00000000b5be0204>] netlink_sendmsg+0x7f2/0xba0
          [<0000000096d2aa60>] sock_sendmsg+0xba/0xf0
          [<000000008c1b786f>] __sys_sendto+0x1e4/0x330
          [<0000000019587b3f>] __x64_sys_sendto+0xe1/0x1a0
          [<00000000071f4d56>] do_syscall_64+0x9f/0x300
          [<000000002737577f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
          [<0000000057587684>] 0xffffffffffffffff
      
      Fixes: c63586dc
      
       ("net: rtnl_dump_all needs to propagate error from dumpit function")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e1acb4a
  6. 05 Nov, 2018 2 commits
  7. 04 Nov, 2018 2 commits
  8. 03 Nov, 2018 10 commits
    • Eric Dumazet's avatar
      net: do not abort bulk send on BQL status · fe60faa5
      Eric Dumazet authored
      
      Before calling dev_hard_start_xmit(), upper layers tried
      to cook optimal skb list based on BQL budget.
      
      Problem is that GSO packets can end up comsuming more than
      the BQL budget.
      
      Breaking the loop is not useful, since requeued packets
      are ahead of any packets still in the qdisc.
      
      It is also more expensive, since next TX completion will
      push these packets later, while skbs are not in cpu caches.
      
      It is also a behavior difference with TSO packets, that can
      break the BQL limit by a large amount.
      
      Note that drivers should use __netdev_tx_sent_queue()
      in order to have optimal xmit_more support, and avoid
      useless atomic operations as shown in the following patch.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe60faa5
    • Vasily Khoruzhick's avatar
      netfilter: conntrack: fix calculation of next bucket number in early_drop · f393808d
      Vasily Khoruzhick authored
      If there's no entry to drop in bucket that corresponds to the hash,
      early_drop() should look for it in other buckets. But since it increments
      hash instead of bucket number, it actually looks in the same bucket 8
      times: hsize is 16k by default (14 bits) and hash is 32-bit value, so
      reciprocal_scale(hash, hsize) returns the same value for hash..hash+7 in
      most cases.
      
      Fix it by increasing bucket number instead of hash and rename _hash
      to bucket to avoid future confusion.
      
      Fixes: 3e86638e
      
       ("netfilter: conntrack: consider ct netns in early_drop logic")
      Cc: <stable@vger.kernel.org> # v4.7+
      Signed-off-by: default avatarVasily Khoruzhick <vasilykh@arista.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f393808d
    • Florian Westphal's avatar
      netfilter: nft_compat: ebtables 'nat' table is normal chain type · e4844c9c
      Florian Westphal authored
      
      Unlike ip(6)tables, the ebtables nat table has no special properties.
      This bug causes 'ebtables -A' to fail when using a target such as
      'snat' (ebt_snat target sets ".table = "nat"').  Targets that have
      no table restrictions work fine.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e4844c9c
    • Pablo Neira Ayuso's avatar
      netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr · 8866df92
      Pablo Neira Ayuso authored
      Otherwise, we hit a NULL pointer deference since handlers always assume
      default timeout policy is passed.
      
        netlink: 24 bytes leftover after parsing attributes in process `syz-executor2'.
        kasan: CONFIG_KASAN_INLINE enabled
        kasan: GPF could be caused by NULL-ptr deref or user memory access
        general protection fault: 0000 [#1] PREEMPT SMP KASAN
        CPU: 0 PID: 9575 Comm: syz-executor1 Not tainted 4.19.0+ #312
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:icmp_timeout_obj_to_nlattr+0x77/0x170 net/netfilter/nf_conntrack_proto_icmp.c:297
      
      Fixes: c779e849
      
       ("netfilter: conntrack: remove get_timeout() indirection")
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8866df92
    • Pablo Neira Ayuso's avatar
      netfilter: conntrack: add nf_{tcp,udp,sctp,icmp,dccp,icmpv6,generic}_pernet() · a95a7774
      Pablo Neira Ayuso authored
      
      Expose these functions to access conntrack protocol tracker netns area,
      nfnetlink_cttimeout needs this.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a95a7774
    • Jozsef Kadlecsik's avatar
      netfilter: ipset: Fix calling ip_set() macro at dumping · 8a02bdd5
      Jozsef Kadlecsik authored
      
      The ip_set() macro is called when either ip_set_ref_lock held only
      or no lock/nfnl mutex is held at dumping. Take this into account
      properly. Also, use Pablo's suggestion to use rcu_dereference_raw(),
      the ref_netlink protects the set.
      Signed-off-by: default avatarJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8a02bdd5
    • Taehee Yoo's avatar
      netfilter: xt_IDLETIMER: add sysfs filename checking routine · 54451f60
      Taehee Yoo authored
      When IDLETIMER rule is added, sysfs file is created under
      /sys/class/xt_idletimer/timers/
      But some label name shouldn't be used.
      ".", "..", "power", "uevent", "subsystem", etc...
      So that sysfs filename checking routine is needed.
      
      test commands:
         %iptables -I INPUT -j IDLETIMER --timeout 1 --label "power"
      
      splat looks like:
      [95765.423132] sysfs: cannot create duplicate filename '/devices/virtual/xt_idletimer/timers/power'
      [95765.433418] CPU: 0 PID: 8446 Comm: iptables Not tainted 4.19.0-rc6+ #20
      [95765.449755] Call Trace:
      [95765.449755]  dump_stack+0xc9/0x16b
      [95765.449755]  ? show_regs_print_info+0x5/0x5
      [95765.449755]  sysfs_warn_dup+0x74/0x90
      [95765.449755]  sysfs_add_file_mode_ns+0x352/0x500
      [95765.449755]  sysfs_create_file_ns+0x179/0x270
      [95765.449755]  ? sysfs_add_file_mode_ns+0x500/0x500
      [95765.449755]  ? idletimer_tg_checkentry+0x3e5/0xb1b [xt_IDLETIMER]
      [95765.449755]  ? rcu_read_lock_sched_held+0x114/0x130
      [95765.449755]  ? __kmalloc_track_caller+0x211/0x2b0
      [95765.449755]  ? memcpy+0x34/0x50
      [95765.449755]  idletimer_tg_checkentry+0x4e2/0xb1b [xt_IDLETIMER]
      [ ... ]
      
      Fixes: 0902b469
      
       ("netfilter: xtables: idletimer target implementation")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      54451f60
    • David Howells's avatar
      rxrpc: Fix lockup due to no error backoff after ack transmit error · c7e86acf
      David Howells authored
      If the network becomes (partially) unavailable, say by disabling IPv6, the
      background ACK transmission routine can get itself into a tizzy by
      proposing immediate ACK retransmission.  Since we're in the call event
      processor, that happens immediately without returning to the workqueue
      manager.
      
      The condition should clear after a while when either the network comes back
      or the call times out.
      
      Fix this by:
      
       (1) When re-proposing an ACK on failed Tx, don't schedule it immediately.
           This will allow a certain amount of time to elapse before we try
           again.
      
       (2) Enforce a return to the workqueue manager after a certain number of
           iterations of the call processing loop.
      
       (3) Add a backoff delay that increases the delay on deferred ACKs by a
           jiffy per failed transmission to a limit of HZ.  The backoff delay is
           cleared on a successful return from kernel_sendmsg().
      
       (4) Cancel calls immediately if the opening sendmsg fails.  The layer
           above can arrange retransmission or rotate to another server.
      
      Fixes: 248f219c
      
       ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7e86acf
    • Jeff Barnhill's avatar
      net/ipv6: Add anycast addresses to a global hashtable · 2384d025
      Jeff Barnhill authored
      
      icmp6_send() function is expensive on systems with a large number of
      interfaces. Every time it’s called, it has to verify that the source
      address does not correspond to an existing anycast address by looping
      through every device and every anycast address on the device.  This can
      result in significant delays for a CPU when there are a large number of
      neighbors and ND timers are frequently timing out and calling
      neigh_invalidate().
      
      Add anycast addresses to a global hashtable to allow quick searching for
      matching anycast addresses.  This is based on inet6_addr_lst in addrconf.c.
      Signed-off-by: default avatarJeff Barnhill <0xeffeff@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2384d025
    • Mathieu Malaterre's avatar
      net: document skb parameter in function 'skb_gso_size_check' · 49682bfa
      Mathieu Malaterre authored
      
      Remove kernel-doc warning:
      
        net/core/skbuff.c:4953: warning: Function parameter or member 'skb' not described in 'skb_gso_size_check'
      Signed-off-by: default avatarMathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49682bfa
  9. 02 Nov, 2018 1 commit
    • Marc Zyngier's avatar
      iov_iter: Fix 9p virtio breakage · 2cbfdf4d
      Marc Zyngier authored
      When switching to the new iovec accessors, a negation got subtly
      dropped, leading to 9p being remarkably broken (here with kvmtool):
      
      [    7.430941] VFS: Mounted root (9p filesystem) on device 0:15.
      [    7.432080] devtmpfs: mounted
      [    7.432717] Freeing unused kernel memory: 1344K
      [    7.433658] Run /virt/init as init process
        Warning: unable to translate guest address 0x7e00902ff000 to host
        Warning: unable to translate guest address 0x7e00902fefc0 to host
        Warning: unable to translate guest address 0x7e00902ff000 to host
        Warning: unable to translate guest address 0x7e008febef80 to host
        Warning: unable to translate guest address 0x7e008febf000 to host
        Warning: unable to translate guest address 0x7e008febef00 to host
        Warning: unable to translate guest address 0x7e008febf000 to host
      [    7.436376] Kernel panic - not syncing: Requested init /virt/init failed (error -8).
      [    7.437554] CPU: 29 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc8-02267-g00e23707 #291
      [    7.439006] Hardware name: linux,dummy-virt (DT)
      [    7.439902] Call trace:
      [    7.440387]  dump_backtrace+0x0/0x148
      [    7.441104]  show_stack+0x14/0x20
      [    7.441768]  dump_stack+0x90/0xb4
      [    7.442425]  panic+0x120/0x27c
      [    7.443036]  kernel_init+0xa4/0x100
      [    7.443725]  ret_from_fork+0x10/0x18
      [    7.444444] SMP: stopping secondary CPUs
      [    7.445391] Kernel Offset: disabled
      [    7.446169] CPU features: 0x0,23000438
      [    7.446974] Memory Limit: none
      [    7.447645] ---[ end Kernel panic - not syncing: Requested init /virt/init failed (error -8). ]---
      
      Restoring the missing "!" brings the guest back to life.
      
      Fixes: 00e23707
      
       ("iov_iter: Use accessor function")
      Reported-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      2cbfdf4d
  10. 01 Nov, 2018 5 commits
    • Al Viro's avatar
      missing bits of "iov_iter: Separate type from direction and use accessor functions" · 0e9b4a82
      Al Viro authored
      
      sunrpc patches from nfs tree conflict with calling conventions change done
      in iov_iter work.  Trivial fixup...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      0e9b4a82
    • Cong Wang's avatar
      net: drop skb on failure in ip_check_defrag() · 7de414a9
      Cong Wang authored
      Most callers of pskb_trim_rcsum() simply drop the skb when
      it fails, however, ip_check_defrag() still continues to pass
      the skb up to stack. This is suspicious.
      
      In ip_check_defrag(), after we learn the skb is an IP fragment,
      passing the skb to callers makes no sense, because callers expect
      fragments are defrag'ed on success. So, dropping the skb when we
      can't defrag it is reasonable.
      
      Note, prior to commit 88078d98, this is not a big problem as
      checksum will be fixed up anyway. After it, the checksum is not
      correct on failure.
      
      Found this during code review.
      
      Fixes: 88078d98
      
       ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7de414a9
    • Paul Burton's avatar
      SUNRPC: Use atomic(64)_t for seq_send(64) · c3be6577
      Paul Burton authored
      The seq_send & seq_send64 fields in struct krb5_ctx are used as
      atomically incrementing counters. This is implemented using cmpxchg() &
      cmpxchg64() to implement what amount to custom versions of
      atomic_fetch_inc() & atomic64_fetch_inc().
      
      Besides the duplication, using cmpxchg64() has another major drawback in
      that some 32 bit architectures don't provide it. As such commit
      571ed1fd
      
       ("SUNRPC: Replace krb5_seq_lock with a lockless scheme")
      resulted in build failures for some architectures.
      
      Change seq_send to be an atomic_t and seq_send64 to be an atomic64_t,
      then use atomic(64)_* functions to manipulate the values. The atomic64_t
      type & associated functions are provided even on architectures which
      lack real 64 bit atomic memory access via CONFIG_GENERIC_ATOMIC64 which
      uses spinlocks to serialize access. This fixes the build failures for
      architectures lacking cmpxchg64().
      
      A potential alternative that was raised would be to provide cmpxchg64()
      on the 32 bit architectures that currently lack it, using spinlocks.
      However this would provide a version of cmpxchg64() with semantics a
      little different to the implementations on architectures with real 64
      bit atomics - the spinlock-based implementation would only work if all
      access to the memory used with cmpxchg64() is *always* performed using
      cmpxchg64(). That is not currently a requirement for users of
      cmpxchg64(), and making it one seems questionable. As such avoiding
      cmpxchg64() outside of architecture-specific code seems best,
      particularly in cases where atomic64_t seems like a better fit anyway.
      
      The CONFIG_GENERIC_ATOMIC64 implementation of atomic64_* functions will
      use spinlocks & so faces the same issue, but with the key difference
      that the memory backing an atomic64_t ought to always be accessed via
      the atomic64_* functions anyway making the issue moot.
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Fixes: 571ed1fd
      
       ("SUNRPC: Replace krb5_seq_lock with a lockless scheme")
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Anna Schumaker <anna.schumaker@netapp.com>
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: linux-nfs@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      c3be6577
    • Dmitry Safonov's avatar
      compat: Cleanup in_compat_syscall() callers · 98f76206
      Dmitry Safonov authored
      
      Now that in_compat_syscall() is consistent on all architectures and does
      not longer report true on native i686, the workarounds (ifdeffery and
      helpers) can be removed.
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Andy Lutomirsky <luto@kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Stephen Boyd <sboyd@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: linux-efi@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181012134253.23266-3-dima@arista.com
      98f76206
    • Jaime Caamaño Ruiz's avatar
      openvswitch: Fix push/pop ethernet validation · 46ebe283
      Jaime Caamaño Ruiz authored
      When there are both pop and push ethernet header actions among the
      actions to be applied to a packet, an unexpected EINVAL (Invalid
      argument) error is obtained. This is due to mac_proto not being reset
      correctly when those actions are validated.
      
      Reported-at:
      https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047554.html
      Fixes: 91820da6
      
       ("openvswitch: add Ethernet push and pop actions")
      Signed-off-by: default avatarJaime Caamaño Ruiz <jcaamano@suse.com>
      Tested-by: default avatarGreg Rose <gvrose8192@gmail.com>
      Reviewed-by: default avatarGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46ebe283
  11. 31 Oct, 2018 5 commits
    • Andrey Ryabinin's avatar
      netfilter: ipset: fix ip_set_list allocation failure · ed956f39
      Andrey Ryabinin authored
      
      ip_set_create() and ip_set_net_init() attempt to allocate physically
      contiguous memory for ip_set_list. If memory is fragmented, the
      allocations could easily fail:
      
              vzctl: page allocation failure: order:7, mode:0xc0d0
      
              Call Trace:
               dump_stack+0x19/0x1b
               warn_alloc_failed+0x110/0x180
               __alloc_pages_nodemask+0x7bf/0xc60
               alloc_pages_current+0x98/0x110
               kmalloc_order+0x18/0x40
               kmalloc_order_trace+0x26/0xa0
               __kmalloc+0x279/0x290
               ip_set_net_init+0x4b/0x90 [ip_set]
               ops_init+0x3b/0xb0
               setup_net+0xbb/0x170
               copy_net_ns+0xf1/0x1c0
               create_new_namespaces+0xf9/0x180
               copy_namespaces+0x8e/0xd0
               copy_process+0xb61/0x1a00
               do_fork+0x91/0x320
      
      Use kvcalloc() to fallback to 0-order allocations if high order
      page isn't available.
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ed956f39
    • Eric Westbrook's avatar
      netfilter: ipset: actually allow allowable CIDR 0 in hash:net,port,net · 886503f3
      Eric Westbrook authored
      Allow /0 as advertised for hash:net,port,net sets.
      
      For "hash:net,port,net", ipset(8) says that "either subnet
      is permitted to be a /0 should you wish to match port
      between all destinations."
      
      Make that statement true.
      
      Before:
      
          # ipset create cidrzero hash:net,port,net
          # ipset add cidrzero 0.0.0.0/0,12345,0.0.0.0/0
          ipset v6.34: The value of the CIDR parameter of the IP address is invalid
      
          # ipset create cidrzero6 hash:net,port,net family inet6
          # ipset add cidrzero6 ::/0,12345,::/0
          ipset v6.34: The value of the CIDR parameter of the IP address is invalid
      
      After:
      
          # ipset create cidrzero hash:net,port,net
          # ipset add cidrzero 0.0.0.0/0,12345,0.0.0.0/0
          # ipset test cidrzero 192.168.205.129,12345,172.16.205.129
          192.168.205.129,tcp:12345,172.16.205.129 is in set cidrzero.
      
          # ipset create cidrzero6 hash:net,port,net family inet6
          # ipset add cidrzero6 ::/0,12345,::/0
          # ipset test cidrzero6 fe80::1,12345,ff00::1
          fe80::1,tcp:12345,ff00::1 is in set cidrzero6.
      
      See also:
      
        https://bugzilla.kernel.org/show_bug.cgi?id=200897
        https://github.com/ewestbrook/linux/commit/df7ff6efb0934ab6acc11f003ff1a7580d6c1d9c
      
      Signed-off-by: default avatarEric Westbrook <linux@westbrook.io>
      Signed-off-by: default avatarJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      886503f3
    • Stefano Brivio's avatar
      netfilter: ipset: list:set: Decrease refcount synchronously on deletion and replace · 439cd39e
      Stefano Brivio authored
      Commit 45040978
      
       ("netfilter: ipset: Fix set:list type crash
      when flush/dump set in parallel") postponed decreasing set
      reference counters to the RCU callback.
      
      An 'ipset del' command can terminate before the RCU grace period
      is elapsed, and if sets are listed before then, the reference
      counter shown in userspace will be wrong:
      
       # ipset create h hash:ip; ipset create l list:set; ipset add l
       # ipset del l h; ipset list h
       Name: h
       Type: hash:ip
       Revision: 4
       Header: family inet hashsize 1024 maxelem 65536
       Size in memory: 88
       References: 1
       Number of entries: 0
       Members:
       # sleep 1; ipset list h
       Name: h
       Type: hash:ip
       Revision: 4
       Header: family inet hashsize 1024 maxelem 65536
       Size in memory: 88
       References: 0
       Number of entries: 0
       Members:
      
      Fix this by making the reference count update synchronous again.
      
      As a result, when sets are listed, ip_set_name_byindex() might
      now fetch a set whose reference count is already zero. Instead
      of relying on the reference count to protect against concurrent
      set renaming, grab ip_set_ref_lock as reader and copy the name,
      while holding the same lock in ip_set_rename() as writer
      instead.
      Reported-by: default avatarLi Shuang <shuali@redhat.com>
      Fixes: 45040978
      
       ("netfilter: ipset: Fix set:list type crash when flush/dump set in parallel")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      439cd39e
    • Jeff Kirsher's avatar
      ixgbe/ixgbevf: fix XFRM_ALGO dependency · 48e01e00
      Jeff Kirsher authored
      Based on the original work from Arnd Bergmann.
      
      When XFRM_ALGO is not enabled, the new ixgbe IPsec code produces a
      link error:
      
      drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.o: In function `ixgbe_ipsec_vf_add_sa':
      ixgbe_ipsec.c:(.text+0x1266): undefined reference to `xfrm_aead_get_byname'
      
      Simply selecting XFRM_ALGO from here causes circular dependencies, so
      to fix it, we probably want this slightly more complex solution that is
      similar to what other drivers with XFRM offload do:
      
      A separate Kconfig symbol now controls whether we include the IPsec
      offload code. To keep the old behavior, this is left as 'default y'. The
      dependency in XFRM_OFFLOAD still causes a circular dependency but is
      not actually needed because this symbol is not user visible, so removing
      that dependency on top makes it all work.
      
      CC: Arnd Bergmann <arnd@arndb.de>
      CC: Shannon Nelson <shannon.nelson@oracle.com>
      Fixes: eda0333a
      
       ("ixgbe: add VF IPsec management")
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      48e01e00
    • Mike Rapoport's avatar
      mm: remove include/linux/bootmem.h · 57c8a661
      Mike Rapoport authored
      Move remaining definitions and declarations from include/linux/bootmem.h
      into include/linux/memblock.h and remove the redundant header.
      
      The includes were replaced with the semantic patch below and then
      semi-automated removal of duplicated '#include <linux/memblock.h>
      
      @@
      @@
      - #include <linux/bootmem.h>
      + #include <linux/memblock.h>
      
      [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
        Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
      [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
        Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
      [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
        Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
      Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      57c8a661
  12. 30 Oct, 2018 4 commits
    • John Fastabend's avatar
      bpf: tcp_bpf_recvmsg should return EAGAIN when nonblocking and no data · 27b31e68
      John Fastabend authored
      We return 0 in the case of a nonblocking socket that has no data
      available. However, this is incorrect and may confuse applications.
      After this patch we do the correct thing and return the error
      EAGAIN.
      
      Quoting return codes from recvmsg manpage,
      
      EAGAIN or EWOULDBLOCK
       The socket is marked nonblocking and the receive operation would
       block, or a receive timeout had been set and the timeout expired
       before data was received.
      
      Fixes: 604326b4
      
       ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      27b31e68
    • Ido Schimmel's avatar
      rtnetlink: Disallow FDB configuration for non-Ethernet device · da715775
      Ido Schimmel authored
      When an FDB entry is configured, the address is validated to have the
      length of an Ethernet address, but the device for which the address is
      configured can be of any type.
      
      The above can result in the use of uninitialized memory when the address
      is later compared against existing addresses since 'dev->addr_len' is
      used and it may be greater than ETH_ALEN, as with ip6tnl devices.
      
      Fix this by making sure that FDB entries are only configured for
      Ethernet devices.
      
      BUG: KMSAN: uninit-value in memcmp+0x11d/0x180 lib/string.c:863
      CPU: 1 PID: 4318 Comm: syz-executor998 Not tainted 4.19.0-rc3+ #49
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x14b/0x190 lib/dump_stack.c:113
        kmsan_report+0x183/0x2b0 mm/kmsan/kmsan.c:956
        __msan_warning+0x70/0xc0 mm/kmsan/kmsan_instr.c:645
        memcmp+0x11d/0x180 lib/string.c:863
        dev_uc_add_excl+0x165/0x7b0 net/core/dev_addr_lists.c:464
        ndo_dflt_fdb_add net/core/rtnetlink.c:3463 [inline]
        rtnl_fdb_add+0x1081/0x1270 net/core/rtnetlink.c:3558
        rtnetlink_rcv_msg+0xa0b/0x1530 net/core/rtnetlink.c:4715
        netlink_rcv_skb+0x36e/0x5f0 net/netlink/af_netlink.c:2454
        rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4733
        netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
        netlink_unicast+0x1638/0x1720 net/netlink/af_netlink.c:1343
        netlink_sendmsg+0x1205/0x1290 net/netlink/af_netlink.c:1908
        sock_sendmsg_nosec net/socket.c:621 [inline]
        sock_sendmsg net/socket.c:631 [inline]
        ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
        __sys_sendmsg net/socket.c:2152 [inline]
        __do_sys_sendmsg net/socket.c:2161 [inline]
        __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
        do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
        entry_SYSCALL_64_after_hwframe+0x63/0xe7
      RIP: 0033:0x440ee9
      Code: e8 cc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fff6a93b518 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440ee9
      RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 00000000004002c8 R11: 0000000000000213 R12: 000000000000b4b0
      R13: 0000000000401ec0 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:256 [inline]
        kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:181
        kmsan_kmalloc+0x98/0x100 mm/kmsan/kmsan_hooks.c:91
        kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan_hooks.c:100
        slab_post_alloc_hook mm/slab.h:446 [inline]
        slab_alloc_node mm/slub.c:2718 [inline]
        __kmalloc_node_track_caller+0x9e7/0x1160 mm/slub.c:4351
        __kmalloc_reserve net/core/skbuff.c:138 [inline]
        __alloc_skb+0x2f5/0x9e0 net/core/skbuff.c:206
        alloc_skb include/linux/skbuff.h:996 [inline]
        netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline]
        netlink_sendmsg+0xb49/0x1290 net/netlink/af_netlink.c:1883
        sock_sendmsg_nosec net/socket.c:621 [inline]
        sock_sendmsg net/socket.c:631 [inline]
        ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
        __sys_sendmsg net/socket.c:2152 [inline]
        __do_sys_sendmsg net/socket.c:2161 [inline]
        __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
        do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
        entry_SYSCALL_64_after_hwframe+0x63/0xe7
      
      v2:
      * Make error message more specific (David)
      
      Fixes: 090096bf
      
       ("net: generic fdb support for drivers without ndo_fdb_<op>")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-and-tested-by: syzbot+3a288d5f5530b901310e@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+d53ab4e92a1db04110ff@syzkaller.appspotmail.com
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da715775
    • Xin Long's avatar
      sctp: check policy more carefully when getting pr status · 71335836
      Xin Long authored
      When getting pr_assocstatus and pr_streamstatus by sctp_getsockopt,
      it doesn't correctly process the case when policy is set with
      SCTP_PR_SCTP_ALL | SCTP_PR_SCTP_MASK. It even causes a
      slab-out-of-bounds in sctp_getsockopt_pr_streamstatus().
      
      This patch fixes it by return -EINVAL for this case.
      
      Fixes: 0ac1077e
      
       ("sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL")
      Reported-by: syzbot+5da0d0a72a9e7d791748@syzkaller.appspotmail.com
      Suggested-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71335836
    • Xin Long's avatar
      sctp: clear the transport of some out_chunk_list chunks in sctp_assoc_rm_peer · df132eff
      Xin Long authored
      
      If a transport is removed by asconf but there still are some chunks with
      this transport queuing on out_chunk_list, later an use-after-free issue
      will be caused when accessing this transport from these chunks in
      sctp_outq_flush().
      
      This is an old bug, we fix it by clearing the transport of these chunks
      in out_chunk_list when removing a transport in sctp_assoc_rm_peer().
      
      Reported-by: syzbot+56a40ceee5fb35932f4d@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df132eff