- 22 Mar, 2012 11 commits
-
-
Alex Elder authored
The messenger workqueue has no need to be public. So give it static scope. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
Encapsulate the operation of adding a new chunk of data to the next open slot in a ceph_connection's out_kvec array. Also add a "reset" operation to make subsequent add operations start at the beginning of the array again. Use these routines throughout, avoiding duplicate code and ensuring all calls are handled consistently. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
One of the arguments to prepare_write_connect() indicates whether it is being called immediately after a call to prepare_write_banner(). Move the prepare_write_banner() call inside prepare_write_connect(), and reinterpret (and rename) the "after_banner" argument so it indicates that prepare_write_connect() should *make* the call rather than should know it has already been made. This was split out from the next patch to highlight this change in logic. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
ceph_parse_options() takes the address of a pointer as an argument and uses it to return the address of an allocated structure if successful. With this interface is not evident at call sites that the pointer is always initialized. Change the interface to return the address instead (or a pointer-coded error code) to make the validity of the returned pointer obvious. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
This fixes some spots where a type cast to (void *) was used as as a universal type hiding mechanism. Instead, properly cast the type to the intended target type. Signed-off-by:
Alex Elder <elder@newdream.net> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
This eliminates type casts in some places where they are not required. Signed-off-by:
Alex Elder <elder@newdream.net> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
A spinlock is used to protect a value used for selecting an array index for a string used for formatting a socket address for human consumption. The index is reset to 0 if it ever reaches the maximum index value. Instead, use an ever-increasing atomic variable as a sequence number, and compute the array index by masking off all but the sequence number's lowest bits. Make the number of entries in the array a power of two to allow the use of such a mask (to avoid jumps in the index value when the sequence number wraps). The length of these strings is somewhat arbitrarily set at 60 bytes. The worst-case length of a string produced is 54 bytes, for an IPv6 address that can't be shortened, e.g.: [1234:5678:9abc:def0:1111:2222:123.234.210.100]:32767 Change it so we arbitrarily use 64 bytes instead; if nothing else it will make the array of these line up better in hex dumps. Rename a few things to reinforce the distinction between the number of strings in the array and the length of individual strings. Signed-off-by:
Alex Elder <elder@newdream.net> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
Rearrange ceph_tcp_connect() a bit, making use of "else" rather than re-testing a value with consecutive "if" statements. Don't record a connection's socket pointer unless the connect operation is successful. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
Each messenger allocates a page to be used when writing zeroes out in the event of error or other abnormal condition. Instead, use the kernel ZERO_PAGE() for that purpose. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Xi Wang authored
The existing overflow check (n > ULONG_MAX / b) didn't work, because n = ULONG_MAX / b would both bypass the check and still overflow the allocation size a + n * b. The correct check should be (n > (ULONG_MAX - a) / b). Signed-off-by:
Xi Wang <xi.wang@gmail.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Jim Schutt authored
The Ceph messenger would sometimes queue multiple work items to write data to a socket when the socket buffer was full. Fix this problem by making ceph_write_space() use SOCK_NOSPACE in the same way that net/core/stream.c:sk_stream_write_space() does, i.e., clearing it only when sufficient space is available in the socket buffer. Signed-off-by:
Jim Schutt <jaschut@sandia.gov> Reviewed-by:
Alex Elder <elder@dreamhost.com>
-
- 17 Mar, 2012 2 commits
-
-
Pablo Neira Ayuso authored
Kerin Millar reported hardlockups while running `conntrackd -c' in a busy firewall. That system (with several processors) was acting as backup in a primary-backup setup. After several tries, I found a race condition between the deletion operation of ctnetlink and timeout expiration. This patch fixes this problem. Tested-by:
Kerin Millar <kerframil@gmail.com> Reported-by:
Kerin Millar <kerframil@gmail.com> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
RongQing.Li authored
ip6_mc_find_dev_rcu() is called with rcu_read_lock(), so don't need to dev_hold(). With dev_hold(), not corresponding dev_put(), will lead to leak. [ bug introduced in 96b52e61 (ipv6: mcast: RCU conversions) ] Signed-off-by:
RongQing.Li <roy.qing.li@gmail.com> Acked-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 16 Mar, 2012 2 commits
-
-
Eric Dumazet authored
This reverts commit d47a0ac7 (sch_sfq: dont put new flow at the end of flows) As Jesper found out, patch sounded great but has bad side effects. In stress situation, pushing new flows in front of the queue can prevent old flows doing any progress. Packets can stay in SFQ queue for unlimited amount of time. It's possible to add heuristics to limit this problem, but this would add complexity outside of SFQ scope. A more sensible answer to Dave Taht concerns (who reported the issued I tried to solve in original commit) is probably to use a qdisc hierarchy so that high prio packets dont enter a potentially crowded SFQ qdisc. Reported-by:
Jesper Dangaard Brouer <jdb@comx.dk> Cc: Dave Taht <dave.taht@gmail.com> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
commit 87a11578 ( ipv6: Move xfrm_lookup() call down into icmp6_dst_alloc().) forgot to convert one error path, leading to crashes in mld_sendpack() Many thanks to Dave Jones for providing a very complete bug report. Reported-by:
Dave Jones <davej@redhat.com> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 11 Mar, 2012 1 commit
-
-
Eric Dumazet authored
commit ea4fc0d6 (ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit()) added a serious regression on synflood handling. Simon Kirby discovered a successful connection was delayed by 20 seconds before being responsive. In my tests, I discovered that xmit frames were lost, and needed ~4 retransmits and a socket dst rebuild before being really sent. In case of syncookie initiated connection, we use a different path to initialize the socket dst, and inet->cork.fl.u.ip4 is left cleared. As ip_queue_xmit() now depends on inet flow being setup, fix this by copying the temp flowi4 we use in cookie_v4_check(). Reported-by:
Simon Kirby <sim@netnation.com> Bisected-by:
Simon Kirby <sim@netnation.com> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Tested-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 08 Mar, 2012 4 commits
-
-
Steffen Klassert authored
As we invalidate the inetpeer tree along with the routing cache now, we don't need a genid to reset the redirect handling when the routing cache is flushed. Signed-off-by:
Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Steffen Klassert authored
We initialize the routing metrics with the values cached on the inetpeer in rt_init_metrics(). So if we have the metrics cached on the inetpeer, we ignore the user configured fib_metrics. To fix this issue, we replace the old tree with a fresh initialized inet_peer_base. The old tree is removed later with a delayed work queue. Signed-off-by:
Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Paulius Zaleckas authored
Now we have: eth0: link *down* br0: port 1(eth0) entered *forwarding* state br_log_state(p) should be called *after* p->state is set to BR_STATE_DISABLED. Reported-by:
Zilvinas Valinskas <zilvinas@wilibox.com> Signed-off-by:
Paulius Zaleckas <paulius.zaleckas@gmail.com> Acked-by:
Stephen Hemminger <shemminger@vyatta.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Paulius Zaleckas authored
When br_log_state() is reporting state it should say "entered" istead of "entering" since state at this point is already changed. Signed-off-by:
Paulius Zaleckas <paulius.zaleckas@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 07 Mar, 2012 1 commit
-
-
Jesse Gross authored
When modifying IP addresses or ports on a UDP packet we don't correctly follow the rules for unchecksummed packets. This meant that packets without a checksum can be given a incorrect new checksum and packets with a checksum can become marked as being unchecksummed. This fixes it to handle those requirements. Signed-off-by:
Jesse Gross <jesse@nicira.com>
-
- 06 Mar, 2012 8 commits
-
-
Ben Pfaff authored
When OVS_VPORT_ATTR_NAME is specified and dp_ifindex is nonzero, the logical behavior would be for the vport name lookup scope to be limited to the specified datapath, but in fact the dp_ifindex value was ignored. This commit causes the search scope to be honored. Signed-off-by:
Ben Pfaff <blp@nicira.com> Signed-off-by:
Jesse Gross <jesse@nicira.com>
-
Li Wei authored
When forwarding was set and a new net device is register, we need add this device to the all-router mcast group. Signed-off-by:
Li Wei <lw@cn.fujitsu.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Pablo Neira Ayuso authored
If reliable event delivery is enabled and ctnetlink fails to deliver the destroy event in early_drop, the conntrack subsystem cannot drop any the candidate flow that was planned to be evicted. Reported-by:
Kerin Millar <kerframil@gmail.com> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
When net.bridge.bridge-nf-filter-vlan-tagged is 0 (default), vlan packets arriving should not be sent to ip(6)tables by bridge netfilter. However, it turns out that we currently always send VLAN packets to netfilter, if .. a), CONFIG_VLAN_8021Q is enabled ; or b), CONFIG_VLAN_8021Q is not set but rx vlan offload is enabled on the bridge port. This is because bridge netfilter treats skb with skb->protocol == ETH_P_IP{V6} as "non-vlan packet". With rx vlan offload on or CONFIG_VLAN_8021Q=y, the vlan header has already been removed here, and we cannot rely on skb->protocol alone. Fix this by only using skb->protocol if the skb has no vlan tag, or if a vlan tag is present and filter-vlan-tagged bridge netfilter sysctl is enabled. We cannot remove the skb->protocol == htons(ETH_P_8021Q) test because the vlan tag is still around in the CONFIG_VLAN_8021Q=n && "ethtool -K $itf rxvlan off" case. reproducer: iptables -t raw -I PREROUTI...
-
Pablo Neira Ayuso authored
In adf7ff8, a invalid dereference was added in ebt_make_names. CC [M] net/bridge/netfilter/ebtables.o net/bridge/netfilter/ebtables.c: In function `ebt_make_names': net/bridge/netfilter/ebtables.c:1371:20: warning: `t' may be used uninitialized in this function [-Wuninitialized] Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Pablo Neira Ayuso authored
Since 7d367e06 , ctnetlink_new_conntrack is called without holding the nf_conntrack_lock spinlock. Thus, ctnetlink_parse_nat_setup does not require to release that spinlock anymore in the NAT module autoload case. Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Santosh Nayak authored
user-space ebtables expects 32 bytes-long names, but xt_match names use 29 bytes. We have to copy less 29 bytes and then, make sure we fill the remaining bytes with zeroes. Signed-off-by:
Santosh Nayak <santoshprasadnayak@gmail.com> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Neal Cardwell authored
This commit fixes tcp_shift_skb_data() so that it does not shift SACKed data below snd_una. This fixes an issue whose symptoms exactly match reports showing tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at net/ipv4/tcp_input.c:3418" thread on netdev). Since 2008 (832d11c5) tcp_shift_skb_data() had been shifting SACKed ranges that were below snd_una. It checked that the *end* of the skb it was about to shift from was above snd_una, but did not check that the end of the actual shifted range was above snd_una; this commit adds that check. Shifting SACKed ranges below snd_una is problematic because for such ranges tcp_sacktag_one() short-circuits: it does not declare anything as SACKed and does not increase sacked_out. Before the fixes in commits cc9a672e and daef52ba, shifting SACKed ranges below snd_una happened to work becaus...
-
- 05 Mar, 2012 4 commits
-
-
Ulrich Weber authored
otherwise source IPv6 address of ICMPV6_MGM_QUERY packet might be random junk if IPv6 is disabled on interface or link-local address is not yet ready (DAD). Signed-off-by:
Ulrich Weber <ulrich.weber@sophos.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
nlmsg_parse() might return an error, so test its return value before potential random memory accesses. Errors introduced in commit 115c9b81 (rtnetlink: Fix problem with buffer allocation) Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Cc: Greg Rose <gregory.v.rose@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Joakim Tjernlund authored
commit bridge: send proper message_age in config BPDU added this gem: bpdu.message_age = (jiffies - root->designated_age) p->designated_age = jiffies + bpdu->message_age; Notice how bpdu->message_age is negated when reassigned to bpdu.message_age. This causes message age to decrease breaking the STP protocol. Signed-off-by:
Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Joakim Tjernlund authored
min age increment needs to round up its min age tick for all HZ values to guarantee message age is increasing. Signed-off-by:
Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 03 Mar, 2012 1 commit
-
-
Neal Cardwell authored
In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb to mark the first portion as lost. This is for two primary reasons: (1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When doing this, it preserves the sum of their packet counts in order to reflect the real-world dynamics on the wire. But given that skbs can have remainders that do not align to MSS boundaries, this packet count preservation means that for SACKed skbs there is not necessarily a direct linear relationship between tcp_skb_pcount(skb) and skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment off and mark as lost a prefix of length (packets - oldcnt)*mss from SACKed skbs were leading to occasional failures of the WARN_ON(len > skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the recent "crash in tcp_fragment" thread on netdev). (2) there is no real point in fragmenting off part of a SACKed skb and calling tcp_skb_mark_lost()...
-
- 28 Feb, 2012 1 commit
-
-
Neal Cardwell authored
When tcp_shifted_skb() shifts bytes from the skb that is currently pointed to by 'highest_sack' then the increment of TCP_SKB_CB(skb)->seq implicitly advances tcp_highest_sack_seq(). This implicit advancement, combined with the recent fix to pass the correct SACKed range into tcp_sacktag_one(), caused tcp_sacktag_one() to think that the newly SACKed range was before the tcp_highest_sack_seq(), leading to a call to tcp_update_reordering() with a degree of reordering matching the size of the newly SACKed range (typically just 1 packet, which is a NOP, but potentially larger). This commit fixes this by simply calling tcp_sacktag_one() before the TCP_SKB_CB(skb)->seq advancement that can advance our notion of the highest SACKed sequence. Correspondingly, we can simplify the code a little now that tcp_shifted_skb() should update the lost_cnt_hint in all cases where skb == tp->lost_skb_hint. Signed-off-by:
Neal Cardwell <ncardwell@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 25 Feb, 2012 1 commit
-
-
Florian Westphal authored
We expected 0 if module doesn't exist, which is no longer the case (42046e2e , netfilter: x_tables: return -ENOENT for non-existant matches/targets). Signed-off-by:
Florian Westphal <fw@strlen.de> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
- 24 Feb, 2012 3 commits
-
-
stephen hemminger authored
The original spelling and bad word choice makes these comments hard to read. Signed-off-by:
Stephen Hemminger <shemminger@vyatta.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jozsef Kadlecsik authored
Marcell Zambo and Janos Farago noticed and reported that when new conntrack entries are added via netlink and the conntrack table gets full, soft lockup happens. This is because the nf_conntrack_lock is held while nf_conntrack_alloc is called, which is in turn wants to lock nf_conntrack_lock while evicting entries from the full table. The patch fixes the soft lockup with limiting the holding of the nf_conntrack_lock to the minimum, where it's absolutely required. It required to extend (and thus change) nf_conntrack_hash_insert so that it makes sure conntrack and ctnetlink do not add the same entry twice to the conntrack table. Signed-off-by:
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
This reverts commit af14cca1 . This patch contains a race condition between packets and ctnetlink in the conntrack addition. A new patch to fix this issue follows up. Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
- 23 Feb, 2012 1 commit
-
-
Eric Dumazet authored
Niccolo Belli reported ipsec crashes in case we handle a frame without mac header (atm in his case) Before copying mac header, better make sure it is present. Bugzilla reference: https://bugzilla.kernel.org/show_bug.cgi?id=42809 Reported-by:
Niccolò Belli <darkbasic@linuxsystems.it> Tested-by:
Niccolò Belli <darkbasic@linuxsystems.it> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-