- 14 Jan, 2014 3 commits
-
-
Ilya Dryomov authored
The check that makes sure that we have enough memory allocated to read in the entire header of the message in question is currently busted. It compares front_len of the incoming message with iov_len field of ceph_msg::front structure, which is used primarily to indicate the amount of data already read in, and not the size of the allocated buffer. Under certain conditions (e.g. a short read from a socket followed by that socket's shutdown and owning ceph_connection reset) this results in a warning similar to [85688.975866] libceph: get_reply front 198 > preallocated 122 (4#0) and, through another bug, leads to forever hung tasks and forced reboots. Fix this by comparing front_len with front_alloc_len field of struct ceph_msg, which stores the actual size of the buffer. Fixes: http://tracker.ceph.com/issues/5425 Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
Rename front local variable to front_len in get_reply() to make its purpose more clear. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
Rename front_max field of struct ceph_msg to front_alloc_len to make its purpose more clear. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- 31 Dec, 2013 22 commits
-
-
Ilya Dryomov authored
Similar to userspace, don't bail with "parse_ips bad ip ..." if the specified port is port 0, instead use port CEPH_MON_PORT (6789, the default monitor port). Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
Reflects ceph.git commit 8b38f10bc2ee3643a33ea5f9545ad5c00e4ac5b4. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
Reflects ceph.git commit ea3a0bb8b773360d73b8b77fa32115ef091c9857. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
This allows all of the tunables to be overridden by a specific rule. Reflects ceph.git commits d129e09e57fbc61cfd4f492e3ee77d0750c9d292, 0497db49e5973b50df26251ed0e3f4ac7578e66e. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
The legacy behavior is to make the normal number of tries for the recursive chooseleaf call. The descend_once tunable changed this to making a single try and bail if we get a reject (note that it is impossible to collide in the recursive case). The new set_chooseleaf_tries lets you select the number of recursive chooseleaf attempts for indep mode, or default to 1. Use the same behavior for firstn, except default to total_tries when the legacy tunables are set (for compatibility). This makes the rule step override the (new) default of 1 recursive attempt, keeping behavior consistent with indep mode. Reflects ceph.git commit 685c6950ef3df325ef04ce7c986e36ca2514c5f1. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
This aligns the internal identifier names with the user-visible names in the decompiled crush map language. Reflects ceph.git commit caa0e22e15e4226c3671318ba1f61314bf6da2a6. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
Since we can specify the recursive retries in a rule, we may as well also specify the non-recursive tries too for completeness. Reflects ceph.git commit d1b97462cffccc871914859eaee562f2786abfd1. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
Parameterize the attempts for the _firstn choose method, and apply the rule-specified tries count to firstn mode as well. Note that we have slightly different behavior here than with indep: If the firstn value is not specified for firstn, we pass through the normal attempt count. This maintains compatibility with legacy behavior. Note that this is usually *not* actually N^2 work, though, because of the descend_once tunable. However, descend_once is unfortunately *not* the same thing as 1 chooseleaf try because it is only checked on a reject but not on a collision. Sigh. In contrast, for indep, if tries is not specified we default to 1 recursive attempt, because that is simply more sane, and we have the option to do so. The descend_once tunable has no effect for indep. Reflects ceph.git commit 64aeded50d80942d66a5ec7b604ff2fcbf5d7b63. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
Explicitly control the number of sample attempts, and allow the number of tries in the recursive call to be explicitly controlled via the rule. This is important because the amount of time we want to spend looking for a solution may be rule dependent (e.g., higher for the wide indep pool than the rep pools). (We should do the same for the other tunables, by the way!) Reflects ceph.git commit c43c893be872f709c787bc57f46c0e97876ff681. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
Pass down the parent's 'r' value so that we will sample different values in the recursive call when the parent tries multiple times. This avoids doing useless work (calling multiple times and trying the same values). Reflects ceph.git commit 2731d3030d7a3e80922b7f1b7756f9a4a124bac5. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
Pass numrep (the width of the result) separately from the number of results we want *this* iteration. This makes things less awkward when we do a recursive call (for chooseleaf) and want only one item. Reflects ceph.git commit 1b567ee08972f268c11b43fc881e57b5984dd08b. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
Now that indep is handled by crush_choose_indep, rename crush_choose to crush_choose_firstn and remove all the conditionals. This ends up stripping out *lots* of code. Note that it *also* makes it obvious that the shenanigans we were playing with r' for uniform buckets were broken for firstn mode. This appears to have happened waaaay back in commit dae8bec9 (or earlier)... 2007. Reflects ceph.git commit 94350996cb2035850bcbece6a77a9b0394177ec9. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
Reflects ceph.git commit 4551fee9ad89d0427ed865d766d0d44004d3e3e1. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
Reflects ceph.git commit 86e978036a4ecbac4c875e7c00f6c5bbe37282d3. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
For firstn mode, if we fail to make a valid placement choice, we just continue and return a short result to the caller. For indep mode, however, we need to make the position stable, and return an undefined value on failed placements to avoid shifting later results to the left. Reflects ceph.git commit b1d4dd4eb044875874a1d01c01c7d766db5d0a80. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
This is only present to size the temporary scratch arrays that we put on the stack. Let the caller allocate them as they wish and remove the limitation. Reflects ceph.git commit 1cfe140bf2dab99517589a82a916f4c75b9492d1. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
Reflects ceph.git commit 3cef755428761f2481b1dd0e0fbd0464ac483fc5. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
Reflects ceph.git commit e7d47827f0333c96ad43d257607fb92ed4176550. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
Reflects ceph.git commit 43a01c9973c4b83f2eaa98be87429941a227ddde. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
Pass the size of the weight vector into crush_do_rule() to ensure that we don't access values past the end. This can happen if the caller misbehaves and passes a weight vector that is smaller than max_devices. Currently the monitor tries to prevent that from happening, but this will gracefully tolerate previous bad osdmaps that got into this state. It's also a bit more defensive. Reflects ceph.git commit 5922e2c2b8335b5e46c9504349c3a55b7434c01a. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
This updates ceph_features.h so that it has all feature bits defined in ceph.git. In the interim since the last update, ceph.git crossed the "32 feature bits" point, and, the addition of the 33rd bit wasn't handled correctly. The work-around is squashed into this commit and reflects ceph.git commit 053659d05e0349053ef703b414f44965f368b9f0. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Ilya Dryomov authored
In preparation for ceph_features.h update, change all features fields from unsigned int/u32 to u64. (ceph.git has ~40 feature bits at this point.) Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- 13 Dec, 2013 3 commits
-
-
Josh Durgin authored
With the current full handling, there is a race between osds and clients getting the first map marked full. If the osd wins, it will return -ENOSPC to any writes, but the client may already have writes in flight. This results in the client getting the error and propagating it up the stack. For rbd, the block layer turns this into EIO, which can cause corruption in filesystems above it. To avoid this race, osds are being changed to drop writes that came from clients with an osdmap older than the last osdmap marked full. In order for this to work, clients must resend all writes after they encounter a full -> not full transition in the osdmap. osds will wait for an updated map instead of processing a request from a client with a newer map, so resent writes will not be dropped by the osd unless there is another not full -> full transition. This approach requires both osds and clients to be fixed to avoid the race. Old clients talking to osds with this fix may hang instead of returning EIO and potentially corrupting an fs. New clients talking to old osds have the same behavior as before if they encounter this race. Fixes: http://tracker.ceph.com/issues/6938 Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Josh Durgin <josh.durgin@inktank.com>
-
Josh Durgin authored
The PAUSEWR and PAUSERD flags are meant to stop the cluster from processing writes and reads, respectively. The FULL flag is set when the cluster determines that it is out of space, and will no longer process writes. PAUSEWR and PAUSERD are purely client-side settings already implemented in userspace clients. The osd does nothing special with these flags. When the FULL flag is set, however, the osd responds to all writes with -ENOSPC. For cephfs, this makes sense, but for rbd the block layer translates this into EIO. If a cluster goes from full to non-full quickly, a filesystem on top of rbd will not behave well, since some writes succeed while others get EIO. Fix this by blocking any writes when the FULL flag is set in the osd client. This is the same strategy used by userspace, so apply it by default. A follow-on patch makes this configurable. __map_request() is called to re-target osd requests in case the available osds changed. Add a paused field to a ceph_osd_request, and set it whenever an appropriate osd map flag is set. Avoid queueing paused requests in __map_request(), but force them to be resent if they become unpaused. Also subscribe to the next osd map from the monitor if any of these flags are set, so paused requests can be unblocked as soon as possible. Fixes: http://tracker.ceph.com/issues/6079 Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Josh Durgin <josh.durgin@inktank.com>
-
Li Wang authored
Wake up possible waiters, invoke the call back if any, unregister the request Signed-off-by:
Li Wang <liwang@ubuntukylin.com> Signed-off-by:
Yunchuan Wen <yunchuanwen@ubuntukylin.com> Signed-off-by:
Sage Weil <sage@inktank.com>
-
- 02 Dec, 2013 1 commit
-
-
fan.du authored
commit a553e4a6 ("[PKTGEN]: IPSEC support") tried to support IPsec ESP transport transformation for pktgen, but acctually this doesn't work at all for two reasons(The orignal transformed packet has bad IPv4 checksum value, as well as wrong auth value, reported by wireshark) - After transpormation, IPv4 header total length needs update, because encrypted payload's length is NOT same as that of plain text. - After transformation, IPv4 checksum needs re-caculate because of payload has been changed. With this patch, armmed pktgen with below cofiguration, Wireshark is able to decrypted ESP packet generated by pktgen without any IPv4 checksum error or auth value error. pgset "flag IPSEC" pgset "flows 1" Signed-off-by:
Fan Du <fan.du@windriver.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 30 Nov, 2013 6 commits
-
-
stephen hemminger authored
Patch from developers of the alternative loss models, downloaded from: http://netgroup.uniroma2.it/twiki/bin/view.cgi/Main/NetemCLG "in case 2, of the switch we change the direction of the inequality to net_random()>clg->a3, because clg->a3 is h in the GE model and when h is 0 all packets will be lost." Signed-off-by:
Stephen Hemminger <stephen@networkplumber.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
stephen hemminger authored
Patch from developers of the alternative loss models, downloaded from: http://netgroup.uniroma2.it/twiki/bin/view.cgi/Main/NetemCLG "In the case 1 of the switch statement in the if conditions we need to add clg->a4 to clg->a1, according to the model." Signed-off-by:
Stephen Hemminger <stephen@networkplumber.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
stephen hemminger authored
There is a missing break statement in the Gilbert Elliot loss model generator which makes state machine behave incorrectly. Reported-by: Martin Burri <martin.burri@ch.abb.com Signed-off-by:
Stephen Hemminger <stephen@networkplumber.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Arvid Brodin authored
This implements the rtnl_link_ops fill_info routine for HSR. Signed-off-by:
Arvid Brodin <arvid.brodin@alten.se> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Arvid Brodin authored
Signed-off-by:
Arvid Brodin <arvid.brodin@alten.se> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Hannes Frederic Sowa authored
IPv6 stats are 64 bits and thus are protected with a seqlock. By not disabling bottom-half we could deadlock here if we don't disable bh and a softirq reentrantly updates the same mib. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 29 Nov, 2013 4 commits
-
-
Eric Dumazet authored
In commit c9e90429 ("ipv4: fix possible seqlock deadlock") I left another places where IP_INC_STATS_BH() were improperly used. udp_sendmsg(), ping_v4_sendmsg() and tcp_v4_connect() are called from process context, not from softirq context. This was detected by lockdep seqlock support. Reported-by:
jongman heo <jongman.heo@samsung.com> Fixes: 584bdf8c ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP") Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind") Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by:
Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Shawn Landden authored
Commit 35f9c09f (tcp: tcp_sendpages() should call tcp_push() once) added an internal flag MSG_SENDPAGE_NOTLAST, similar to MSG_MORE. algif_hash, algif_skcipher, and udp used MSG_MORE from tcp_sendpages() and need to see the new flag as identical to MSG_MORE. This fixes sendfile() on AF_ALG. v3: also fix udp Cc: Tom Herbert <therbert@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> # 3.4.x + 3.2.x Reported-and-tested-by:
Shawn Landden <shawnlandden@gmail.com> Original-patch: Richard Weinberger <richard@nod.at> Signed-off-by:
Shawn Landden <shawn@churchofgit.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Dan Carpenter authored
If kmsg->msg_namelen > sizeof(struct sockaddr_storage) then in the original code that would lead to memory corruption in the kernel if you had audit configured. If you didn't have audit configured it was harmless. There are some programs such as beta versions of Ruby which use too large of a buffer and returning an error code breaks them. We should clamp the ->msg_namelen value instead. Fixes: 1661bf36 ("net: heap overflow in __audit_sockaddr()") Reported-by:
Eric Wong <normalperson@yhbt.net> Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Tested-by:
Eric Wong <normalperson@yhbt.net> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Veaceslav Falico authored
Currently we're using plain spin_lock() in prb_shutdown_retire_blk_timer(), however the timer might fire right in the middle and thus try to re-aquire the same spinlock, leaving us in a endless loop. To fix that, use the spin_lock_bh() to block it. Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.") CC: "David S. Miller" <davem@davemloft.net> CC: Daniel Borkmann <dborkman@redhat.com> CC: Willem de Bruijn <willemb@google.com> CC: Phil Sutter <phil@nwl.cc> CC: Eric Dumazet <edumazet@google.com> Reported-by:
Jan Stancek <jstancek@redhat.com> Tested-by:
Jan Stancek <jstancek@redhat.com> Signed-off-by:
Veaceslav Falico <vfalico@redhat.com> Acked-by:
Daniel Borkmann <dborkman@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 28 Nov, 2013 1 commit
-
-
Baker Zhang authored
since f9242b6b inet: Sanitize inet{,6} protocol demux. there are not pretended hash tables for ipv4 or ipv6 protocol handler. Signed-off-by:
Baker Zhang <Baker.kernel@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-