- 25 Feb, 2013 3 commits
-
-
Alex Elder authored
Collect the code that tests for and implements a backoff delay for a ceph connection into a new function, ceph_backoff(). Make the debug output messages in that part of the code report things consistently by reporting a message in the socket closed case, and by making the one for PREOPEN state report the connection pointer like the rest. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
Eliminate most of the problems in the libceph code that cause sparse to issue warnings. - Convert functions that are never referenced externally to have static scope. - Pass NULL rather than 0 for a pointer argument in one spot in ceph_monc_delete_snapid() This partially resolves: http://tracker.ceph.com/issues/4184 Reported-by:
Fengguang Wu <fengguang.wu@intel.com> Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
Define and use functions that encapsulate operations performed on a connection's flags. This resolves: http://tracker.ceph.com/issues/4234 Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
- 14 Feb, 2013 1 commit
-
-
Alex Elder authored
The ceph messenger has a few spots that are only used when bio messages are supported, and that's only when CONFIG_BLOCK is defined. This surrounds a couple of spots with #ifdef's that would cause a problem if CONFIG_BLOCK were not present in the kernel configuration. This resolves: http://tracker.ceph.com/issues/3976 Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
- 28 Dec, 2012 2 commits
-
-
Sage Weil authored
We should not set con->state to CLOSED here; that happens in ceph_fault() in the caller, where it first asserts that the state is not yet CLOSED. Avoids a BUG when the features don't match. Since the fail_protocol() has become a trivial wrapper, replace calls to it with direct calls to reset_connection(). Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Alex Elder authored
A number of assertions in the ceph messenger are implemented with BUG_ON(), killing the system if connection's state doesn't match what's expected. At this point our state model is (evidently) not well understood enough for these assertions to trigger a BUG(). Convert all BUG_ON(con->state...) calls to be WARN_ON(con->state...) so we learn about these issues without killing the machine. We now recognize that a connection fault can occur due to a socket closure at any time, regardless of the state of the connection. So there is really nothing we can assert about the state of the connection at that point so eliminate that assertion. Reported-by:
Ugis <ugis22@gmail.com> Tested-by:
Ugis <ugis22@gmail.com> Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- 20 Dec, 2012 1 commit
-
-
Alex Elder authored
When a connection's socket disconnects, or if there's a protocol error of some kind on the connection, a fault is signaled and the connection is reset (closed and reopened, basically). We currently get an error message on the log whenever this occurs. A ceph connection will attempt to reestablish a socket connection repeatedly if a fault occurs. This means that these error messages will get repeatedly added to the log, which is undesirable. Change the error message to be a warning, so they don't get logged by default. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- 17 Dec, 2012 1 commit
-
-
Alex Elder authored
A connection's socket can close for any reason, independent of the state of the connection (and without irrespective of the connection mutex). As a result, the connectino can be in pretty much any state at the time its socket is closed. Handle those other cases at the top of con_work(). Pull this whole block of code into a separate function to reduce the clutter. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- 26 Oct, 2012 1 commit
-
-
Sage Weil authored
The ceph_on_in_msg_alloc() method calls the ->alloc_msg() helper which may return NULL. It also drops con->mutex while it allocates a message, which means that the connection state may change (e.g., get closed). If that happens, we clean up and bail out. Avoid calling ceph_msg_put() on a NULL return value and triggering a crash. This was observed when an ->alloc_msg() call races with a timeout that resends a zillion messages and resets the connection, and ->alloc_msg() returns NULL (because the request was resent to another target). Fixes http://tracker.newdream.net/issues/3342 Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
- 24 Oct, 2012 1 commit
-
-
Sage Weil authored
The ceph_on_in_msg_alloc() method drops con->mutex while it allocates a message. If that races with a timeout that resends a zillion messages and resets the connection, and the ->alloc_msg() method returns a NULL message, it will call ceph_msg_put(NULL) and BUG. Fix by only calling put if msg is non-NULL. Fixes http://tracker.newdream.net/issues/3142 Signed-off-by:
Sage Weil <sage@inktank.com>
-
- 10 Oct, 2012 3 commits
-
-
Alex Elder authored
This patch defines a single function, queue_con_delay() to call queue_delayed_work() for a connection. It basically generalizes what was previously queue_con() by adding the delay argument. queue_con() is now a simple helper that passes 0 for its delay. queue_con_delay() returns 0 if it queued work or an errno if it did not for some reason. If con_work() finds the BACKOFF flag set for a connection, it now calls queue_con_delay() to handle arranging to start again after a delay. Note about connection reference counts: con_work() only ever gets called as a work item function. At the time that work is scheduled, a reference to the connection is acquired, and the corresponding con_work() call is then responsible for dropping that reference before it returns. Previously, the backoff handling inside con_work() silently handed off its reference to delayed work it scheduled. Now that queue_con_delay() is used, a new reference is acquired for the newly-scheduled work, and the original reference is dropped by the con->ops->put() call at the end of the function. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Alex Elder authored
Both ceph_fault() and con_work() include handling for imposing a delay before doing further processing on a faulted connection. The latter is used only if ceph_fault() is unable to. Instead, just let con_work() always be responsible for implementing the delay. After setting up the delay value, set the BACKOFF flag on the connection unconditionally and call queue_con() to ensure con_work() will get called to handle it. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Alex Elder authored
If ceph_fault() is unable to queue work after a delay, it sets the BACKOFF connection flag so con_work() will attempt to do so. In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't result in newly-queued work, it simply ignores this condition and proceeds as if no backoff delay were desired. There are two problems with this--one of which is a bug. The first problem is simply that the intended behavior is to back off, and if we aren't able queue the work item to run after a delay we're not doing that. The only reason queue_delayed_work() won't queue work is if the provided work item is already queued. In the messenger, this means that con_work() is already scheduled to be run again. So if we simply set the BACKOFF flag again when this occurs, we know the next con_work() call will again attempt to hold off activity on the connection until after the delay. The second problem--the bug--is a leak of a reference count. If queue_delayed_work() returns 0 in con_work(), con->ops->put() drops the connection reference held on entry to con_work(). However, processing is (was) allowed to continue, and at the end of the function a second con->ops->put() is called. This patch fixes both problems. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- 22 Sep, 2012 1 commit
-
-
Alex Elder authored
In write_partial_msg_pages(), pages need to be kmapped in order to perform a CRC-32c calculation on them. As an artifact of the way this code used to be structured, the kunmap() call was separated from the kmap() call and both were done conditionally. But the conditions under which the kmap() and kunmap() calls were made differed, so there was a chance a kunmap() call would be done on a page that had not been mapped. The symptom of this was tripping a BUG() in kunmap_high() when pkmap_count[nr] became 0. Reported-by:
Bryan K. Wright <bryan@virginia.edu> Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- 21 Aug, 2012 1 commit
-
-
Jim Schutt authored
Because the Ceph client messenger uses a non-blocking connect, it is possible for the sending of the client banner to race with the arrival of the banner sent by the peer. When ceph_sock_state_change() notices the connect has completed, it schedules work to process the socket via con_work(). During this time the peer is writing its banner, and arrival of the peer banner races with con_work(). If con_work() calls try_read() before the peer banner arrives, there is nothing for it to do, after which con_work() calls try_write() to send the client's banner. In this case Ceph's protocol negotiation can complete succesfully. The server-side messenger immediately sends its banner and addresses after accepting a connect request, *before* actually attempting to read or verify the banner from the client. As a result, it is possible for the banner from the server to arrive before con_work() calls try_read(). If that happens, try_read() will read the banner and prepare protocol negotiation info via prepare_write_connect(). prepare_write_connect() calls con_out_kvec_reset(), which discards the as-yet-unsent client banner. Next, con_work() calls try_write(), which sends the protocol negotiation info rather than the banner that the peer is expecting. The result is that the peer sees an invalid banner, and the client reports "negotiation failed". Fix this by moving con_out_kvec_reset() out of prepare_write_connect() to its callers at all locations except the one where the banner might still need to be sent. [elder@inktak.com: added note about server-side behavior] Signed-off-by:
Jim Schutt <jaschut@sandia.gov> Reviewed-by:
Alex Elder <elder@inktank.com>
-
- 31 Jul, 2012 19 commits
-
-
Sage Weil authored
We drop the lock when calling the ->alloc_msg() con op, which means we need to (a) not clobber con->in_msg without the mutex held, and (b) we need to verify that we are still in the OPEN state when we retake it to avoid causing any mayhem. If the state does change, -EAGAIN will get us back to con_work() and loop. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Sage Weil authored
This function's calling convention is very limiting. In particular, we can't return any error other than ENOMEM (and only implicitly), which is a problem (see next patch). Instead, return an normal 0 or error code, and make the skip a pointer output parameter. Drop the useless in_hdr argument (we have the con pointer). Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Sage Weil authored
The ceph_fault() function takes the con mutex, so we should avoid dropping it before calling it. This fixes a potential race with another thread calling ceph_con_close(), or _open(), or similar (we don't reverify con->state after retaking the lock). Add annotation so that lockdep realizes we will drop the mutex before returning. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Sage Weil authored
We drop the con mutex when delivering a message. When we retake the lock, we need to verify we are still in the OPEN state before preparing to read the next tag, or else we risk stepping on a connection that has been closed. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Sage Weil authored
If the connect() call immediately fails such that sock == NULL, we still need con_close_socket() to reset our socket state to CLOSED. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Sage Weil authored
Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
Rename flags with CON_FLAG prefix, move the definitions into the c file, and (better) document their meaning. Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
Use a simple set of 6 enumerated values for the socket states (CON_STATE_*) and use those instead of the state bits. All of the con->state checks are now under the protection of the con mutex, so this is safe. It also simplifies many of the state checks because we can check for anything other than the expected state instead of various bits for races we can think of. This appears to hold up well to stress testing both with and without socket failure injection on the server side. Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
If we are CLOSED, the socket is closed and we won't get these. Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
It is simpler to do this immediately, since we already hold the con mutex. It also avoids the need to deal with a not-quite-CLOSED socket in con_work. Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
If the state is CLOSED or OPENING, we shouldn't have a socket. Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
Take the con mutex before checking whether the connection is closed to avoid racing with someone else closing it. Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
Avoid dropping and retaking con->mutex in the ceph_con_send() case by leaving locking up to the caller. Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
If we fault on a lossy connection, we should still close the socket immediately, and do so under the con mutex. We should also take the con mutex before printing out the state bits in the debug output. Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
We exponentially back off when we encounter connection errors. If several errors accumulate, we will eventually wait ages before even trying to reconnect. Fix this by resetting the backoff counter after a successful negotiation/ connection with the remote node. Fixes ceph issue #2802. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Yehuda Sadeh <yehuda@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Sage Weil authored
Take the con mutex while we are initiating a ceph open. This is necessary because the may have previously been in use and then closed, which could result in a racing workqueue running con_work(). Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Yehuda Sadeh <yehuda@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Sage Weil authored
Previously, we were opportunistically initializing the bio_iter if it appeared to be uninitialized in the middle of the read path. The problem is that a sequence like: - start reading message - initialize bio_iter - read half a message - messenger fault, reconnect - restart reading message - ** bio_iter now non-NULL, not reinitialized ** - read past end of bio, crash Instead, initialize the bio_iter unconditionally when we allocate/claim the message for read. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Yehuda Sadeh <yehuda@inktank.com>
-
Sage Weil authored
Hold the mutex while twiddling all of the state bits to avoid possible races. While we're here, make not of why we cannot close the socket directly. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Yehuda Sadeh <yehuda@inktank.com>
-
Sage Weil authored
We need to set error_msg to something useful before calling ceph_fault(); do so here for try_{read,write}(). This is more informative than libceph: osd0 192.168.106.220:6801 (null) Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Yehuda Sadeh <yehuda@inktank.com>
-
- 30 Jul, 2012 2 commits
-
-
Guanjun He authored
Add an atomic variable 'stopping' as flag in struct ceph_messenger, set this flag to 1 in function ceph_destroy_client(), and add the condition code in function ceph_data_ready() to test the flag value, if true(1), just return. Signed-off-by:
Guanjun He <gjhe@suse.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
In ancient times, the messenger could both initiate and accept connections. An artifact if that was data structures to store/process an incoming ceph_msg_connect request and send an outgoing ceph_msg_connect_reply. Sadly, the negotiation code was referencing those structures and ignoring important information (like the peer's connect_seq) from the correct ones. Among other things, this fixes tight reconnect loops where the server sends RETRY_SESSION and we (the client) retries with the same connect_seq as last time. This bug pretty easily triggered by injecting socket failures on the MDS and running some fs workload like workunits/direct_io/test_sync_io. Signed-off-by:
Sage Weil <sage@inktank.com>
-
- 18 Jul, 2012 1 commit
-
-
Sage Weil authored
In ancient times, the messenger could both initiate and accept connections. An artifact if that was data structures to store/process an incoming ceph_msg_connect request and send an outgoing ceph_msg_connect_reply. Sadly, the negotiation code was referencing those structures and ignoring important information (like the peer's connect_seq) from the correct ones. Among other things, this fixes tight reconnect loops where the server sends RETRY_SESSION and we (the client) retries with the same connect_seq as last time. This bug pretty easily triggered by injecting socket failures on the MDS and running some fs workload like workunits/direct_io/test_sync_io. Signed-off-by:
Sage Weil <sage@inktank.com>
-
- 06 Jul, 2012 3 commits
-
-
Sage Weil authored
It is possible to close a socket that is in the OPENING state. For example, it can happen if ceph_con_close() is called on the con before the TCP connection is established. con_work() will come around and shut down the socket. Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
The peer name may change on each open attempt, even when the connection is reused. Signed-off-by:
Sage Weil <sage@inktank.com>
-
Alex Elder authored
Sage liked the state diagram I put in my commit description so I'm putting it in with the code. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-