- 02 May, 2013 20 commits
-
-
Alex Elder authored
Some values printed are not (necessarily) in CPU order. We already have a copy of the converted versions, so use them. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
This is probably unnecessary but the code read as if it were wrong in read_partial_message(). Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
In ceph_con_in_msg_alloc() it is possible for a connection's alloc_msg method to indicate an incoming message should be skipped. By default, read_partial_message() initializes the skip variable to 0 before it gets provided to ceph_con_in_msg_alloc(). The osd client, mon client, and mds client each supply an alloc_msg method. The mds client always assigns skip to be 0. The other two leave the skip value of as-is, or assigns it to zero, except: - if no (osd or mon) request having the given tid is found, in which case skip is set to 1 and NULL is returned; or - in the osd client, if the data of the reply message is not adequate to hold the message to be read, it assigns skip value 1 and returns NULL. So the returned message pointer will always be NULL if skip is ever non-zero. Clean up the logic a bit in ceph_con_in_msg_alloc() to make this state of affairs more obvious. Add a comment explaining how a null message pointer can mean either a message that should be skipped or a problem allocating a message. This resolves: http://tracker.ceph.com/issues/4324 Reported-by:
Greg Farnum <greg@inktank.com> Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Greg Farnum <greg@inktank.com>
-
Alex Elder authored
An osd request defines information about where data to be read should be placed as well as where data to write comes from. Currently these are represented by common fields. Keep information about data for writing separate from data to be read by splitting these into data_in and data_out fields. This is the key patch in this whole series, in that it actually identifies which osd requests generate outgoing data and which generate incoming data. It's less obvious (currently) that an osd CALL op generates both outgoing and incoming data; that's the focus of some upcoming work. This resolves: http://tracker.ceph.com/issues/4127 Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
An osd request uses either pages or a bio list for its data. Use a union to record information about the two, and add a data type tag to select between them. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
Pull the fields in an osd request structure that define the data for the request out into a separate structure. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
Currently ceph_osdc_new_request() assigns an osd request's r_num_pages and r_alignment fields. The only thing it does after that is call ceph_osdc_build_request(), and that doesn't need those fields to be assigned. Move the assignment of those fields out of ceph_osdc_new_request() and into its caller. As a result, the page_align parameter is no longer used, so get rid of it. Note that in ceph_sync_write(), the value for req->r_num_pages had already been calculated earlier (as num_pages, and fortunately it was computed the same way). So don't bother recomputing it, but because it's not needed earlier, move that calculation after the call to ceph_osdc_new_request(). Hold off making the assignment to r_alignment, doing it instead r_pages and r_num_pages are getting set. Similarly, in start_read(), nr_pages already holds the number of pages in the array (and is calculated the same way), so there's no need to recompute it. Move the assignment of the page alignment down with the others there as well. This and the next few patches are preparation work for: http://tracker.ceph.com/issues/4127 Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
The only user of the ceph messenger that doesn't define an alloc_msg method is the mds client. Define one, such that it works just like it did before, and simplify ceph_con_in_msg_alloc() by assuming the alloc_msg method is always present. This and the next patch resolve: http://tracker.ceph.com/issues/4322 Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Greg Farnum <greg@inktank.com>
-
Alex Elder authored
In ceph_con_in_msg_alloc(), if no alloc_msg method is defined for a connection a new message is allocated with ceph_msg_new(). Drop the mutex before making this call, and make sure we're still connected when we get it back again. This is preparing for the next patch, which ensures all connections define an alloc_msg method, and then handles them all the same way. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Greg Farnum <greg@inktank.com>
-
Alex Elder authored
The purpose of ceph_calc_object_layout() is to fill in the pool number and seed for a ceph_pg structure provided, based on a given osd map and target object id. Currently that function takes a file layout parameter, but the only thing used out of that is its pool number. Change the function so it takes a pool number rather than the full file layout structure. Only update the ceph_pg if the pool is found in the osd map. Get rid of few useless lines of code from the function while there. Since the function now very clearly just fills in the ceph_pg structure it's provided, rename it ceph_calc_ceph_pg(). Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
The pagelist_count field is never actually used, so get rid of it. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
The new cases added to osd_req_encode_op() caused a new sparse error, which highlighted an existing problem that had been overlooked since it was originally checked in. When an unsupported opcode is found the destination rather than the source opcode was being used in the error message. The two differ in their byte order, and we want to be using the one in the source. Fix the problem in both spots. Reported-by:
Fengguang Wu <fengguang.wu@intel.com> Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
An osd request marked to linger will be re-submitted in the event a connection to the target osd gets dropped. Currently, if there is a callback function associated with a request it will be called each time a request is submitted--which for lingering requests can be more than once. Change it so a request--including lingering ones--will get completed (from the perspective of the user of the osd client) exactly once. This resolves: http://tracker.ceph.com/issues/3967 Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
The page alignment field for a request is currently set in ceph_osdc_build_request(). It's not needed at that point nor do either of its callers need that value assigned at any point before they call ceph_osdc_start_request(). So move that assignment into ceph_osdc_start_request(). Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
Use distinct fields for tracking the number of pages in a message's page array and in a message's page list. Currently only one or the other is used at a time, but that will be changing soon. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
The only remaining reason to pass the osd request to calc_layout() is to fill in its r_num_pages and r_page_alignment fields. Once it fills those in, it doesn't do anything more with them. We can therefore move those assignments into the caller, and get rid of the "req" parameter entirely. Note, however, that the only caller is ceph_osdc_new_request(), and that immediately overwrites those fields with values based on its passed-in page offset. So the assignment inside calc_layout() was redundant anyway. This resolves: http://tracker.ceph.com/issues/4262 Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
Move the formatting of the object name (oid) to use for an object request into the caller of calc_layout(). This makes the "vino" parameter no longer necessary, so get rid of it. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
Have calc_layout() pass the computed object number back to its caller. (This is a small step to simplify review.) Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
The bio_seg field is used by the ceph messenger in iterating through a bio. It should never have a negative value, so make it an unsigned. (I contemplated making it unsigned short to match the struct bio definition, but it offered no benefit.) Change variables used to hold bio_seg values to all be unsigned as well. Change two variable names in init_bio_iter() to match the convention used everywhere else. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
If an invalid layout is provided to ceph_osdc_new_request(), its call to calc_layout() might return an error. At that point in the function we've already allocated an osd request structure, so we need to free it (drop a reference) in the event such an error occurs. The only other value calc_layout() will return is 0, so make that explicit in the successful case. This resolves: http://tracker.ceph.com/issues/4240 Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
- 11 Mar, 2013 1 commit
-
-
Sage Weil authored
In 4f6a7e5e we effectively dropped support for the legacy encoding for the OSDMap and incremental. However, we didn't fix the decoding for the pgid. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Yehuda Sadeh <yehuda@inktank.com>
-
- 26 Feb, 2013 6 commits
-
-
Sage Weil authored
The legacy behavior adds the pgid seed and pool together as the input for CRUSH. That is problematic because each pool's PGs end up mapping to the same OSDs: 1.5 == 2.4 == 3.3 == ... Instead, if the HASHPSPOOL flag is set, we has the ps and pool together and feed that into CRUSH. This ensures that two adjacent pools will map to an independent pseudorandom set of OSDs. Advertise our support for this via a protocol feature flag. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Sage Weil authored
Use the new version of the encoding for osd requests and replies. In the process, update the way we are tracking request ops and reply lengths and results in the struct ceph_osd_request. Update the rbd and fs/ceph users appropriately. The main changes are: - we keep pointers into the request memory for fields we need to update each time the request is sent out over the wire - we keep information about the result in an array in the request struct where the users can easily get at it. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Sage Weil authored
Instead of using the old ceph_object_layout struct, update our internal ceph_calc_object_layout method to use the ceph_pg type. This allows us to pass the full 32-bit precision of the pgid.seed to the callers. It also allows some callers to avoid reaching into the request structures for the struct ceph_object_layout fields. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Sage Weil authored
Support (and require) the PGID64, PGPOOL3, and OSDENC protocol features. These have been present in ceph.git since v0.42, Feb 2012. Require these features to simplify support; nobody is running older userspace. Note that the new request and reply encoding is still not in place, so the new code is not yet functional. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Sage Weil authored
Always decode data into our cpu-native ceph_pg type that has the correct field widths. Limit any remaining uses of ceph_pg_v1 to dealing with the legacy protocol. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Sage Weil authored
Rename the old version this type to distinguish it from the new version. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
- 25 Feb, 2013 6 commits
-
-
Alex Elder authored
This just converts a manually-implemented loop into a do..while loop in con_work(). It also moves handling of EAGAIN inside the blocks where it's already been determined an error code was returned. Also update a few dout() calls near the affected code for consistency. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
This just rearranges the logic in con_work() a little bit so that a flag is used to indicate a fault has occurred. This allows both the fault and non-fault case to be handled the same way and avoids a couple of nearly consecutive gotos. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
An error occurring on a ceph connection is treated as a fault, causing the connection to be reset. The initial part of this fault handling has to be done while holding the connection mutex, but it must then be dropped for the last part. Separate the part of this fault handling that executes without the lock into its own function, con_fault_finish(). Move the call to this new function, as well as call that drops the connection mutex, into ceph_fault(). Rename that function con_fault() to reflect that it's only handling the connection part of the fault handling. The motivation for this was a warning from sparse about the locking being done here. Rearranging things this way keeps all the mutex manipulation within ceph_fault(), and this stops sparse from complaining. This partially resolves: http://tracker.ceph.com/issues/4184 Reported-by:
Fengguang Wu <fengguang.wu@intel.com> Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.d...
-
Alex Elder authored
Collect the code that tests for and implements a backoff delay for a ceph connection into a new function, ceph_backoff(). Make the debug output messages in that part of the code report things consistently by reporting a message in the socket closed case, and by making the one for PREOPEN state report the connection pointer like the rest. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
Eliminate most of the problems in the libceph code that cause sparse to issue warnings. - Convert functions that are never referenced externally to have static scope. - Pass NULL rather than 0 for a pointer argument in one spot in ceph_monc_delete_snapid() This partially resolves: http://tracker.ceph.com/issues/4184 Reported-by:
Fengguang Wu <fengguang.wu@intel.com> Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
Define and use functions that encapsulate operations performed on a connection's flags. This resolves: http://tracker.ceph.com/issues/4234 Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
- 20 Feb, 2013 4 commits
-
-
Alex Elder authored
The return values provided for ceph_copy_to_page_vector() and ceph_copy_from_page_vector() serve no purpose, so get rid of them. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
The functions used for working with ceph page vectors are defined with char pointers, but they're really intended to operate on untyped data. Change the types of these function parameters to (void *) to reflect this. (Note that the functions now assume void pointer arithmetic works like arithmetic on char pointers.) Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
Add support for CEPH_OSD_OP_STAT operations in the osd client and in rbd. This operation sends no data to the osd; everything required is encoded in identity of the target object. The result will be ENOENT if the object doesn't exist. If it does exist and no other error occurs the server returns the size and last modification time of the target object as output data (in little endian format). The size is a 64 bit unsigned and the time is ceph_timespec structure (two unsigned 32-bit integers, representing a seconds and nanoseconds value). This resolves: http://tracker.ceph.com/issues/4007 Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
Simplify the way the data length recorded in a message header is calculated in ceph_osdc_build_request(). Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
- 18 Feb, 2013 3 commits
-
-
Alex Elder authored
In osd_req_encode_op() there are a few cases that handle osd opcodes that are never used in the kernel. The presence of this code gives the impression it's correct (which really can't be assumed), and may impose some unnecessary restrictions on some upcoming refactoring of this code. So delete this effectively dead code, and report uses of the previously handled cases as unsupported. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
If osd_req_encode_op() is given any opcode it doesn't recognize it reports an error. This patch fleshes out that routine to distinguish between well-defined but unsupported values and values that are simply bogus. This and the next commit are related to: http://tracker.ceph.com/issues/4126 Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
Update ceph_osd_op_name() to include the newly-added definitions in "rados.h", and to match its counterpart in the user space code. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-