- 14 May, 2012 1 commit
-
-
Alex Elder authored
From Al Viro <viro@zeniv.linux.org.uk> Al Viro noticed that we were using a non-cpu-encoded value in a switch statement in osd_req_encode_op(). The result would clearly not work correctly on a big-endian machine. Signed-off-by:
Alex Elder <elder@dreamhost.com>
-
- 07 May, 2012 10 commits
-
-
Sage Weil authored
If we get an error code from crush_do_rule(), print an error to the console. Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
Reflects ceph.git commit 46d63d98434b3bc9dad2fc9ab23cbaedc3bcb0e4. Reported-by:
Alexander Lyakas <alex.bolshoy@gmail.com> Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
Fix the node weight lookup for tree buckets by using a correct accessor. Reflects ceph.git commit d287ade5bcbdca82a3aef145b92924cf1e856733. Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
These were used for the ill-fated forcefeed feature. Remove them. Reflects ceph.git commit ebdf80edfecfbd5a842b71fbe5732857994380c1. Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
Remove forcefeed functionality from CRUSH. This is an ugly misfeature that is mostly useless and unused. Remove it. Reflects ceph.git commit ed974b5000f2851207d860a651809af4a1867942. Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Sage Weil <sage@inktank.com> Conflicts: net/ceph/crush/mapper.c
-
Sage Weil authored
Use a temporary variable here to avoid repeated array lookups and clean up the code a bit. This reflects ceph.git commit 6b5be27634ad307b471a5bf0db85c4f5c834885f. Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
If we get a map that doesn't make sense, error out or ignore the badness instead of BUGging out. This reflects the ceph.git commits 9895f0bff7dc68e9b49b572613d242315fb11b6c and 8ded26472058d5205803f244c2f33cb6cb10de79. Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
This small adjustment reflects a change that was made in ceph.git commit af6a9f30696c900a2a8bd7ae24e8ed15fb4964bb, about 6 months ago. An N-1 search is not exhaustive. Fixed ceph.git bug #1594. Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
Move various types from int -> __u32 (or similar), and add const as appropriate. This reflects changes that have been present in the userland implementation for some time. Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
This was an ill-conceived feature that has been removed from Ceph. Do this gracefully: - reject attempts to specify a preferred_osd via the ioctl - stop exposing this information via virtual xattrs - always fill in -1 for requests, in case we talk to an older server - don't calculate preferred_osd placements/pgids Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Sage Weil <sage@inktank.com>
-
- 22 Mar, 2012 26 commits
-
-
Alex Elder authored
In write_partial_msg_pages(), every case now does an identical call to kmap(page). Instead, just call it once inside the CRC-computing block where it's needed. Move the definition of kaddr inside that block, and make it a (char *) to ensure portable pointer arithmetic. We still don't kunmap() it until after the sendpage() call, in case that also ends up needing to use the mapping. Signed-off-by:
Alex Elder <elder@dreamhost.com> Reviewed-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
In write_partial_msg_pages() there is a local variable used to track the starting offset within a bio segment to use. Its name, "page_shift" defies the Linux convention of using that name for log-base-2(page size). Since it's only used in the bio case rename it "bio_offset". Use it along with the page_pos field to compute the memory offset when computing CRC's in that function. This makes the bio case match the others more closely. Signed-off-by:
Alex Elder <elder@dreamhost.com> Reviewed-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
There's not a lot of benefit to zero_page_address, which basically holds a mapping of the zero page through the life of the messenger module. Even with our own mapping, the sendpage interface where it's used may need to kmap() it again. It's almost certain to be in low memory anyway. So stop treating the zero page specially in write_partial_msg_pages() and just get rid of zero_page_address entirely. Signed-off-by:
Alex Elder <elder@dreamhost.com> Reviewed-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
Make ceph_tcp_sendpage() be the only place kernel_sendpage() is used, by using this helper in write_partial_msg_pages(). Signed-off-by:
Alex Elder <elder@dreamhost.com> Reviewed-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
If a message queued for send gets revoked, zeroes are sent over the wire instead of any unsent data. This is done by constructing a message and passing it to kernel_sendmsg() via ceph_tcp_sendmsg(). Since we are already working with a page in this case we can use the sendpage interface instead. Create a new ceph_tcp_sendpage() helper that sets up flags to match the way ceph_tcp_sendmsg() does now. Signed-off-by:
Alex Elder <elder@dreamhost.com> Reviewed-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
CRC's are computed for all messages between ceph entities. The CRC computation for the data portion of message can optionally be disabled using the "nocrc" (common) ceph option. The default is for CRC computation for the data portion to be enabled. Unfortunately, the code that implements this feature interprets the feature flag wrong, meaning that by default the CRC's have *not* been computed (or checked) for the data portion of messages unless the "nocrc" option was supplied. Fix this, in write_partial_msg_pages() and read_partial_message(). Also change the flag variable in write_partial_msg_pages() to be "no_datacrc" to match the usage elsewhere in the file. This fixes http://tracker.newdream.net/issues/2064 Signed-off-by:
Alex Elder <elder@dreamhost.com> Reviewed-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
Nothing too big here. - define the size of the buffer used for consuming ignored incoming data using a symbolic constant - simplify the condition determining whether to unmap the page in write_partial_msg_pages(): do it for crc but not if the page is the zero page Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
Make a small change in the code that counts down kvecs consumed by a ceph_tcp_sendmsg() call. Same functionality, just blocked out a little differently. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
Move blocks of code out of loops in read_partial_message_section() and read_partial_message(). They were only was getting called at the end of the last iteration of the loop anyway. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
Calculate CRC in a separate step from rearranging the byte order of the result, to improve clarity and readability. Use offsetof() to determine the number of bytes to include in the CRC calculation. In read_partial_message(), switch which value gets byte-swapped, since the just-computed CRC is already likely to be in a register. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
Change the name (and type) of a few CRC-related Boolean local variables so they contain the word "do", to distingish their purpose from variables used for holding an actual CRC value. Note that in the process of doing this I identified a fairly serious logic error in write_partial_msg_pages(): the value of "do_crc" assigned appears to be the opposite of what it should be. No attempt to fix this is made here; this change preserves the erroneous behavior. The problem I found is documented here: http://tracker.newdream.net/issues/2064 Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
Many ceph-related Boolean options offer the ability to both enable and disable a feature. For all those that don't offer this, add a new option so that they do. Note that ceph_show_options()--which reports mount options currently in effect--only reports the option if it is different from the default value. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
This gathers a number of very minor changes: - use %hu when formatting the a socket address's address family - null out the ceph_msgr_wq pointer after the queue has been destroyed - drop a needless cast in ceph_write_space() - add a WARN() call in ceph_state_change() in the event an unrecognized socket state is encountered - rearrange the logic in ceph_con_get() and ceph_con_put() so that: - the reference counts are only atomically read once - the values displayed via dout() calls are known to be meaningful at the time they are formatted Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
There is no real need for ceph_tcp_connect() to return the socket pointer it creates, since it already assigns it to con->sock, which is visible to the caller. Instead, have it return an error code, which tidies things up a bit. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
Define a helper function to perform various cleanup operations. Use it both in the exit routine and in the init routine in the event of an error. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
The messenger workqueue has no need to be public. So give it static scope. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
Encapsulate the operation of adding a new chunk of data to the next open slot in a ceph_connection's out_kvec array. Also add a "reset" operation to make subsequent add operations start at the beginning of the array again. Use these routines throughout, avoiding duplicate code and ensuring all calls are handled consistently. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
One of the arguments to prepare_write_connect() indicates whether it is being called immediately after a call to prepare_write_banner(). Move the prepare_write_banner() call inside prepare_write_connect(), and reinterpret (and rename) the "after_banner" argument so it indicates that prepare_write_connect() should *make* the call rather than should know it has already been made. This was split out from the next patch to highlight this change in logic. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
ceph_parse_options() takes the address of a pointer as an argument and uses it to return the address of an allocated structure if successful. With this interface is not evident at call sites that the pointer is always initialized. Change the interface to return the address instead (or a pointer-coded error code) to make the validity of the returned pointer obvious. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
This fixes some spots where a type cast to (void *) was used as as a universal type hiding mechanism. Instead, properly cast the type to the intended target type. Signed-off-by:
Alex Elder <elder@newdream.net> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
This eliminates type casts in some places where they are not required. Signed-off-by:
Alex Elder <elder@newdream.net> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
A spinlock is used to protect a value used for selecting an array index for a string used for formatting a socket address for human consumption. The index is reset to 0 if it ever reaches the maximum index value. Instead, use an ever-increasing atomic variable as a sequence number, and compute the array index by masking off all but the sequence number's lowest bits. Make the number of entries in the array a power of two to allow the use of such a mask (to avoid jumps in the index value when the sequence number wraps). The length of these strings is somewhat arbitrarily set at 60 bytes. The worst-case length of a string produced is 54 bytes, for an IPv6 address that can't be shortened, e.g.: [1234:5678:9abc:def0:1111:2222:123.234.210.100]:32767 Change it so we arbitrarily use 64 bytes instead; if nothing else it will make the array of these line up better in hex dumps. Rename a few things to reinforce the distinction between the number of strings in the array and the length of individual strings. Signed-off-by:
Alex Elder <elder@newdream.net> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
Rearrange ceph_tcp_connect() a bit, making use of "else" rather than re-testing a value with consecutive "if" statements. Don't record a connection's socket pointer unless the connect operation is successful. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Alex Elder authored
Each messenger allocates a page to be used when writing zeroes out in the event of error or other abnormal condition. Instead, use the kernel ZERO_PAGE() for that purpose. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Xi Wang authored
The existing overflow check (n > ULONG_MAX / b) didn't work, because n = ULONG_MAX / b would both bypass the check and still overflow the allocation size a + n * b. The correct check should be (n > (ULONG_MAX - a) / b). Signed-off-by:
Xi Wang <xi.wang@gmail.com> Signed-off-by:
Sage Weil <sage@newdream.net>
-
Jim Schutt authored
The Ceph messenger would sometimes queue multiple work items to write data to a socket when the socket buffer was full. Fix this problem by making ceph_write_space() use SOCK_NOSPACE in the same way that net/core/stream.c:sk_stream_write_space() does, i.e., clearing it only when sufficient space is available in the socket buffer. Signed-off-by:
Jim Schutt <jaschut@sandia.gov> Reviewed-by:
Alex Elder <elder@dreamhost.com>
-
- 02 Feb, 2012 1 commit
-
-
Sage Weil authored
Initializing debufs under monc->mutex introduces a lock dependency for sb->s_type->i_mutex_key, which (combined with several other dependencies) leads to an annoying lockdep warning. There's no particular reason to do the debugfs setup under this lock, so move it out. It used to be the case that our first monmap could come from the OSD; that is no longer the case with recent servers, so we will reliably set up the client entry during the initial authentication. We don't have to worry about racing with debugfs teardown by ceph_debugfs_client_cleanup() because ceph_destroy_client() calls ceph_msgr_flush() first, which will wait for the message dispatch work to complete (and the debugfs init to complete). Fixes: #1940 Signed-off-by:
Sage Weil <sage@newdream.net>
-
- 10 Jan, 2012 2 commits
-
-
Sage Weil authored
Signed-off-by:
Sage Weil <sage@newdream.net>
-
Sage Weil authored
Signed-off-by:
Sage Weil <sage@newdream.net>
-