Commits · f42299e6c3883c69c14079b8c88fe33815b2dcc3 · OpenBMC Firmware / talos-obmc-linux

22 Mar, 2012 19 commits

libceph: small refactor in write_partial_kvec() · f42299e6

Alex Elder authored 13 years ago


Make a small change in the code that counts down kvecs consumed by
a ceph_tcp_sendmsg() call.  Same functionality, just blocked out
a little differently.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>

f42299e6

libceph: do crc calculations outside loop · fe3ad593

Alex Elder authored 13 years ago


Move blocks of code out of loops in read_partial_message_section()
and read_partial_message().  They were only was getting called at
the end of the last iteration of the loop anyway.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>

fe3ad593

libceph: separate CRC calculation from byte swapping · a9a0c51a

Alex Elder authored 13 years ago


Calculate CRC in a separate step from rearranging the byte order
of the result, to improve clarity and readability.

Use offsetof() to determine the number of bytes to include in the
CRC calculation.

In read_partial_message(), switch which value gets byte-swapped,
since the just-computed CRC is already likely to be in a register.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>

a9a0c51a

libceph: use "do" in CRC-related Boolean variables · bca064d2

Alex Elder authored 13 years ago

Change the name (and type) of a few CRC-related Boolean local
variables so they contain the word "do", to distingish their purpose
from variables used for holding an actual CRC value.

Note that in the process of doing this I identified a fairly serious
logic error in write_partial_msg_pages():  the value of "do_crc"
assigned appears to be the opposite of what it should be.  No
attempt to fix this is made here; this change preserves the
erroneous behavior.  The problem I found is documented here:
    http://tracker.newdream.net/issues/2064

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>

bca064d2

ceph: ensure Boolean options support both senses · cffaba15

Alex Elder authored 13 years ago


Many ceph-related Boolean options offer the ability to both enable
and disable a feature.  For all those that don't offer this, add
a new option so that they do.

Note that ceph_show_options()--which reports mount options currently
in effect--only reports the option if it is different from the
default value.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>

cffaba15

libceph: a few small changes · d3002b97

Alex Elder authored 13 years ago


This gathers a number of very minor changes:
    - use %hu when formatting the a socket address's address family
    - null out the ceph_msgr_wq pointer after the queue has been
      destroyed
    - drop a needless cast in ceph_write_space()
    - add a WARN() call in ceph_state_change() in the event an
      unrecognized socket state is encountered
    - rearrange the logic in ceph_con_get() and ceph_con_put() so
      that:
        - the reference counts are only atomically read once
	- the values displayed via dout() calls are known to
	  be meaningful at the time they are formatted
Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>

d3002b97

libceph: make ceph_tcp_connect() return int · 41617d0c

Alex Elder authored 13 years ago


There is no real need for ceph_tcp_connect() to return the socket
pointer it creates, since it already assigns it to con->sock, which
is visible to the caller.  Instead, have it return an error code,
which tidies things up a bit.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>

41617d0c

libceph: encapsulate some messenger cleanup code · 6173d1f0

Alex Elder authored 13 years ago


Define a helper function to perform various cleanup operations.  Use
it both in the exit routine and in the init routine in the event of
an error.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>

6173d1f0

libceph: make ceph_msgr_wq private · e0f43c94

Alex Elder authored 13 years ago


The messenger workqueue has no need to be public.  So give it static
scope.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>

e0f43c94

libceph: encapsulate connection kvec operations · 859eb799

Alex Elder authored 13 years ago


Encapsulate the operation of adding a new chunk of data to the next
open slot in a ceph_connection's out_kvec array.  Also add a "reset"
operation to make subsequent add operations start at the beginning
of the array again.

Use these routines throughout, avoiding duplicate code and ensuring
all calls are handled consistently.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>

859eb799

libceph: move prepare_write_banner() · 963be4d7

Alex Elder authored 13 years ago


One of the arguments to prepare_write_connect() indicates whether it
is being called immediately after a call to prepare_write_banner().
Move the prepare_write_banner() call inside prepare_write_connect(),
and reinterpret (and rename) the "after_banner" argument so it
indicates that prepare_write_connect() should *make* the call
rather than should know it has already been made.

This was split out from the next patch to highlight this change in
logic.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>

963be4d7

rbd: make ceph_parse_options() return a pointer · ee57741c

Alex Elder authored 13 years ago


ceph_parse_options() takes the address of a pointer as an argument
and uses it to return the address of an allocated structure if
successful.  With this interface is not evident at call sites that
the pointer is always initialized.  Change the interface to return
the address instead (or a pointer-coded error code) to make the
validity of the returned pointer obvious.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>

ee57741c

ceph: eliminate some abusive casts · 99f0f3b2

Alex Elder authored 13 years ago


This fixes some spots where a type cast to (void *) was used as
as a universal type hiding mechanism.  Instead, properly cast the
type to the intended target type.
Signed-off-by: Alex Elder <elder@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

99f0f3b2

ceph: eliminate some needless casts · bd406145

Alex Elder authored 13 years ago


This eliminates type casts in some places where they are not
required.
Signed-off-by: Alex Elder <elder@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

bd406145

ceph: kill addr_str_lock spinlock; use atomic instead · f64a9317

Alex Elder authored 13 years ago


A spinlock is used to protect a value used for selecting an array
index for a string used for formatting a socket address for human
consumption.  The index is reset to 0 if it ever reaches the maximum
index value.

Instead, use an ever-increasing atomic variable as a sequence
number, and compute the array index by masking off all but the
sequence number's lowest bits.  Make the number of entries in the
array a power of two to allow the use of such a mask (to avoid jumps
in the index value when the sequence number wraps).

The length of these strings is somewhat arbitrarily set at 60 bytes.
The worst-case length of a string produced is 54 bytes, for an IPv6
address that can't be shortened, e.g.:
    [1234:5678:9abc:def0:1111:2222:123.234.210.100]:32767
Change it so we arbitrarily use 64 bytes instead; if nothing else
it will make the array of these line up better in hex dumps.

Rename a few things to reinforce the distinction between the number
of strings in the array and the length of individual strings.
Signed-off-by: Alex Elder <elder@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

f64a9317

ceph: make use of "else" where appropriate · a5bc3129

Alex Elder authored 13 years ago


Rearrange ceph_tcp_connect() a bit, making use of "else" rather than
re-testing a value with consecutive "if" statements.  Don't record a
connection's socket pointer unless the connect operation is
successful.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>

a5bc3129

ceph: use a shared zero page rather than one per messenger · 57666519

Alex Elder authored 13 years ago


Each messenger allocates a page to be used when writing zeroes
out in the event of error or other abnormal condition.  Instead,
use the kernel ZERO_PAGE() for that purpose.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>

57666519

libceph: fix overflow check in crush_decode() · 64486697

Xi Wang authored 13 years ago


The existing overflow check (n > ULONG_MAX / b) didn't work, because
n = ULONG_MAX / b would both bypass the check and still overflow the
allocation size a + n * b.

The correct check should be (n > (ULONG_MAX - a) / b).
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>

64486697

net/ceph: Only clear SOCK_NOSPACE when there is sufficient space in the socket buffer · 182fac26

Jim Schutt authored 13 years ago


The Ceph messenger would sometimes queue multiple work items to write
data to a socket when the socket buffer was full.

Fix this problem by making ceph_write_space() use SOCK_NOSPACE in the
same way that net/core/stream.c:sk_stream_write_space() does, i.e.,
clearing it only when sufficient space is available in the socket buffer.
Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Reviewed-by: Alex Elder <elder@dreamhost.com>

182fac26

02 Feb, 2012 1 commit

ceph: initialize client debugfs outside of monc->mutex · ab434b60

Sage Weil authored 13 years ago


Initializing debufs under monc->mutex introduces a lock dependency for
sb->s_type->i_mutex_key, which (combined with several other dependencies)
leads to an annoying lockdep warning.  There's no particular reason to do
the debugfs setup under this lock, so move it out.

It used to be the case that our first monmap could come from the OSD; that
is no longer the case with recent servers, so we will reliably set up the
client entry during the initial authentication.

We don't have to worry about racing with debugfs teardown by
ceph_debugfs_client_cleanup() because ceph_destroy_client() calls
ceph_msgr_flush() first, which will wait for the message dispatch work
to complete (and the debugfs init to complete).

Fixes: #1940
Signed-off-by: Sage Weil <sage@newdream.net>

ab434b60

10 Jan, 2012 3 commits

libceph: remove useless return value for osd_client __send_request() · 56e925b6
Sage Weil authored 13 years ago
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
56e925b6
crush: fix force for non-root TAKE · e11b05d3
Sage Weil authored 13 years ago
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
e11b05d3

ceph: Use kmemdup rather than duplicating its implementation · 18648256

Thomas Meyer authored 13 years ago


Use kmemdup rather than duplicating its implementation

The semantic patch that makes this change is available
in scripts/coccinelle/api/memdup.cocci.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Sage Weil <sage@newdream.net>

18648256

12 Dec, 2011 1 commit

crush: fix mapping calculation when force argument doesn't exist · f1932fc1

Sage Weil authored 13 years ago


If the force argument isn't valid, we should continue calculating a
mapping as if it weren't specified.
Signed-off-by: Sage Weil <sage@newdream.net>

f1932fc1

11 Nov, 2011 1 commit

libceph: Allocate larger oid buffer in request msgs · 224736d9

Stratos Psomadakis authored 13 years ago


ceph_osd_request struct allocates a 40-byte buffer for object names.
RBD image names can be up to 96 chars long (100 with the .rbd suffix),
which results in the object name for the image being truncated, and a
subsequent map failure.

Increase the oid buffer in request messages, in order to avoid the
truncation.
Signed-off-by: Stratos Psomadakis <psomas@grnet.gr>
Signed-off-by: Sage Weil <sage@newdream.net>

224736d9

31 Oct, 2011 1 commit

net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules · bc3b2d7f

Paul Gortmaker authored 13 years ago


These files are non modular, but need to export symbols using
the macros now living in export.h -- call out the include so
that things won't break when we remove the implicit presence
of module.h from everywhere.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

bc3b2d7f

25 Oct, 2011 7 commits

libceph: force resend of osd requests if we skip an osdmap · 38d6453c

Sage Weil authored 13 years ago

If we skip over one or more map epochs, we need to resend all osd requests
because it is possible they remapped to other servers and then back.
Signed-off-by: Sage Weil <sage@newdream.net>

38d6453c

ceph: use kernel DNS resolver · ee3b56f2

Noah Watkins authored 13 years ago


Change ceph_parse_ips to take either names given as
IP addresses or standard hostnames (e.g. localhost).
The DNS lookup is done using the dns_resolver facility
similar to its use in AFS, NFS, and CIFS.

This patch defines CONFIG_CEPH_LIB_USE_DNS_RESOLVER
that controls if this feature is on or off.
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>

ee3b56f2

ceph: fix ceph_monc_init memory leak · 49d9224c

Noah Watkins authored 13 years ago


failure clean up does not consider ceph_auth_init.
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>

49d9224c

libceph: warn on msg allocation failures · f0ed1b7c

Sage Weil authored 13 years ago


Any non-masked msg allocation failure should generate a warning and stack
trace to the console.  All of these need to eventually be replaced by
safe preallocation or msgpools.
Signed-off-by: Sage Weil <sage@newdream.net>

f0ed1b7c

libceph: don't complain on msgpool alloc failures · b61c2763

Sage Weil authored 13 years ago


The pool allocation failures are masked by the pool; there is no need to
spam the console about them.  (That's the whole point of having the pool
in the first place.)

Mark msg allocations whose failure is safely handled as such.
Signed-off-by: Sage Weil <sage@newdream.net>

b61c2763

libceph: always preallocate mon connection · f6a2f5be

Sage Weil authored 13 years ago


Allocate the mon connection on init.  We already reuse it across
reconnects.  Remove now unnecessary (and incomplete) NULL checks.
Signed-off-by: Sage Weil <sage@newdream.net>

f6a2f5be

libceph: create messenger with client · 6ab00d46

Sage Weil authored 13 years ago


This simplifies the init/shutdown paths, and makes client->msgr available
during the rest of the setup process.
Signed-off-by: Sage Weil <sage@newdream.net>

6ab00d46

28 Sep, 2011 2 commits

libceph: fix pg_temp mapping update · 8adc8b3d

Sage Weil authored 13 years ago


The incremental map updates have a record for each pg_temp mapping that is
to be add/updated (len > 0) or removed (len == 0).  The old code was
written as if the updates were a complete enumeration; that was just wrong.
Update the code to remove 0-length entries and drop the rbtree traversal.

This avoids misdirected (and hung) requests that manifest as server
errors like

[WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11
Signed-off-by: Sage Weil <sage@newdream.net>

8adc8b3d

libceph: fix pg_temp mapping calculation · 782e182e

Sage Weil authored 13 years ago


We need to apply the modulo pg_num calculation before looking up a pgid in
the pg_temp mapping rbtree.  This fixes pg_temp mappings, and fixes
(some) misdirected requests that result in messages like

[WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11

on the server and stall make the client block without getting a reply (at
least until the pg_temp mapping goes way, but that can take a long long
time).

Reorder calc_pg_raw() a bit to make more sense.
Signed-off-by: Sage Weil <sage@newdream.net>

782e182e

16 Sep, 2011 3 commits

libceph: fix linger request requeuing · 935b639a

Sage Weil authored 13 years ago


The r_req_lru_item list node moves between several lists, and that cycle
is not directly related (and does not begin) with __register_request().
Initialize it in the request constructor, not __register_request(). This
fixes later badness (below) when OSDs restart underneath an rbd mount.

Crashes we've seen due to this include:

[  213.974288] kernel BUG at net/ceph/messenger.c:2193!

and

[  144.035274] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[  144.035278] IP: [<ffffffffa036c053>] con_work+0x1463/0x2ce0 [libceph]
Signed-off-by: Sage Weil <sage@newdream.net>

935b639a

libceph: fix parse options memory leak · 1cad7893

Noah Watkins authored 13 years ago


ceph_destroy_options does not free opt->mon_addr that
is allocated in ceph_parse_options.
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>

1cad7893

libceph: initialize ack_stamp to avoid unnecessary connection reset · c0d5f9db

Jim Schutt authored 13 years ago

Commit 4cf9d544

 recorded when an outgoing ceph message was ACKed,
in order to avoid unnecessary connection resets when an OSD is busy.

However, ack_stamp is uninitialized, so there is a window between
when the message is sent and when it is ACKed in which handle_timeout()
interprets the unitialized value as an expired timeout, and resets
the connection unnecessarily.

Close the window by initializing ack_stamp.
Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Sage Weil <sage@newdream.net>

c0d5f9db

31 Aug, 2011 1 commit

libceph: fix leak of osd structs during shutdown · aca420bc

Sage Weil authored 13 years ago


We want to remove all OSDs, not just those on the idle LRU.
Signed-off-by: Sage Weil <sage@newdream.net>

aca420bc

09 Aug, 2011 1 commit

libceph: fix msgpool · 5185352c

Sage Weil authored 13 years ago


There were several problems here:

 1- we weren't tagging allocations with the pool, so they were never
    returned to the pool.
 2- msgpool_put didn't add back to the mempool, even it were called.
 3- msgpool_release didn't clear the pool pointer, so it would have looped
    had #1 not been broken.

These may or may not have been responsible for #1136 or #1381 (BUG due to
non-empty mempool on umount).  I can't seem to trigger the crash now using
the method I was using before.
Signed-off-by: Sage Weil <sage@newdream.net>

5185352c