Commits · c0d5f9db1c7d1b8a9e2f217706e8ea233bac2754 · OpenBMC Firmware / talos-obmc-linux

16 Sep, 2011 1 commit

libceph: initialize ack_stamp to avoid unnecessary connection reset · c0d5f9db

Jim Schutt authored 13 years ago

Commit 4cf9d544

 recorded when an outgoing ceph message was ACKed,
in order to avoid unnecessary connection resets when an OSD is busy.

However, ack_stamp is uninitialized, so there is a window between
when the message is sent and when it is ACKed in which handle_timeout()
interprets the unitialized value as an expired timeout, and resets
the connection unnecessarily.

Close the window by initializing ack_stamp.
Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Sage Weil <sage@newdream.net>

c0d5f9db

31 Aug, 2011 1 commit

libceph: fix leak of osd structs during shutdown · aca420bc

Sage Weil authored 13 years ago


We want to remove all OSDs, not just those on the idle LRU.
Signed-off-by: Sage Weil <sage@newdream.net>

aca420bc

09 Aug, 2011 1 commit

libceph: fix msgpool · 5185352c

Sage Weil authored 13 years ago


There were several problems here:

 1- we weren't tagging allocations with the pool, so they were never
    returned to the pool.
 2- msgpool_put didn't add back to the mempool, even it were called.
 3- msgpool_release didn't clear the pool pointer, so it would have looped
    had #1 not been broken.

These may or may not have been responsible for #1136 or #1381 (BUG due to
non-empty mempool on umount).  I can't seem to trigger the crash now using
the method I was using before.
Signed-off-by: Sage Weil <sage@newdream.net>

5185352c

26 Jul, 2011 1 commit

libceph: don't time out osd requests that haven't been received · 4cf9d544

Sage Weil authored 13 years ago

Keep track of when an outgoing message is ACKed (i.e., the server fully
received it and, presumably, queued it for processing). Time out OSD
requests only if it's been too long since they've been received.

This prevents timeouts and connection thrashing when the OSDs are simply
busy and are throttling the requests they read off the network.
Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

4cf9d544

19 Jul, 2011 1 commit

ceph: fix file mode calculation · 38be7a79

Sage Weil authored 13 years ago


open(2) must always include one of O_RDONLY, O_WRONLY, or O_RDWR.  No need
for any O_APPEND special case.

Passing O_WRONLY|O_RDWR is undefined according to the man page, but the
Linux VFS interprets this as O_RDWR, so we'll do the same.

This fixes open(2) with flags O_RDWR|O_APPEND, which was incorrectly being
translated to readonly.
Reported-by: Fyodor Ustinov <ufm@ufm.su>
Signed-off-by: Sage Weil <sage@newdream.net>

38be7a79

13 Jun, 2011 1 commit

libceph: fix page calculation for non-page-aligned io · 9bb0ce2b

Sage Weil authored 13 years ago


Set the page count correctly for non-page-aligned IO.  We were already
doing this correctly for alignment, but not the page count.  Fixes
DIRECT_IO writes from unaligned pages.
Signed-off-by: Sage Weil <sage@newdream.net>

9bb0ce2b

08 Jun, 2011 1 commit

ceph: fix sync vs canceled write · 25845472

Sage Weil authored 13 years ago


If we cancel a write, trigger the safe completions to prevent a sync from
blocking indefinitely in ceph_osdc_sync().
Signed-off-by: Sage Weil <sage@newdream.net>

25845472

24 May, 2011 2 commits

libceph: subscribe to osdmap when cluster is full · cd634fb6

Sage Weil authored 13 years ago

When the cluster is marked full, subscribe to subsequent map updates to
ensure we find out promptly when it is no longer full. This will prevent
us from spewing ENOSPC for (much) longer than necessary.
Signed-off-by: Sage Weil <sage@newdream.net>

cd634fb6

libceph: handle new osdmap down/state change encoding · 7662d8ff

Sage Weil authored 13 years ago


Old incrementals encode a 0 value (nearly always) when an osd goes down.
Change that to allow any state bit(s) to be flipped.  Special case 0 to
mean flip the CEPH_OSD_UP bit to mimic the old behavior.
Signed-off-by: Sage Weil <sage@newdream.net>

7662d8ff

19 May, 2011 8 commits

ceph: check return value for start_request in writepages · 9d6fcb08

Sage Weil authored 13 years ago


Since we pass the nofail arg, we should never get an error; BUG if we do.
(And fix the function to not return an error if __map_request fails.)
Signed-off-by: Sage Weil <sage@newdream.net>

9d6fcb08

libceph: add missing breaks in addr_set_port · a2a79609
Sage Weil authored 13 years ago
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
a2a79609

libceph: fix TAG_WAIT case · 04177882

Sage Weil authored 13 years ago


If we get a WAIT as a client something went wrong; error out.  And don't
fall through to an unrelated case.
Signed-off-by: Sage Weil <sage@newdream.net>

04177882

libceph: fix osdmap timestamp assignment · 31456665
Sage Weil authored 13 years ago
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
31456665
libceph: use snprintf for unknown addrs · 12a2f643
Sage Weil authored 13 years ago
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
12a2f643
libceph: use snprintf for formatting object name · 2dab036b
Sage Weil authored 13 years ago
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
2dab036b

libceph: fix uninitialized value when no get_authorizer method is set · e8f54ce1

Sage Weil authored 13 years ago

If there is no get_authorizer method we set the out_kvec to a bogus
pointer. The length is also zero in that case, so it doesn't much matter,
but it's better not to add the empty item in the first place.
Signed-off-by: Sage Weil <sage@newdream.net>

e8f54ce1

libceph: handle connection reopen race with callbacks · 0da5d703

Sage Weil authored 13 years ago


If a connection is closed and/or reopened (ceph_con_close, ceph_con_open)
it can race with a callback.  con_work does various state checks for
closed or reopened sockets at the beginning, but drops con->mutex before
making callbacks.  We need to check for state bit changes after retaking
the lock to ensure we restart con_work and execute those CLOSED/OPENING
tests or else we may end up operating under stale assumptions.

In Jim's case, this was causing 'bad tag' errors.

There are four cases where we re-take the con->mutex inside con_work: catch
them all and return EAGAIN from try_{read,write} so that we can restart
con_work.
Reported-by: Jim Schutt <jaschut@sandia.gov>
Tested-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Sage Weil <sage@newdream.net>

0da5d703

03 May, 2011 2 commits

libceph: fix ceph_osdc_alloc_request error checks · 4ad12621

Sage Weil authored 13 years ago


ceph_osdc_alloc_request returns NULL on failure.
Signed-off-by: Sage Weil <sage@newdream.net>

4ad12621

libceph: fix ceph_msg_new error path · ca20892d

Henry C Chang authored 13 years ago


If memory allocation failed, calling ceph_msg_put() will cause GPF
since some of ceph_msg variables are not initialized first.

Fix Bug #970.
Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>

ca20892d

06 Apr, 2011 1 commit

libceph: fix linger request requeueing · 77f38e0e

Sage Weil authored 14 years ago


Fix the request transition from linger -> normal request.  The key is to
preserve r_osd and requeue on the same OSD.  Reregister as a normal request,
add the request to the proper queues, then unregister the linger.  Fix the
unregister helper to avoid clearing r_osd (and also simplify the parallel
check in __unregister_request()).
Reported-by: Henry Chang <henry.cy.chang@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>

77f38e0e

31 Mar, 2011 1 commit

Fix common misspellings · 25985edc

Lucas De Marchi authored 14 years ago


Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

29 Mar, 2011 5 commits

libceph: Create a new key type "ceph". · 4b2a58ab

Tommi Virtanen authored 14 years ago


This allows us to use existence of the key type as a feature test,
from userspace.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>

4b2a58ab

libceph: Get secret from the kernel keys api when mounting with key=NAME. · e2c3d29b
Tommi Virtanen authored 14 years ago
```
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
```
e2c3d29b

ceph: Move secret key parsing earlier. · 8323c3aa

Tommi Virtanen authored 14 years ago


This makes the base64 logic be contained in mount option parsing,
and prepares us for replacing the homebew key management with the
kernel key retention service.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>

8323c3aa

libceph: fix null dereference when unregistering linger requests · fbdb9190

Sage Weil authored 14 years ago

We should only clear r_osd if we are neither registered as a linger or a
regular request. We may unregister as a linger while still registered as
a regular request (e.g., in reset_osd). Incorrectly clearing r_osd there
leads to a null pointer dereference in __send_request.

Also simplify the parallel check in __unregister_request() where we just
removed r_osd_item and know it's empty.
Signed-off-by: Sage Weil <sage@newdream.net>

fbdb9190

ceph: unlock on error in ceph_osdc_start_request() · 234af26f

Dan Carpenter authored 14 years ago


There was a missing unlock on the error path if __map_request() failed.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>

234af26f

26 Mar, 2011 1 commit

ceph: fix possible NULL pointer dereference · 6b0ae409

Mariusz Kozlowski authored 14 years ago


This patch fixes 'event_work' dereference before it is checked for NULL.
Signed-off-by: Mariusz Kozlowski <mk@lab.zgora.pl>
Signed-off-by: Sage Weil <sage@newdream.net>

6b0ae409

25 Mar, 2011 1 commit

ceph: flush msgr_wq during mds_client shutdown · ef550f6f

Sage Weil authored 14 years ago


The release method for mds connections uses a backpointer to the
mds_client, so we need to flush the workqueue of any pending work (and
ceph_connection references) prior to freeing the mds_client.  This fixes
an oops easily triggered under UML by

 while true ; do mount ... ; umount ... ; done

Also fix an outdated comment: the flush in ceph_destroy_client only flushes
OSD connections out.  This bug is basically an artifact of the ceph ->
ceph+libceph conversion.
Signed-off-by: Sage Weil <sage@newdream.net>

ef550f6f

22 Mar, 2011 1 commit

libceph: add lingering request and watch/notify event framework · a40c4f10

Yehuda Sadeh authored 14 years ago


Lingering requests are requests that are sent to the OSD normally but
tracked also after we get a successful request.  This keeps the OSD
connection open and resends the original request if the object moves to
another OSD.  The OSD can then send notification messages back to us
if another client initiates a notify.

This framework will be used by RBD so that the client gets notification
when a snapshot is created by another node or tool.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

a40c4f10

21 Mar, 2011 1 commit

libceph: fix osd request queuing on osdmap updates · 6f6c7006

Sage Weil authored 14 years ago


If we send a request to osd A, and the request's pg remaps to osd B and
then back to A in quick succession, we need to resend the request to A. The
old code was only calling kick_requests after processing all incremental
maps in a message, so it was very possible to not resend a request that
needed to be resent.  This would make the osd eventually time out (at least
with the current default of osd timeouts enabled).

The correct approach is to scan requests on every map incremental.  This
patch refactors the kick code in a few ways:
 - all requests are either on req_lru (in flight), req_unsent (ready to
   send), or req_notarget (currently map to no up osd)
 - mapping always done by map_request (previous map_osds)
 - if the mapping changes, we requeue.  requests are resent only after all
   map incrementals are processed.
 - some osd reset code is moved out of kick_requests into a separate
   function
 - the "kick this osd" functionality is moved to kick_osd_requests, as it
   is unrelated to scanning for request->pg->osd mapping changes
Signed-off-by: Sage Weil <sage@newdream.net>

6f6c7006

15 Mar, 2011 1 commit

libceph: Fix base64-decoding when input ends in newline. · b09734b1

Tommi Virtanen authored 14 years ago


It used to return -EINVAL because it thought the end was not aligned
to 4 bytes.

Clean up superfluous src < end test in if, the while itself guarantees
that.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>

b09734b1

04 Mar, 2011 3 commits

libceph: fix msgr standby handling · e00de341

Sage Weil authored 14 years ago


The standby logic used to be pretty dependent on the work requeueing
behavior that changed when we switched to WQ_NON_REENTRANT.  It was also
very fragile.

Restructure things so that:
 - We clear WRITE_PENDING when we set STANDBY.  This ensures we will
   requeue work when we wake up later.
 - con_work backs off if STANDBY is set.  There is nothing to do if we are
   in standby.
 - clear_standby() helper is called by both con_send() and con_keepalive(),
   the two actions that can wake us up again.  Move the connect_seq++
   logic here.
Signed-off-by: Sage Weil <sage@newdream.net>

e00de341

libceph: fix msgr keepalive flag · e76661d0

Sage Weil authored 14 years ago


There was some broken keepalive code using a dead variable.  Shift to using
the proper bit flag.
Signed-off-by: Sage Weil <sage@newdream.net>

e76661d0

libceph: fix msgr backoff · 60bf8bf8

Sage Weil authored 14 years ago

With commit f363e45f

 we replaced a bunch of hacky workqueue mutual
exclusion logic with the WQ_NON_REENTRANT flag.  One pieces of fallout is
that the exponential backoff breaks in certain cases:

 * con_work attempts to connect.
 * we get an immediate failure, and the socket state change handler queues
   immediate work.
 * con_work calls con_fault, we decide to back off, but can't queue delayed
   work.

In this case, we add a BACKOFF bit to make con_work reschedule delayed work
next time it runs (which should be immediately).
Signed-off-by: Sage Weil <sage@newdream.net>

60bf8bf8

03 Mar, 2011 2 commits

libceph: retry after authorization failure · 692d20f5

Sage Weil authored 14 years ago

If we mark the connection CLOSED we will give up trying to reconnect to
this server instance. That is appropriate for things like a protocol
version mismatch that won't change until the server is restarted, at which
point we'll get a new addr and reconnect. An authorization failure like
this is probably due to the server not properly rotating it's secret keys,
however, and should be treated as transient so that the normal backoff and
retry behavior kicks in.
Signed-off-by: Sage Weil <sage@newdream.net>

692d20f5

libceph: fix handling of short returns from get_user_pages · 38815b78

Sage Weil authored 14 years ago

get_user_pages() can return fewer pages than we ask for. We were returning
a bogus pointer/error code in that case. Instead, loop until we get all
the pages we want or get an error we can return to the caller.
Signed-off-by: Sage Weil <sage@newdream.net>

38815b78

25 Jan, 2011 2 commits

libceph: fix socket write error handling · 42961d23

Sage Weil authored 14 years ago


Pass errors from writing to the socket up the stack.  If we get -EAGAIN,
return 0 from the helper to simplify the callers' checks.
Signed-off-by: Sage Weil <sage@newdream.net>

42961d23

libceph: fix socket read error handling · 98bdb0aa

Sage Weil authored 14 years ago


If we get EAGAIN when trying to read from the socket, it is not an error.
Return 0 from the helper in this case to simplify the error handling cases
in the caller (indirectly, try_read).

Fix try_read to pass any error to it's caller (con_work) instead of almost
always returning 0.  This let's us respond to things like socket
disconnects.
Signed-off-by: Sage Weil <sage@newdream.net>

98bdb0aa

12 Jan, 2011 2 commits

net/ceph: make ceph_msgr_wq non-reentrant · f363e45f

Tejun Heo authored 14 years ago


ceph messenger code does a rather complex dancing around multithread
workqueue to make sure the same work item isn't executed concurrently
on different CPUs.  This restriction can be provided by workqueue with
WQ_NON_REENTRANT.

Make ceph_msgr_wq non-reentrant workqueue with the default concurrency
level and remove the QUEUED/BUSY logic.

* This removes backoff handling in con_work() but it couldn't reliably
  block execution of con_work() to begin with - queue_con() can be
  called after the work started but before BUSY is set.  It seems that
  it was an optimization for a rather cold path and can be safely
  removed.

* The number of concurrent work items is bound by the number of
  connections and connetions are independent from each other.  With
  the default concurrency level, different connections will be
  executed independently.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Sage Weil <sage@newdream.net>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Sage Weil <sage@newdream.net>

f363e45f

ceph: Always free allocated memory in osdmap_decode() · b0aee351

Jesper Juhl authored 14 years ago


Always free memory allocated to 'pi' in
net/ceph/osdmap.c::osdmap_decode().
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Sage Weil <sage@newdream.net>

b0aee351