- 23 Nov, 2013 1 commit
-
-
Yan, Zheng authored
call __queue_cap_release() in __ceph_remove_cap(), this avoids acquiring s_cap_lock twice. Signed-off-by:
Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- 30 Sep, 2013 1 commit
-
-
Yan, Zheng authored
If client has outdated directory fragments information, it may request readdir an non-existent directory fragment. In this case, the MDS finds an approximate directory fragment and sends its contents back to the client. When receiving a reply with fragment that is different than the requested one, the client need to reset the 'readdir offset'. Signed-off-by:
Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- 06 Sep, 2013 1 commit
-
-
Yan, Zheng authored
commit 6f60f889 (ceph: fix freeing inode vs removing session caps race) introduced ceph_lookup_inode(). But there is already a ceph_find_inode() which provides similar function. So remove ceph_lookup_inode(), use ceph_find_inode() instead. Signed-off-by:
Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by:
Alex Elder <alex.elder@linary.org> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- 10 Aug, 2013 2 commits
-
-
Yan, Zheng authored
remove_session_caps() uses iterate_session_caps() to remove caps, but iterate_session_caps() skips inodes that are being deleted. So session->s_nr_caps can be non-zero after iterate_session_caps() return. We can fix the issue by waiting until deletions are complete. __wait_on_freeing_inode() is designed for the job, but it is not exported, so we use lookup inode function to access it. Signed-off-by:
Yan, Zheng <zheng.z.yan@intel.com>
-
Nathaniel Yazdani authored
When register_session() is given an out-of-range argument for mds, ceph_mdsmap_get_addr() will return a null pointer, which would be given to ceph_con_open() & be dereferenced, causing a kernel oops. This fixes bug #4685 in the Ceph bug tracker <http://tracker.ceph.com/issues/4685 >. Signed-off-by:
Nathaniel Yazdani <n1ght.4nd.d4y@gmail.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- 05 Jul, 2013 1 commit
-
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 03 Jul, 2013 3 commits
-
-
majianpeng authored
Signed-off-by:
Jianpeng Ma <majianpeng@gmail.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Yan, Zheng authored
Signed-off-by:
Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Yan, Zheng authored
Signed-off-by:
Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- 29 Jun, 2013 1 commit
-
-
Jeff Layton authored
Having a global lock that protects all of this code is a clear scalability problem. Instead of doing that, move most of the code to be protected by the i_lock instead. The exceptions are the global lists that the ->fl_link sits on, and the ->fl_block list. ->fl_link is what connects these structures to the global lists, so we must ensure that we hold those locks when iterating over or updating these lists. Furthermore, sound deadlock detection requires that we hold the blocked_list state steady while checking for loops. We also must ensure that the search and update to the list are atomic. For the checking and insertion side of the blocked_list, push the acquisition of the global lock into __posix_lock_file and ensure that checking and update of the blocked_list is done without dropping the lock in between. On the removal side, when waking up blocked lock waiters, take the global lock before walking the blocked list and dequeue the waiters from the global list prior to removal from the fl_block list. With this, deadlock detection should be race free while we minimize excessive file_lock_lock thrashing. Finally, in order to avoid a lock inversion problem when handling /proc/locks output we must ensure that manipulations of the fl_block list are also protected by the file_lock_lock. Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 17 May, 2013 2 commits
-
-
Jim Schutt authored
Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc() while holding a lock, but it's spoiled because ceph_pagelist_addpage() always calls kmap(), which might sleep. Here's the result: [13439.295457] ceph: mds0 reconnect start [13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58 [13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1 . . . [13439.376225] Call Trace: [13439.378757] [<ffffffff81076f4c>] __might_sleep+0xfc/0x110 [13439.384353] [<ffffffffa03f4ce0>] ceph_pagelist_append+0x120/0x1b0 [libceph] [13439.391491] [<ffffffffa0448fe9>] ceph_encode_locks+0x89/0x190 [ceph] [13439.398035] [<ffffffff814ee849>] ? _raw_spin_lock+0x49/0x50 [13439.403775] [<ffffffff811cadf5>] ? lock_flocks+0x15/0x20 [13439.409277] [<ffffffffa045e2af>] encode_caps_cb+0x41f/0x4a0 [ceph] [13439.415622] [<ffffffff81196748>] ? igrab+0x28/0x70 [13439.420610] [<ffffffffa045e9f8>] ? iterate_session_caps+0xe8/0x250 [ceph] [13439.427584] [<ffffffffa045ea25>] iterate_session_caps+0x115/0x250 [ceph] [13439.434499] [<ffffffffa045de90>] ? set_request_path_attr+0x2d0/0x2d0 [ceph] [13439.441646] [<ffffffffa0462888>] send_mds_reconnect+0x238/0x450 [ceph] [13439.448363] [<ffffffffa0464542>] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph] [13439.455250] [<ffffffffa0462e42>] check_new_map+0x352/0x500 [ceph] [13439.461534] [<ffffffffa04631ad>] ceph_mdsc_handle_map+0x1bd/0x260 [ceph] [13439.468432] [<ffffffff814ebc7e>] ? mutex_unlock+0xe/0x10 [13439.473934] [<ffffffffa043c612>] extra_mon_dispatch+0x22/0x30 [ceph] [13439.480464] [<ffffffffa03f6c2c>] dispatch+0xbc/0x110 [libceph] [13439.486492] [<ffffffffa03eec3d>] process_message+0x1ad/0x1d0 [libceph] [13439.493190] [<ffffffffa03f1498>] ? read_partial_message+0x3e8/0x520 [libceph] . . . [13439.587132] ceph: mds0 reconnect success [13490.720032] ceph: mds0 caps stale [13501.235257] ceph: mds0 recovery completed [13501.300419] ceph: mds0 caps renewed Fix it up by encoding locks into a buffer first, and when the number of encoded locks is stable, copy that into a ceph_pagelist. [elder@inktank.com: abbreviated the stack info a bit.] Cc: stable@vger.kernel.org # 3.4+ Signed-off-by:
Jim Schutt <jaschut@sandia.gov> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Jim Schutt authored
In his review, Alex Elder mentioned that he hadn't checked that num_fcntl_locks and num_flock_locks were properly decoded on the server side, from a le32 over-the-wire type to a cpu type. I checked, and AFAICS it is done; those interested can consult Locker::_do_cap_update() in src/mds/Locker.cc and src/include/encoding.h in the Ceph server code (git://github.com/ceph/ceph ). I also checked the server side for flock_len decoding, and I believe that also happens correctly, by virtue of having been declared __le32 in struct ceph_mds_cap_reconnect, in src/include/ceph_fs.h. Cc: stable@vger.kernel.org # 3.4+ Signed-off-by:
Jim Schutt <jaschut@sandia.gov> Reviewed-by:
Alex Elder <elder@inktank.com>
-
- 02 May, 2013 16 commits
-
-
Alex Elder authored
Change the names of the functions that put data on a pagelist to reflect that we're adding to whatever's already there rather than just setting it to the one thing. Currently only one data item is ever added to a message, but that's about to change. This resolves: http://tracker.ceph.com/issues/2770 Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Sage Weil authored
Use wrapper functions that check whether the auth op exists so that callers do not need a bunch of conditional checks. Simplifies the external interface. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Sage Weil authored
Currently the messenger calls out to a get_authorizer con op, which will create a new authorizer if it doesn't yet have one. In the meantime, when we rotate our service keys, the authorizer doesn't get updated. Eventually it will be rejected by the server on a new connection attempt and get invalidated, and we will then rebuild a new authorizer, but this is not ideal. Instead, if we do have an authorizer, call a new update_authorizer op that will verify that the current authorizer is using the latest secret. If it is not, we will build a new one that does. This avoids the transient failure. This fixes one of the sorry sequence of events for bug http://tracker.ceph.com/issues/4282 Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Yan, Zheng authored
Current ceph code tracks directory's completeness in two places. ceph_readdir() checks i_release_count to decide if it can set the I_COMPLETE flag in i_ceph_flags. All other places check the I_COMPLETE flag. This indirection introduces locking complexity. This patch adds a new variable i_complete_count to ceph_inode_info. Set i_release_count's value to it when marking a directory complete. By comparing the two variables, we know if a directory is complete Signed-off-by:
Yan, Zheng <zheng.z.yan@intel.com>
-
Alex Elder authored
Change it so we only assign outgoing data information for messages if there is outgoing data to send. This then allows us to add a few more (currently commented-out) assertions. This is related to: http://tracker.ceph.com/issues/4284 Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Greg Farnum <greg@inktank.com>
-
Alex Elder authored
Define ceph_msg_data_set_pagelist(), ceph_msg_data_set_bio(), and ceph_msg_data_set_trail() to clearly abstract the assignment of the remaining data-related fields in a ceph message structure. Use the new functions in the osd client and mds client. This partially resolves: http://tracker.ceph.com/issues/4263 Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
When setting page array information for message data, provide the byte length rather than the page count ceph_msg_data_set_pages(). Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
Define a function ceph_msg_data_set_pages(), which more clearly abstracts the assignment page-related fields for data in a ceph message structure. Use this new function in the osd client and mds client. Ideally, these fields would never be set more than once (with BUG_ON() calls to guarantee that). At the moment though the osd client sets these every time it receives a message, and in the event of a communication problem this can happen more than once. (This will be resolved shortly, but setting up these helpers first makes it all a bit easier to work with.) Rearrange the field order in a ceph_msg structure to group those that are used to define the possible data payloads. This partially resolves: http://tracker.ceph.com/issues/4263 Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Alex Elder authored
Currently, incoming mds messages never use page data, which means there is no need to set the page_alignment field in the message. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Greg Farnum <greg@inktank.com>
-
Alex Elder authored
The only user of the ceph messenger that doesn't define an alloc_msg method is the mds client. Define one, such that it works just like it did before, and simplify ceph_con_in_msg_alloc() by assuming the alloc_msg method is always present. This and the next patch resolve: http://tracker.ceph.com/issues/4322 Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Greg Farnum <greg@inktank.com>
-
Alex Elder authored
The pagelist_count field is never actually used, so get rid of it. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
Sage Weil authored
commit 22cddde1 breaks the atomicity of write operation, it also introduces a deadlock between write and truncate. Signed-off-by:
Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by:
Greg Farnum <greg@inktank.com> Conflicts: fs/ceph/addr.c
-
Yan, Zheng authored
commit c6ffe100 moved the flag that tracks if the dcache contents for a directory are complete to dentry. The problem is there are lots of places that use ceph_dir_{set,clear,test}_complete() while holding i_ceph_lock. but ceph_dir_{set,clear,test}_complete() may sleep because they call dput(). This patch basically reverts that commit. For ceph_d_prune(), it's called with both the dentry to prune and the parent dentry are locked. So it's safe to access the parent dentry's d_inode and clear I_COMPLETE flag. Signed-off-by:
Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by:
Greg Farnum <greg@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Yan, Zheng authored
So the client will later send cap release message to MDS Signed-off-by:
Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by:
Greg Farnum <greg@inktank.com>
-
Yan, Zheng authored
commit 6e8575fa makes parse_reply_info_extra() return -EIO for LSSNAP Signed-off-by:
Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by:
Greg Farnum <greg@inktank.com>
-
Alex Elder authored
Use distinct fields for tracking the number of pages in a message's page array and in a message's page list. Currently only one or the other is used at a time, but that will be changing soon. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
- 12 Feb, 2013 1 commit
-
-
Eric W. Biederman authored
Hold the uid and gid for a pending ceph mds request using the types kuid_t and kgid_t. When a request message is finally created convert the kuid_t and kgid_t values into uids and gids in the initial user namespace. Cc: Sage Weil <sage@inktank.com> Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com>
-
- 17 Jan, 2013 1 commit
-
-
Sam Lang authored
The mds now sends back a created inode if the create request performed the create. If the file already existed, no inode is returned in the reply. This allows ceph to set the created flag in atomic_open so that permissions are properly checked in the case that the file wasn't created by the create call to the mds. To ensure compability with previous kernels, a feature for sending back the inode in the create reply was added, so that the mds will only send back the inode if the client indicates it supports the feature. Signed-off-by:
Sam Lang <sam.lang@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- 13 Dec, 2012 1 commit
-
-
Yan, Zheng authored
__wake_requests() will enter infinite loop if we use it to wake requests in the session->s_waiting list. __wake_requests() deletes requests from the list and __do_request() adds requests back to the list. Signed-off-by:
Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by:
Sage Weil <sage@inktank.com>
-
- 26 Oct, 2012 1 commit
-
-
David Zafman authored
set_request_path_attr() checks for NULL ptr before calling strlen() This fixes http://tracker.newdream.net/issues/3404 Signed-off-by:
David Zafman <david.zafman@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- 01 Oct, 2012 1 commit
-
-
Yan, Zheng authored
When i >= newmap->m_max_mds, ceph_mdsmap_get_addr(newmap, i) return NULL. Passing NULL to memcmp() triggers oops. Signed-off-by:
Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by:
Sage Weil <sage@inktank.com>
-
- 31 Jul, 2012 1 commit
-
-
Sage Weil authored
When we detect a mds session reset, close the old ceph_connection before reopening it. This ensures we clean up the old socket properly and keep the ceph_connection state correct. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Yehuda Sadeh <yehuda@inktank.com>
-
- 30 Jul, 2012 2 commits
-
-
Sage Weil authored
This is simply cleanup that will keep things more closely synced with the userland code. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Yehuda Sadeh <yehuda@inktank.com>
-
Sage Weil authored
d_parent is never NULL, and IS_ROOT() is the proper way to check for a (non-self-referential) parent. Reported-by:
Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by:
Sage Weil <sage@inktank.com>
-
- 06 Jul, 2012 1 commit
-
-
Sage Weil authored
The peer name may change on each open attempt, even when the connection is reused. Signed-off-by:
Sage Weil <sage@inktank.com>
-
- 06 Jun, 2012 1 commit
-
-
Alex Elder authored
Move the initialization of a ceph connection's private pointer, operations vector pointer, and peer name information into ceph_con_init(). Rearrange the arguments so the connection pointer is first. Hide the byte-swapping of the peer entity number inside ceph_con_init() Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- 01 Jun, 2012 1 commit
-
-
Alex Elder authored
A ceph client has a pointer to a ceph messenger structure in it. There is always exactly one ceph messenger for a ceph client, so there is no need to allocate it separate from the ceph client structure. Switch the ceph_client structure to embed its ceph_messenger structure. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Yehuda Sadeh <yehuda@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- 17 May, 2012 1 commit
-
-
Alex Elder authored
Rather than passing a bunch of arguments to be filled in with the content of the ceph_auth_handshake buffer now returned by the get_authorizer method, just use the returned information in the caller, and drop the unnecessary arguments. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-