1. 06 Jul, 2012 4 commits
  2. 22 Jun, 2012 2 commits
  3. 19 Jun, 2012 1 commit
  4. 15 Jun, 2012 2 commits
  5. 07 Jun, 2012 4 commits
  6. 06 Jun, 2012 11 commits
    • Alex Elder's avatar
      libceph: make ceph_con_revoke_message() a msg op · 8921d114
      Alex Elder authored
      
      ceph_con_revoke_message() is passed both a message and a ceph
      connection.  A ceph_msg allocated for incoming messages on a
      connection always has a pointer to that connection, so there's no
      need to provide the connection when revoking such a message.
      
      Note that the existing logic does not preclude the message supplied
      being a null/bogus message pointer.  The only user of this interface
      is the OSD client, and the only value an osd client passes is a
      request's r_reply field.  That is always non-null (except briefly in
      an error path in ceph_osdc_alloc_request(), and that drops the
      only reference so the request won't ever have a reply to revoke).
      So we can safely assume the passed-in message is non-null, but add a
      BUG_ON() to make it very obvious we are imposing this restriction.
      
      Rename the function ceph_msg_revoke_incoming() to reflect that it is
      really an operation on an incoming message.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: Sage Weil <...
      8921d114
    • Alex Elder's avatar
      libceph: make ceph_con_revoke() a msg operation · 6740a845
      Alex Elder authored
      
      ceph_con_revoke() is passed both a message and a ceph connection.
      Now that any message associated with a connection holds a pointer
      to that connection, there's no need to provide the connection when
      revoking a message.
      
      This has the added benefit of precluding the possibility of the
      providing the wrong connection pointer.  If the message's connection
      pointer is null, it is not being tracked by any connection, so
      revoking it is a no-op.  This is supported as a convenience for
      upper layers, so they can revoke a message that is not actually
      "in flight."
      
      Rename the function ceph_msg_revoke() to reflect that it is really
      an operation on a message, not a connection.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      6740a845
    • Alex Elder's avatar
      libceph: have messages take a connection reference · 92ce034b
      Alex Elder authored
      
      There are essentially two types of ceph messages: incoming and
      outgoing.  Outgoing messages are always allocated via ceph_msg_new(),
      and at the time of their allocation they are not associated with any
      particular connection.  Incoming messages are always allocated via
      ceph_con_in_msg_alloc(), and they are initially associated with the
      connection from which incoming data will be placed into the message.
      
      When an outgoing message gets sent, it becomes associated with a
      connection and remains that way until the message is successfully
      sent.  The association of an incoming message goes away at the point
      it is sent to an upper layer via a con->ops->dispatch method.
      
      This patch implements reference counting for all ceph messages, such
      that every message holds a reference (and a pointer) to a connection
      if and only if it is associated with that connection (as described
      above).
      
      
      For background, here is an explanation of the ceph message
      lifecycle, emphasizing when an association exists between a message
      and a connection.
      
      Outgoing Messages
      An outgoing message is "owned" by its allocator, from the time it is
      allocated in ceph_msg_new() up to the point it gets queued for
      sending in ceph_con_send().  Prior to that point the message's
      msg->con pointer is null; at the point it is queued for sending its
      message pointer is assigned to refer to the connection.  At that
      time the message is inserted into a connection's out_queue list.
      
      When a message on the out_queue list has been sent to the socket
      layer to be put on the wire, it is transferred out of that list and
      into the connection's out_sent list.  At that point it is still owned
      by the connection, and will remain so until an acknowledgement is
      received from the recipient that indicates the message was
      successfully transferred.  When such an acknowledgement is received
      (in process_ack()), the message is removed from its list (in
      ceph_msg_remove()), at which point it is no longer associated with
      the connection.
      
      So basically, any time a message is on one of a connection's lists,
      it is associated with that connection.  Reference counting outgoing
      messages can thus be done at the points a message is added to the
      out_queue (in ceph_con_send()) and the point it is removed from
      either its two lists (in ceph_msg_remove())--at which point its
      connection pointer becomes null.
      
      Incoming Messages
      When an incoming message on a connection is getting read (in
      read_partial_message()) and there is no message in con->in_msg,
      a new one is allocated using ceph_con_in_msg_alloc().  At that
      point the message is associated with the connection.  Once that
      message has been completely and successfully read, it is passed to
      upper layer code using the connection's con->ops->dispatch method.
      At that point the association between the message and the connection
      no longer exists.
      
      Reference counting of connections for incoming messages can be done
      by taking a reference to the connection when the message gets
      allocated, and releasing that reference when it gets handed off
      using the dispatch method.
      
      We should never fail to get a connection reference for a
      message--the since the caller should already hold one.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      92ce034b
    • Alex Elder's avatar
      libceph: have messages point to their connection · 38941f80
      Alex Elder authored
      
      When a ceph message is queued for sending it is placed on a list of
      pending messages (ceph_connection->out_queue).  When they are
      actually sent over the wire, they are moved from that list to
      another (ceph_connection->out_sent).  When acknowledgement for the
      message is received, it is removed from the sent messages list.
      
      During that entire time the message is "in the possession" of a
      single ceph connection.  Keep track of that connection in the
      message.  This will be used in the next patch (and is a helpful
      bit of information for debugging anyway).
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      38941f80
    • Alex Elder's avatar
      libceph: tweak ceph_alloc_msg() · 1c20f2d2
      Alex Elder authored
      
      The function ceph_alloc_msg() is only used to allocate a message
      that will be assigned to a connection's in_msg pointer.  Rename the
      function so this implied usage is more clear.
      
      In addition, make that assignment inside the function (again, since
      that's precisely what it's intended to be used for).  This allows us
      to return what is now provided via the passed-in address of a "skip"
      variable.  The return type is now Boolean to be explicit that there
      are only two possible outcomes.
      
      Make sure the result of an ->alloc_msg method call always sets the
      value of *skip properly.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      1c20f2d2
    • Alex Elder's avatar
      libceph: fully initialize connection in con_init() · 1bfd89f4
      Alex Elder authored
      
      Move the initialization of a ceph connection's private pointer,
      operations vector pointer, and peer name information into
      ceph_con_init().  Rearrange the arguments so the connection pointer
      is first.  Hide the byte-swapping of the peer entity number inside
      ceph_con_init()
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      1bfd89f4
    • Alex Elder's avatar
      libceph: init monitor connection when opening · 20581c1f
      Alex Elder authored
      
      Hold off initializing a monitor client's connection until just
      before it gets opened for use.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      20581c1f
    • Sage Weil's avatar
      libceph: drop connection refcounting for mon_client · ec87ef43
      Sage Weil authored
      
      All references to the embedded ceph_connection come from the msgr
      workqueue, which is drained prior to mon_client destruction.  That
      means we can ignore con refcounting entirely.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      Reviewed-by: default avatarAlex Elder <elder@inktank.com>
      ec87ef43
    • Alex Elder's avatar
      libceph: embed ceph connection structure in mon_client · 67130934
      Alex Elder authored
      
      A monitor client has a pointer to a ceph connection structure in it.
      This is the only one of the three ceph client types that do it this
      way; the OSD and MDS clients embed the connection into their main
      structures.  There is always exactly one ceph connection for a
      monitor client, so there is no need to allocate it separate from the
      monitor client structure.
      
      So switch the ceph_mon_client structure to embed its
      ceph_connection structure.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      67130934
    • Sage Weil's avatar
      libceph: use con get/put ops from osd_client · 0d47766f
      Sage Weil authored
      
      There were a few direct calls to ceph_con_{get,put}() instead of the con
      ops from osd_client.c.  This is a bug since those ops aren't defined to
      be ceph_con_get/put.
      
      This breaks refcounting on the ceph_osd structs that contain the
      ceph_connections, and could lead to all manner of strangeness.
      
      The purpose of the ->get and ->put methods in a ceph connection are
      to allow the connection to indicate it has a reference to something
      external to the messaging system, *not* to indicate something
      external has a reference to the connection.
      
      [elder@inktank.com: added that last sentence]
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      Reviewed-by: default avatarAlex Elder <elder@inktank.com>
      0d47766f
    • Alex Elder's avatar
      libceph: osd_client: don't drop reply reference too early · ab8cb34a
      Alex Elder authored
      
      In ceph_osdc_release_request(), a reference to the r_reply message
      is dropped.  But just after that, that same message is revoked if it
      was in use to receive an incoming reply.  Reorder these so we are
      sure we hold a reference until we're actually done with the message.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      ab8cb34a
  7. 02 Jun, 2012 1 commit
    • Linus Torvalds's avatar
      tty: Revert the tty locking series, it needs more work · f309532b
      Linus Torvalds authored
      This reverts the tty layer change to use per-tty locking, because it's
      not correct yet, and fixing it will require some more deep surgery.
      
      The main revert is d29f3ef3 ("tty_lock: Localise the lock"), but
      there are several smaller commits that built upon it, they also get
      reverted here. The list of reverted commits is:
      
        fde86d31 - tty: add lockdep annotations
        8f6576ad - tty: fix ldisc lock inversion trace
        d3ca8b64 - pty: Fix lock inversion
        b1d679af - tty: drop the pty lock during hangup
        abcefe5f - tty/amiserial: Add missing argument for tty_unlock()
        fd11b42e - cris: fix missing tty arg in wait_event_interruptible_tty call
        d29f3ef3
      
       - tty_lock: Localise the lock
      
      The revert had a trivial conflict in the 68360serial.c staging driver
      that got removed in the meantime.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f309532b
  8. 01 Jun, 2012 15 commits
    • Eric Dumazet's avatar
      tcp: reflect SYN queue_mapping into SYNACK packets · fff32699
      Eric Dumazet authored
      
      While testing how linux behaves on SYNFLOOD attack on multiqueue device
      (ixgbe), I found that SYNACK messages were dropped at Qdisc level
      because we send them all on a single queue.
      
      Obvious choice is to reflect incoming SYN packet @queue_mapping to
      SYNACK packet.
      
      Under stress, my machine could only send 25.000 SYNACK per second (for
      200.000 incoming SYN per second). NIC : ixgbe with 16 rx/tx queues.
      
      After patch, not a single SYNACK is dropped.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fff32699
    • Eric Dumazet's avatar
      tcp: do not create inetpeer on SYNACK message · 7433819a
      Eric Dumazet authored
      Another problem on SYNFLOOD/DDOS attack is the inetpeer cache getting
      larger and larger, using lots of memory and cpu time.
      
      tcp_v4_send_synack()
      ->inet_csk_route_req()
       ->ip_route_output_flow()
        ->rt_set_nexthop()
         ->rt_init_metrics()
          ->inet_getpeer( create = true)
      
      This is a side effect of commit a4daad6b
      
       (net: Pre-COW metrics for
      TCP) added in 2.6.39
      
      Possible solution :
      
      Instruct inet_csk_route_req() to remove FLOWI_FLAG_PRECOW_METRICS
      
      Before patch :
      
      # grep peer /proc/slabinfo
      inet_peer_cache   4175430 4175430    192   42    2 : tunables    0    0    0 : slabdata  99415  99415      0
      
      Samples: 41K of event 'cycles', Event count (approx.): 30716565122
      +  20,24%      ksoftirqd/0  [kernel.kallsyms]           [k] inet_getpeer
      +   8,19%      ksoftirqd/0  [kernel.kallsyms]           [k] peer_avl_rebalance.isra.1
      +   4,81%      ksoftirqd/0  [kernel.kallsyms]           [k] sha_transform
      +   3,64%      ksoftirqd/0  [kernel.kallsyms]           [k] fib_table_lookup
      +   2,36%      ksoftirqd/0  [ixgbe]                     [k] ixgbe_poll
      +   2,16%      ksoftirqd/0  [kernel.kallsyms]           [k] __ip_route_output_key
      +   2,11%      ksoftirqd/0  [kernel.kallsyms]           [k] kernel_map_pages
      +   2,11%      ksoftirqd/0  [kernel.kallsyms]           [k] ip_route_input_common
      +   2,01%      ksoftirqd/0  [kernel.kallsyms]           [k] __inet_lookup_established
      +   1,83%      ksoftirqd/0  [kernel.kallsyms]           [k] md5_transform
      +   1,75%      ksoftirqd/0  [kernel.kallsyms]           [k] check_leaf.isra.9
      +   1,49%      ksoftirqd/0  [kernel.kallsyms]           [k] ipt_do_table
      +   1,46%      ksoftirqd/0  [kernel.kallsyms]           [k] hrtimer_interrupt
      +   1,45%      ksoftirqd/0  [kernel.kallsyms]           [k] kmem_cache_alloc
      +   1,29%      ksoftirqd/0  [kernel.kallsyms]           [k] inet_csk_search_req
      +   1,29%      ksoftirqd/0  [kernel.kallsyms]           [k] __netif_receive_skb
      +   1,16%      ksoftirqd/0  [kernel.kallsyms]           [k] copy_user_generic_string
      +   1,15%      ksoftirqd/0  [kernel.kallsyms]           [k] kmem_cache_free
      +   1,02%      ksoftirqd/0  [kernel.kallsyms]           [k] tcp_make_synack
      +   0,93%      ksoftirqd/0  [kernel.kallsyms]           [k] _raw_spin_lock_bh
      +   0,87%      ksoftirqd/0  [kernel.kallsyms]           [k] __call_rcu
      +   0,84%      ksoftirqd/0  [kernel.kallsyms]           [k] rt_garbage_collect
      +   0,84%      ksoftirqd/0  [kernel.kallsyms]           [k] fib_rules_lookup
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7433819a
    • Al Viro's avatar
      sch_atm.c: get rid of poinless extern · d5836751
      Al Viro authored
      
      sockfd_lookup() is declared in linux/net.h, which is pulled by
      linux/skbuff.h (and needed for a lot of other stuff in sch_atm.c
      anyway).
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d5836751
    • Alex Elder's avatar
      libceph: set CLOSED state bit in con_init · a5988c49
      Alex Elder authored
      
      Once a connection is fully initialized, it is really in a CLOSED
      state, so make that explicit by setting the bit in its state field.
      
      It is possible for a connection in NEGOTIATING state to get a
      failure, leading to ceph_fault() and ultimately ceph_con_close().
      Clear that bits if it is set in that case, to reflect that the
      connection truly is closed and is no longer participating in a
      connect sequence.
      
      Issue a warning if ceph_con_open() is called on a connection that
      is not in CLOSED state.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      a5988c49
    • Alex Elder's avatar
      libceph: provide osd number when creating osd · e10006f8
      Alex Elder authored
      
      Pass the osd number to the create_osd() routine, and move the
      initialization of fields that depend on it therein.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      e10006f8
    • Alex Elder's avatar
      libceph: start tracking connection socket state · ce2c8903
      Alex Elder authored
      
      Start explicitly keeping track of the state of a ceph connection's
      socket, separate from the state of the connection itself.  Create
      placeholder functions to encapsulate the state transitions.
      
          --------
          | NEW* |  transient initial state
          --------
              | con_sock_state_init()
              v
          ----------
          | CLOSED |  initialized, but no socket (and no
          ----------  TCP connection)
           ^      \
           |       \ con_sock_state_connecting()
           |        ----------------------
           |                              \
           + con_sock_state_closed()       \
           |\                               \
           | \                               \
           |  -----------                     \
           |  | CLOSING |  socket event;       \
           |  -----------  await close          \
           |       ^                            |
           |       |                            |
           |       + con_sock_state_closing()   |
           |      / \                           |
           |     /   ---------------            |
           |    /                   \           v
           |   /                    --------------
           |  /    -----------------| CONNECTING |  socket created, TCP
           |  |   /                 --------------  connect initiated
           |  |   | con_sock_state_connected()
           |  |   v
          -------------
          | CONNECTED |  TCP connection established
          -------------
      
      Make the socket state an atomic variable, reinforcing that it's a
      distinct transtion with no possible "intermediate/both" states.
      This is almost certainly overkill at this point, though the
      transitions into CONNECTED and CLOSING state do get called via
      socket callback (the rest of the transitions occur with the
      connection mutex held).  We can back out the atomicity later.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: Sage Weil<sage@inktank.com>
      ce2c8903
    • Alex Elder's avatar
      libceph: start separating connection flags from state · 928443cd
      Alex Elder authored
      
      A ceph_connection holds a mixture of connection state (as in "state
      machine" state) and connection flags in a single "state" field.  To
      make the distinction more clear, define a new "flags" field and use
      it rather than the "state" field to hold Boolean flag values.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: Sage Weil<sage@inktank.com>
      928443cd
    • Alex Elder's avatar
      libceph: embed ceph messenger structure in ceph_client · 15d9882c
      Alex Elder authored
      
      A ceph client has a pointer to a ceph messenger structure in it.
      There is always exactly one ceph messenger for a ceph client, so
      there is no need to allocate it separate from the ceph client
      structure.
      
      Switch the ceph_client structure to embed its ceph_messenger
      structure.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarYehuda Sadeh <yehuda@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      15d9882c
    • Alex Elder's avatar
      libceph: rename kvec_reset and kvec_add functions · e2200423
      Alex Elder authored
      
      The functions ceph_con_out_kvec_reset() and ceph_con_out_kvec_add()
      are entirely private functions, so drop the "ceph_" prefix in their
      name to make them slightly more wieldy.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarYehuda Sadeh <yehuda@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      e2200423
    • Alex Elder's avatar
      libceph: rename socket callbacks · 327800bd
      Alex Elder authored
      
      Change the names of the three socket callback functions to make it
      more obvious they're specifically associated with a connection's
      socket (not the ceph connection that uses it).
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarYehuda Sadeh <yehuda@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      327800bd
    • Alex Elder's avatar
      libceph: kill bad_proto ceph connection op · 6384bb8b
      Alex Elder authored
      
      No code sets a bad_proto method in its ceph connection operations
      vector, so just get rid of it.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarYehuda Sadeh <yehuda@inktank.com>
      6384bb8b
    • Alex Elder's avatar
      libceph: eliminate connection state "DEAD" · e5e372da
      Alex Elder authored
      
      The ceph connection state "DEAD" is never set and is therefore not
      needed.  Eliminate it.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarYehuda Sadeh <yehuda@inktank.com>
      e5e372da
    • J. Bruce Fields's avatar
      nfsd4: move rq_flavor into svc_cred · d5497fc6
      J. Bruce Fields authored
      
      Move the rq_flavor into struct svc_cred, and use it in setclientid and
      exchange_id comparisons as well.
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      d5497fc6
    • J. Bruce Fields's avatar
      nfsd4: move principal name into svc_cred · 03a4e1f6
      J. Bruce Fields authored
      
      Instead of keeping the principal name associated with a request in a
      structure that's private to auth_gss and using an accessor function,
      move it to svc_cred.
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      03a4e1f6
    • J. Bruce Fields's avatar
      svcrpc: fix a comment typo · 3ddbe879
      J. Bruce Fields authored
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      3ddbe879