1. 19 Jan, 2017 2 commits
  2. 18 Jan, 2017 1 commit
  3. 16 Jan, 2017 3 commits
    • Masami Hiramatsu's avatar
      perf probe: Fix to probe on gcc generated functions in modules · 613f050d
      Masami Hiramatsu authored
      
      Fix to probe on gcc generated functions on modules. Since
      probing on a module is based on its symbol name, it should
      be adjusted on actual symbols.
      
      E.g. without this fix, perf probe shows probe definition
      on non-exist symbol as below.
      
        $ perf probe -m build-x86_64/net/netfilter/nf_nat.ko -F in_range*
        in_range.isra.12
        $ perf probe -m build-x86_64/net/netfilter/nf_nat.ko -D in_range
        p:probe/in_range nf_nat:in_range+0
      
      With this fix, perf probe correctly shows a probe on
      gcc-generated symbol.
      
        $ perf probe -m build-x86_64/net/netfilter/nf_nat.ko -D in_range
        p:probe/in_range nf_nat:in_range.isra.12+0
      
      This also fixes same problem on online module as below.
      
        $ perf probe -m i915 -D assert_plane
        p:probe/assert_plane i915:assert_plane.constprop.134+0
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/148411450673.9978.14905987549651656075.stgit@devbox
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      613f050d
    • Masami Hiramatsu's avatar
      perf probe: Add error checks to offline probe post-processing · 3e96dac7
      Masami Hiramatsu authored
      
      Add error check codes on post processing and improve it for offline
      probe events as:
      
       - post processing fails if no matched symbol found in map(-ENOENT)
         or strdup() failed(-ENOMEM).
      
       - Even if the symbol name is the same, it updates symbol address
         and offset.
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/148411443738.9978.4617979132625405545.stgit@devbox
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3e96dac7
    • Masami Hiramatsu's avatar
      perf probe: Fix to show correct locations for events on modules · d2d4edbe
      Masami Hiramatsu authored
      
      Fix to show correct locations for events on modules by relocating given
      address instead of retrying after failure.
      
      This happens when the module text size is big enough, bigger than
      sh_addr, because the original code retries with given address + sh_addr
      if it failed to find CU DIE at the given address.
      
      Any address smaller than sh_addr always fails and it retries with the
      correct address, but addresses bigger than sh_addr will get a CU DIE
      which is on the given address (not adjusted by sh_addr).
      
      In my environment(x86-64), the sh_addr of ".text" section is 0x10030.
      Since i915 is a huge kernel module, we can see this issue as below.
      
        $ grep "[Tt] .*\[i915\]" /proc/kallsyms | sort | head -n1
        ffffffffc0270000 t i915_switcheroo_can_switch	[i915]
      
      ffffffffc0270000 + 0x10030 = ffffffffc0280030, so we'll check
      symbols cross this boundary.
      
        $ grep "[Tt] .*\[i915\]" /proc/kallsyms | grep -B1 ^ffffffffc028\
        | head -n 2
        ffffffffc027ff80 t haswell_init_clock_gating	[i915]
        ffffffffc0280110 t valleyview_init_clock_gating	[i915]
      
      So setup probes on both function and see what happen.
      
        $ sudo ./perf probe -m i915 -a haswell_init_clock_gating \
              -a valleyview_init_clock_gating
        Added new events:
          probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915)
          probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e probe:valleyview_init_clock_gating -aR sleep 1
      
        $ sudo ./perf probe -l
          probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
          probe:valleyview_init_clock_gating (on i915_vga_set_decode:4@gpu/drm/i915/i915_drv.c in i915)
      
      As you can see, haswell_init_clock_gating is correctly shown,
      but valleyview_init_clock_gating is not.
      
      With this patch, both events are shown correctly.
      
        $ sudo ./perf probe -l
          probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
          probe:valleyview_init_clock_gating (on valleyview_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
      
      Committer notes:
      
      In my case:
      
        # perf probe -m i915 -a haswell_init_clock_gating -a valleyview_init_clock_gating
        Added new events:
          probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915)
          probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915)
      
        You can now use it in all perf tools, such as:
      
      	  perf record -e probe:valleyview_init_clock_gating -aR sleep 1
      
        # perf probe -l
          probe:haswell_init_clock_gating (on i915_getparam+432@gpu/drm/i915/i915_drv.c in i915)
          probe:valleyview_init_clock_gating (on __i915_printk+240@gpu/drm/i915/i915_drv.c in i915)
        #
      
        # readelf -SW /lib/modules/4.9.0+/build/vmlinux | egrep -w '.text|Name'
         [Nr] Name   Type      Address          Off    Size   ES Flg Lk Inf Al
         [ 1] .text  PROGBITS  ffffffff81000000 200000 822fd3 00  AX  0   0 4096
        #
      
        So both are b0rked, now with the fix:
      
        # perf probe -m i915 -a haswell_init_clock_gating -a valleyview_init_clock_gating
        Added new events:
          probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915)
          probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915)
      
        You can now use it in all perf tools, such as:
      
      	perf record -e probe:valleyview_init_clock_gating -aR sleep 1
      
        # perf probe -l
          probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
          probe:valleyview_init_clock_gating (on valleyview_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
        #
      
      Both looks correct.
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/148411436777.9978.1440275861947194930.stgit@devbox
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d2d4edbe
  4. 11 Jan, 2017 1 commit
  5. 05 Jan, 2017 4 commits
  6. 04 Jan, 2017 2 commits
  7. 03 Jan, 2017 6 commits
  8. 02 Jan, 2017 1 commit
    • Masami Hiramatsu's avatar
      perf probe: Fix to get correct modname from elf header · 1f2ed153
      Masami Hiramatsu authored
      
      Since 'perf probe' supports cross-arch probes, it is possible to analyze
      different arch kernel image which has different bits-per-long.
      
      In that case, it fails to get the module name because it uses the
      MOD_NAME_OFFSET macro based on the host machine bits-per-long, instead
      of the target arch bits-per-long.
      
      This fixes above issue by changing modname-offset based on the target
      archs bit width. This is ok because linux kernel uses LP64 model on
      64bit arch.
      
      E.g. without this (on x86_64, and target module is arm32):
      
        $ perf probe -m build-arm/fs/configfs/configfs.ko -D configfs_lookup
        p:probe/configfs_lookup :configfs_lookup+0
                                ^-Here is an empty module name.
      
      With this fix, you can see correct module name:
      
        $ perf probe -m build-arm/fs/configfs/configfs.ko -D configfs_lookup
        p:probe/configfs_lookup configfs:configfs_lookup+0
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/148337043836.6752.383495516397005695.stgit@devbox
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1f2ed153
  9. 28 Dec, 2016 1 commit
  10. 27 Dec, 2016 2 commits
  11. 24 Dec, 2016 2 commits
    • Len Brown's avatar
      tools/power turbostat: remove obsolete -M, -m, -C, -c options · 6886fee4
      Len Brown authored
      
      The new --add option has replaced the -M, -m, -C, -c options
      Eg.
      
      -M 0x10 is now --add msr0x10,raw
      -m 0x10 is now --add msr0x10,raw,u32
      -C 0x10 is now --add msr0x10,delta
      -c 0x10 is now --add msr0x10,delta,u32
      
      The --add option can be repeated to add any number of counters,
      while the previous options were limited to adding one of each type.
      
      In addition, the --add option can accept a column label,
      and can also display a counter as a percentage of elapsed cycles.
      
      Eg. --add msr0x3fe,core,percent,MY_CC3
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      6886fee4
    • Len Brown's avatar
      tools/power turbostat: Make extensible via the --add parameter · 388e9c81
      Len Brown authored
      
      Create the "--add" parameter.  This can be used to teach an existing
      turbostat binary about any number of any type of counter.
      
      turbostat(8) details the syntax for --add.
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      388e9c81
  12. 22 Dec, 2016 4 commits
    • Namhyung Kim's avatar
      perf sched timehist: Fix invalid period calculation · bdd75729
      Namhyung Kim authored
      
      When --time option is given with a value outside recorded time, the last
      sample time (tprev) was set to that value and run time calculation might
      be incorrect.  This is a problem of the first samples for each cpus
      since it would skip the runtime update when tprev is 0.  But with --time
      option it had non-zero (which is invalid) value so the calculation is
      also incorrect.
      
      For example, let's see the followging:
      
        $ perf sched timehist
                   time    cpu  task name                       wait time  sch delay   run time
                                [tid/pid]                          (msec)     (msec)     (msec)
        --------------- ------  ------------------------------  ---------  ---------  ---------
            3195.968367 [0003]  <idle>                              0.000      0.000      0.000
            3195.968386 [0002]  Timer[4306/4277]                    0.000      0.000      0.018
            3195.968397 [0002]  Web Content[4277]                   0.000      0.000      0.000
            3195.968595 [0001]  JS Helper[4302/4277]                0.000      0.000      0.000
            3195.969217 [0000]  <idle>                              0.000      0.000      0.621
            3195.969251 [0001]  kworker/1:1H[291]                   0.000      0.000      0.033
      
      The sample starts at 3195.968367 but when I gave a time interval from
      3194 to 3196 (in sec) it will calculate the whole 2 second as runtime.
      In below, 2 cpus accounted it as runtime, other 2 cpus accounted it as
      idle time.
      
      Before:
      
        $ perf sched timehist --time 3194,3196 -s | tail
        Idle stats:
            CPU  0 idle for   1995.991  msec
            CPU  1 idle for     20.793  msec
            CPU  2 idle for     30.191  msec
            CPU  3 idle for   1999.852  msec
      
            Total number of unique tasks: 23
        Total number of context switches: 128
                   Total run time (msec): 3724.940
      
      After:
      
        $ perf sched timehist --time 3194,3196 -s | tail
        Idle stats:
            CPU  0 idle for     10.811  msec
            CPU  1 idle for     20.793  msec
            CPU  2 idle for     30.191  msec
            CPU  3 idle for     18.337  msec
      
            Total number of unique tasks: 23
        Total number of context switches: 128
                   Total run time (msec): 18.139
      
      Committer notes:
      
      Further testing:
      
      Before:
      
        Idle stats:
            CPU  0 idle for    229.785  msec
            CPU  1 idle for    937.944  msec
            CPU  2 idle for    188.931  msec
            CPU  3 idle for    986.185  msec
      
        After:
      
        # perf sched timehist --time 40602,40603 -s | tail
      
        Idle stats:
            CPU  0 idle for    229.785  msec
            CPU  1 idle for    175.407  msec
            CPU  2 idle for    188.931  msec
            CPU  3 idle for    223.657  msec
      
            Total number of unique tasks: 68
        Total number of context switches: 814
                   Total run time (msec): 97.688
      
        # for cpu in `seq 0 3` ; do echo -n "CPU $cpu idle for " ; perf sched timehist --time 40602,40603 | grep "\[000${cpu}\].*\<idle\>" | tr -s ' ' | cut -d' ' -f7 | awk '{entries++ ; s+=$1} END {print s " msec (entries: " entries ")"}' ; done
        CPU 0 idle for 229.721 msec (entries: 123)
        CPU 1 idle for 175.381 msec (entries: 65)
        CPU 2 idle for 188.903 msec (entries: 56)
        CPU 3 idle for 223.61 msec (entries: 102)
      
      Difference due to the idle stats being accounted at nanoseconds precision while
      the <idle> entries in 'perf sched timehist' are trucated at msec.usec.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Fixes: 853b7407 ("perf sched timehist: Add option to specify time window of interest")
      Link: http://lkml.kernel.org/r/20161222060350.17655-2-namhyung@kernel.org
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bdd75729
    • Namhyung Kim's avatar
      perf sched timehist: Remove hardcoded 'comm_width' check at print_summary · 4fa0d1aa
      Namhyung Kim authored
      
      Now that the default 'comm_width' value is 30, no need to check that at
      print_summary,
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
      
      
      [ Split from a larger patch ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4fa0d1aa
    • Namhyung Kim's avatar
      perf sched timehist: Enlarge default 'comm_width' · 9b8087d7
      Namhyung Kim authored
      
      Current default value is 20 but it's easily changed to a bigger value as
      task has a long name and different tid and pid.  And it makes the output
      not aligned.  So change it to have a large value as summary shows.
      
      Committer notes:
      
      Before:
      
        # perf sched record
        ^C
        # perf sched timehist
        <SNIP>
          40602.770537 [0001]  rcuos/2[29]               7.970      0.002      0.020
          40602.771512 [0003]  <idle>                    0.003      0.000      0.986
          40602.771586 [0001]  <idle>                    0.020      0.000      1.049
          40602.771606 [0001]  qemu-system-x86[3593/3510]      0.000      0.002      0.020
          40602.771629 [0003]  qemu-system-x86[3510]           0.000      0.003      0.116
          40602.771776 [0000]  <idle>                          0.001      0.000      1.892
        <SNIP>
      
      After:
      
        # perf sched timehist
        <SNIP>
         40602.770537 [0001]  rcuos/2[29]                         7.970      0.002      0.020
         40602.771512 [0003]  <idle>                              0.003      0.000      0.986
         40602.771586 [0001]  <idle>                              0.020      0.000      1.049
         40602.771606 [0001]  qemu-system-x86[3593/3510]          0.000      0.002      0.020
         40602.771629 [0003]  qemu-system-x86[3510]               0.000      0.003      0.116
        <SNIP>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
      
      
      [ Split from a larger patch ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9b8087d7
    • Namhyung Kim's avatar
      perf sched timehist: Honour 'comm_width' when aligning the headers · 0e6758e8
      Namhyung Kim authored
      
      Current default value is 20, but that may change in the future, so make
      places where we have 20 hardcoded use 'comm_width'.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
      
      
      [ Split from a larger patch ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0e6758e8
  13. 20 Dec, 2016 5 commits
  14. 17 Dec, 2016 2 commits
    • Daniel Borkmann's avatar
      bpf, test_verifier: fix a test case error result on unprivileged · 0eb6984f
      Daniel Borkmann authored
      Running ./test_verifier as unprivileged lets 1 out of 98 tests fail:
      
        [...]
        #71 unpriv: check that printk is disallowed FAIL
        Unexpected error message!
        0: (7a) *(u64 *)(r10 -8) = 0
        1: (bf) r1 = r10
        2: (07) r1 += -8
        3: (b7) r2 = 8
        4: (bf) r3 = r1
        5: (85) call bpf_trace_printk#6
        unknown func bpf_trace_printk#6
        [...]
      
      The test case is correct, just that the error outcome changed with
      ebb676da ("bpf: Print function name in addition to function id").
      Same as with e00c7b21 ("bpf: fix multiple issues in selftest suite
      and samples") issue 2), so just fix up the function name.
      
      Fixes: ebb676da
      
       ("bpf: Print function name in addition to function id")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0eb6984f
    • Daniel Borkmann's avatar
      bpf: fix regression on verifier pruning wrt map lookups · a08dd0da
      Daniel Borkmann authored
      Commit 57a09bf0 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL
      registers") introduced a regression where existing programs stopped
      loading due to reaching the verifier's maximum complexity limit,
      whereas prior to this commit they were loading just fine; the affected
      program has roughly 2k instructions.
      
      What was found is that state pruning couldn't be performed effectively
      anymore due to mismatches of the verifier's register state, in particular
      in the id tracking. It doesn't mean that 57a09bf0 is incorrect per
      se, but rather that verifier needs to perform a lot more work for the
      same program with regards to involved map lookups.
      
      Since commit 57a09bf0 is only about tracking registers with type
      PTR_TO_MAP_VALUE_OR_NULL, the id is only needed to follow registers
      until they are promoted through pattern matching with a NULL check to
      either PTR_TO_MAP_VALUE or UNKNOWN_VALUE type. After that point, the
      id becomes irrelevant for the transitioned types.
      
      For UNKNOWN_VALUE, id is already reset to 0 via mark_reg_unknown_value(),
      but not so for PTR_TO_MAP_VALUE where id is becoming stale. It's even
      transferred further into other types that don't make use of it. Among
      others, one example is where UNKNOWN_VALUE is set on function call
      return with RET_INTEGER return type.
      
      states_equal() will then fall through the memcmp() on register state;
      note that the second memcmp() uses offsetofend(), so the id is part of
      that since d2a4dd37 ("bpf: fix state equivalence"). But the bisect
      pointed already to 57a09bf0, where we really reach beyond complexity
      limit. What I found was that states_equal() often failed in this
      case due to id mismatches in spilled regs with registers in type
      PTR_TO_MAP_VALUE. Unlike non-spilled regs, spilled regs just perform
      a memcmp() on their reg state and don't have any other optimizations
      in place, therefore also id was relevant in this case for making a
      pruning decision.
      
      We can safely reset id to 0 as well when converting to PTR_TO_MAP_VALUE.
      For the affected program, it resulted in a ~17 fold reduction of
      complexity and let the program load fine again. Selftest suite also
      runs fine. The only other place where env->id_gen is used currently is
      through direct packet access, but for these cases id is long living, thus
      a different scenario.
      
      Also, the current logic in mark_map_regs() is not fully correct when
      marking NULL branch with UNKNOWN_VALUE. We need to cache the destination
      reg's id in any case. Otherwise, once we marked that reg as UNKNOWN_VALUE,
      it's id is reset and any subsequent registers that hold the original id
      and are of type PTR_TO_MAP_VALUE_OR_NULL won't be marked UNKNOWN_VALUE
      anymore, since mark_map_reg() reuses the uncached regs[regno].id that
      was just overridden. Note, we don't need to cache it outside of
      mark_map_regs(), since it's called once on this_branch and the other
      time on other_branch, which are both two independent verifier states.
      A test case for this is added here, too.
      
      Fixes: 57a09bf0
      
       ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a08dd0da
  15. 15 Dec, 2016 4 commits