Use sed:
sed 's/.$//'
Thursday, March 24, 2011
Tuesday, March 15, 2011
Where to find multi-core processor core mapping information
My OS is Fedora 13.
/sys/devices/system/cpu/cpu0/topology/core_siblings_list shows which CPUID are siblings;
/sys/devices/system/cpu/cpu0/topology/thread_siblings_list shows which CPUID are multithreading/hyperthreading siblings (i.e. virtual processors that share the same physical core).
/sys/devices/system/cpu/cpu0/cache/index/* contains all the cache information. Take my Intel Nehalem Xeon E5520 as an example:
/sys/devices/system/cpu/cpu0/cache/index0/level shows this is a L1 cache;
/sys/devices/system/cpu/cpu0/cache/index0/type shows this is a data cache;
/sys/devices/system/cpu/cpu0/cache/index0/size shows the cache size is 32KB.
Similarly,
/sys/devices/system/cpu/cpu0/cache/index1/ describes the L1 Icache;
/sys/devices/system/cpu/cpu0/cache/index2/ depicts the L2 unified cache;
/sys/devices/system/cpu/cpu0/cache/index3/ is for the L3 unified shared cache.
/sys/devices/system/cpu/cpu0/topology/core_siblings_list shows which CPUID are siblings;
/sys/devices/system/cpu/cpu0/topology/thread_siblings_list shows which CPUID are multithreading/hyperthreading siblings (i.e. virtual processors that share the same physical core).
/sys/devices/system/cpu/cpu0/cache/index/* contains all the cache information. Take my Intel Nehalem Xeon E5520 as an example:
/sys/devices/system/cpu/cpu0/cache/index0/level shows this is a L1 cache;
/sys/devices/system/cpu/cpu0/cache/index0/type shows this is a data cache;
/sys/devices/system/cpu/cpu0/cache/index0/size shows the cache size is 32KB.
Similarly,
/sys/devices/system/cpu/cpu0/cache/index1/ describes the L1 Icache;
/sys/devices/system/cpu/cpu0/cache/index2/ depicts the L2 unified cache;
/sys/devices/system/cpu/cpu0/cache/index3/ is for the L3 unified shared cache.
Wednesday, February 2, 2011
GNU compiler "-ffloat-store" option
-ffloat-store
- Do not store floating point variables in registers, and inhibit other options that might change whether a floating point value is taken from a register or memory.
This option prevents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a
double
is supposed to have. Similarly for the x86 architecture. For most programs, the excess precision does only good, but a few programs rely on the precise definition of IEEE floating point. Use-ffloat-store
for such programs, after modifying them to store all pertinent intermediate computations into variables.
Thursday, January 6, 2011
Oprofile performance counter events for Intel Nehalem processor
Some common performance counter events:
(Updates) LLC_MISSES is not well-documented by Intel. It seems to include L2 cache misses. Instead, one can use MEM_LOAD_RETIRED:0x10 to collect the number of retired loads that miss the last level cache. My measurement showed that LLC_MISSES can be ten times larger than MEM_LOAD_RETIRED:0x10.
Other useful metrics:
MEM_INST_RETIRED:0x01, the number of instructions with an architecturally-visible load retired on the architected path;
MEM_LOAD_RETIRED:0x04, llc_unshared_hit, the number of retired loads that hit their own, unshared lines in the LLC cache;
MEM_LOAD_RETIRED:0x08, other_core_l2_hit_hitm, the number of retired loads that hit in a sibling core's L2 (on die core);
MEM_LOAD_RETIRED:0x80, dtlb_miss, the number of retired loads that missed the DTLB;
MEM_UNCORE_RETIRED:0x08, remote_cache_local_home_hit, the number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and HIT in a remote socket's cache;
MEM_UNCORE_RETIRED:0x10, remote_dram, the number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and was remotely homed (dram);
MEM_UNCORE_RETIRED:0x20, local_dram, the number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and required a local socket memory reference (dram);
Name | Description | Counters usable | Unit mask options |
CPU_CLK_UNHALTED | Clock cycles when not halted | all | |
UNHALTED_REFERENCE_CYCLES | Unhalted reference cycles | 0, 1, 2 | 0x01: No unit mask |
LLC_MISSES | Last level cache demand requests from this core that missed the LLC | all | 0x41: No unit mask |
LLC_REFS | Last level cache demand requests from this core | all | 0x4f: No unit mask |
(Updates) LLC_MISSES is not well-documented by Intel. It seems to include L2 cache misses. Instead, one can use MEM_LOAD_RETIRED:0x10 to collect the number of retired loads that miss the last level cache. My measurement showed that LLC_MISSES can be ten times larger than MEM_LOAD_RETIRED:0x10.
Other useful metrics:
MEM_INST_RETIRED:0x01, the number of instructions with an architecturally-visible load retired on the architected path;
MEM_LOAD_RETIRED:0x04, llc_unshared_hit, the number of retired loads that hit their own, unshared lines in the LLC cache;
MEM_LOAD_RETIRED:0x08, other_core_l2_hit_hitm, the number of retired loads that hit in a sibling core's L2 (on die core);
MEM_LOAD_RETIRED:0x80, dtlb_miss, the number of retired loads that missed the DTLB;
MEM_UNCORE_RETIRED:0x08, remote_cache_local_home_hit, the number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and HIT in a remote socket's cache;
MEM_UNCORE_RETIRED:0x10, remote_dram, the number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and was remotely homed (dram);
MEM_UNCORE_RETIRED:0x20, local_dram, the number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and required a local socket memory reference (dram);
Subscribe to:
Posts (Atom)