Another technique for avoiding locking which is used fairly widely is to duplicate information for each CPU. For example, if you wanted to keep a count of a common condition, you could use a spin lock and a single counter. Nice and simple.
      If that was too slow (it's usually not, but if you've got a
      really big machine to test on and can show that it is), you
      could instead use a counter for each CPU, then none of them need
      an exclusive lock.  See DEFINE_PER_CPU(),
      get_cpu_var() and
      put_cpu_var()
      (include/linux/percpu.h).
    
      Of particular use for simple per-cpu counters is the
      local_t type, and the
      cpu_local_inc() and related functions,
      which are more efficient than simple code on some architectures
      (include/asm/local.h).
    
Note that there is no simple, reliable way of getting an exact value of such a counter, without introducing more locks. This is not a problem for some uses.