Monday 8 September 2014

linux kernel locking machanism

Linux kernel provides following type to lockings: 

1. Atomic operations:

They guarantee that simple operations like increment and decrement of a counter are performed atomically without any interruption.


It is defined as:
typedef struct { volatile int counter; } atomic_t;

A short example, in which a counter is incremented by 1, is sufficient to demonstrate the need for operations of this kind. On the assembly language level, incrementation is usually performed in three steps:
1. The counter value is copied from memory into a processor register.
2. The value is incremented.
3. The register data are written back to memory.

Let’s say counter =1 before operation. It may happen processor- 1 and processor- 2 read the counter value (step 1) simultaneously, and increment it and write back to counter. So value of counter will be 3.
But if take counter as atomic variable and perform atomic operation on this then its value will be 2.
In case of atomic operation special assembly level locking is used to avoid interference of other processors during current operation.


Atomic variables may be initialize using ATOMIC_INIT() macro only. There are many atomic operations; few of them are listed below:

atomic_read(atomic_t *v)             
Reads the value of the atomic variable.

atomic_set(atomic_t *v, int i)             
Sets v to i.

atomic_add(int i, atomic_t *v)              
Adds i to v.

atomic_sub(int i, atomic_t *v)               
Subtracts i from v.

atomic_inc(atomic_t *v)                        
Subtracts 1 from v.

atomic_inc_and_test(atomic_t *v)         
Adds 1 to v. Returns true if the result is 0, otherwise false.

atomic_dec(atomic_t *v)                        
Subtracts 1 from v.

atomic_dec_and_test(atomic_t *v)        
Subtracts 1 from v. Returns true if the result is 0,otherwise false.

NOTE:
Atomic operations are applicable on integer variables. Kernel provides separate APIs for bit level operations. You can find these operations here 

2. Spinlocks:

These locks are used for short term protection of critical section.
Spinlocks are implemented by means of spinlock_t data structure

Spinlocks are used as follows:

spinlock_t lock = SPIN_LOCK_UNLOCKED; 
//for static initialization
spin_lock_init(&lock);                                                   
//for dynamic initialization

......
spin_lock(&lock);
/* Critical section */
spin_unlock(&lock);

spin_lock() function takes account of two situations:
1. If lock is not yet acquired from another place in the kernel, it is reserved for the current
processor. Other processors may no longer enter in critical section..
2. If lock is already acquired by another processor, spin_lock goes into an endless loop to
repeatedly check whether lock has been released by spin_unlock . Once this is the case, lock is acquired, and it enters in critical section.


Points to remember while using spinlock:
* Piece of code protected by spinlock must not go to sleep.
For example when we call kmalloc() and kernel is short of memory then this  function may go to sleep. If so, other processors may spin for long time to acquire the lock. You should pay full attention to which function you call inside spinlocked region.

* Spinlock can’t be acquired more than once by current holder.
For example when two function operates with same spinlock than one function (already acquired spinlock) can’t call another function (need to acquire same lock which is acquired by same code path). The processor will wait for itself to release the lock.
* Spinlocks be acquired for a longer period because all processors waiting for lock release are no longer available for other productive tasks.
* If a processor is inside spinlocked region and interrupts comes onsame processor then it may be a deadlock situation, because the interrupt is waiting for the lock, and the lock-holder is interrupted by the interrupt and will not continue until the interrupt has been processed.
To avoid this situation kernel provides few other versions of spinlock, which are mentioned below,

spin_lock_irqsave() and spin_unlock_irqsave() :
                This version is always safe to use. It disables interrupts on local processor. Ofcourse this is fairly slow.
               
spin_lock_bh() and spin_unlock_bh() :
                This version disabled software interrupts only.
  

3. Semaphores:

While waiting for a semaphore to be released, the kernel goes to sleep until it is woken. Only then it attempts to acquire the semaphore.

Semaphores that are used in the kernel are defined by the structure mentioned below.

struct semaphore {
atomic_t count;
int sleepers;
wait_queue_head_t wait;
};

Count = specifies how many processes may be in the critical region protected by the semaphore at the same time. Counter decremented when a process enters in critical section, when it is zero no other process may enter in critical section.

Sleepers = specifies the number of processes waiting to be allowed to enter the critical region.

Wait = is used to implement a queue to hold the task structures of all processes sleeping on the Semaphore.

When an attempt is made to acquire a reserved semaphore with down, the current process is put to sleep and placed on the wait queue associated with the semaphore. At the same time, the process is placed in the TASK_UNINTERRUPTIBLE state and cannot receive signals while waiting to enter the critical region.

If the semaphore is not reserved, the process may immediately continue without being put to sleep and enters the critical region.

Unlike spinlocks, waiting processes go to sleep and are not woken until the semaphore is free; this means that the relevant CPU can perform other tasks in the meantime.

semaphores are suitable for protecting longer critical sections against parallel access. However, they should not be used to protect shorter sections because it is very costly to put processes to sleep and wake them up again


3. Mutex

Its sleeping lock that implements mutual exclusion. 
It behaves similar to semaphore with count but it has concept of ownership, only lock holder can release the lock.



DEFINE_MUTEX(name);
//for static initialization
mutex_init(&mutex);                                                   
//for dynamic initialization

......
mutex_lock(&mutex);
/* Critical section */


mutex_unlock(&mutex);

mutex_lock(struct mutex *) :
- Locks the given mutex; sleeps if the lock is unavailable
mutex_unlock(struct mutex *) :
- Unlocks the given mutex 
mutex_trylock(struct mutex *):
- Tries to acquire the given mutex; returns one if successful and the lock is acquired and zero otherwise

mutex_is_locked (struct mutex *):
- Returns one if the lock is locked and zero otherwise

4. Reader/Writers locks

It allows multiple readers in critical section of code but if someone wants to write in critical section then it has to take exclusive lock. Kernel provides reader/writer versions of semaphore and spinlocks known as Reader/Writer semaphores and Reader/Writer spinlocks.

This kind of lock may be useful for complex data structures like linked lists, especially searching for entries without changing the list itself. So using read lock many concurrent readers can read list, but anything that modify the list will have to get the write lock.


Reader/Writer spinlock can be used as
rwlock_t my_lock = __RW_LOCK_UNLOCKED(my_lock);

unsigned long flags;

read_lock(&my_lock);
//critical section where multiple read is allowed (No write)
read_unlock(&my_lock);

write_lock&my_lock);
//critical section where read and write has exclusive access
write_unlock(&my_lock);

Note: _trylock and _irqsave is also available.

Read/write semaphores can be used in similar way
The equivalent data structure is struct rw_semaphore,
and down_read and up_read are used to obtain read access to the critical region. Write access is performed with the help of down_write and up_write.


5. Completion variables 


Its easy way to synchronize between two tasks in the kernel
when one task needs to signal to the other that an event has occurred. One task waits on the completion variable while another task performs some work.
When the other task has completed the work, it uses the completion variable to wake up any waiting tasks.

On a given completion variable, the tasks that want to wait call
wait_for_completion().After the event has occurred, calling complete() or complete_all() signals all waiting tasks to wake up

example:
vfork() system call uses completion variables to wake up the parent process when the child process execs or exits.

No comments:

Post a Comment