Saturday 15 November 2014

                       SAN and NAS


NAS and SAN both are enterprise level storages solution for storing data.

NAS: Network attached storage

  • Any server/device that shares its own storage with others on the network and acts as a file server is the simplest form NAS.
  • NAS uses an Ethernet connection for sharing files over the network. The NAS device will have an IP address, and then will be accessible over the network through that IP address. When you access files on a file-server of your organisation from your windows /Linux/ mac system, it’s basically NAS.
  • A client sends a file read/write request to NAS device/server. This request can be CIFS (from Windows OS) or NFS (from Linux OS) or simply HTTP (form web browser), or FTP.  NAS server serves this request.
  • NAS is file level storage (file sharing technology); it hides actual file system underhood.
  • NAS is cheaper technology and it can be setup without any modification on existing system as it works on TCP/IP model.
  • NAS suffers High latency because Ethernet also carries local traffic due to which NAS cannot be used for high performance applications.






SAN: Storage area network 

  • SAN is a high-speed network of storage devices that also connects those storage devices with servers. It provides block-level storage that can be accessed by the applications running on any networked servers. SAN storage devices can include tape libraries, and more commonly, disk-based devices, like RAID.
  •  It is a dedicated storage network that provides block level access to lun (virtual disk provided by SAN).
  •  When a LUN is mapped to a host (let’s say windows initiator), host can see this LUN as new local drive.  
  • A SAN is used to transport data between servers and storage resources. SAN technology enables high-speed connectivity from server to storage, storage to storage, or server to server.
  • The purpose of a SAN server is to store large amounts of data over a network enabling multiple users to share the same storage space simultaneously SAN uses FC and/or iSCSI protocol as communication protocol between SAN device and host (server).
  • SAN can also use ATA over Ethernet (AoE), Fibre Channel over Ethernet (FCoE), ESCON over Fibre Channel as communication protocol.





  1. SAN environment is costly to implement.
  2. SAN provides high speed storage sharing system.
  3. SAN increases the network bandwidth and reliability of data I/O.
  4. SAN is separated from the regular network system and has an ability to expand the storage capacity.
  5. 5.   SAN provide safety by implementing concept of zoning.

Hybrid SAN + NAS

Many organizations prefer such environment where a file server may need have very large storage space that may be not available as local hard disk.
This requirement can be fulfill by SAN storage in very well manner. So a NAS device can use features provided by SAN.






Other

DAS (direct attached storage ) : It is a storage system directly attached to a server or workstation, without a storage network in between them. (more)

Refs:



Monday 29 September 2014

Gdb tutorial

  GDB (GNU Debugger)


Gdb is a debugger for C (and C++). It allows you to do things like run the program up to a certain point then stop and print out the values of certain variables at that point, or step through the program one line at a time and print out the values of each variable after executing each line. It uses a command line interface.

It allows you to inspect what the program is doing at a certain point during execution. Errors like segmentation faults may be easier to find with the help of gdb.


Steps:
1.   Compilation:

For normal compilation without enabling gcc
#gcc <flags> <source file > -o <output file>
e.g. #gcc –Wall hello.c :q–o hello
For compilation with gcc
#gcc <flags> -g  <source file > -o <output file>
e.g. #gcc –Wall –g  hello.c –o hello
NOTE: Use g++ instead of gcc for c++ programs.

    2.  Start gdb

#gdb
After this prompt will appear like this ->  (gdb)

    3.   load program:
# (gdb) file <executable_name>
e.g.  # (gdb) file hello
OR
We can start gdb and run a program in one command
#gdb hello

   4.   Run program

# (gdb) run
This will run complete program in one go.
If program have any issue in running (like segmentation fault), then gdb will throw some useful information like line number and parameters to function that caused error.

If your program has some bugs / issues then you don’t want to run program in one go, instead you would like to step through program a bit at a time, until you receive upon error.

Breakpoints:

These can be used to stop a running program at a designated point. The command is “break”
Breakpoint using line number
# (gdb) break hello.c:13
This will set breakpoint at line 6 in hello.c , So when control of execution reach at line 6, it will stop here and wait for another  command.
Breakpoint using function name
 # (gdb) break my_function
Once you have set break points, you can run your program using “run” command. This time execution should stop at 1st break point (unless a fatal error occurs before reaching that point)
                  
Conditional breakpoints:
Use to avoid unnecessary stepping in program
# (gdb) hello.c:13 if len >= MAX_LEN
Here, pausing of program will trigger only if condition is satisfied.
# (gdb) delete  <breakpoint-name>  Use to delete breakpoint.
                    # (gdb) info breakpoints show information about all declared breakpoints.


More useful commands:

#(gdb) continue    Use to proceed from one break point to next break point.
#(gdb) step  It simply execute next line of program and wait for further command
#(gdb) next  It work similar to step , but it doesn’t execute each line of subroutine instead  “next” treat it as one instruction
NOTE: Instead of typing “step” or “next” again and again, you can just press ENTER, gdb will repeat last command fired

Checking value of variables in program:

#(gdb) print  my_variable
 It prints value of variable named “my_varaible” at that moment
You can also print value of structure elements.
Using watch on variables
Breakpoints work on line number or function name, watch act on variable. It pause the program whenever value of watched variable is modified. Gdb keep watch on variable within variables’ scope  
# (gdb) watch var

Reference: http://www.gnu.org/software/gdb/



Friday 12 September 2014

Procfs tutorial



Sample code for creating entries inside proc filesystem:

=======================================================================
/*
 * This Kernel module will create 3 proc entry, one parent directory and two
 * entries under this parent.
 */

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/proc_fs.h>
#include <asm/uaccess.h>

#define MODULE_NAME "Proc_entry_sample_module"
#define MAX_BUFFER_SIZE 1024
#define PROC_DIR_NAME "Hello_proc_dir"

static char procfs_buffer[MAX_BUFFER_SIZE];
static unsigned long procfs_buffer_size = 0;
static struct proc_dir_entry *dir , *file1 , *file2;


int hello_file_read (char *buffer, char ** buffer_location, off_t offset ,
                        int buffer_length ,int *eof ,  void *data )
{
int ret;
printk(KERN_INFO"\nReading /Procfs..");

if (offset > 0) {
ret = 0;
                *eof = 1;
} else {
memcpy (buffer,procfs_buffer,procfs_buffer_size);
ret=procfs_buffer_size;
}
printk (KERN_INFO"\nret = %d",ret);
return ret;
}

int hello_file_write(struct file *file , const char *buffer,
                                unsigned long count,void *data )
{
printk (KERN_INFO"\nWriting to /ProcFS..");
        printk("count=%d",count);
procfs_buffer_size = count;
if (procfs_buffer_size > MAX_BUFFER_SIZE) {
procfs_buffer_size = MAX_BUFFER_SIZE;
} else {
                procfs_buffer_size = count;
        }
if (copy_from_user (procfs_buffer,buffer,procfs_buffer_size))
return -EFAULT;
printk (KERN_INFO"\nret = %lu",procfs_buffer_size);
return procfs_buffer_size;
}

static int __init my_init_function(void)
{
int ret = 0;
dir = proc_mkdir (PROC_DIR_NAME,NULL);
if(dir == NULL) {
ret = -ENOMEM;
printk (KERN_ERR"\nDirectory creation failed");
                goto out_err;
        } else {
printk (KERN_INFO"\nDirectory created");
}

file1 = create_proc_entry ("file1",0644,dir);
if (file1 == NULL) {
ret = -ENOMEM;
printk ("\nfile1 creation failed");
                goto out_err;
} else {
printk(KERN_INFO"\nfile1 under dir created");
}

file1->read_proc=hello_file_read;
file1->write_proc=hello_file_write;
file1->mode=S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH;

file2 = create_proc_entry ("file2",0644,dir);
        if (file2 == NULL) {
ret = -ENOMEM;
printk ("\nfile2 creation failed");
                goto out_err;
} else {
printk(KERN_INFO"\nfile2 under dir created");
}

file2->read_proc=hello_file_read;
        file2->write_proc=hello_file_write;
file2->mode=S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH;


out:
return ret;
out_err:
        printk("\nModule failed to create proc entries !!");
        goto out;

}

static void __exit my_exit_function(void)
{
remove_proc_entry("file1",dir);
remove_proc_entry("file2",dir);
remove_proc_entry("dir",NULL);
printk(KERN_INFO"\nAll Proc entries of %s kernel module removed"
                                ,MODULE_NAME);
printk(KERN_INFO"\n%s module unregistered",MODULE_NAME);

}

module_init(my_init_function);
module_exit(my_exit_function);

MODULE_DESCRIPTION("Kernel module for handing procfs entires.");
MODULE_AUTHOR("Narendra Pal Singh");
MODULE_LICENSE("GPL");

========================================================================

Makefile:

obj-m += proc_example.o 

all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

========================================================================

Steps to run:

1.  #make
2.  #insmod proc_example.ko
3.  #cd /proc

You will a directory of name "Hello_proc_dir".

4. #cd Hello_proc_dir
now can you can fire echo / cat commands to these two proc files named "file1" and "file2"

Procfs filesystem

Procfs  (/proc)

The /proc file system is a pseudo file system that provides an interface to kernel data structures in a form that looks like files and directories on a file system. This provides an easy mechanism for viewing and changing various system attributes.

The contents of /proc files are generally in human-readable text form and can be parsed by shell scripts. A program can simply open and read from, or write to, the desired file. In most cases, a process must be privileged to modify the contents of files in the /proc directory.

 This file system resides under the /proc directory and contains various files that expose kernel information, allowing processes to conveniently read that information, and change it in some cases, using normal file I/O system calls.
The /proc file system is said to be virtual because the files and subdirectories that it contains don’t reside on a disk. Instead, the kernel creates them “on the fly” as processes access them.


Content of /proc filesystem
.  
For each process on the system, the kernel provides a corresponding directory named /proc/PID, where PID is the ID of the process. Within this directory are various files and subdirectories containing information about that process.

For example,
We can obtain information about the init process, which always has the process ID 1, by looking at files under the directory /proc/1.

Major content of Proc filesystem is  

Directory                  Information exposed by files in this directory
/proc                          various system information
/proc/net                               Status information about networking and sockets
/proc/sys/fs                         Settings related to file systems
/proc/sys/kernel                 Various general kernel settings
/proc/sys/net                        Networking and sockets settings
/proc/sys/vm                       Memory-management settings


This figure will help you ti understand content of procfs
      
      




Procfs often accessed by shell script using “cat” and “echo “. Proc can also be access by c program using simple file I/O system calls.  We should be take care access permission of proc while accessing them.

We can create our own Procfs entry by writing linux kernel module, Sample tutorial for this is  available here.

Reference: Linux Programming interface

Wednesday 10 September 2014

Block level storage v/s File level Storage


Block level storage:

In block level storage, raw volumes of storage are created and each volume is controlled as individual hard drive. These volumes are controlled by server based operating system, can be formatted by required file system.

·         This type of storage is deployed in SAN environment which is much reliable and flexible.
·         When you use a block-based volume, you're basically using a blank hard drive with which you can do anything.
·         Each storage volume can be treated as an independent disk drive and it can be controlled by external server operating system.
·         Each storage volume can be formatted with file systems like NFS, NTFS or SMB/CIFS, VMFS (VMware)   which are required by the applications.
·         Block level storage uses iSCSI and FCoE protocols for data transfer as SCSI commands act as communication interface in between the initiator and the target.
·         There are a lot of applications that make use of this block-level shared storage, including:
·         Databases, Microsoft Exchange, VMware shared file system, Server boot.

File system level storage:

This can be defined as centralized location for storing files (data). This storage requires file level protocol like NFS (Linux and VMware) and SMB/CIFS (windows).

·         It stores files and folders and the visibility is the same to the clients accessing and to the system which stores it.
·         The storage disk is configured with protocol as NFS or SMB/CIFS and files are stored and access for it in bulk.
·         Network attached storage (NAS) usually depends on file level storage.
·         It represent data to end user and applications. This data is typically organized in directories or folders in mostly in some hierarchical fashion.
  •    Application of such storage can be in mass file storage

Convergence of block and file storage:

Nowadays, new storage device includes best of both world’s block level and file level storage capabilities. 

Tuesday 9 September 2014

Bimap operation and APIs

Bitmap operations:


Bitmap can be used as efficient data structure. Where each bit can be used as information.
Let’s say we have a large stock of items. We can use a long variable to represent status (used / unused) of all items. Each bit represents one item.
If bit is 0 -> item is free,
If bit is 1-> item is busy.
We can keep track count of used/unused item using variable.

We have bitwise operations, which can use to manipulate Bitmaps. Few bitwise operations are listed below:

void set_bit(int pos, volatile unsigned long * addr)
sets the bit at position pos ; counting begins at addr.

int test_bit(int pos , const volatile unsigned long * addr)
checks whether the specified bit is set.

void clear_bit(int pos, volatile unsigned long * addr)
Deletes the bit at position pos (counting begins at addr).

void change_bit(int pos, volatile unsigned long * addr)
Toggle the bit value at position pos (counting begins at addr); in other words, a set bit is unset and vice versa.




clone() in Linux

clone ():

This library function is used to creates a new process.
Depending on flags passed to this function, It allows the child process to share parts of its execution context with the calling process, such as the memory space, the table of file descriptors, and the table of signal handlers.
The main use of clone() is to implement threads where multiple threads of control in a program that run concurrently in a shared memory space.

Fork() also internally calls this function.
 
Prototype is:
int __clone(int (*fn) (void *), void *child_stack, int flags, void *arg)
where
fn: The fn argument is a pointer to a function that is called by new child process at the beginning of its execution. When function “fn” returns, its returnvalue is exit code for child process.

child_stack: It is location of the stack used by the child process.Usually points to the topmost address of the memory space set up for the child stack (As stack grows downward)

flags: The low byte of flags contains the number of the terminationsignal sent to the parent when the child dies.

arg: The arg argument is passed to the fn function.


This function returns
 On success: thread ID of child id returned to context to calling  process. 
 On failure: No child process will be created and appropriately  errno. will be  returned.

 clone () system call leaves all memory management up to the  programmer.  The first thing to be done is allocating space for the  stack of the new child  thread with malloc (). Once this is done,  the clone () call is issued to begin a  new context of execution,  starting at the beginning of the given function.

Raw clone() system call:

long clone(unsigned long flags, void *child_stack,
                      void *ptid, void *ctid,
                      struct pt_regs *regs);

 Here the execution in the child continues from the point of the  call. As such,  the fn and arg arguments of the clone() wrapper  function are omitted.    Furthermore, the argument order changes.

 Another difference for the raw system call is that the child_stack  argument may  be zero, in which case copy-on-write semantics ensure  that the child gets  separate copies of stack pages when either  process modifies the stack.

Monday 8 September 2014

vfork() in linux

vfork():


This function creates a child process and block parent (calling process).

It is used to create new processes without copying the page tables of the parent process. This system call avoids the address space and page table copying by suspending the parent process until the child terminates or executes a new binary image. 

It may be useful in performance-sensitive applications where a child is 
created which then immediately issues an execve().


Prototype:
#include <sys/types.h>#include <unistd.h>
pid_t vfork(void);
return type is similar to fork  

linux kernel locking machanism

Linux kernel provides following type to lockings: 

1. Atomic operations:

They guarantee that simple operations like increment and decrement of a counter are performed atomically without any interruption.


It is defined as:
typedef struct { volatile int counter; } atomic_t;

A short example, in which a counter is incremented by 1, is sufficient to demonstrate the need for operations of this kind. On the assembly language level, incrementation is usually performed in three steps:
1. The counter value is copied from memory into a processor register.
2. The value is incremented.
3. The register data are written back to memory.

Let’s say counter =1 before operation. It may happen processor- 1 and processor- 2 read the counter value (step 1) simultaneously, and increment it and write back to counter. So value of counter will be 3.
But if take counter as atomic variable and perform atomic operation on this then its value will be 2.
In case of atomic operation special assembly level locking is used to avoid interference of other processors during current operation.


Atomic variables may be initialize using ATOMIC_INIT() macro only. There are many atomic operations; few of them are listed below:

atomic_read(atomic_t *v)             
Reads the value of the atomic variable.

atomic_set(atomic_t *v, int i)             
Sets v to i.

atomic_add(int i, atomic_t *v)              
Adds i to v.

atomic_sub(int i, atomic_t *v)               
Subtracts i from v.

atomic_inc(atomic_t *v)                        
Subtracts 1 from v.

atomic_inc_and_test(atomic_t *v)         
Adds 1 to v. Returns true if the result is 0, otherwise false.

atomic_dec(atomic_t *v)                        
Subtracts 1 from v.

atomic_dec_and_test(atomic_t *v)        
Subtracts 1 from v. Returns true if the result is 0,otherwise false.

NOTE:
Atomic operations are applicable on integer variables. Kernel provides separate APIs for bit level operations. You can find these operations here 

2. Spinlocks:

These locks are used for short term protection of critical section.
Spinlocks are implemented by means of spinlock_t data structure

Spinlocks are used as follows:

spinlock_t lock = SPIN_LOCK_UNLOCKED; 
//for static initialization
spin_lock_init(&lock);                                                   
//for dynamic initialization

......
spin_lock(&lock);
/* Critical section */
spin_unlock(&lock);

spin_lock() function takes account of two situations:
1. If lock is not yet acquired from another place in the kernel, it is reserved for the current
processor. Other processors may no longer enter in critical section..
2. If lock is already acquired by another processor, spin_lock goes into an endless loop to
repeatedly check whether lock has been released by spin_unlock . Once this is the case, lock is acquired, and it enters in critical section.


Points to remember while using spinlock:
* Piece of code protected by spinlock must not go to sleep.
For example when we call kmalloc() and kernel is short of memory then this  function may go to sleep. If so, other processors may spin for long time to acquire the lock. You should pay full attention to which function you call inside spinlocked region.

* Spinlock can’t be acquired more than once by current holder.
For example when two function operates with same spinlock than one function (already acquired spinlock) can’t call another function (need to acquire same lock which is acquired by same code path). The processor will wait for itself to release the lock.
* Spinlocks be acquired for a longer period because all processors waiting for lock release are no longer available for other productive tasks.
* If a processor is inside spinlocked region and interrupts comes onsame processor then it may be a deadlock situation, because the interrupt is waiting for the lock, and the lock-holder is interrupted by the interrupt and will not continue until the interrupt has been processed.
To avoid this situation kernel provides few other versions of spinlock, which are mentioned below,

spin_lock_irqsave() and spin_unlock_irqsave() :
                This version is always safe to use. It disables interrupts on local processor. Ofcourse this is fairly slow.
               
spin_lock_bh() and spin_unlock_bh() :
                This version disabled software interrupts only.
  

3. Semaphores:

While waiting for a semaphore to be released, the kernel goes to sleep until it is woken. Only then it attempts to acquire the semaphore.

Semaphores that are used in the kernel are defined by the structure mentioned below.

struct semaphore {
atomic_t count;
int sleepers;
wait_queue_head_t wait;
};

Count = specifies how many processes may be in the critical region protected by the semaphore at the same time. Counter decremented when a process enters in critical section, when it is zero no other process may enter in critical section.

Sleepers = specifies the number of processes waiting to be allowed to enter the critical region.

Wait = is used to implement a queue to hold the task structures of all processes sleeping on the Semaphore.

When an attempt is made to acquire a reserved semaphore with down, the current process is put to sleep and placed on the wait queue associated with the semaphore. At the same time, the process is placed in the TASK_UNINTERRUPTIBLE state and cannot receive signals while waiting to enter the critical region.

If the semaphore is not reserved, the process may immediately continue without being put to sleep and enters the critical region.

Unlike spinlocks, waiting processes go to sleep and are not woken until the semaphore is free; this means that the relevant CPU can perform other tasks in the meantime.

semaphores are suitable for protecting longer critical sections against parallel access. However, they should not be used to protect shorter sections because it is very costly to put processes to sleep and wake them up again


3. Mutex

Its sleeping lock that implements mutual exclusion. 
It behaves similar to semaphore with count but it has concept of ownership, only lock holder can release the lock.



DEFINE_MUTEX(name);
//for static initialization
mutex_init(&mutex);                                                   
//for dynamic initialization

......
mutex_lock(&mutex);
/* Critical section */


mutex_unlock(&mutex);

mutex_lock(struct mutex *) :
- Locks the given mutex; sleeps if the lock is unavailable
mutex_unlock(struct mutex *) :
- Unlocks the given mutex 
mutex_trylock(struct mutex *):
- Tries to acquire the given mutex; returns one if successful and the lock is acquired and zero otherwise

mutex_is_locked (struct mutex *):
- Returns one if the lock is locked and zero otherwise

4. Reader/Writers locks

It allows multiple readers in critical section of code but if someone wants to write in critical section then it has to take exclusive lock. Kernel provides reader/writer versions of semaphore and spinlocks known as Reader/Writer semaphores and Reader/Writer spinlocks.

This kind of lock may be useful for complex data structures like linked lists, especially searching for entries without changing the list itself. So using read lock many concurrent readers can read list, but anything that modify the list will have to get the write lock.


Reader/Writer spinlock can be used as
rwlock_t my_lock = __RW_LOCK_UNLOCKED(my_lock);

unsigned long flags;

read_lock(&my_lock);
//critical section where multiple read is allowed (No write)
read_unlock(&my_lock);

write_lock&my_lock);
//critical section where read and write has exclusive access
write_unlock(&my_lock);

Note: _trylock and _irqsave is also available.

Read/write semaphores can be used in similar way
The equivalent data structure is struct rw_semaphore,
and down_read and up_read are used to obtain read access to the critical region. Write access is performed with the help of down_write and up_write.


5. Completion variables 


Its easy way to synchronize between two tasks in the kernel
when one task needs to signal to the other that an event has occurred. One task waits on the completion variable while another task performs some work.
When the other task has completed the work, it uses the completion variable to wake up any waiting tasks.

On a given completion variable, the tasks that want to wait call
wait_for_completion().After the event has occurred, calling complete() or complete_all() signals all waiting tasks to wake up

example:
vfork() system call uses completion variables to wake up the parent process when the child process execs or exits.