Get the most IOPS out of your physical volumes using LVM.

Hope everyone aware about known about LVM(Logical Volume Manager) an extremely useful tool for handling the storage at various levels. LVM basically functions by layering abstractions on top of physical storage devices as mentioned below in the illustration.

Below is a simple diagrammatic expression of LVM

         sda1  sdb1   (PV:s on partitions or whole disks)
           \    /
            \  /
          Vgmysql      (VG)
           / | \
         /   |   \
      data  log  tmp  (LV:s)
       |     |    |
      xfs  ext4  xfs  (filesystems)

IOPS is an extremely important resource, when it comes to storage it defines the performance of disk. Let’s not forget PIOPS(Provisioned IOPS) one of the major selling points for AWS and other cloud vendors for production machines such as databases. Since Disk is the slowest in the server, we can compare the major components as below.

Consider CPU in speed range of Fighter Jet, RAM in speed range of F1 car and hard Disk in speed range of bullock cart. With modern hardware improvement, IOPS is also seeing significant improvement with SSD’s.

In this blog, we are going to see Merging and Stripping of multiple HDD drives to reap the benefit of disks and combined IOPS

Below is the Disk attached to my server, Each is an 11TB disk with Max supported IOPS of 600.

# lsblk
NAME   MAJ:MIN  RM  SIZE  RO  TYPE  MOUNTPOINT
sda      8:0     0   10G    0   disk
sda1     8:1     0   10G    0   part        
sdb      8:16    0   10.9T  0   disk
sdc      8:32    0   10.9T  0   disk
sdd      8:48    0   10.9T  0   disk
sde      8:64    0   10.9T  0   disk
sdf      8:80    0   10.9T  0   disk
sdg      8:96    0   10.9T  0   disk

sda is the root partition, sd[b-g] is the attached HDD disk,

With Mere merging of these disk, you will have space management since the disk is clubbed in a linear fashion. With stripping our aim is to get 600*6=3600 IOPS or atleast a value somewhere around 3.2 k to 3.4 k.

Now let’s proceed to create the PV (Physical volume)

# pvcreate /dev/sd[b-g]
Physical volume "/dev/sdb" successfully created.
Physical volume "/dev/sdc" successfully created.
Physical volume "/dev/sdd" successfully created.
Physical volume "/dev/sde" successfully created.
Physical volume "/dev/sdf" successfully created.
Physical volume "/dev/sdg" successfully created.

Validating the PV status:

# pvs
PV VG Fmt Attr PSize PFree
/dev/vdb lvm2 --- 10.91t 10.91t
/dev/vdc lvm2 --- 10.91t 10.91t
/dev/vdd lvm2 --- 10.91t 10.91t
/dev/vde lvm2 --- 10.91t 10.91t
/dev/vdf lvm2 --- 10.91t 10.91t
/dev/vdg lvm2 --- 10.91t 10.91t

Let’s proceed to create a volume group (VG) with a physical extent of 1MB, (PE is similar to block size with physical disks) and volume group name as “vgmysql” combining the PV’s

#vgcreate -s 1M vgmysql /dev/vd[b-g] -v
Wiping internal VG cache
Wiping cache of LVM-capable devices
Wiping signatures on new PV /dev/vdb.
Wiping signatures on new PV /dev/vdc.
Wiping signatures on new PV /dev/vdd.
Wiping signatures on new PV /dev/vde.
Wiping signatures on new PV /dev/vdf.
Wiping signatures on new PV /dev/vdg.
Adding physical volume '/dev/vdb' to volume group 'vgmysql'
Adding physical volume '/dev/vdc' to volume group 'vgmysql'
Adding physical volume '/dev/vdd' to volume group 'vgmysql'
Adding physical volume '/dev/vde' to volume group 'vgmysql'
Adding physical volume '/dev/vdf' to volume group 'vgmysql'
Adding physical volume '/dev/vdg' to volume group 'vgmysql'
Archiving volume group "vgmysql" metadata (seqno 0).
Creating volume group backup "/etc/lvm/backup/vgmysql" (seqno 1).
Volume group "vgmysql" successfully created

Will check the volume group status as below with VG display

# vgdisplay -v  
--- Volume group ---
VG Name           vgmysql
System ID
Format            lvm2
Metadata Areas     6
MetadataSequenceNo 1
VG Access          read/write
VG Status          resizable
MAX LV             0
Cur LV             0
Open LV            0
Max PV             0
Cur PV             6
Act PV             6
VG Size            65.48 TiB
PE Size            1.00 MiB
Total PE           68665326
Alloc PE / Size    0 / 0
Free PE / Size     68665326 / 65.48 TiB
VG UUID 51KvHN-ZqgY-LyjH-znpq-Ufy2-AUVH-OqRNrN

Now our volume group is ready, let’s proceed to create Logical Volume(LV) space with stripe size of 16K equivalent to the page size of MySQL (InnoDB) to be stripped across the 6 attached disk

# lvcreate -L 7T -I 16k -i 6 -n mysqldata vgmysql
Rounding size 7.00 TiB (234881024 extents) up to stripe boundary size 7.00 TiB (234881028 extents).
Logical volume "mysqldata" created.

-L volume size
-I strip size
-i Equivalent to number of disks
-n LV name
Vgmysql volume group to use

lvdisplay to provide a complete view of the Logical volume

# lvdisplay -m
--- Logical volume ---
LV Path           /dev/vgmysql/mysqldata
LV Name           mysqldata
VG Name           vgmysql
LV UUID           Y6i7ql-ecfN-7lXz-GzzQ-eNsV-oax3-WVUKn6
LV Write Access   read/write
LV Creation host, time warehouse-db-archival-none, 2019-08-26 15:50:20 +0530
LV Status          available
# open             0
LV Size            7.00 TiB
Current LE         7340034
Segments           1
Allocation         inherit
Read ahead sectors auto
- currently set to 384
Block device       254:0
--- Segments ---
Logical extents 0 to 7340033:
  Type       striped
  Stripes   6
Stripe size 16.00 KiB

Now we will proceed to format with XFS and mount the partition

# mkfs.xfs /dev/mapper/vgmysql-mysqldata

Below are the mount options used

/dev/mapper/vgmysql-mysqldata on /var/lib/mysql type xfs (rw,noatime,nodiratime,attr2,nobarrier,inode64,sunit=32,swidth=192,noquota)

Now let’s proceed with the FIO test to have IO benchmark.

Command:

#fio --randrepeat=1 --name=randrw --rw=randrw --direct=1 --ioengine=libaio --bs=16k --numjobs=10 --size=512M --runtime=60 --time_based --iodepth=64 --group_reporting

Result:

read : io=1467.8MB, bw=24679KB/s, iops=1542, runt= 60903msec
slat (usec): min=3, max=1362.7K, avg=148.74, stdev=8772.92
clat (msec): min=2, max=6610, avg=233.47, stdev=356.86
lat (msec): min=2, max=6610, avg=233.62, stdev=357.65
write: io=1465.1MB, bw=24634KB/s, iops=1539, runt= 60903msec
slat (usec): min=4, max=1308.1K, avg=162.97, stdev=8196.09
clat (usec): min=551, max=5518.4K, avg=180989.83, stdev=316690.67
lat (usec): min=573, max=5526.4K, avg=181152.80, stdev=317708.30

We have the desired iops ~3.1k by merging and stripped LVM rather than the normal IOPS of 600

Key Take-aways:

  • Management of storage becomes very easy with LVM
  • Distributed IOPS with stripping helps in enhancing disk performance
  • LVM snapshots

Downsides:

Every tool has its own downsides, we should embrace it. Considering the use case it serves best ie., IOPS in our case. One major downside I could think of is, if any one of the disks fails with this setup there will be a potential data-loss/Data corruption.

Work Around:

  • To avoid this data-loss/Data corruption we have set-up HA by adding 3 slaves for this setup in production
  • Have a regular backup for stripped LVM with xtrabackup, MEB, or via snapshot
  • RAID 0 also serves the same purpose as the stripped LVM.

Featured Image by Carl J on Unsplash

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s