Difference between revisions of "Configuring storage pools with ZFS"

From Public PIC Wiki
Jump to navigation Jump to search
Line 69: Line 69:
 
  config:
 
  config:
 
   
 
   
NAME                                STATE    READ WRITE CKSUM
+
NAME                                STATE    READ WRITE CKSUM
dcpool                              ONLINE      0    0    0
+
dcpool                              ONLINE      0    0    0
  raidz2-0                          ONLINE      0    0    0
+
  raidz2-0                          ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:0:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:0:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:1:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:1:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:2:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:2:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:3:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:3:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:4:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:4:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:5:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:5:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:6:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:6:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:7:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:7:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:8:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:8:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:9:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:9:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:10:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:10:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:11:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:11:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:12:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:12:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:13:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:13:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:14:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:14:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:15:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:15:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:16:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:16:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:17:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:17:0  ONLINE      0    0    0
  raidz2-1                          ONLINE      0    0    0
+
  raidz2-1                          ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:18:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:18:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:19:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:19:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:20:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:20:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:21:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:21:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:22:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:22:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:23:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:23:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:24:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:24:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:25:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:25:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:26:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:26:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:27:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:27:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:28:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:28:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:29:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:29:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:30:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:30:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:31:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:31:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:32:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:32:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:33:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:33:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:34:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:34:0  ONLINE      0    0    0
    pci-0000:05:00.0-scsi-0:2:35:0  ONLINE      0    0    0
+
    pci-0000:05:00.0-scsi-0:2:35:0  ONLINE      0    0    0
+
 
errors: No known data errors
+
  errors: No known data errors
  
 
=== Tuning up ZFS ===
 
=== Tuning up ZFS ===

Revision as of 14:06, 25 June 2015

(Optional+Recommended) By-Passing the LSI Controller

  • [Dangerous!] Clear the controller configuration (it will wipe ALL configuration, will cause data loss)
/opt/MegaRAID/MegaCli/MegaCli64 -CfgClr -a0
  • [Optional] For old LSI controllers, we have to create a RAID0 per each disk (for some reason it does not work without RAID0, with newer controllers this is not needed)
/opt/MegaRAID/MegaCli/MegaCli64 -CfgEachDskRaid0 -a0

Map your storage disks by-partuuid, by-path or by-id

  • Devices can be used with the human-friendly device name (i.e. sda, sdb, etc.). As Linux can remap this it may not map with the correct physical disk, and can carry problems if a different mapping is being performed.
  • The use of by-partuuid, by-path or by-id is recommended instead of the use of device names.
/dev/disk/by-path -> This is the default method that will be used at PIC
/dev/disk/by-partuuid
/dev/disk/by-id
  • For instance, for dc106.pic.es which is a SuperMicro X8DT3 we have:
[root@dc106 ~]# ls -lha /dev/disk/by-path/  | grep -v part
total 0
drwxr-xr-x 2 root root 2.2K Jun 25 14:25 .
drwxr-xr-x 7 root root  140 Jun 25 14:25 ..
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:0:0 -> ../../sda
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:1:0 -> ../../sdb
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:2:0 -> ../../sdc
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:3:0 -> ../../sdd
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:4:0 -> ../../sde
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:5:0 -> ../../sdf
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:6:0 -> ../../sdg
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:7:0 -> ../../sdh
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:8:0 -> ../../sdi
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:9:0 -> ../../sdj
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:10:0 -> ../../sdk
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:11:0 -> ../../sdl
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:12:0 -> ../../sdm
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:13:0 -> ../../sdn
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:14:0 -> ../../sdo
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:15:0 -> ../../sdp
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:16:0 -> ../../sdq
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:17:0 -> ../../sdr
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:18:0 -> ../../sds
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:19:0 -> ../../sdt
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:20:0 -> ../../sdu
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:21:0 -> ../../sdv
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:22:0 -> ../../sdw
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:23:0 -> ../../sdx
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:24:0 -> ../../sdy
lrwxrwxrwx 1 root root    9 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:25:0 -> ../../sdz
lrwxrwxrwx 1 root root   10 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:26:0 -> ../../sdaa
lrwxrwxrwx 1 root root   10 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:27:0 -> ../../sdab
lrwxrwxrwx 1 root root   10 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:28:0 -> ../../sdac
lrwxrwxrwx 1 root root   10 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:29:0 -> ../../sdad
lrwxrwxrwx 1 root root   10 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:30:0 -> ../../sdae
lrwxrwxrwx 1 root root   10 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:31:0 -> ../../sdaf
lrwxrwxrwx 1 root root   10 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:32:0 -> ../../sdag
lrwxrwxrwx 1 root root   10 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:33:0 -> ../../sdah
lrwxrwxrwx 1 root root   10 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:34:0 -> ../../sdai
lrwxrwxrwx 1 root root   10 Jun 25 14:25 pci-0000:05:00.0-scsi-0:2:35:0 -> ../../sdaj

Create zpool partition with 2 parity disks

zpool create <zpool_name> <raid_type_1> <device_X> ... <device_X(n)> ... <raid_type_N> <device_Y> ... <device_Y(n)>

  • raidz2 will be used in order to create 2 parity disks.
We can create several raidz2 in the same zpool
In order to improve performance, is strictly recommended to use power of 2 disks on each raidz2 (i.e. 4,8,16,etc.)
  • Example:
zpool create -f pool raidz2 "pci-0000:05:00.0-scsi-0:2:0:0" "pci-0000:05:00.0-scsi-0:2:1:0" "pci-0000:05:00.0-scsi-0:2:2:0" "pci-0000:05:00.0-scsi-0:2:3:0" "pci-0000:05:00.0-scsi-0:2:4:0" "pci-0000:05:00.0-scsi-0:2:5:0" "pci-0000:05:00.0-scsi-0:2:6:0" "pci-0000:05:00.0-scsi-0:2:7:0" "pci-0000:05:00.0-scsi-0:2:8:0" "pci-0000:05:00.0-scsi-0:2:9:0" "pci-0000:05:00.0-scsi-0:2:10:0" "pci-0000:05:00.0-scsi-0:2:11:0" "pci-0000:05:00.0-scsi-0:2:12:0" "pci-0000:05:00.0-scsi-0:2:13:0" "pci-0000:05:00.0-scsi-0:2:14:0" "pci-0000:05:00.0-scsi-0:2:15:0" "pci-0000:05:00.0-scsi-0:2:16:0" "pci-0000:05:00.0-scsi-0:2:17:0" raidz2 "pci-0000:05:00.0-scsi-0:2:18:0" "pci-0000:05:00.0-scsi-0:2:19:0" "pci-0000:05:00.0-scsi-0:2:20:0" "pci-0000:05:00.0-scsi-0:2:21:0" "pci-0000:05:00.0-scsi-0:2:22:0" "pci-0000:05:00.0-scsi-0:2:23:0" "pci-0000:05:00.0-scsi-0:2:24:0" "pci-0000:05:00.0-scsi-0:2:25:0" "pci-0000:05:00.0-scsi-0:2:26:0" "pci-0000:05:00.0-scsi-0:2:27:0" "pci-0000:05:00.0-scsi-0:2:28:0" "pci-0000:05:00.0-scsi-0:2:29:0" "pci-0000:05:00.0-scsi-0:2:30:0" "pci-0000:05:00.0-scsi-0:2:31:0" "pci-0000:05:00.0-scsi-0:2:32:0" "pci-0000:05:00.0-scsi-0:2:33:0" "pci-0000:05:00.0-scsi-0:2:34:0" "pci-0000:05:00.0-scsi-0:2:35:0"

zpool status

[root@dc106 vpool1]# zpool status
  pool: dcpool
 state: ONLINE
  scan: none requested
config:

	NAME                                STATE     READ WRITE CKSUM
	dcpool                              ONLINE       0     0     0
	  raidz2-0                          ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:0:0   ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:1:0   ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:2:0   ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:3:0   ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:4:0   ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:5:0   ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:6:0   ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:7:0   ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:8:0   ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:9:0   ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:10:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:11:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:12:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:13:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:14:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:15:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:16:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:17:0  ONLINE       0     0     0
	  raidz2-1                          ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:18:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:19:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:20:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:21:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:22:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:23:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:24:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:25:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:26:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:27:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:28:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:29:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:30:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:31:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:32:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:33:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:34:0  ONLINE       0     0     0
	    pci-0000:05:00.0-scsi-0:2:35:0  ONLINE       0     0     0
 
 errors: No known data errors

Tuning up ZFS

  • Specify zfs_arc_min and zfs_arc_max values that will be used. Also zfs_txg_timeout can be specified.
zfs_arc_min: Determines the maximum size of the ZFS Adjustable Replacement Cache (ARC).
zfs_arc_max: Determines the maximum size of the ZFS Adjustable Replacement Cache (ARC).
zfs_txg_timeout: Specifies the transaction group timeout
  • Setting ZFS tunables is specific to each environment:
Usually we will set ARC cache min to 33% and max to 75% of installed RAM. More RAM is better with ZFS, but be careful with current Linux bugs for ZFS. Actually our max is 50%
We will set transaction group timeout to 5 seconds to prevent the volume from appearing to freeze due to a large batch of writes. 5 seconds is the default, but is safer to force this.
  • Example:
echo "options zfs zfs_arc_min=8589934592 zfs_arc_max=25769803776 zfs_txg_timeout=5" > /etc/modprobe.d/zfs.conf

Links of Interest