Chapter 5. Managing Multipath I/O for Devices

This section describes how to manage failover and path load balancing for multiple paths between the servers and block storage devices.

5.1. Understanding Multipathing

5.1.1. What Is Multipathing?

Multipathing is the ability of a server to communicate with the same physical or logical block storage device across multiple physical paths between the host bus adapters in the server and the storage controllers for the device, typically in Fibre Channel (FC) or iSCSI SAN environments. You can also achieve multiple connections with direct attached storage when multiple channels are available.

5.1.2. Benefits of Multipathing

Linux multipathing provides connection fault tolerance and can provide load balancing across the active connections. When multipathing is configured and running, it automatically isolates and identifies device connection failures, and reroutes I/O to alternate connections.

Typical connection problems involve faulty adapters, cables, or controllers. When you configure multipath I/O for a device, the multipath driver monitors the active connection between devices. When the multipath driver detects I/O errors for an active path, it fails over the traffic to the device’s designated secondary path. When the preferred path becomes healthy again, control can be returned to the preferred path.

5.2. Planning for Multipathing

5.2.1. Guidelines for Multipathing

Use the guidelines in this section when planning your multipath I/O solution.

5.2.1.1. Prerequisites

  • Multipathing is managed at the device level.

  • The storage array you use for the multipathed device must support multipathing. For more information, see Section 5.2.9, “Supported Storage Arrays for Multipathing”.

  • You need to configure multipathing only if multiple physical paths exist between host bus adapters in the server and host bus controllers for the block storage device. You configure multipath for the logical device as seen by the server.

5.2.1.2. Vendor-Provided Multipath Solutions

For some storage arrays, the vendor will provide its own multipathing software to manage multipathing for the array’s physical and logical devices. In this case, you should follow the vendor’s instructions for configuring multipathing for those devices.

5.2.1.3. Disk Management Tasks

Perform the following disk management tasks before you attempt to configure multipathing for a physical or logical device that has multiple paths:

  • Use third-party tools to carve physical disks into smaller logical disks.

  • Use third-party tools to partition physical or logical disks. If you change the partitioning in the running system, the Device Mapper Multipath (DM-MP) module does not automatically detect and reflect these changes. DM-MP must be reinitialized, which usually requires a reboot.

  • Use third-party SAN array management tools to create and configure hardware RAID devices.

  • Use third-party SAN array management tools to create logical devices such as LUNs. Logical device types that are supported for a given array depend on the array vendor.

5.2.1.4. Software RAIDs

The Linux software RAID management software runs on top of multipathing. For each device that has multiple I/O paths and that you plan to use in a software RAID, you must configure the device for multipathing before you attempt to create the software RAID device. Automatic discovery of multipathed devices is not available. The software RAID is not aware of the multipathing management running underneath.

5.2.1.5. High-Availability Solutions

High-availability solutions for clustering typically run on top of the multipathing server. For example, the Distributed Replicated Block Device (DRBD) high-availability solution for mirroring devices across a LAN runs on top of multipathing. For each device that has multiple I/O paths and that you plan to use in a DRDB solution, you must configure the device for multipathing before you configure DRBD.

5.2.1.6. Volume Managers

Volume managers such as LVM2 and EVMS run on top of multipathing. You must configure multipathing for a device before you use LVM2 or EVMS to create segment managers and file systems on it.

5.2.1.7. Virtualization Environments

When using multipathing in a virtualization environment, the multipathing is controlled in the host server environment. Configure multipathing for the device before you assign it to a virtual guest machine.

5.2.2. Using Multipathed Devices Directly or in EVMS

If you want to use the entire LUN directly (for example, if you are using the SAN features to partition your storage), you can simply use the /dev/disk/by-id/xxx names directly for mkfs, fstab, your application, etc.

If the user_friendly_names option or alias option is enabled in the /etc/multipath.conf file, you can optionally use the /dev/mapper/mpathn device name because this name is aliased to the devices ID. Some limitations apply; for information, see Section 5.4.5.3, “Configuring User-Friendly Names or Alias Names in /etc/multipath.conf”.

5.2.3. Using LVM2 on Multipath Devices

By default, LVM2 does not recognize multipathed devices. To make LVM2 recognize the multipathed devices as possible physical volumes, you must modify /etc/lvm/lvm.conf. It is important to modify it in a way that it does not scan and use the physical paths, but only accesses the multipath I/O storage through the multipath I/O layer.

To modify /etc/lvm/lvm.conf for multipath use:

  1. Open the /etc/lvm/lvm.conf file in a text editor.

  2. Change the filter and types entry in /etc/lvm/lvm.conf as follows:

    filter = [ "a|/dev/disk/by-id/.*|", "r|.*|" ]
    

    This allows LVM2 to scan only the by-id paths and reject everything else.

  3. If you are also using LVM2 on non-multipathed devices, make the necessary adjustments in the filter and types entries to suit your setup. Otherwise, the other LVM devices are not visible with a pvscan after you modify the lvm.conf file for multipathing.

    You want only those devices that are configured with LVM to be included in the LVM cache, so make sure you are specific about which other non-multipathed devices are included by the filter.

    For example, if your local disk is /dev/sda and all SAN devices are /dev/sdb and above, specify the local and multipathing paths in the filter as follows:

    filter = [ "a|/dev/sda.*|", "a|/dev/disk/by-id/.*|", "r|.*|" ]
    types = [ "device-mapper", 253 ]
    
  4. Save the file.

  5. Add dm-multipath to /etc/sysconfig/kernel:INITRD_MODULES.

  6. Make a new initrd to ensure that the Device Mapper Multipath services are loaded with the changed settings. Enter

    mkinitrd -f mpath
    
  7. Reboot the server to apply the changes.

5.2.4. Using mdadm with Multipath Devices

The mdadm tool requires that the devices be accessed by the ID rather than by the device node path. Therefore, the DEVICE entry in /etc/mdadm.conf should be set as follows:

DEVICE /dev/disk/by-id/*

This is the default handling in SUSE Linux Enterprise Server 10 and later.

5.2.5. Using --noflush with Multipath Devices

The option --noflush should always be used when running on multipath devices.

For example, in scripts where you perform a table reload, you use the --noflush option on resume to ensure that any outstanding I/O is not flushed, because you need the multipath topology information.

load
resume --noflush

5.2.6. SAN Timeout Settings When the Root Device Is Multipathed

A system with root (/) on a multipath device might stall when all paths have failed and are removed from the system because a dev_loss_tmo time-out is received from the storage subsystem (such as Fibre Channel storage arrays).

If the system device is configured with multiple paths and the multipath no_path_retry setting is active, you should modify the storage subsystem’s dev_loss_tmo setting accordingly to ensure that no devices are removed during an all-paths-down scenario. We strongly recommend that you set the dev_loss_tmo value to be equal to or higher than the no_path_retry setting from multipath.

The recommended setting for the storage subsystem’s dev_los_tmo is:

<dev_loss_tmo> = <no_path_retry> * <polling_interval>

where the following definitions apply for the multipath values:

  • no_path_retry is the number of retries for multipath I/O until the path is considered to be lost, and queuing of IO is stopped.

  • polling_interval is the time in seconds between path checks.

Each of these multipath values should be set from the /etc/multipath.conf configuration file. For information, see Section 5.4.5, “Creating and Configuring the /etc/multipath.conf File”.

5.2.7. Partitioning Multipath Devices

SUSE Linux Enterprise Server 10

In SUSE Linux Enterprise Server 10, the kpartx software is used in the /etc/init.d/boot.multipath to add symlinks to the /dev/dm-* line in the multipath.conf configuration file for any newly created partitions without requiring a reboot. This triggers udevd to fill in the /dev/disk/by-* symlinks. The main benefit is that you can call kpartx with the new parameters without having to reboot the server.

SUSE Linux Enterprise Server 9

In SUSE Linux Enterprise Server 9, it is not possible to partition multipath I/O devices themselves. If the underlying physical device is already partitioned, the multipath I/O device reflects those partitions and the layer provides /dev/disk/by-id/<name>p1 ... pN devices so you can access the partitions through the multipath I/O layer. As a consequence, the devices need to be partitioned prior to enabling multipath I/O. If you change the partitioning in the running system, DM-MP does not automatically detect and reflect these changes. The device must be reinitialized, which usually requires a reboot.

5.2.8. Supported Architectures for Multipath I/O

The multipathing drivers and tools in SUSE Linux Enterprise Server 10 support all seven of the supported processor architectures: IA32, AMD64/EM64T, IPF/IA64, p-Series (32-bit/64-bit), z-Series (31-bit and 64-bit).

5.2.9. Supported Storage Arrays for Multipathing

The multipathing drivers and tools in SUSE Linux Enterprise Server 10 support most storage arrays. The storage array that houses the multipathed device must support multipathing in order to use the multipathing drivers and tools. Some storage array vendors provide their own multipathing management tools. Consult the vendor’s hardware documentation to determine what settings are required.

5.2.9.1. Storage ArraysThat Are Automatically Detected for Multipathing

The multipath-tools package automatically detects the following storage arrays:

3PARdata VV
Compaq HSV110
Compaq MSA1000
DDN SAN MultiDirector
DEC HSG80
EMC CLARiiON CX
EMC Symmetrix
FSC CentricStor
Hewlett Packard (HP) A6189A
HP HSV110
HP HSV210
HP Open-
Hitachi DF400
Hitachi DF500
Hitachi DF600
IBM 3542
IBM ProFibre 4000R
NetApp
SGI TP9100
SGI TP9300
SGI TP9400
SGI TP9500
STK OPENstorage DS280
Sun StorEdge 3510
Sun T4

In general, most other storage arrays should work. When storage arrays are automatically detected, the default settings for multipathing apply. If you want non-default settings, you must manually create and configure the /etc/multipath.conf file. For information, see Section 5.4.5, “Creating and Configuring the /etc/multipath.conf File”.

Hardware that is not automatically detected requires an appropriate entry for configuration in the DEVICES section of the /etc/multipath.conf file. In this case, you must manually create and configure the configuration file. For information, see Section 5.4.5, “Creating and Configuring the /etc/multipath.conf File”.

Consider the following caveats:

5.2.9.2. Tested Storage Arrays for Multipathing Support

The following storage arrays have been tested with SUSE Linux Enterprise Server:

EMC
Hitachi
Hewlett-Packard/Compaq
IBM
NetApp
SGI

Most other vendors’ storage arrays should also work. Consult your vendor’s documentation for guidance. For a list of the default storage arrays recognized by the multipath-tools package, see Section 5.2.9.1, “Storage ArraysThat Are Automatically Detected for Multipathing”.

5.2.9.3. Storage Arrays that Require Specific Hardware Handlers

Storage arrays that require special commands on failover from one path to the other or that require special nonstandard error handling might require more extensive support. Therefore, the Device Mapper Multipath service has hooks for hardware handlers. For example, one such handler for the EMC CLARiiON CX family of arrays is already provided.

[Important]

Consult the hardware vendor’s documentation to determine if its hardware handler must be installed for Device Mapper Multipath.

The multipath -t command shows an internal table of storage arrays that require special handling with specific hardware handlers. The displayed list is not an exhaustive list of supported storage arrays. It lists only those arrays that require special handling and that the multipath-tools developers had access to during the tool development.

[Important]

Arrays with true active/active multipath support do not require special handling, so they are not listed for the multipath -t command.

A listing in the multipath -t table does not necessarily mean that SUSE Linux Enterprise Server was tested on that specific hardware. For a list of tested storage arrays, see Section 5.2.9.2, “Tested Storage Arrays for Multipathing Support”.

5.3. Multipath Management Tools

The multipathing support in SUSE Linux Enterprise Server 10 is based on the Device Mapper Multipath module of the Linux 2.6 kernel and the multipath-tools userspace package. You can use mdadm to view the status of multipathed devices.

5.3.1. Device Mapper Multipath Module

The Device Mapper Multipath (DM-MP) module provides the multipathing capability for Linux. DM-MP is the preferred solution for multipathing on SUSE Linux Enterprise Server 10. It is the only multipathing option shipped with the product that is completely supported by Novell and SUSE.

DM-MP features automatic configuration of the multipathing subsystem for a large variety of setups. Configurations of up to 8 paths to each device are supported. Configurations are supported for active/passive (one path active, others passive) or active/active (all paths active with round-robin load balancing).

The DM-MP framework is extensible in two ways:

The user-space component of DM-MP takes care of automatic path discovery and grouping, as well as automated path retesting, so that a previously failed path is automatically reinstated when it becomes healthy again. This minimizes the need for administrator attention in a production environment.

DM-MP protects against failures in the paths to the device, and not failures in the device itself. If one of the active paths is lost (for example, a network adapter breaks or a fiber-optic cable is removed), I/O is redirected to the remaining paths. If the configuration is active/passive, then the path fails over to one of the passive paths. If you are using the round-robin load-balancing configuration, the traffic is balanced across the remaining healthy paths. If all active paths fail, inactive secondary paths must be waked up, so failover occurs with a delay of approximately 30 seconds.

If a disk array has more than one storage processor, make sure that the SAN switch has a connection to the storage processor that owns the LUNs you want to access. On most disk arrays, all LUNs belong to both storage processors, so both connections are active.

[Note]

On some disk arrays, the storage array manages the traffic through storage processors so that it presents only one storage processor at a time. One processor is active and the other one is passive until there is a failure. If you are connected to the wrong storage processor (the one with the passive path) you might not see the expected LUNs, or you might see the LUNs but get errors when trying to access them.

Table 5.1. Multipath I/O Features of Storage Arrays

Features of Storage Arrays

Description

Active/passive controllers

One controller is active and serves all LUNs. The second controller acts as a standby. The second controller also presents the LUNs to the multipath component so that the operating system knows about redundant paths. If the primary controller fails, the second controller takes over, and it serves all LUNs.

In some arrays, the LUNs can be assigned to different controllers. A given LUN is assigned to one controller to be its active controller. One controller does the disk I/O for any given LUN at a time, and the second controller is the standby for that LUN. The second controller also presents the paths, but disk I/O is not possible. Servers that use that LUN are connected to the LUN’s assigned controller. If the primary controller for a set of LUNs fails, the second controller takes over, and it serves all LUNs.

Active/active controllers

Both controllers share the load for all LUNs, and can process disk I/O for any given LUN. If one controller fails, the second controller automatically handles all traffic.

Load balancing

The Device Mapper Multipath driver automatically load balances traffic across all active paths.

Controller failover

When the active controller fails over to the passive, or standby, controller, the Device Mapper Multipath driver automatically activates the paths between the host and the standby, making them the primary paths.

Boot/Root device support

Multipathing is supported for the root (/) device in SUSE Linux Enterprise Server 10 and later. The host server must be connected to the currently active controller and storage processor for the boot device. The /boot partition must be on a separate, non-multipathed partition. Otherwise, no boot loader is written. For information, see Section 5.7, “Configuring Multipath I/O for the Root Device”.


Device Mapper Multipath detects every path for a multipathed device as a separate SCSI device. The SCSI device names take the form /dev/sdN, where N is an autogenerated letter for the device, beginning with a and issued sequentially as the devices are created, such as /dev/sda, /dev/sdb, and so on. If the number of devices exceeds 26, the letters are duplicated so that the next device after /dev/sdz will be named /dev/sdaa, /dev/sdab, and so on.

If multiple paths are not automatically detected, you can configure them manually in the /etc/multipath.conf file. The multipath.conf file does not exist until you create and configure it. For information, see Section 5.4.5, “Creating and Configuring the /etc/multipath.conf File”.

5.3.2. Multipath I/O Management Tools

The multipath-tools user-space package takes care of automatic path discovery and grouping. It automatically tests the path periodically, so that a previously failed path is automatically reinstated when it becomes healthy again. This minimizes the need for administrator attention in a production environment.

Table 5.2. Tools in the multipath-tools Package

Tool

Description

multipath

Scans the system for multipathed devices and assembles them.

multipathd

Waits for maps events, then executes multipath.

devmap-name

Provides a meaningful device name to udev for device maps (devmaps).

kpartx

Maps linear devmaps to partitions on the multipathed device, which makes it possible to create multipath monitoring for partitions on the device.


The file list for a package can vary for different server architectures. For a list of files included in the multipath-tools package, go to the SUSE Linux Enterprise Server Technical SpecificationsPackage Descriptions Web page, find your architecture and select Packages Sorted by Name, then search on “multipath-tools” to find the package list for that architecture.

You can also determine the file list for an RPM file by querying the package itself: using the rpm -ql or rpm -qpl command options.

  • To query an installed package, enter

    rpm -ql <package_name>
    
  • To query a package not installed, enter

    rpm -qpl <URL_or_path_to_package>
    

To check that the multipath-tools package is installed, do the following:

  • Ensure that the multipath-tools package is installed by entering the following at a terminal console prompt:

    rpm -q multipath-tools
    

    If it is installed, the response repeats the package name and provides the version information, such as:

    multipath-tools-04.7-34.23
    

    If it is not installed, the response reads:

    package multipath-tools is not installed
    

5.3.3. Using mdadm for Multipathed Devices

In SUSE Linux Enterprise Server 10, Udev is the default device handler, and devices are automatically known to the system by the Worldwide ID instead of by the device node name. This resolves problems in previous releases where mdadm.conf and lvm.conf did not properly recognize multipathed devices.

Just as for LVM2, mdadm requires that the devices be accessed by the ID rather than by the device node path. Therefore, the DEVICE entry in /etc/mdadm.conf should be set as follows:

DEVICE /dev/disk/by-id/*

This is the default handling in SUSE Linux Enterprise Server 10 and later, as noted above.

To verify that mdadm is installed:

  • Ensure that the mdadm package is installed by entering the following at a terminal console prompt:

    rpm -q mdadm
    

    If it is installed, the response repeats the package name and provides the version information. For example:

    mdadm-2.6-0.11
    

    If it is not installed, the response reads:

    package mdadm is not installed
    

For information about modifying the /etc/lvm/lvm.conf file, see Section 5.2.3, “Using LVM2 on Multipath Devices”.

5.3.4. The Linux multipath(8) Command

Use the Linux multipath(8) command to configure and manage multipathed devices.

Syntax

General syntax for the multipath(8) command:

multipath [-v verbosity] [-d] [-h|-l|-ll|-f|-F] [-p failover | multibus | group_by_serial | group_by_prio| group_by_node_name ] 

Options

multipath

Configure all multipath devices.

multipath devicename

Configures a specific multipath device.

Replace devicename with the device node name such as /dev/sdb (as shown by udev in the $DEVNAME variable), or in the major:minor format.

multipath -f

Selectively suppresses a multipath map, and its device-mapped partitions.

multipath -d

Dry run. Displays potential multipath devices, but does not create any devices and does not update device maps.

multipath -v2 -d

Displays multipath map information for potential multipath devices in a dry run. The -v2 option shows only local disks. This verbosity level prints the created or updated multipath names only for use to feed other tools like kpartx.

There is no output if the devices already exists and there are no changes. Use multipath -ll to see the status of configured multipath devices.

multipath -v2 devicename

Configures a specific potential multipath device and displays multipath map information for it. This verbosity level prints only the created or updated multipath names for use to feed other tools like kpartx.

There is no output if the device already exists and there are no changes. Use multipath -ll to see the status of configured multipath devices.

Replace devicename with the device node name such as /dev/sdb (as shown by udev in the $DEVNAME variable), or in the major:minor format.

multipath -v3

Configures potential multipath devices and displays multipath map information for them. This verbosity level prints all detected paths, multipaths, and device maps. Both wwid and devnode blacklisted devices are displayed.

multipath -v3 devicename

Configures a specific potential multipath device and displays information for it. The -v3 option shows the full path list. This verbosity level prints all detected paths, multipaths, and device maps. Both wwid and devnode blacklisted devices are displayed.

Replace devicename with the device node name such as /dev/sdb (as shown by udev in the $DEVNAME variable), or in the major:minor format.

multipath -ll

Display the status of all multipath devices.

multipath -ll devicename

Displays the status of a specified multipath device.

Replace devicename with the device node name such as /dev/sdb (as shown by udev in the $DEVNAME variable), or in the major:minor format.

multipath -F

Flushes all unused multipath device maps. This unresolves the multiple paths; it does not delete the devices.

multipath -F devicename

Flushes unused multipath device maps for a specified multipath device. This unresolves the multiple paths; it does not delete the device.

Replace devicename with the device node name such as /dev/sdb (as shown by udev in the $DEVNAME variable), or in the major:minor format.

multipath -p [ failover | multibus | group_by_serial | group_by_prio | group_by_node_name ]

Sets the group policy by specifying one of the group policy options that are described in Table 5.3, “Group Policy Options for the multipath -p Command”:

Table 5.3. Group Policy Options for the multipath -p Command

Policy Option

Description

failover

One path per priority group. You can use only one path at a time.

multibus

All paths in one priority group.

group_by_serial

One priority group per detected SCSI serial number (the controller node worldwide number).

group_by_prio

One priority group per path priority value. Paths with the same priority are in the same priority group. Priorities are determined by callout programs specified as a global, per-controller, or per-multipath option in the /etc/multipath.conf configuration file.

group_by_node_name

One priority group per target node name. Target node names are fetched in the /sys/class/fc_transport/target*/node_name location.


5.4. Configuring the System for Multipathing

5.4.1. Preparing SAN Devices for Multipathing

Before configuring multipath I/O for your SAN devices, prepare the SAN devices, as necessary, by doing the following:

  • Configure and zone the SAN with the vendor’s tools.

  • Configure permissions for host LUNs on the storage arrays with the vendor’s tools.

  • Install the Linux HBA driver module. Upon module installation, the driver automatically scans the HBA to discover any SAN devices that have permissions for the host. It presents them to the host for further configuration.

    [Note]

    Ensure that the HBA driver you are using does not have native multipathing enabled.

    See the vendor’s specific instructions for more details.

  • After the driver module is loaded, discover the device nodes assigned to specific array LUNs or partitions.

  • If the SAN device will be used as the root device on the server, modify the timeout settings for the device as described in Section 5.2.6, “SAN Timeout Settings When the Root Device Is Multipathed”.

If the LUNs are not seen by the HBA driver, lsscsi can be used to check whether the SCSI devices are seen correctly by the operating system. When the LUNs are not seen by the HBA driver, check the zoning setup of the SAN. In particular, check whether LUN masking is active and whether the LUNs are correctly assigned to the server.

If the LUNs are seen by the HBA driver, but there are no corresponding block devices, additional kernel parameters are needed to change the SCSI device scanning behavior, such as to indicate that LUNs are not numbered consecutively. For information, see Options for SCSI Device Scanning in the Novell Support Knowledgebase.

5.4.2. Partitioning Multipathed Devices

Partitioning devices that have multiple paths is not recommended, but it is supported.

5.4.2.1. SUSE Linux Enterprise Server 10

In SUSE Linux Enterprise Server 10, you can use the kpartx tool to create partitions on multipathed devices without rebooting. You can also partition the device before you attempt to configure multipathing by using the Partitioner function in YaST2 or by using a third-party partitioning tool.

5.4.2.2. SUSE Linux Enterprise Server 9

In SUSE Linux Enterprise Server 9, if you want to partition the device, you should configure its partitions before you attempt to configure multipathing by using the Partitioner function in YaST2 or by using a third-party partitioning tool. This is necessary because partitioning an existing multipathed device is not supported. Partitioning operations on multipathed devices fail if attempted.

If you configure partitions for a device, DM-MP automatically recognizes the partitions and indicates them by appending p1 to pn to the device’s ID, such as

/dev/disk/by-id/26353900f02796769p1

To partition multipathed devices, you must disable the DM-MP service, partition the normal device node (such as /dev/sdc), then reboot to allow the DM-MP service to see the new partitions.

5.4.3. Configuring the Server for Multipathing

The system must be manually configured to automatically load the device drivers for the controllers to which the multipath I/O devices are connected within the INITRD. Therefore add the needed driver module to the variable INITRD_MODULES in the file /etc/sysconfig/kernel.

For example, if your system contains a RAID controller accessed by the cciss driver and multipathed devices connected to a QLogic controller accessed by the driver qla2xxx, this entry would look like:

   INITRD_MODULES="cciss"

Because the QLogic driver is not automatically loaded on start-up, add it here:

   INITRD_MODULES="cciss qla23xx"

After having changed /etc/sysconfig/kernel, you must re-create the INITRD on your system with the command mkinitrd, then reboot in order for the changes to take effect.

When you are using LILO as a boot manager, reinstall it with the command /sbin/lilo. No further action is required if you are using GRUB.

5.4.4. Adding multipathd to the Boot Sequence

Use either of the methods in this section to add multipath I/O services (multipathd) to the boot sequence.

5.4.4.1. YaST

  1. In YaST, click System > System Services (Runlevel) > Simple Mode.

  2. Select multipathd, then click Enable.

  3. Click OK to acknowledge the service startup message.

  4. Click Finish, then click Yes.

    The changes do not take affect until the server is restarted.

5.4.4.2. Command Line

  1. Open a terminal console, then log in as the root user or equivalent.

  2. At the terminal console prompt, enter

    insserv multipathd
    

5.4.5. Creating and Configuring the /etc/multipath.conf File

The /etc/multipath.conf file does not exist unless you create it. The /usr/share/doc/packages/multipath-tools/multipath.conf.synthetic file contains a sample /etc/multipath.conf file that you can use as a guide for multipath settings. See /usr/share/doc/packages/multipath-tools/multipath.conf.annotated for a template with extensive comments for each of the attributes and their options.

5.4.5.1. Creating the multipath.conf File

If the /etc/multipath.conf file does not exist, copy the example to create the file:

  1. In a terminal console, log in as the root user.

  2. Enter the following command (all on one line, of course) to copy the template:

    cp /usr/share/doc/packages/multipath-tools/multipath.conf.synthetic /etc/multipath.conf
    
  3. Use the /usr/share/doc/packages/multipath-tools/multipath.conf.annotated file as a reference to determine how to configure multipathing for your system.

  4. Make sure there is an appropriate device entry for your SAN. Most vendors provide documentation on the proper setup of the device section.

    The /etc/multipath.conf file requires a different device section for different SANs. If you are using a storage subsystem that is automatically detected (see Section 5.2.9.2, “Tested Storage Arrays for Multipathing Support”), the default entry for that device can be used; no further configuration of the /etc/multipath.conf file is required.

  5. Save the file.

5.4.5.2. Verifying the Setup in the etc/multipath.conf File

After setting up the configuration, you can perform a dry run by entering

multipath -v2 -d

This command scans the devices, then displays what the setup would look like. The output is similar to the following:

26353900f02796769
[size=127 GB]
[features="0"]
[hwhandler="1    emc"] 
\_ round-robin 0 [first]
  \_ 1:0:1:2 sdav 66:240  [ready ]
  \_ 0:0:1:2 sdr  65:16   [ready ]
\_ round-robin 0 
  \_ 1:0:0:2 sdag 66:0    [ready ]
  \_ 0:0:0:2 sdc   8:32   [ready ] 

Paths are grouped into priority groups. Only one priority group is in active use at a time. To model an active/active configuration, all paths end in the same group. To model active/passive configuration, the paths that should not be active in parallel are placed in several distinct priority groups. This normally happens automatically on device discovery.

The output shows the order, the scheduling policy used to balance I/O within the group, and the paths for each priority group. For each path, its physical address (host:bus:target:lun), device node name, major:minor number, and state is shown.

By using a verbosity level of -v3 in the dry run, you can see all detected paths, multipaths, and device maps. Both wwid and device node blacklisted devices are displayed.

multipath -v3 d

The following is an example of -v3 output on a 64-bit SLES server with two Qlogic HBA connected to a Xiotech Magnitude 3000 SAN. Some multiple entries have been omitted to shorten the example.

dm-22: device node name blacklisted
< content omitted >
loop7: device node name blacklisted
< content omitted >
md0: device node name blacklisted
< content omitted >
dm-0: device node name blacklisted
sdf: not found in pathvec
sdf: mask = 0x1f
sdf: dev_t = 8:80
sdf: size = 105005056
sdf: subsystem = scsi
sdf: vendor = XIOtech
sdf: product = Magnitude 3D
sdf: rev = 3.00
sdf: h:b:t:l = 1:0:0:2
sdf: tgt_node_name = 0x202100d0b2028da
sdf: serial = 000028DA0014
sdf: getuid = /sbin/scsi_id -g -u -s /block/%n (config file default)
sdf: uid = 200d0b2da28001400 (callout)
sdf: prio = const (config file default)
sdf: const prio = 1
< content omitted >
ram15: device node name blacklisted
< content omitted >
===== paths list =====
uuid              hcil    dev dev_t pri dm_st  chk_st  vend/prod/rev
200d0b2da28001400 1:0:0:2 sdf 8:80  1   [undef][undef] XIOtech,Magnitude 3D
200d0b2da28005400 1:0:0:1 sde 8:64  1   [undef][undef] XIOtech,Magnitude 3D
200d0b2da28004d00 1:0:0:0 sdd 8:48  1   [undef][undef] XIOtech,Magnitude 3D
200d0b2da28001400 0:0:0:2 sdc 8:32  1   [undef][undef] XIOtech,Magnitude 3D
200d0b2da28005400 0:0:0:1 sdb 8:16  1   [undef][undef] XIOtech,Magnitude 3D
200d0b2da28004d00 0:0:0:0 sda 8:0   1   [undef][undef] XIOtech,Magnitude 3D
params = 0 0 2 1 round-robin 0 1 1 8:80 1000 round-robin 0 1 1 8:32 1000
status = 2 0 0 0 2 1 A 0 1 0 8:80 A 0 E 0 1 0 8:32 A 0
sdf: mask = 0x4
sdf: path checker = directio (config file default)
directio: starting new request
directio: async io getevents returns 1 (errno=Success)
directio: io finished 4096/0
sdf: state = 2
< content omitted >

5.4.5.3. Configuring User-Friendly Names or Alias Names in /etc/multipath.conf

A multipath device can be identified by its WWID, by a user-friendly name, or by an alias that you assign for it. Table 5.4, “Comparison of Multipath Device Name Types” describes the types of device names that can be used for a device in the /etc/multipath.conf file.

Table 5.4. Comparison of Multipath Device Name Types

Name Types

Description

WWID (default)

The WWID (Worldwide Identifier) is an identifier for the multipath device that is guaranteed to be globally unique and unchanging. The default name used in multipathing is the ID of the logical unit as found in the /dev/disk/by-id directory. Because device node names in the form of /dev/sdn and /dev/dm-n can change on reboot, referring to multipath devices by their ID is preferred.

User-friendly

The Device Mapper Multipath device names in the /dev/mapper directory also reference the ID of the logical unit. These multipath device names are user-friendly names in the form of /dev/mapper/mpath<n>, such as /dev/mapper/mpath0. The names are unique and persistent because they use the /var/lib/multipath/bindings file to track the association between the UUID and user-friendly names.

Alias

An alias name is a globally unique name that the administrator provides for a multipath device. Alias names override the WWID and the user-friendly /dev/mapper/mpathN names.


The global multipath user_friendly_names option in the /etc/multipath.conf file is used to enable or disable the use of user-friendly names for multipath devices. If it is set to “no” (the default), multipath uses the WWID as the name of the device. If it is set to “yes”, multipath uses the /var/lib/multipath/bindings file to assign a persistent and unique name to the device in the form of mpath<n>. The bindings_file option in the /etc/multipath.conf file can be used to specify an alternate location for the bindings file.

The global multipath alias option in the /etc/multipath.conf file is used to explicitly assign a name to the device. If an alias name is set up for a multipath device, the alias is used instead of the WWID or the user-friendly name.

Using the user_friendly_names option can be problematic in the following situations:

  • Root Device Is Using Multipath: If the system root device is using multipath and you use the user_friendly_names option, the user-friendly settings in the /var/lib/multipath/bindings file are included in the initrd. If you later change the storage setup, such as by adding or removing devices, there is a mismatch between the bindings setting inside the initrd and the bindings settings in /var/lib/multipath/bindings.

    [Warning]

    A bindings mismatch between initrd and /var/lib/multipath/bindings can lead to a wrong assignment of mount points to devices, which can result in file system corruption and data loss.

    To avoid this problem, we recommend that you use the default WWID settings for the system root device. You can also use the alias option to override the user_friendly_names option for the system root device in the /etc/multipath.conf file.

    For example:

    multipaths {
           multipath {
                   wwid           36006048000028350131253594d303030
                   alias             mpatha
           }
           multipath {
                   wwid           36006048000028350131253594d303041
                   alias             mpathb
           }
           multipath {
                   wwid           36006048000028350131253594d303145
                   alias             mpathc
           }
           multipath {
                   wwid           36006048000028350131253594d303334
                   alias             mpathd
           }
    }
    
    [Important]

    We recommend that you do not use aliases for the system root device, because the ability to seamlessly switch off multipathing via the kernel command line is lost because the device name differs.

  • Mounting /var from Another Partition: The default location of the user_friendly_names configuration file is /var/lib/multipath/bindings. If the /var data is not located on the system root device but mounted from another partition, the bindings file is not available when setting up multipathing.

    Make sure that the /var/lib/multipath/bindings file is available on the system root device and multipath can find it. For example, this can be done as follows:

    1. Move the /var/lib/multipath/bindings file to /etc/multipath/bindings.

    2. Set the bindings_file option in the defaults section of /etc/multipath.conf to this new location. For example:

      defaults {
                     user_friendly_names yes
                     bindings_file "/etc/multipath/bindings"
      }
      
  • Multipath Is in the initrd: Even if the system root device is not on multipath, it is possible for multipath to be included in the initrd. For example, this can happen of the system root device is on LVM. If you use the user_friendly_names option and multipath is in the initrd, you should boot with the parameter multipath=off to avoid problems.

    This disables multipath only in the initrd during system boots. After the system boots, the boot.multipath and multipathd boot scripts are able to activate multipathing.

For an example of multipath.conf settings, see the /usr/share/doc/packages/multipath-tools/multipath.conf.synthetic file.

To enable user-friendly names or to specify aliases:

  1. In a terminal console, log in as the root user.

  2. Open the /etc/multipath.conf file in a text editor.

  3. (Optional) Modify the location of the /var/lib/multipath/bindings file.

    The alternate path must be available on the system root device where multipath can find it.

    1. Move the /var/lib/multipath/bindings file to /etc/multipath/bindings.

    2. Set the bindings_file option in the defaults section of /etc/multipath.conf to this new location. For example:

      defaults {
                     user_friendly_names yes
                     bindings_file "/etc/multipath/bindings"
      }
      
  4. (Optional) Enable user-friendly names:

    1. Uncomment the defaults section and its ending bracket.

    2. Uncomment the user_friendly_names option, then change its value from No to Yes.

      For example:

      ## Use user friendly names, instead of using WWIDs as names.
      defaults {
        user_friendly_names yes
      }
      
  5. (Optional) Specify your own names for devices by using the alias option in the multipath section.

    For example:

    ## Use alias names, instead of using WWIDs as names.
    multipaths {
           multipath {
                   wwid           36006048000028350131253594d303030
                   alias             mpatha
           }
           multipath {
                   wwid           36006048000028350131253594d303041
                   alias             mpathb
           }
           multipath {
                   wwid           36006048000028350131253594d303145
                   alias             mpathc
           }
           multipath {
                   wwid           36006048000028350131253594d303334
                   alias             mpathd
           }
    }
    
  6. Save your changes, then close the file.

5.4.5.4. Blacklisting Non-Multipathed Devices in /etc/multipath.conf

The /etc/multipath.conf file should contain a blacklist section where all non-multipathed devices should be listed. For example, local IDE hard drives and floppy drives are not normally multipathed. If you have single-path devices that multipath is trying to manage and you want multipath to ignore them, put them in the blacklist section to resolve the problem.

[Note]

Beginning in SLES 10 SP3, the keyword devnode_blacklist has been deprecated and replaced with the keyword blacklist.

For example, to blacklist local devices and all arrays from the cciss driver from being managed by multipath, the blacklist section looks like this:

blacklist {
      wwid 26353900f02796769
      devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|sda)[0-9]*"
      devnode "^hd[a-z][0-9]*"
      devnode "^cciss!c[0-9]d[0-9].*"
}

You can also blacklist only the partitions from a driver instead of the entire array. For example, using the following regular expression would blacklist only partitions from the cciss driver and not the entire array:

^cciss!c[0-9]d[0-9]*[p[0-9]*]

After you modify the /etc/multipath.conf file, you must run mkinitrd to re-create the INITRD on your system, then reboot in order for the changes to take effect.

Afterwards, the local devices should no longer be listed in the multipath maps when you issue the multipath -ll command.

5.4.5.5. Configuring Default Multipath Behavior in /etc/multipath.conf

The /etc/multipath.conf file should contain a defaults section where you can specify default behaviors. If the field is not otherwise specified in a device section, the default setting is applied for that SAN configuration.

The following defaults section specifies a simple failover policy:

defaults {
       multipath_tool  "/sbin/multipath -v0"
       udev_dir        /dev
       polling_interval 10
       default_selector        "round-robin 0"
       default_path_grouping_policy    failover
       default_getuid "/sbin/scsi_id -g -u -s /block/%n"
       default_prio_callout    "/bin/true"
       default_features        "0"
       rr_min_io               100
       failback                immediate
[Note]

In the default_getuid command line, use the path /sbin/scsi_id as shown in the above example instead of the sample path of /lib/udev/scsi_id that is found in the sample file /usr/share/doc/packages/multipath-tools/multipath.conf.synthetic file (and in the default and annotated sample files).

5.4.5.6. Applying Changes Made to the /etc/multipath.conf File

Changes to the /etc/multipath.conf file cannot take effect when multipathd is running. After you make changes, save and close the file, then do the following to apply the changes:

  1. Stop the multipathd service.

  2. Clear old multipath bindings by entering

    /sbin/multipath -F
    
  3. Create new multipath bindings by entering

    /sbin/multipath -v2 -l
    
  4. Start the multipathd service.

  5. Run mkinitrd to re-create the INITRD on your system, then reboot in order for the changes to take effect.

5.5. Enabling and Starting Multipath I/O Services

To start multipath services and enable them to start at reboot:

  1. Open a terminal console, then log in as the root user or equivalent.

  2. At the terminal console prompt, enter

    chkconfig multipathd on
    
    chkconfig boot.multipath on
    

If the boot.multipath service does not start automatically on system boot, do the following:

  1. Open a terminal console, then log in as the root user or equivalent.

  2. Enter

    /etc/init.d/boot.multipath start
    
    /etc/init.d/multipathd start
    

5.6. Configuring Path Failover Policies and Priorities

In a Linux host, when there are multiple paths to a storage controller, each path appears as a separate block device, and results in multiple block devices for single LUN. The Device Mapper Multipath service detects multiple paths with the same LUN ID, and creates a new multipath device with that ID. For example, a host with two HBAs attached to a storage controller with two ports via a single unzoned Fibre Channel switch sees four block devices: /dev/sda, /dev/sdb, /dev/sdc, and /dev/sdd. The Device Mapper Multipath service creates a single block device, /dev/mpath/mpath1 that reroutes I/O through those four underlying block devices.

This section describes how to specify policies for failover and configure priorities for the paths.

5.6.1. Configuring the Path Failover Policies

Use the multipath command with the -p option to set the path failover policy:

multipath devicename -p policy 

Replace policy with one of the following policy options:

Table 5.5. Group Policy Options for the multipath -p Command

Policy Option

Description

failover

One path per priority group.

multibus

All paths in one priority group.

group_by_serial

One priority group per detected serial number.

group_by_prio

One priority group per path priority value. Priorities are determined by callout programs specified as a global, per-controller, or per-multipath option in the /etc/multipath.conf configuration file.

group_by_node_name

One priority group per target node name. Target node names are fetched in the /sys/class/fc_transport/target*/node_name location.


5.6.2. Configuring Failover Priorities

You must manually enter the failover priorities for the device in the /etc/multipath.conf file. Examples for all settings and options can be found in the /usr/share/doc/packages/multipath-tools/multipath.conf.annotated file.

5.6.2.1. Understanding Priority Groups and Attributes

A priority group is a collection of paths that go to the same physical LUN. By default, I/O is distributed in a round-robin fashion across all paths in the group. The multipath command automatically creates priority groups for each LUN in the SAN based on the path_grouping_policy setting for that SAN. The multipath command multiplies the number of paths in a group by the group’s priority to determine which group is the primary. The group with the highest calculated value is the primary. When all paths in the primary group are failed, the priority group with the next highest value becomes active.

A path priority is an integer value assigned to a path. The higher the value, the higher is the priority. An external program is used to assign priorities for each path. For a given device, its paths with the same priorities belong to the same priority group.

Table 5.6. Multipath Attributes

Multipath Attribute

Description

Values

user_friendly_names

Specifies whether to use IDs or to use the /var/lib/multipath/bindings file to assign a persistent and unique alias to the multipath devices in the form of /dev/mapper/mpathN.

yes. Autogenerate user-friendly names as aliases for the multipath devices instead of the actual ID.

no. Default. Use the WWIDs shown in the /dev/disk/by-id/ location.

blacklist

Specifies the list of device names to ignore as non-multipathed devices, such as cciss, fd, hd, md, dm, sr, scd, st, ram, raw, loop.

For an example, see Section 5.4.5.4, “Blacklisting Non-Multipathed Devices in /etc/multipath.conf”.

blacklist_exceptions

Specifies the list of device names to treat as multipath devices even if they are included in the blacklist.

For an example, see the /usr/share/doc/packages/multipath-tools/multipath.conf.annotated file.

getuid

The default program and argumentss to call out to obtain a unique path identifier. Should be specified with an absolute path.

/sbin/scsi_id -g -u -s /block/%n

This is the default location and arguments.

Example:

getuid "/sbin/scsi_id -g -u -d /dev/%n"

path_grouping_policy

Specifies the path grouping policy for a multipath device hosted by a given controller.

failover. One path is assigned per priority group so that only one path at a time is used.

multibus. (Default) All valid paths are in one priority group. Traffic is load-balanced across all active paths in the group.

group_by_prio. One priority group exists for each path priority value. Paths with the same priority are in the same priority group. Priorities are assigned by an external program.

group_by_serial. Paths are grouped by the SCSI target serial number (controller node WWN).

group_by_node_name. One priority group is assigned per target node name. Target node names are fetched in /sys/class/fc_transport/target*/node_name.

path_checker

Determines the state of the path.

directio. (Default in multipath-tools version 0.4.8 and later) Reads the first sector that has direct I/O. This is useful for DASD devices. Logs failure messages in /var/log/messages.

readsector0. (Default in multipath-tools version 0.4.7 and earlier) Reads the first sector of the device. Logs failure messages in /var/log/messages.

tur. Issues a SCSI test unit ready command to the device. This is the preferred setting if the LUN supports it. The command does not fill up /var/log/messages on failure with messages.

Some SAN vendors provide custom path_checker options:

  • emc_clariionQueries the EMC Clariion EVPD page 0xC0 to determine the path state.

  • hp_swChecks the path state (Up, Down, or Ghost) for HP storage arrays with Active/Standby firmware.

  • rdacChecks the path state for the LSI/Engenio RDAC storage controller.

path_selector

Specifies the path-selector algorithm to use for load-balancing.

round-robin 0. (Default) The load-balancing algorithm used to balance traffic across all active paths in a priority group.

This is currently the only algorithm available.

pg_timeout

Specifies path group timeout handling.

NONE (internal default)

prio_callout

Specifies the program and arguments to use to determine the layout of the multipath map.

When queried by the multipath command, the specified mpath_prio_* callout program returns the priority for a given path in relation to the entire multipath layout.

When it is used with the path_grouping_policy of group_by_prio, all paths with the same priority are grouped into one multipath group. The group with the highest aggregate priority becomes the active group.

When all paths in a group fail, the group with the next highest aggregate priority becomes active. Additionally, a failover command (as determined by the hardware handler) might be send to the target.

The mpath_prio_* program can also be a custom script created by a vendor or administrator for a specified setup.

A %n in the command line expands to the device name in the /dev directory.

A %b expands to the device number in major:minor format in the /dev directory.

A %d expands to the device ID in the /dev/disk/by-id directory.

If devices are hot-pluggable, use the %d flag instead of %n. This addresses the short time that elapses between the time when devices are available and when udev creates the device nodes.

If no prio_callout attribute is used, all paths are equal. This is the default.

/bin/true. Use this value when the group_by_priority is not being used.

The prioritizer programs generate path priorities when queried by the multipath command. The program names must begin with mpath_prio_ and are named by the device type or balancing method used. Current prioritizer programs include the following:

/sbin/mpath_prio_alua %n. Generates path priorities based on the SCSI-3 ALUA settings.

/sbin/mpath_prio_balance_units. Generates the same priority for all paths.

/sbin/mpath_prio_emc %n. Generates the path priority for EMC arrays.

/sbin/mpath_prio_hds_modular %b. Generates the path priority for Hitachi HDS Modular storage arrays.

/sbin/mpath_prio_hp_sw %n. Generates the path priority for Compaq/HP controller in active/standby mode.

/sbin/mpath_prio_netapp %n. Generates the path priority for NetApp arrays.

/sbin/mpath_prio_random %n. Generates a random priority for each path.

/sbin/mpath_prio_rdac %n. Generates the path priority for LSI/Engenio RDAC controller.

/sbin/mpath_prio_tpc %n. You can optionally use a script created by a vendor or administrator that gets the priorities from a file where you specify priorities to use for each path.

/usr/local/sbin/mpath_prio_spec.sh %n. Provides the path of a user-created script that generates the priorities for multipathing based on information contained in a second data file. (This path and filename are provided as an example. Specify the location of your script instead.) The script can be created by a vendor or administrator. The script’s target file identifies each path for all multipathed devices and specifies a priority for each path. For an example, see Section 5.6.3, “Using a Script to Set Path Priorities”.

rr_min_io

Specifies the number of I/O transactions to route to a path before switching to the next path in the same path group, as determined by the specified algorithm in the path_selector setting.

n (>0).  Specify an integer value greater than 0.

1000.  Default.

rr_weight

Specifies the weighting method to use for paths.

uniform.  Default. All paths have the same round-robin weightings.

priorities.  Each path’s weighting is determined by the path’s priority times the rr_min_io setting.

no_path_retry

Specifies the behaviors to use on path failure.

n (> 0).  Specifies the number of retries until multipath stops the queuing and fails the path. Specify an integer value greater than 0.

fail.  Specified immediate failure (no queuing).

queue.  Never stop queuing (queue forever until the path comes alive).

failback

Specifies whether to monitor the failed path recovery, and indicates the timing for group failback after failed paths return to service.

When the failed path recovers, the path is added back into the multipath enabled path list based on this setting. Multipath evaluates the priority groups, and changes the active priority group when the priority of the primary path exceeds the secondary group.

immediate.  When a path recovers, enable the path immediately.

n (> 0). When the path recovers, wait n seconds before enabling the path. Specify an integer value greater than 0.

manual.  (Default) The failed path is not monitored for recovery. The administrator runs the multipath command to update enabled paths and priority groups.


5.6.2.2. Configuring for Round-Robin Load Balancing

All paths are active. I/O is configured for some number of seconds or some number of I/O transactions before moving to the next open path in the sequence.

5.6.2.3. Configuring for Single Path Failover

A single path with the highest priority (lowest value setting) is active for traffic. Other paths are available for failover, but are not used unless failover occurs.

5.6.2.4. Grouping I/O Paths for Round-Robin Load Balancing

Multiple paths with the same priority fall into the active group. When all paths in that group fail, the device fails over to the next highest priority group. All paths in the group share the traffic load in a round-robin load balancing fashion.

5.6.3. Using a Script to Set Path Priorities

You can create a script that interacts with DM-MP to provide priorities for paths to the LUN when set as a resource for the prio_callout setting.

First, set up a text file that lists information about each device and the priority values you want to assign to each path. For example, name the file /usr/local/etc/primary-paths. Enter one line for each path in the following format:

host_wwpn target_wwpn scsi_id priority_value

Return a priority value for each path on the device. Make sure that the variable FILE_PRIMARY_PATHS resolves to a real file with appropriate data (host wwpn, target wwpn, scsi_id and priority value) for each device.

The contents of the primary-paths file for a single LUN with eight paths each might look like this:

0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:0 sdb 3600a0b8000122c6d00000000453174fc 50
0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:1 sdc 3600a0b80000fd6320000000045317563 2
0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:2 sdd 3600a0b8000122c6d0000000345317524 50
0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:3 sde 3600a0b80000fd6320000000245317593 2
0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:0 sdi 3600a0b8000122c6d00000000453174fc 5
0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:1 sdj 3600a0b80000fd6320000000045317563 51
0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:2 sdk 3600a0b8000122c6d0000000345317524 5
0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:3 sdl 3600a0b80000fd6320000000245317593 51

To continue the example mentioned in Table 5.6, “Multipath Attributes”, create a script named /usr/local/sbin/path_prio.sh. You can use any path and filename. The script does the following:

  • On query from multipath, grep the device and its path from the /usr/local/etc/primary-paths file.

  • Return to multipath the priority value in the last column for that entry in the file.

5.6.4. Configuring ALUA

The mpath_prio_alua(8) command is used as a priority callout for the Linux multipath(8) command. It returns a number that is used by DM-MP to group SCSI devices with the same priority together. This path priority tool is based on ALUA (Asynchronous Logical Unit Access).

5.6.4.1. Syntax

mpath_prio_alua [-d directory] [-h] [-v] [-V] device [device...] 

5.6.4.2. Prerequisite

SCSI devices

5.6.4.3. Options

-d directory

Specifying the Linux directory path where the listed device node names can be found. The default directory is /dev. When used, specify the device node name only (such as sda) for the device or devices you want to manage.

-h

Displays help for this command, then exits.

-v

Turns on verbose output to display status in human-readable format. Output includes information about which port group the specified device is in and its current state.

-V

Displays the version number of this tool, then exits.

device

Specifies the SCSI device you want to manage. The device must be a SCSI device that supports the Report Target Port Groups (sg_rtpg(8)) command. Use one of the following formats for the device node name:

  • The full Linux directory path, such as /dev/sda. Do not use with the -d option.

  • The device node name only, such as sda. Specify the directory path using the -d option.

  • The major and minor number of the device separated by a colon (:) with no spaces, such as 8:0. This creates a temporary device node in the /dev directory with a name in the format of tmpdev-<major>:<minor>-<pid>. For example, /dev/tmpdev-8:0-<pid>.

5.6.4.4. Return Values

On success, returns a value of 0 and the priority value for the group. Table 5.7, “ALUA Priorities for Device Mapper Multipath” shows the priority values returned by the mpath_prio_alua command.

Table 5.7. ALUA Priorities for Device Mapper Multipath

Priority Value

Description

50

The device is in the active, optimized group.

10

The device is in an active but non-optimized group.

1

The device is in the standby group.

0

All other groups.


Values are widely spaced because of the way the multipath command handles them. It multiplies the number of paths in a group with the priority value for the group, then selects the group with the highest result. For example, if a non-optimized path group has six paths (6 x 10 = 60) and the optimized path group has a single path (1 x 50 = 50), the non-optimized group has the highest score, so multipath chooses the non-optimized group. Traffic to the device uses all six paths in the group in a round-robin fashion.

On failure, returns a value of 1 to 5 indicating the cause for the command’s failure. For information, see the man page for mpath_prio_alua.

5.6.5. Reporting Target Path Groups

Use the SCSI Report Target Port Groups (sg_rtpg(8)) command. For information, see the man page for sg_rtpg(8).

5.7. Configuring Multipath I/O for the Root Device

In the SUSE Linux Enterprise Server 10, the root partition (/) on multipath is supported only if the /boot partition is on a separate, non-multipathed partition. Otherwise, no boot loader is written.

[Important]

If you apply all online updates, the DM-MP is available but is not supported for /boot and /root in SUSE Linux Enterprise Server 10 SP1 and later. More specifically, you need mkinitrd 1.2-106.61 and multipath-tools 0.4.7-34.23 or later. However, if you install the packages and set up the configuration, you might run into update issues later.

Full multipath support is available in SUSE Linux Enterprise Server 11.

To enable multipathing on the existing root device:

  1. Install Linux with only a single path active, preferably one where the by-id symlinks are listed in the partitioner.

  2. Mount the devices by using the /dev/disk/by-id path used during the install.

  3. After installation, add dm-multipath to /etc/sysconfig/kernel:INITRD_MODULES.

  4. For System Z, before running mkinitrd, edit the /etc/zipl.conf file to change the by-path information in zipl.conf with the same by-id information that was used in the /etc/fstab.

  5. Re-run /sbin/mkinitrd to update the initrd image.

  6. For System Z, after running mkinitrd, run zipl.

  7. Reboot the server.

To disable multipathing on the root device:

  • Add multipath=off to the kernel command line.

    This affects only the root device. All other devices are not affected.

5.8. Configuring Multipath I/O for an Existing Software RAID

Ideally, you should configure multipathing for devices before you use them as components of a software RAID device. If you add multipathing after creating any software RAID devices, the DM-MP service might be starting after the multipath service on reboot, which makes multipathing appear not to be available for RAIDs. You can use the procedure in this section to get multipathing running for a previously existing software RAID.

For example, you might need to configure multipathing for devices in a software RAID under the following circumstances:

  • If you create a new software RAID as part of the Partitioning settings during a new install or upgrade.

  • If you did not configure the devices for multipathing before using them in the software RAID as a member device or spare.

  • If you grow your system by adding new HBA adapters to the server or expanding the storage subsystem in your SAN.

[Note]

The following instructions assume the software RAID device is /dev/mapper/mpath0, which is its device name as recognized by the kernel. Make sure to modify the instructions for the device name of your software RAID.

  1. Open a terminal console, then log in as the root user or equivalent.

    Except where otherwise directed, use this console to enter the commands in the following steps.

  2. If any software RAID devices are currently mounted or running, enter the following commands for each device to dismount the device and stop it.

    umount /dev/mapper/mpath0
    
    mdadm --misc --stop /dev/mapper/mpath0
    
  3. Stop the boot.md service by entering

    /etc/init.d/boot.md stop
    
  4. Start the boot.multipath and multipathd services by entering the following commands:

    /etc/init.d/boot.multipath start
    
    /etc/init.s/multipathd start
    
  5. After the multipathing services are started, verify that the software RAID’s component devices are listed in the /dev/disk/by-id directory. Do one of the following:

    • Devices Are Listed: The device names should now have symbolic links to their Device Mapper Multipath device names, such as /dev/dm-1.

    • Devices Are Not Listed: Force the multipath service to recognize them by flushing and rediscovering the devices.

      To do this, enter the following commands:

      multipath -F
      
      multipath -v0
      

      The devices should now be listed in /dev/disk/by-id, and have symbolic links to their Device Mapper Multipath device names. For example:

      lrwxrwxrwx 1 root root 10 Jun 15 09:36 scsi-mpath1 -> ../../dm-1
      
  6. Restart the boot.md service and the RAID device by entering

    /etc/init.d/boot.md start
    
  7. Check the status of the software RAID by entering

    mdadm --detail /dev/mapper/mpath0
    

    The RAID’s component devices should match their Device Mapper Multipath device names that are listed as the symbolic links of devices in the /dev/disk/by-id directory.

  8. Make a new initrd to ensure that the Device Mapper Multipath services are loaded before the RAID services on reboot. Enter

    mkinitrd -f mpath
    
  9. Reboot the server to apply these post-install configuration settings.

  10. Verify that the software RAID array comes up properly on top of the multipathed devices by checking the RAID status. Enter

    mdadm --detail /dev/mapper/mpath0
    

    For example:

    Number Major Minor RaidDevice State
    0 253 0 0 active sync /dev/dm-0
    1 253 1 1 active sync /dev/dm-1
    2 253 2 2 active sync /dev/dm-2

5.9. Scanning for New Devices without Rebooting

If your system has already been configured for multipathing and you later need to add more storage to the SAN, you can use the rescan-scsi-bus.sh script to scan for the new devices. By default, this script scans all HBAs with typical LUN ranges.

Syntax

rescan-scsi-bus.sh [options] [host [host ...]]

You can specify hosts on the command line (deprecated), or use the --hosts=LIST option (recommended).

Options

For most storage subsystems, the script can be run successfully without options. However, some special cases might need to use one or more of the following parameters for the rescan-scsi-bus.sh script:

Option

Description

-l

Activates scanning for LUNs 0-7. [Default: 0]

-L NUM

Activates scanning for LUNs 0 to NUM. [Default: 0]

-w

Scans for target device IDs 0 to 15. [Default: 0 to 7]

-c

Enables scanning of channels 0 or 1. [Default: 0]

-r
--remove

Enables removing of devices. [Default: Disabled]

-i
--issueLip

Issues a Fibre Channel LIP reset. [Default: Disabled]

--forcerescan

Rescans existing devices.

--forceremove

Removes and re-adds every device. (DANGEROUS)

--nooptscan

Don’t stop looking for LUNs if 0 is not found.

--color

Use colored prefixes OLD/NEW/DEL.

--hosts=LIST

Scans only hosts in LIST, where LIST is a comma-separated list of single values and ranges. No spaces are allowed.

--hosts=A[-B][,C[-D]]
--channels=LIST

Scans only channels in LIST, where LIST is a comma-separated list of single values and ranges. No spaces are allowed. 

--channels=A[-B][,C[-D]]
--ids=LIST

Scans only target IDs in LIST, where LIST is a comma-separated list of single values and ranges. No spaces are allowed. 

--ids=A[-B][,C[-D]]
--luns=LIST

Scans only LUNs in LIST, where LIST is a comma-separated list of single values and ranges. No spaces are allowed.

--luns=A[-B][,C[-D]]

Procedure

Use the following procedure to scan the devices and make them available to multipathing without rebooting the system.

  1. On the storage subsystem, use the vendor’s tools to allocate the device and update its access control settings to allow the Linux system access to the new storage. Refer to the vendor’s documentation for details.

  2. Scan all targets for a host to make its new device known to the middle layer of the Linux kernel’s SCSI subsystem. At a terminal console prompt, enter

    rescan-scsi-bus.sh [options]
    
  3. Check for scanning progress in the system log (the /var/log/messages file). At a terminal console prompt, enter

    tail -30 /var/log/messages
    

    This command displays the last 30 lines of the log. For example:

    # tail -30 /var/log/messages
    . . .
    Feb 14 01:03 kernel: SCSI device sde: 81920000
    Feb 14 01:03 kernel: SCSI device sdf: 81920000
    Feb 14 01:03 multipathd: sde: path checker registered
    Feb 14 01:03 multipathd: sdf: path checker registered
    Feb 14 01:03 multipathd: mpath4: event checker started
    Feb 14 01:03 multipathd: mpath5: event checker started
    Feb 14 01:03:multipathd: mpath4: remaining active paths: 1
    Feb 14 01:03 multipathd: mpath5: remaining active paths: 1
    
  4. Repeat Step 2 through Step 3 to add paths through other HBA adapters on the Linux system that are connected to the new device.

  5. Run the multipath command to recognize the devices for DM-MP configuration. At a terminal console prompt, enter

    multipath
    

    You can now configure the new device for multipathing.

5.10. Scanning for New Partitioned Devices without Rebooting

Use the example in this section to detect a newly added multipathed LUN without rebooting.

  1. Open a terminal console, then log in as the root user.

  2. Scan all targets for a host to make its new device known to the middle layer of the Linux kernel’s SCSI subsystem. At a terminal console prompt, enter

    rescan-scsi-bus.sh [options]
    

    For syntax and options information for the rescan-scsi-bus-sh script, see Section 5.9, “Scanning for New Devices without Rebooting”.

  3. Verify that the device is seen (the link has a new time stamp) by entering

    ls -lrt /dev/dm-*
    
  4. Verify the new WWN of the device appears in the log by entering

    tail -33 /var/log/messages
    
  5. Use a text editor to add a new alias definition for the device in the /etc/multipath.conf file, such as oradata3.

  6. Create a partition table for the device by entering

    fdisk /dev/dm-8
    
  7. Trigger udev by entering

    echo 'add' > /sys/block/dm-8/uevent
    

    This generates the device-mapper devices for the partitions on dm-8.

  8. Create a file system and label for the new partition by entering

    mke2fs -j /dev/dm-9
    
    tune2fs -L oradata3 /dev/dm-9
    
  9. Restart DM-MP to let it read the aliases by entering

    /etc/init.d/multipathd restart
    
  10. Verify that the device is recognized by multipathd by entering

    multipath -ll
    
  11. Use a text editor to add a mount entry in the /etc/fstab file.

    At this point, the alias you created in Step 5 is not yet in the /dev/disk/by-label directory. Add the mount entry the /dev/dm-9 path, then change the entry before the next time you reboot to

    LABEL=oradata3
    
  12. Create a directory to use as the mount point, then mount the device by entering

    md /oradata3
    
    mount /oradata3
    

5.11. Viewing Multipath I/O Status

Querying the multipath I/O status outputs the current status of the multipath maps.

The multipath -l option displays the current path status as of the last time that the path checker was run. It does not run the path checker.

The multipath -ll option runs the path checker, updates the path information, then displays the current status information. This option always the displays the latest information about the path status.

  • At a terminal console prompt, enter

    multipath -ll
    

    This displays information for each multipathed device. For example:

    3600601607cf30e00184589a37a31d911
    [size=127 GB][features="0"][hwhandler="1 emc"]
    
    \_ round-robin 0 [active][first]
      \_ 1:0:1:2 sdav 66:240  [ready ][active]
      \_ 0:0:1:2 sdr  65:16   [ready ][active]
    
    \_ round-robin 0 [enabled]
      \_ 1:0:0:2 sdag 66:0    [ready ][active]
      \_ 0:0:0:2 sdc  8:32    [ready ][active]
    

For each device, it shows the device’s ID, size, features, and hardware handlers.

Paths to the device are automatically grouped into priority groups on device discovery. Only one priority group is active at a time. For an active/active configuration, all paths are in the same group. For an active/passive configuration, the passive paths are placed in separate priority groups.

The following information is displayed for each group:

  • Scheduling policy used to balance I/O within the group, such as round-robin

  • Whether the group is active, disabled, or enabled

  • Whether the group is the first (highest priority) group

  • Paths contained within the group

The following information is displayed for each path:

  • The physical address as host:bus:target:lun, such as 1:0:1:2

  • Device node name, such as sda

  • Major:minor numbers

  • Status of the path and device

Each path line contains the following:

\_ host:channel:id:lun devnode major:minor [path_status] [dm_status]

When the path is up and ready for I/O, path_status shows a state of ready or active. When the path is down, the path_status shows a state of faulty or failed. The path_status is updated periodically based on the value of the polling_interval setting in /etc/multipath.conf. For information about the polling_interval, see Section 5.4.5.5, “Configuring Default Multipath Behavior in /etc/multipath.conf”.

The dm_status field reports two states: failed and active.

It is normal for path_status and dm_status to temporarily disagree.

5.12. Managing I/O in Error Situations

You might need to configure multipathing to queue I/O if all paths fail concurrently. In certain scenarios, where the driver, the HBA, or the fabric experiences spurious errors, it is advisable that DM-MP be configured to queue all I/O where those errors lead to a loss of all paths, and never propagate errors upwards. Because this leads to I/O being queued indefinitely unless a path is reinstated, make sure that multipathd is running and works for your scenario. Otherwise, I/O might be stalled indefinitely on the affected multipathed device, until reboot or until you manually return to failover instead of queuing.

To test the scenario:

  1. In a terminal console, log in as the root user.

  2. Activate queuing instead of failover for the device I/O by entering:

    dmsetup message device_ID 0 queue_if_no_path
    

    Replace the device_ID with the ID for your device. For example, enter:

    dmsetup message 3600601607cf30e00184589a37a31d911 0 queue_if_no_path
    
  3. Return to failover for the device I/O by entering:

    dmsetup message device_ID 0 fail_if_no_path
    

    This command immediately causes all queued I/O to fail.

    Replace the device_ID with the ID for your device. For example, enter:

    dmsetup message 3600601607cf30e00184589a37a31d911 0 fail_if_no_path
    

To set up queuing I/O for scenarios where all paths fail:

  1. In a terminal console, log in as the root user.

  2. Open the /etc/multipath.conf file in a text editor.

  3. Uncomment the defaults section and its ending bracket, then add the default_features setting, as follows:

    defaults {
      default_features "1 queue_if_no_path"
    }
    
  4. After you modify the /etc/multipath.conf file, you must run mkinitrd to re-create the INITRD on your system, then reboot in order for the changes to take effect.

  5. When you are ready to return over to failover for the device I/O, enter:

    dmsetup message mapname 0 fail_if_no_path
    

    Replace the mapname with the mapped alias name or the device ID for the device.

    This command immediately causes all queued I/O to fail and propagates the error to the calling application.

5.13. Resolving Stalled I/O

If all paths fail concurrently and I/O is queued and stalled, do the following:

  1. Enter the following command at a terminal console prompt:

    dmsetup message mapname 0 fail_if_no_path
    

    Replace mapname with the correct device ID or mapped alias name for the device. This causes all queued I/O to fail and propagates the error to the calling application.

  2. Reactivate queueing by entering the following command at a terminal console prompt:

    dmsetup message mapname 0 queue_if_no_path
    

5.14. Additional Information

For more information about configuring and using multipath I/O on SUSE Linux Enterprise Server, see the following additional resources in the Novell Support Knowledgebase:

5.15. What’s Next

If you want to use software RAIDs, create and configure them before you create file systems on the devices. For information, see the following:


SUSE® Linux Enterprise Server Storage Administration Guide 10 SP3/SP4