Replacing a failed disk in SmartOS

Replacement drives are not automatically discovered in SmartOS. Here’s how you replace a drive without rebooting the server.

This assumes the failed drive is c0t0d0 in slot sata0/0

Replacing a failed drive

  1. Offline the failed drive: zpool offline zones c0t0d0
  2. devfsadm -C -c disk -v (may not be necessary)
  3. devfsadm -c disk -v (may not be necessary)
  4. cfgadm -l you should see type sata-port & receptacle disconnected. If you don’t do this, type will be unknown (may not be necessary)
  5. cfgadm -f -c connect sata0/0
  6. cfgadm -l should show type disk, receptacle connected, occupant unconfigured
  7. cfgadm -c configure sata0/0
  8. cfgadm -l should now show occupant configured
  9. zpool replace zones c0t0d0 I got an error that the drive was already part of a ZFS pool because it was a used drive. In that case, do: zpool replace -f zones c0t0d0

When things go wrong

Your replacement drive might not work. My first replacement wouldn’t spin up. This section goes through what that looked like.

You run zpool replace zones c0t0d0 and get this message: cannot label 'c0t0d0': try using fdisk(1M) and then provide a specific slice

So you try to run fdisk to label the drive but it fails:

fdisk /dev/dsk/c0t0d0
fdisk: Cannot stat device /dev/dsk/c0t0d0

Check for device probe errors: dmesg | grep disk@0,0|tail -1 and find a message that it failed to power up: 2016-07-13T18:33:16.997975+00:00 c8-0a-a9-57-4c-ee genunix: [ID 353554 kern.warning] WARNING: Device /pci@0,0/pci152d,8975@1f,2/disk@0,0 failed to power up.

So, the drive is bad. Replace the drive and start over. TL;DR:

cfgadm -f -c connect sata0/0
dmesg | grep disk@3,0|tail -1
cfgadm -l
cfgadm -c configure sata0/3::dsk/c0t0d0
cfgadm -l
dmesg | grep disk@0,0|tail -1
Want to keep reading? / go foward / go back