# Replacing a failed disk in SmartOS

*July 16, 2016* — https://jade.wtf/tech-notes/replacing-failed-disk-smartos/

Tags: SmartOS

---


Replacement drives are not automatically discovered in SmartOS. Here's how you replace a drive without rebooting the server.

_This assumes the failed drive is c0t0d0 in slot sata0/0_

## Replacing a failed drive

1. Offline the failed drive: `zpool offline zones c0t0d0`
2. `devfsadm -C -c disk -v` (may not be necessary)
3. `devfsadm -c disk -v` (may not be necessary)
4. `cfgadm -l`
you should see type sata-port & receptacle disconnected. If you don't do this, type will be unknown (may not be necessary)
5. `cfgadm -f -c connect sata0/0`
6. `cfgadm -l`
should show type disk, receptacle connected, occupant unconfigured
7. `cfgadm -c configure sata0/0`
8. `cfgadm -l`
should now show occupant configured
9. `zpool replace zones c0t0d0`
I got an error that the drive was already part of a ZFS pool because it was a used drive. In that case, do: `zpool replace -f zones c0t0d0`

## When things go wrong
Your replacement drive might not work. My first replacement wouldn't spin up. This section goes through what that looked like.

You run `zpool replace zones c0t0d0` and get this message:
`cannot label 'c0t0d0': try using fdisk(1M) and then provide a specific slice` 

So you try to run fdisk to label the drive but it fails:
```
fdisk /dev/dsk/c0t0d0
fdisk: Cannot stat device /dev/dsk/c0t0d0
```
Check for device probe errors:
`dmesg | grep disk@0,0|tail -1` and find a message that it failed to power up:
`2016-07-13T18:33:16.997975+00:00 c8-0a-a9-57-4c-ee genunix: [ID 353554 kern.warning] WARNING: Device /pci@0,0/pci152d,8975@1f,2/disk@0,0 failed to power up.`

So, the drive is bad. Replace the drive and start over. TL;DR: 

```
cfgadm -f -c connect sata0/0
dmesg | grep disk@3,0|tail -1
cfgadm -l
cfgadm -c configure sata0/3::dsk/c0t0d0
cfgadm -l
dmesg | grep disk@0,0|tail -1
```


---

&copy; 2016 Jade Angrboða.
