1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Rejecting I/O to offline device

Discussion in 'Tomato Firmware' started by Lugoues, Jul 11, 2014.

  1. Lugoues

    Lugoues Network Newbie Member

    I am running Shibby's Tomato on an Asus RT-N16 and occasionally the usb drive (solid state sata => usb) will enter a bad state where it needs to be unplugged and reconnected before it can be accessed again without an I/O Error.

    The system log sees spurts of the following when this occurs, usually 30 messages at once:
    Jul 10 20:59:58 user.err kernel: sd 0:0:0:0: rejecting I/O to offline device

    Everything seems fine after reconnecting the drive. This pops up every few months but I have not been able to track down a problem. Does anyone have any idea what might be wrong?
  2. koitsu

    koitsu Network Guru Member

    Possible explanations:

    1) USB/SATA bridge is saving power and going into a mode that essentially knocks the USB storage layer offline (power-cycling the bridge would cause USB reenumeration within the Linux USB layer, followed by SCSI device bus rescanning within the Linux SCSI/SATA translation layer).

    2) USB/SATA bridge is malfunctioning due to manufacturing defects or numerous other things which most end users cannot troubleshoot.

    3) USB/SATA bridge cannot provide the correct amount of power to drive a SATA disk (doesn't matter if it's SSD or classic MHDD). The common issue I've seen are people using USB/SATA enclosures where the drive is powered off the USB bus and the person is using either a) too long of USB cables (the length matters in this situation!) or USB extension cables, b) a single-ended USB cable to connect the enclosure to the system (not enough power can be provided across a single USB connector for this, which is why these exist (they get data+power from one connector and just (more) power from the other), or c) shoddy USB cables.

    4) The SSD is saving power by going into a mode (e.g. spindown, even though it's an SSD) that essentially knocks the I/O subsystem layer offline, causing Linux SCSI/SATA translation layer to realise that the device isn't accessible any longer).

    5) The SSD is misbehaving due to actual device errors or other anomalies.

    USB/SATA bridges are a complete nightmare in many regards, and I am not exaggerating in the least. (You may find other posts of mine on this forum highly technical and informative, but the one thing I know a lot about is storage subsystems, particularly SATA, all the way down to ATA protocol). Many of these bridges do things "under the hood" when translating USB mass storage device messages to ATA equivalents (often buggy), and try to be sneaky/fancy about certain other things and cause trouble/annoyances for end users. And even more of them don't offer full SMART passthrough, i.e. the bridge is literally analysing every single ATA command that comes through the device and if it doesn't match a specific permitted list it's dropped/ignored (this impacts your ability to monitor the hard disk/SSD through smartmontools when attached to such USB/SATA bridges; hooking the device up via native SATA relieves this problem).

    Getting SMART statistics, including device PHY statistics (if available), would be a good starting point. Rule out the SSD first (assuming you bought it separately from the USB/SATA enclosure). It may be possible to do this using smartmontools when the device is attached via USB but as I just said it depends on the USB/SATA bridge behaviour and if it properly allows SMART passthrough (and many do not).

    I'm extremely particular about what brands of USB/SATA products I buy because of all of the above. My experience, by the way, is that most USB/SATA enclosures will work with an SSD but are designed to be used with MHDDs, and in turn (given stupid design nuances with the USB/SATA bridge itself, i.e. cheap garbage) can cause mysterious problems/anomalies because the bridge may make blind assumptions about the attached device (e.g. assuming its an MHDD).

    Finally, to make matters worse atop worse, debugging this in TomatoUSB is extremely difficult/virtually impossible because the kernel is stripped down to be extremely small, not to mention things like libata and the SAT layer (AFAIK) cannot have debugging turned on.

    My advice to you would be, assuming you can deal with smaller capacities, consider purchasing something like this and use a microSD card instead. The device I just linked does not experience the problems you described on an RT-N16 -- I know because I used mine for Entware for multiple years without a single interruption. The trick here is that there's no USB/SATA bridge involved -- microSD cards aren't SATA so there's no USB storage vs. ATA protocol translation going on. It's also extremely small and thus takes up virtually no room on the backplane of your RT-N16 (the other USB port will be accessible as well). I've also used these with success as well (but they take up more physical room around the port), including on my RT-N66U. I've yet to see an I/O error or any kind of anomaly**.

    * -- "Bridge" refers to the actual USB/SATA IC within the USB SATA enclosure you're using. It's referred to as a "bridge" because it bridges two completely unrelated/incompatible protocols (USB and ATA). It's a chip within the USB/SATA enclosure you purchased (assuming you purchased the drive and the enclosure separately).

    ** - Though on the RT-N66U I do see that the hardware reports some sort of "Flash Reader" device even though there is no such port/disk/whatever. Example:

    scsi0 : SCSI emulation for USB Mass Storage devices
    Registered led device: 1-1.1
    usb 1-1.4: new high speed USB device using ehci_hcd and address 4
    usb 1-1.4: configuration #1 chosen from 1 choice
    scsi1 : SCSI emulation for USB Mass Storage devices
    scsi 0:0:0:0: Direct-Access     hp       v165w            0.00 PQ: 0 ANSI: 4
    sd 0:0:0:0: [sda] 31711232 512-byte hardware sectors (16236 MB)
    sd 0:0:0:0: [sda] Write Protect is off
    sd 0:0:0:0: [sda] Mode Sense: 23 00 00 00
    sd 0:0:0:0: [sda] Assuming drive cache: write through
    sd 0:0:0:0: [sda] Assuming drive cache: write through
     sda: sda1
    sd 0:0:0:0: [sda] Attached SCSI removable disk
    scsi 1:0:0:0: Direct-Access     Multi    Flash Reader     1.00 PQ: 0 ANSI: 0
    sd 1:0:0:0: [sdb] Attached SCSI removable disk
    sda = my 16GB HP flash drive (obviously), sdb = I have not the slightest idea, it must be something internal to the RT-N66U itself, but there's nothing there anyway:

    root@gw:/tmp/home/root# dd if=/dev/sdb | xxd | less
    dd: can't open '/dev/sdb': No medium found
    Last edited: Jul 12, 2014

Share This Page