Problem with hard drive disconnecting

Discussion in 'Tomato Firmware' started by CharlieSummers, Feb 16, 2018.

  1. CharlieSummers

    CharlieSummers Network Newbie Member

    I am running:

    Tomato Firmware v1.28.7511 MIPSR2Toastman-RT K26 USB VPN-NC Built on Fri, 20 Jan 2017 22:13:11 +0700

    ...on an Asus RT-N16. I have a Samsung G3 Station external 2G hard drive hanging off of a USB port, with the main NTFS partition shared. "Spin down each HDD when idle. No need to use with flashdrive." is unchecked, as the thing spins itself down. (Way too much, apparently.) The drive is used sporadically for internal transfers mostly, although there are some media files there rarely accessed by computers and Android set-top boxes.

    Every few days that drive will completely disconnect, not visible in "USB and NAS," and require a USB replug to make it visible again; and it can be days before I realize the drive is offline (not used much). Of course, since I have Bandwidth Monitoring and IP Traffic Monitoring both set to save history and create backups to a folder on that hard drive, rstats and cstats both lock-up if the drive goes offline and I lose the data from that period until I notice and re-connect the drive (at which point it auto-mounts).

    I plan to move to Comcast, so maintaining this data is going to be vital to keep them from screwing me over with their bogus monitoring, so I need to get this fixed. I suspect the problem is not in the router firmware (although you folks would know better than I), and rather in the G3 Station firmware (which I always thought was too d*mned smart for its britches; won't spin-up unless both the power and the USB are live), but I 'm not sure.

    Far as I can tell, my choices are to 1) live with it as-is, 2) somehow fix the existing configuration, 3) replace the G3 Station with another external drive (maybe a bare-drive dock?), 4) stick a flash drive/memory card reader in the second USB post and backup bandwidth logs to that instead, or 4) buy an external NAS and backup the bandwidth logs to it instead, bypassing the router's USB/NAS completely.

    Would appreciate any suggestions.
     
  2. Sean B.

    Sean B. LI Guru Member

    I would recommend using a part of that USB HDD to set up and install optware-ng. Then use optwares package manager to download and install hdparm. Using hdparm, list all the raw drive attributes directly from the HDD's firmware, and turn off all power management settings it shows are supported by the drive.
     
  3. CharlieSummers

    CharlieSummers Network Newbie Member

    So you would recommend the drive never spin down? Probably easier to just temporarily move it to one of the Windows or linux boxes here to run hdparam (save installing software to the router that isn't really necessary), but before I do I want to make sure I understand.
     
  4. Sean B.

    Sean B. LI Guru Member

    Yes, I recommend disabling the drives on-board power management, as this can be a compatibility problem with the older ahci/xhci drivers in kernel 2.6.36. And yes, you can move the drive to a computer to change it, however you won't gain access to useful software to continue diagnosis ( if required ) doing it that way. As the router is the unit in question

    **NOTE** If the on-board power management does turn out to be an issue and you need/want the spin-down functionality, it can be reinstated using the sd-idle daemon. It does not trigger any power state issues with the drivers.
     
    Last edited: Feb 16, 2018
  5. CharlieSummers

    CharlieSummers Network Newbie Member

    Excellent, thank you. I'll try to get that done this afternoon and start testing it out.

    I assume this is what is set by the "Spin down each HDD when idle" on the USB and NAS page?
     
  6. Sean B.

    Sean B. LI Guru Member

    Correct. I stated the program by name as I run a newer version via optware-ng and set the inactivity timer for spin down to a value that was suited to the drives use, as to avoid excessive parking/unparking of the heads.
     
  7. CharlieSummers

    CharlieSummers Network Newbie Member

    Help.

    I've been poking at this for a while, yesterday finally dedicated a few hours to it. Made yet-another-backup of the drive, repartitioned creating an additional ext2 partition (I know, expected to use ext3, didn't), installed Entware-ng using the idiot's guide, then installed hdparm, added scripts, etc., etc. Everything seems to be working fine on the router after reboots.

    hdparm -I reports "Advanced power management level: disabled," Command/Features says "Power Management feature set" is enabled but "Advanced Power Management feature set" is disabled. The only serious command I attempted so far is:

    Since I am not experienced in mucking with hard drive low-level settings, I am going to stop until someone with a clue can advise me.
     
  8. koitsu

    koitsu Network Guru Member

    Please install from Entware-ng the smartmontools package and run smartctl -a /dev/sda and smartctl -x /dev/sda and provide the output here in code blocks. Optware-ng (whatever that is) may not have the latest smartmontools package, if at all, so the info/data I'm looking for might not be available with that version of smartmontools. You really want/need smartmontools 6.4 or newer. It will also be able to help you (potentially) with this problem, rather than through hdparm. Please don't "mess around" with smartctl though, just wait for me to give further advice.

    Footnote: I am extremely familiar with storage, storage subsystems, ATA protocol, etc. so have faith.
     
  9. CharlieSummers

    CharlieSummers Network Newbie Member

    As noted, I did install from Entware-ng; optware was apparently its predecessor, although that's only from reading a couple of web pages and shouldn't be considered authoritative.

    Code:
    root@Tomato:/tmp/home/root# smartctl -a /dev/sda
    smartctl 6.6 2017-11-05 r4594 [mips-linux-2.6.22.19] (localbuild)
    Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
    
    /dev/sda: Unknown USB bridge [0x04e8:0x6014 (0x000)]
    Please specify device type with the -d option.
    
    Use smartctl -h to get a usage summary
    
    Code:
    root@Tomato:/tmp/home/root# smartctl -x /dev/sda
    smartctl 6.6 2017-11-05 r4594 [mips-linux-2.6.22.19] (localbuild)
    Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
    
    /dev/sda: Unknown USB bridge [0x04e8:0x6014 (0x000)]
    Please specify device type with the -d option.
    
    Use smartctl -h to get a usage summary
    
    No fear there; I'm old enough to know not to screw around with things I don't understand. ;)
     
  10. Sean B.

    Sean B. LI Guru Member

    I'm used to the older Optware, and keep calling the newer Entware-ng, Optware-ng. Don't be hatin' Koitsu ;)
     
  11. koitsu

    koitsu Network Guru Member

    Before I get into the details, I'll educate:

    I've talked extensively in the past (on the Internet in general I mean, but I think on this forum too? Can't remember) about how many SATA-USB bridges are either broken or just generally awful. Some don't allow SMART pass-through either, while others actually corrupt pass-through data (I'm not kidding). Literally these chips act as "middlemen", so they can filter out or mangle/tweak data as they see fit. Some may not allow some ATA commands through, others might allow some through but not certain subcommands. It's a mess. Consumers have no idea what to buy (i.e. what operates sanely, what allows SMART passthrough, what doesn't have stupid features turned on) because hardly any vendors disclose what actual bridge ICs they're using inside, and many of them change ICs in the product whenever they see fit (i.e. you don't know what you're getting when you buy it).

    The problems you're experiencing could in fact be caused by the USB-SATA bridge itself. Many of them actually have bridge-level (read: firmware-level within the USB enclosure itself) timeouts/powerdowns. Some vendors, like Plugable, actually disclose this fact (and let you flash a firmware that has this feature turned off). Some vendors don't bother implementing junk like that and instead let the drive dictate everything; you really want the latter (though Plugable, BTW, is a good company -- I do use a specific model of USB drive dock that is extremely good about "true" passthrough). Then there are those "USB hard disks" you can buy which include both the enclosure *and* the drive, usually sold by companies who make hard disks (ex. Western Digital, Samsung, Seagate, etc.), and those all vary too.

    Like I said: a mess.

    Education lesson over.

    The USB-SATA bridge ("USB enclosure for an ATA hard disk") you're using isn't "automatically detected" by smartmontools. It doesn't mean it can't/won't work, just that smartmontools couldn't figure out "how" to talk to the underlying disk through USB. Sometimes you gotta help it and pray that it works.

    The USB Vendor ID (VID) is 0x04e8 (Samsung Electronics Co., Ltd.) and the USB Product ID (PID) is 0x6014. The Linux USB Vendor list website right now is down, so I can't check to see what PID 0x6014 refers to.

    You can see what types of USB-SATA bridges are supported by smartmontools on their website, but this document tends to be a little bit out of date. We can see that several Samsung USB enclosures (VID 0x04e8) are supported, but the exact PID of 0x6014 isn't listed. So instead we go to the source code and look. Hmm, nope, no PID 0x6014 listed, but similar drives are all using SAT -- but this also assumes a newer-ish Linux version, while TomatoUSB is quite old (ARM is newer, MIPS is older).

    So here are 3 commands to try + provide output from in a code block. Note for --scan that's two hyphens, not one.

    Code:
    smartctl --scan
    smartctl -d sat -a /dev/sda
    smartctl -d sat -x /dev/sda
    
    If SAT support works, then we can proceed to the next phase (and also means I have to file a ticket with the smartmontools folks to have PID 0x6014 added to their drive database as using SAT, so that going forward in smartmontools 6.7 it should "just work").

    If SAT support doesn't work, then there's literally no way to solve this dilemma for you, other than buying a different and more-compatible USB-SATA enclosure or USB hard drive. WDC My Passport drives should work, but I haven't tested them on TomatoUSB, only on present-day Linux and FreeBSD desktops.
     
  12. Sean B.

    Sean B. LI Guru Member

    I believe this will get smart to function, but likely with a limited feature set:

    Code:
    smartctl -d scsi -a /dev/sda
    smartctl -d scsi -x /dev/sda
     
  13. CharlieSummers

    CharlieSummers Network Newbie Member

    I have suspected this. I always hated this particular enclosure; thought it was too d*mned smart for its own good (I hate something that requires both power and USB to be connected before the thing will even spin-up), and this is before I plugged it into the router.

    I will cheerfully and gratefully provide anything you wish, but first, a curiosity point. Would it be better/smarter/simpler to drop this Samsung enclosure entirely and grab, say, a standard USB 2.0 cradle/dock/whatever and bare drive and use it instead? Ignoring the infrequent incrementals (the drive is read-from every few days/week, but written-to relatively rarely), I made an extra image backup of the drive before re-partitioning. If I'm going to jump to something more...transparent to the drive, now would be the optimal time.

    On the other hand, if I'm going to have the same issues with any SATA->USB bridge, then maybe sticking with this is the smarter move. My knowedge-base in this area is light enough I can't really judge which would be the most efficient path.
     
  14. koitsu

    koitsu Network Guru Member

    smartctl -d scsi will not work. You will get errors along the lines of "INQUIRY failed (I/O error)" or the like. The underlying device is not a SCSI drive, it's an ATA drive; the CDB and payload would then contain actual SCSI data, not ATA. INQUIRY is a SCSI command; the ATA equivalent is IDENTIFY. Totally different data structure, totally different CDB, totally different request payload and response payload.

    I'll try to explain all of this, but will admit up front: yes, this is confusing because of all the protocols involved and what is "wrapped" around what. More education, I suppose:

    USB hard disks (meaning USB chips or devices that utilise the official USB mass storage class standard) are all technically SCSI devices. This is because the USB mass storage class protocol, for hard disks, only supports the SCSI command set: Details are here: https://en.wikipedia.org/wiki/USB_mass_storage_device_class#Device_access

    That would cause a bit of a conundrum if you were to, say, stick an ATA disk into a USB enclosure. ATA disks don't understand SCSI, they understand ATA.

    That leads us to SAT (SCSI-to-ATA Translation). SAT was a standard developed by T10 (the commitee that developed and maintains SCSI. T13 is the committee that maintains ATA), with the intended goal to allow SCSI commands to be "translated" to ATA commands under the hood. There is not a direct 1:1 mapping of all SCSI commands to ATA commands, but many are similar in functionality, so they can essentially be "translated". You could call this "emulation" if you wanted too, but translation is more accurate. The OS has to do this translation. Reference material: https://en.wikipedia.org/wiki/SCSI_/_ATA_Translation

    There's also something called ATA pass-through

    This is why on non-Linux OSes, where the /dev names vary depending on device type, you'll find varying device names depending on their actual type. Example: on present-day FreeBSD, you have:

    adX = native IDE disks (i.e. old PATA, using the older ATA driver)
    adaX = native ATA disks (i.e. SATA, but can also be PATA if certain emulation modes are enabled in the kernel)
    acdX = native ATA optical drives (i.e. SATA-based CD/DVD)
    daX = SCSI "direct access" (i.e. disks, but also includes USB-attached hard drives)
    cdX = SCSI optical drives (i.e. CD/DVD, also including USB-attached optical drives)

    Linux used to call native IDE disks by hda/hdb/hdc/hdd/... in the early days, and SCSI disks by sda/sdb/sdc/sdd/... but then massive overhauls happened in the kernel (mainly libata) and now everything gets sda/sdb/sdc/sdd because there's SAT happening on some general level. With USB-attached disks, the same applies. You really have no idea "what" you're talking to on Linux without looking deeper with several tools available.

    The USB portion of the situation complicates things further because there are multiple USB protocols (not talking about USB 2.0 vs. 3.0, I'm talking about sub-protocols that handle I/O); the newest one is UAS/UASP (USB-Attached SCSI Protocol), which was introduced as part of USB 3.0. It has less protocol overhead than the classic USB mass storage class, resulting in higher speeds and lower latency. You've probably seen it advertised on USB hard disks sold by vendors, or on USB HDD enclosures ("Supports UASP" etc.). I'm not really familiar with USB 3.0 protocol details (I barely remember USB 2.0 at this point), but you can read about it here: https://en.wikipedia.org/wiki/USB_Attached_SCSI

    There's also something called ATA pass-through as part of the USB storage class that's supposed to permit raw ATA CDBs to be submit to the underlying device (via USB protocol). This is actually what's used a lot of the time, but it depends on the USB bridge. Last paragraph of this section: https://www.smartmontools.org/wiki/FAQ#SmartmontoolsforFireWireUSBandSATAdiskssystems

    I could go on and on about this, including a small drawing of the "I/O stack" for how all the pieces fit together when a userland program tries to issue an ATA CDB to a USB device, but it would take me a while and I'm actually ill (very ill, in fact -- very bad cold, diagnosed with a cyst near my ear yesterday, plus my IBS symptoms and several other things going on. I'm on quite a few drugs right now, and not the fun kind).

    The reason one might need -d sat is because smartmontools isn't automatically able to figure out -- in this case, using USB VID and PID -- what type of device it's talking to. It knows it's USB, but it doesn't know if it should be using SAT, or some vendor-custom protocol (like usbcypress, usbjmicron, usbprolific, usbsunplus, etc. -- yes, all these vendors MADE THEIR OWN PROTOCOLS at one point because SAT hadn't been invented yet).

    The USB layer and all this hullabaloo makes talking "real ATA" extremely painful. It's one of the reasons why I loathe having to do troubleshooting on USB-attached MHDDs or SSDs -- there's so much translation and "junk" going on compared to just raw CDBs going across a wire (like you'd expect with pure native SATA or pure native SCSI). Not to mention, as I described, the SATA-USB bridge ICs under the hood can choose to do whatever they want with the CDBs submit to them by the host (OS) -- I've found many, no joke, which allow "some" SMART pass-through but not all of it (i.e. some SMART sub-commands are rejected, but others are permitted).

    Welcome to my world.
     
  15. Sean B.

    Sean B. LI Guru Member

    Code:
    root@Storage:/tmp/home/root# smartctl -d scsi -a /dev/sdb
    smartctl 6.5 2016-05-07 r4318 [armv7l-linux-2.6.36.4brcmarm] (local build)
    Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Vendor:               HGST HTS
    Product:              545025A7E330
    Revision:
    Compliance:           SPC-4
    User Capacity:        250,059,350,016 bytes [250 GB]
    Logical block size:   512 bytes
    Rotation Rate:        17715 rpm
    Device type:          disk
    Transport protocol:   Fibre channel (FCP-2)
    Local Time is:        Sat Mar  3 04:06:40 2018 PST
    SMART support is:     Available - device has SMART capability.
    SMART support is:     Enabled
    Temperature Warning:  Disabled or Not Supported
    
    === START OF READ SMART DATA SECTION ===
    SMART Health Status: OK
    Current Drive Temperature:     0 C
    Drive Trip Temperature:        0 C
    
    Error Counter logging not supported
    
    Device does not support Self Test logging
    root@Storage:/tmp/home/root#
    
    Code:
    root@Storage:/tmp/home/root# smartctl -d ata -a /dev/sdb
    smartctl 6.5 2016-05-07 r4318 [armv7l-linux-2.6.36.4brcmarm] (local build)
    Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
    
    Read Device Identity failed: Invalid argument
    
    A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
    root@Storage:/tmp/home/root# smartctl -d ata -T permissive -a /dev/sdb
    smartctl 6.5 2016-05-07 r4318 [armv7l-linux-2.6.36.4brcmarm] (local build)
    Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
    
    Read Device Identity failed: Invalid argument
    
    === START OF INFORMATION SECTION ===
    Device Model:     [No Information Found]
    Serial Number:    [No Information Found]
    Firmware Version: [No Information Found]
    Device is:        Not in smartctl database [for details use: -P showall]
    ATA Version is:   [No Information Found]
    Local Time is:    Sat Mar  3 04:28:38 2018 PST
    SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 82-83 don't show if SMART supported.
    SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 85-87 don't show if SMART is enabled.
    A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
    root@Storage:/tmp/home/root#
    
    
    I've never had one work using ATA, only SCSI. Which doesn't make sense, being as you stated the drives are ATA devices not SCSI. Yet the USB->SATA adapters I've used have all needed the SCSI flag.

    **NOTE** Check out that drive RPM.. 17715.. you should see the size of the turbocharger I bolted onto that drive hah. I should be more clear, I get full ( or close to ) functionality when not specifying a device type at all ( believe it's going sat? ). But on adapters that smart can't figure out, only SCSI gets any data at all, while sat and ATA fail.

    Code:
    root@Storage:/tmp/home/root# smartctl -a /dev/sdb
    smartctl 6.5 2016-05-07 r4318 [armv7l-linux-2.6.36.4brcmarm] (local build)
    Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Device Model:     HGST HTS545025A7E330
    Serial Number:    TRR3C3M81NVN2J
    LU WWN Device Id: 5 000cca 6e2d792f5
    Firmware Version: GGEOAH40
    User Capacity:    250,059,350,016 bytes [250 GB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Rotation Rate:    5400 rpm
    Form Factor:      2.5 inches
    Device is:        Not in smartctl database [for details use: -P showall]
    ATA Version is:   ATA8-ACS T13/1699-D revision 6
    SATA Version is:  SATA 2.6, 3.0 Gb/s
    Local Time is:    Sat Mar  3 04:13:38 2018 PST
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART Status not supported: Incomplete response, ATA output registers missing
    SMART overall-health self-assessment test result: PASSED
    Warning: This result is based on an Attribute check.
    
    General SMART Values:
    Offline data collection status:  (0x00) Offline data collection activity
                                            was never started.
                                            Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0) The previous self-test routine completed
                                            without error or no self-test has ever
                                            been run.
    Total time to complete Offline
    data collection:                (   45) seconds.
    Offline data collection
    capabilities:                    (0x5b) SMART execute Offline immediate.
                                            Auto Offline data collection on/off support.
                                            Suspend Offline collection upon new
                                            command.
                                            Offline surface scan supported.
                                            Self-test supported.
                                            No Conveyance Self-test supported.
                                            Selective Self-test supported.
    SMART capabilities:            (0x0003) Saves SMART data before entering
                                            power-saving mode.
                                            Supports SMART auto save timer.
    Error logging capability:        (0x01) Error logging supported.
                                            General Purpose Logging supported.
    Short self-test routine
    recommended polling time:        (   2) minutes.
    Extended self-test routine
    recommended polling time:        (  57) minutes.
    SCT capabilities:              (0x003d) SCT Status supported.
                                            SCT Error Recovery Control supported.
                                            SCT Feature Control supported.
                                            SCT Data Table supported.
    
    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000b   100   100   062    Pre-fail  Always       -       0
      2 Throughput_Performance  0x0005   100   100   040    Pre-fail  Offline      -       0
      3 Spin_Up_Time            0x0007   197   197   033    Pre-fail  Always       -       1
      4 Start_Stop_Count        0x0012   099   099   000    Old_age   Always       -       2305
      5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
      8 Seek_Time_Performance   0x0005   100   100   040    Pre-fail  Offline      -       0
      9 Power_On_Hours          0x0012   087   087   000    Old_age   Always       -       6123
     10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       2276
    191 G-Sense_Error_Rate      0x000a   100   100   000    Old_age   Always       -       0
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       122
    193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       2395
    194 Temperature_Celsius     0x0002   253   253   000    Old_age   Always       -       22 (Min/Max 10/39)
    196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       71
    197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
    223 Load_Retry_Count        0x000a   100   100   000    Old_age   Always       -       0
    
    SMART Error Log Version: 1
    ATA Error Count: 5397 (device log contains only the most recent five errors)
            CR = Command Register [HEX]
            FR = Features Register [HEX]
            SC = Sector Count Register [HEX]
            SN = Sector Number Register [HEX]
            CL = Cylinder Low Register [HEX]
            CH = Cylinder High Register [HEX]
            DH = Device/Head Register [HEX]
            DC = Device Command Register [HEX]
            ER = Error register [HEX]
            ST = Status register [HEX]
    Powered_Up_Time is measured from power on, and printed as
    DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
    SS=sec, and sss=millisec. It "wraps" after 49.710 days.
    
    There's more, but didn't scroll through and copy it all.
     
    Last edited: Mar 3, 2018
  16. Sean B.

    Sean B. LI Guru Member

    Sorry to hear this, btw. Hopefully those drugs will do their job in short order for ya.
     
  17. koitsu

    koitsu Network Guru Member

    This is actually OK/normal, believe it or not. It's inconvenient for you, but when I explain why, it should hopefully shed some light on it.

    Hard disks actually draw a lot of power, especially when spinning up. They often require a lot more power than a USB 1.1 or USB 2.0 port can provide, actually. USB 1.1 and USB 2.0 run at 5V, but only allow up to -- maximum -- 0.5A (half an amp) of current. Hard disks, especially 3.5" ones, or high-speed ones (e.g. 7200rpm or higher), often draw more than that.

    These two-USB-connector "Y cables" (ignore the labels of what connector is for what in the picture -- whoever did those is an idiot) actually do a couple things at once. Each USB connector (that you plug into a PC/router/whatever) runs at 5V and offers 0.5A of current. So with two, you get 5V and 1A current. Twice the power! Since there's 2 USB connectors, one of them is purely power (i.e. no data), and the other is both data and power.

    USB 3.0 runs at 5V but offers up to 0.9A of current. But even with USB 3.0, you could have an EXTREMELY high-powered device that requires more than 0.9A of current and you'd have problems.

    Now you know why so many USB enclosures include an AC adapter. If it comes with one, USE IT. Do not try to power the entire device off of USB.

    But here are 2 more problems that nobody ever talks about: gauge of wire used and cable length. Higher gauge wire (thinner wire) can't carry as much current. And the longer the USB cable, the less current it can carry. This is a notorious issue that plagues people all the time. They might buy a replacement cable that's longer than the one the vendor included (or an extension cable to make it longer) -- suddenly the drive "sort of" works; it spins up but then during heavy I/O it falls off the bus, or it spins up then spins down rapidly (very bad for a mechanical HDD (MHDD) BTW). Worse, I've actually seen USB drive enclosures sold with cables that are too long, causing these exact problems.

    One of the problems with the Y-cable thing though is that the drive for a brief period of time has "half" the amount of power it might actually need. Some badly-designed drives try to spin up in this scenario (bad bad bad!), when in reality they shouldn't do that until they have enough current.

    Another problem is that these Y-cables often don't disclose which of the 2 connectors is data+power and which is power. The order you plug them in matters! You want to plug power in first, followed by data+power. Data+power first can result in the OS issuing some I/O to the drive when it's "partially" spun up and might cause it to try and draw more power than it actually has available (cuz you haven't had time to plug the 2nd connector in yet). Chicken-and-egg problem in a way. Anyway now you know why I hate those things too. :)

    At least for WDC My Passport drives, a USB 3.0 cable is included -- there is no AC adapter included, nor an AC power port on the enclosure/drive. And the cable is VERY short (for a damn good reason). :)

    Larger-capacity drives are also more problematic. They have more physical platters, thus stronger motors, and they have more arms/moving parts and heads. They require more current. This is why if I need a purely USB-powered MHDD, I stick to the smallest capacity possible, and tend to buy enclosure+drive combinations (ex. My Passport drives), because I know what WDC gives me is going to work cable/power-wise.

    In general, if you're going to use an MHDD attached to a router via USB, you should probably buy a drive+enclosure that is AC-powered not full USB-powered. Now you know why. But yes, another wall wart required, I know -- annoying -- but hard disks require a lot of power.

    I really like Plugable's docks (the horizontal ones, NOT vertical! Those vertical ones I want nothing to do with!), specifically the USB3-SATA-UASP1 because the SATA-USB bridge is disclosed (ASMedia ASM1053), and the vendor understands that not all customers want a bridge that automatically powers down the drive (hence why they offer a F/W with that feature disabled). The products are rock solid too. But they aren't aesthetically pleasing next to a router.

    I haven't ever studied how much power draw SSDs require, but they certainly don't draw "tons more" during spin-up because there's nothing to spin up! :) But during heavy I/O I've read they can actually draw more power than MHDDs, depending on all sorts of things.

    Anyway, moving on...

    Hopefully part of my above blabbing helps answer part or all of your question.

    I've tested a LOT of SATA-USB bridges on different products, especially on Amazon, and I always write VERY thorough reviews of them. The biggest problem with most of them is that you don't always get the same SATA-USB bridge chip. You might get one that's decent in 2016, but then 6 months later -- and without changing the product model string -- you get another and it uses some completely different chip that acts stupid. Then there are all the fake/cheap Chinese ICs that mimic the good SATA-USB bridges but do a half-ass job (ex. SMART pass-through doesn't work, they act flaky or weird, etc.). You literally have no idea what you're getting.

    It's so hard to find a good/reliable product like this now.

    This is why I actually prefer for "general use" WDC's My Passport drives (the smaller capacity ones, i.e. 1TB only), even though I *HATE* that the drives themselves are natively USB (yes you read that right: the drive PCB has a USB connector on it, not SATA!) and I HATE that they use encryption (i.e. if the drive dies, you aren't going to recover data from it -- the USB IC on the drive itself actually does AES encryption of the data it writes to the platters, so they're a PITA if you ever have to hire a data recovery company):

    1. They're 100% USB 3.0-powered. So as long as the USB 3.0 port on the device you're hooking them to can provide proper current / complies with standard specifications, you're good during spin-up and during I/O.
    2. They come with a USB 3.0 cable that is guaranteed to work. It's short (proper length) and of proper gauge.
    3. They support SAT and SMART pass-through both, so you can even do SMART select tests (LBA scans) and even other things. I believe APM can be toggled on these drives using smartctl -s apm,somevalue (see the documentation for smartctl to know what somevalue is -- it varies per drive).

    For "general use" where you're swapping drives in/out etc., and you want full control over what SATA drive you buy or want, Plugable's docks are fantastic. The ASMedia ASM1053 is a good chipset with SAT and pass-through, and performs very well. I haven't seen Plugable using cheap Chinese clone ICs either. I've also used a Plugable dock with a 2.5" SSD with success.

    As for the issue at hand: I'd still like you to try doing the 3 commands I mentioned. Bare minimum because I need to know if that USB PID/VID can be added to smartmontools' database of drives that need -d sat. :)

    Finally: please be aware that the enclosure can be the one requesting the drive spin down (see above, re: Plugable), but the drive itself can also do it through several means. APM is one of them, but toggling APM off doesn't necessarily mean the drive will stop doing it. It all depends on the drive's firmware. Every HDD vendor is different, and sometimes it even varies per model (ex. WDC Green vs. WDC Red/NAS).
     
  18. koitsu

    koitsu Network Guru Member

    I'd be asking you why you haven't used smartctl --scan :)

    I can explain what's happening here -- and yes, I too have seen this (and the results are always extremely bad). The reason -d scsi is working for you is mainly because the SATA-USB bridge is doing the command translation (I am assuming the drive really is ATA). Look closely at the output please and you'll see that many things are busted/wrong/half-implemented. I'll quote them for you:

    Code:
    Vendor:               HGST HTS
    Product:              545025A7E330
    Revision:
    Rotation Rate:        17715 rpm
    Transport protocol:   Fibre channel (FCP-2)
    
    Current Drive Temperature:     0 C
    Drive Trip Temperature:        0 C
    
    Error Counter logging not supported
    
    Device does not support Self Test logging
    
    1. Vendor string is botched
    2. Product string is botched
    3. Revision is botched
    4. Rotation rate is wrong/botched
    5. Your drive is not a fibre-channel SCSI drive, that I can assure you 100%
    6. No temperature data for a drive that certainly has it.

    Yeah, this is totally janky. I am almost certain this is due to a crappy SATA-USB bridge that doesn't implement SMART passthrough. smartctl --scan will give you some idea what smartmontools thinks might be a sane -d flag (if anything).

    Otherwise, your SATA-USB bridge is making your life miserable, and you're subject to the same advice I just gave the OP. You wouldn't have known if the bridge was decent or not because... well... you never asked. :) (I'm joshing, but only partially)

    Okay, so this is where we get into a discussion about smartctl --scan.

    If you do not specify -d, then smartmontools will look up USB VID/PID in its internal database and see if there's a match. If so, the same data file contains what flags it should use to properly talk to the drive. The fact that -a by itself worked means smartmontools has the drive in its drivedb.h and is auto-detecting what to use for -d, essentially. It might not be -d sat, it might be something else. I linked to smartmontools web pages that disclose several types. --scan will show you what it recommends/would use if -d isn't specified.

    Finally:

    Yes, the "there's more" is a log of all the I/O errors the drive has experienced, but limited to a certain number (due to physical sector capacity of said log on the drive).

    You should replace this drive. This drive has a long established history of remapped LBAs. It has none pending, but it has actively remapped 71 LBAs, and has encountered nearly 5400 I/O errors during its lifetime of around 6100 hours. That's absolutely horrible.

    Sean, is this the drive you were talking about in this thread? If so, then this COMPLETELY explains how your file got corrupted! I could seriously strangle you right about now.

    P.S. -- You basically just hijacked someone else's thread in the middle of me trying to help them. I'm a bit miffed, and also miffed because fixing formatting mistakes for inline quoting takes me 30 minutes after clicking Save.
     
  19. Sean B.

    Sean B. LI Guru Member

    No, that's not the drive I was discussing in my thread regarding the corrupted file, it's one of many near empty ones I'm checking as a suitable replacement for the drive of previous discussion. No, I didn't hijack someone's thread, I was explaining why I suggested to use "-d scsi" of which is directly related to the OP's issue. However, I digress.. you can take it from here.
     
  20. CharlieSummers

    CharlieSummers Network Newbie Member

    Um, this is a 3.5" drive in the enclosure, with its own power source. I rarely if ever use 2.5" drives in enclosures, and then only in "toolless" ones I can quickly swap in-and-out. I have never purchased a 2.5" portable drive from any HD manufacturer, and frankly prefer SATA docks/cradles/whatever since they handle 2.5" drives yet have their own power supplies.

    This only to note why the power issue is irrelevant in my case. (Er, pun not intended.)

    Always. (Ok, 99% of the time. Used a 2.5" 750G drive shoved in an Orico enclosure to help a friend restore his Macintosh last week, but that was a rarity.)

    Can't imagine why; so long as you use them consistently with the bare drives, and the drives are always used vertically, I don't see the problem. It's effectively the same as mounting a drive in most cases.

    Right now, I'm annoyed with both WD and Seagate, and am only buying Toshiba 3.5" drives (internal and external).

    No problem. Will get to that right after breakfast. ;)

    Hope you're feeling better today!
     
  21. CharlieSummers

    CharlieSummers Network Newbie Member

    In order:

    Code:
    root@Tomato:/tmp/home/root# smartctl --scan
    # scan_smart_devices: glob(3) aborted matching pattern /dev/discs/disc*
    Code:
    root@Tomato:/tmp/home/root# smartctl -d sat -a /dev/sda >2.txt
    smartctl 6.6 2017-11-05 r4594 [mips-linux-2.6.22.19] (localbuild)
    Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Family:     SAMSUNG SpinPoint F4 EG (AF)
    Device Model:     SAMSUNG HD204UI
    Serial Number:    S2K4J9JB904550
    LU WWN Device Id: 5 0024e9 206186c4d
    Firmware Version: 1AQ10001
    User Capacity:    2,000,398,934,016 bytes [2.00 TB]
    Sector Size:      512 bytes logical/physical
    Rotation Rate:    5400 rpm
    Form Factor:      3.5 inches
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   ATA8-ACS T13/1699-D revision 6
    SATA Version is:  SATA 2.6, 3.0 Gb/s
    Local Time is:    Sat Mar  3 13:29:48 2018 EST
    
    ==> WARNING: Using smartmontools or hdparm with this
    drive may result in data loss due to a firmware bug.
    ****** THIS DRIVE MAY OR MAY NOT BE AFFECTED! ******
    Buggy and fixed firmware report same version number!
    See the following web pages for details:
    http://knowledge.seagate.com/articles/en_US/FAQ/223571en
    http://www.smartmontools.org/wiki/SamsungF4EGBadBlocks
    
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x00)   Offline data collection activity
                       was never started.
                       Auto Offline Data Collection: Disabled.
    Total time to complete Offline
    data collection:        (    0) seconds.
    Offline data collection
    capabilities:             (0x00)    Offline data collection not supported.
    SMART capabilities:            (0x0000)   Automatic saving of SMART data                   is not implemented.
    Error logging capability:        (0x00)   Error logging supported.
                       General Purpose Logging supported.
    SCT capabilities:           (0x003f)   SCT Status supported.
                       SCT Error Recovery Control supported.
                       SCT Feature Control supported.
                       SCT Data Table supported.
    
    SMART Error Log Version: 1
    No Errors Logged
    
    SMART Self-test Log not supported
    
    Selective Self-tests/Logging not supported
    Code:
    root@Tomato:/tmp/home/root# smartctl -d sat -x /dev/sda >3.txt
    smartctl 6.6 2017-11-05 r4594 [mips-linux-2.6.22.19] (localbuild)
    Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Family:     SAMSUNG SpinPoint F4 EG (AF)
    Device Model:     SAMSUNG HD204UI
    Serial Number:    S2K4J9JB904550
    LU WWN Device Id: 5 0024e9 206186c4d
    Firmware Version: 1AQ10001
    User Capacity:    2,000,398,934,016 bytes [2.00 TB]
    Sector Size:      512 bytes logical/physical
    Rotation Rate:    5400 rpm
    Form Factor:      3.5 inches
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   ATA8-ACS T13/1699-D revision 6
    SATA Version is:  SATA 2.6, 3.0 Gb/s
    Local Time is:    Sat Mar  3 13:33:45 2018 EST
    
    ==> WARNING: Using smartmontools or hdparm with this
    drive may result in data loss due to a firmware bug.
    ****** THIS DRIVE MAY OR MAY NOT BE AFFECTED! ******
    Buggy and fixed firmware report same version number!
    See the following web pages for details:
    http://knowledge.seagate.com/articles/en_US/FAQ/223571en
    http://www.smartmontools.org/wiki/SamsungF4EGBadBlocks
    
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    AAM level is:     128 (quiet), recommended: 254
    APM feature is:   Disabled
    Rd look-ahead is: Enabled
    Write cache is:   Enabled
    DSN feature is:   Unavailable
    ATA Security is:  Disabled, NOT FROZEN [SEC1]
    Wt Cache Reorder: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x00)   Offline data collection activity
                       was never started.
                       Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0)   The previous self-test routine completed
                       without error or no self-test has ever
                       been run.
    Total time to complete Offline
    data collection:        (20460) seconds.
    Offline data collection
    capabilities:             (0x5b) SMART execute Offline immediate.
                       Auto Offline data collection on/off support.
                       Suspend Offline collection upon new
                       command.
                       Offline surface scan supported.
                       Self-test supported.
                       No Conveyance Self-test supported.
                       Selective Self-test supported.
    SMART capabilities:            (0x0003)   Saves SMART data before entering
                       power-saving mode.
                       Supports SMART auto save timer.
    Error logging capability:        (0x01)   Error logging supported.
                       General Purpose Logging supported.
    Short self-test routine
    recommended polling time:     (   2) minutes.
    Extended self-test routine
    recommended polling time:     ( 341) minutes.
    SCT capabilities:           (0x003f)   SCT Status supported.
                       SCT Error Recovery Control supported.
                       SCT Feature Control supported.
                       SCT Data Table supported.
    
    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
      1 Raw_Read_Error_Rate     POSR-K   100   100   051    -    0
      2 Throughput_Performance  -OS--K   252   252   000    -    0
      3 Spin_Up_Time            PO---K   067   067   025    -    10185
      4 Start_Stop_Count        -O--CK   087   087   000    -    14122
      5 Reallocated_Sector_Ct   PO--CK   252   252   010    -    0
      7 Seek_Error_Rate         -OSR-K   252   252   051    -    0
      8 Seek_Time_Performance   --S--K   252   252   015    -    0
      9 Power_On_Hours          -O--CK   100   100   000    -    6870
     10 Spin_Retry_Count        -O--CK   252   252   051    -    0
     11 Calibration_Retry_Count -O--CK   252   252   000    -    0
     12 Power_Cycle_Count       -O--CK   100   100   000    -    407
    181 Program_Fail_Cnt_Total  -O---K   100   100   000    -    3313936
    191 G-Sense_Error_Rate      -O---K   100   100   000    -    1
    192 Power-Off_Retract_Count -O---K   252   252   000    -    0
    194 Temperature_Celsius     -O----   064   064   000    -    25 (Min/Max 11/51)
    195 Hardware_ECC_Recovered  -O-RCK   100   100   000    -    0
    196 Reallocated_Event_Count -O--CK   252   252   000    -    0
    197 Current_Pending_Sector  -O--CK   252   252   000    -    0
    198 Offline_Uncorrectable   ----CK   252   252   000    -    0
    199 UDMA_CRC_Error_Count    -OS-CK   200   200   000    -    0
    200 Multi_Zone_Error_Rate   -O-R-K   100   100   000    -    0
    223 Load_Retry_Count        -O--CK   252   252   000    -    0
    225 Load_Cycle_Count        -O--CK   099   099   000    -    14226
                                ||||||_ K auto-keep
                                |||||__ C event count
                                ||||___ R error rate
                                |||____ S speed/performance
                                ||_____ O updated online
                                |______ P prefailure warning
    
    General Purpose Log Directory Version 1
    SMART           Log Directory Version 1 [multi-sector log support]
    Address    Access  R/W   Size  Description
    0x00       GPL,SL  R/O      1  Log Directory
    0x01           SL  R/O      1  Summary SMART error log
    0x02           SL  R/O      2  Comprehensive SMART error log
    0x03       GPL     R/O      2  Ext. Comprehensive SMART error log
    0x06           SL  R/O      1  SMART self-test log
    0x07       GPL     R/O      2  Extended self-test log
    0x08       GPL     R/O      2  Power Conditions log
    0x09           SL  R/W      1  Selective self-test log
    0x11       GPL     R/O      1  SATA Phy Event Counters log
    0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
    0xe0       GPL,SL  R/W      1  SCT Command/Status
    0xe1       GPL,SL  R/W      1  SCT Data Transfer
    
    SMART Extended Comprehensive Error Log Version: 1 (2 sectors)
    No Errors Logged
    
    SMART Extended Self-test Log Version: 1 (2 sectors)
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]
    
    SMART Selective self-test log data structure revision number 0
    Note: revision number not 1 implies that no selective self-test has ever been run
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Completed [00% left] (0-65535)
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    
    SCT Status Version:                  2
    SCT Version (vendor specific):       256 (0x0100)
    SCT Support Level:                   1
    Device State:                        Active (0)
    Current Temperature:                    25 Celsius
    Power Cycle Min/Max Temperature:     20/28 Celsius
    Lifetime    Min/Max Temperature:     14/64 Celsius
    Under/Over Temperature Limit Count:   0/0
    
    SCT Temperature History Version:     2
    Temperature Sampling Period:         5 minutes
    Temperature Logging Interval:        5 minutes
    Min/Max recommended Temperature:     -5/80 Celsius
    Min/Max Temperature Limit:           -10/85 Celsius
    Temperature History Size (Index):    128 (68)
    
    Index    Estimated Time   Temperature Celsius
      69    2018-03-03 02:55    25  ******
      70    2018-03-03 03:00    39  ********************
     ...    ..( 26 skipped).    ..  ********************
      97    2018-03-03 05:15    39  ********************
      98    2018-03-03 05:20    38  *******************
      99    2018-03-03 05:25    39  ********************
     ...    ..(  3 skipped).    ..  ********************
     103    2018-03-03 05:45    39  ********************
     104    2018-03-03 05:50    38  *******************
     105    2018-03-03 05:55    39  ********************
     ...    ..(  3 skipped).    ..  ********************
     109    2018-03-03 06:15    39  ********************
     110    2018-03-03 06:20    40  *********************
     111    2018-03-03 06:25    39  ********************
     ...    ..(  5 skipped).    ..  ********************
     117    2018-03-03 06:55    39  ********************
     118    2018-03-03 07:00    38  *******************
     119    2018-03-03 07:05    38  *******************
     120    2018-03-03 07:10    39  ********************
     121    2018-03-03 07:15    38  *******************
     ...    ..( 19 skipped).    ..  *******************
      13    2018-03-03 08:55    38  *******************
      14    2018-03-03 09:00    39  ********************
      15    2018-03-03 09:05    38  *******************
     ...    ..(  4 skipped).    ..  *******************
      20    2018-03-03 09:30    38  *******************
      21    2018-03-03 09:35    39  ********************
      22    2018-03-03 09:40    38  *******************
     ...    ..(  9 skipped).    ..  *******************
      32    2018-03-03 10:30    38  *******************
      33    2018-03-03 10:35    39  ********************
     ...    ..( 12 skipped).    ..  ********************
      46    2018-03-03 11:40    39  ********************
      47    2018-03-03 11:45    24  *****
      48    2018-03-03 11:50    24  *****
      49    2018-03-03 11:55    26  *******
      50    2018-03-03 12:00    30  ***********
      51    2018-03-03 12:05    30  ***********
      52    2018-03-03 12:10    30  ***********
      53    2018-03-03 12:15    26  *******
      54    2018-03-03 12:20    27  ********
      55    2018-03-03 12:25    22  ***
      56    2018-03-03 12:30    23  ****
      57    2018-03-03 12:35    25  ******
      58    2018-03-03 12:40    27  ********
      59    2018-03-03 12:45    28  *********
      60    2018-03-03 12:50    26  *******
      61    2018-03-03 12:55    27  ********
      62    2018-03-03 13:00    27  ********
      63    2018-03-03 13:05    24  *****
      64    2018-03-03 13:10    23  ****
      65    2018-03-03 13:15    21  **
      66    2018-03-03 13:20    22  ***
      67    2018-03-03 13:25    23  ****
      68    2018-03-03 13:30    24  *****
    
    SCT Error Recovery Control:
               Read: Disabled
              Write: Disabled
    
    Device Statistics (GP/SMART Log 0x04) not supported
    
    Pending Defects log (GP Log 0x0c) not supported
    
    SATA Phy Event Counters (GP Log 0x11)
    ID      Size     Value  Description
    0x0001  4            0  Command failed due to ICRC error
    0x0002  4            0  R_ERR response for data FIS
    0x0003  4            0  R_ERR response for device-to-host data FIS
    0x0004  4            0  R_ERR response for host-to-device data FIS
    0x0005  4            0  R_ERR response for non-data FIS
    0x0006  4            0  R_ERR response for device-to-host non-data FIS
    0x0007  4            0  R_ERR response for host-to-device non-data FIS
    0x0008  4            0  Device-to-host non-data FIS retries
    0x0009  4            1  Transition from drive PhyRdy to drive PhyNRdy
    0x000a  4            0  Device-to-host register FISes sent due to a COMRESET
    0x000b  4            0  CRC errors within host-to-device FIS
    0x000d  4            0  Non-CRC errors within host-to-device FIS
    0x000f  4            0  R_ERR response for host-to-device data FIS, CRC
    0x0010  4            0  R_ERR response for host-to-device data FIS, non-CRC
    0x0012  4            0  R_ERR response for host-to-device non-data FIS, CRC
    0x0013  4            0  R_ERR response for host-to-device non-data FIS, non-CRC
    0x8e00  4            0  Vendor specific
    0x8e01  4            0  Vendor specific
    0x8e02  4            0  Vendor specific
    0x8e03  4            0  Vendor specific
    0x8e04  4            0  Vendor specific
    0x8e05  4            0  Vendor specific
    0x8e06  4            0  Vendor specific
    0x8e07  4            0  Vendor specific
    0x8e08  4            0  Vendor specific
    0x8e09  4            0  Vendor specific
    0x8e0a  4            0  Vendor specific
    0x8e0b  4            0  Vendor specific
    0x8e0c  4            0  Vendor specific
    0x8e0d  4            0  Vendor specific
    0x8e0e  4            0  Vendor specific
    0x8e0f  4            0  Vendor specific
    0x8e10  4            0  Vendor specific
    0x8e11  4            0  Vendor specific
    A little disconcerting to see that WARNING: at the top of the outputs...
     
  22. koitsu

    koitsu Network Guru Member

    Thanks. I'm surprised to see smartctl --scan not working. Hmm. I have several guesses for why that's happening (uClibc difference in glob(3) functionality vs. GNU libc, possibly an oddity with the Entware-ng package itself (how it was built)) but it's irrelevant with regards to solving your specific problem because explicitly using -d sat seems to work.

    The firmware warning is quite legitimate. You can read about the details at said links. You're at the mercy of the vendor for stuff like this, sorry to say. For now, let's just put that on the back burner and move forward.

    The smartctl -a output looks copy-pasted wrong or something anomalous has happened; there are no attributes being shown. That's incredibly discerning. I've literally never seen that happen before, especially because -x shows attributes. Hmm.

    Review of this drive's attributes (because while I'm here I tend to do this for folks):

    At least one attribute in -x output is mislabelled/wrong; attribute 181 is usually reserved for SSDs (there is no flash on the model of drive you're using), so this looks like another case of where SMART attribute IDs are being used differently between vendors' products. There is no official standard for SMART attribute IDs (the attribute numbers and what they represent); each vendor/manufacturer can use whatever number for whatever purpose they want. The only standard there is is for the actual format of the structure/data. This may be a case where a new -F value is needed. I'll let the smartmontools folks decide. So please just ignore (for this drive) what attribute 181 shows. This drive might also need -F samsung3 for it's self-test results (showing 0% left despite being completed), but again, I'll have to let the smartmontools folks decide that.

    This drive also seems to be one that parks its heads excessively (very large number in RAW_VALUE for attribute 225) -- sometimes people call this LCC (Load Cycle Count), taken from the description of the SMART attribute. Many Western Digital and Seagate drives do this too, though some models have workarounds. Sometimes you can disable APM to get it to stop (though there is no direct relation between APM itself and head-parking, vendors sometimes just make it so the latter stops if the former is disabled; it's their choice). I've talked about this on my blog before and have very strong feelings about this feature on 3.5" (non-laptop) disks. So if you hear this drive making a "click" noise (especially if no I/O has been done to the drive in some time, then suddenly I/O issued to it), it's normal (and you'll see RAW_VALUE increment by 1 every time it happens).

    That said: APM on this drive is already disabled:

    Code:
    APM feature is:   Disabled
    
    But note that this says Disabled, not Unavailable. There's a difference. APM being disabled can sometimes affect whether or not disk-level idle standby can be adjusted. Again, it's vendor or model-specific.

    Don't mistake AAM (Automatic Acoustic Management) for APM (Advanced Power Management). They're different features, only differing by a single letter.

    hdparm -S 0 actually tries to disable the disk-level idle standby timer by setting the value to 0. This didn't work when you tried it, so this could be due to the drive itself having APM disable, or the drive itself not allowing changing of this feature, or it could be some wonkiness with hdparm vs. the Linux kernel. All are strong possibilities. smartmontools can also change/adjust this feature, so let's try it with smartctl:

    Please try smartctl -d sat -s standby,off /dev/sda and provide the output in a code block.

    If that doesn't work, then let's see if enabling APM is possible (I'm going to have to guess at the value here, and you'll understand why when you read the docs), followed by disabling standby by doing:

    smartctl -d sat -s apm,127 /dev/sda
    smartctl -d sat -s standby,off /dev/sda
    (only run this if the previous command worked)

    If this doesn't work, let's try a higher APM value:

    smartctl -d sat -s apm,254 /dev/sda
    smartctl -d sat -s standby,off /dev/sda
    (only run this if the previous command worked)

    You can see if enabling APM worked by doing smartctl -d sat -g apm /dev/sda

    Here's the documentation for the apm and standby parameters. Note that you cannot get (-g) standby, only set it.

    Code:
           -g NAME, --get=NAME, -s NAME[,VALUE], --set=NAME[,VALUE]
    
    ...
    
                  apm[,N|off] - [ATA only] Gets/sets the Advanced Power Management
                  (APM) feature on device (if supported).  If a value between 1
                  and 254 is provided, it will attempt to enable APM and set the
                  specified value, 'off' disables APM.  Note the actual behavior
                  depends on the drive, for example some drives disable APM if
                  their value is set above 128.  Values below 128 are supposed to
                  allow drive spindown, values 128 and above adjust only head-
                  parking frequency, although the actual behavior defined is also
                  vendor-specific.
    
    ...
    
                  standby,[N|off] - [ATA only] Sets the standby (spindown) timer
                  and places the drive in the IDLE mode.  A value of 0 or 'off'
                  disables the standby timer.  Values from 1 to 240 specify
                  timeouts from 5 seconds to 20 minutes in 5 second increments.
                  Values from 241 to 251 specify timeouts from 30 minutes to 330
                  minutes in 30 minute increments.  Value 252 specifies 21
                  minutes.  Value 253 specifies a vendor specific time between 8
                  and 12 hours.  Value 255 specifies 21 minutes and 15 seconds.
                  Some drives may use a vendor specific interpretation for the
                  values.  Note that there is no get option because ATA standards
                  do not specify a method to read the standby timer.
                  [NEW EXPERIMENTAL SMARTCTL FEATURE] If '-s standby,now' is also
                  specified, the drive is immediately placed in the STANDBY mode
                  without temporarily placing it in the IDLE mode.  Note that ATA
                  standards do not specify a command to set the standby timer
                  without affecting the power mode.
    
    You will see "vendor specific" a lot in the documentation -- welcome to present-day hard disks and the lack of standards when it comes to features like this (many of which, IMO, shouldn't exist at all. It's very hard to find a MHDD these days that just does what it's told and doesn't try to do extra "magic things").

    If none of that works (i.e. the problem still happens), then the explanation is simple: the drive firmware does not permit adjustment of APM and/or disk-level idle standby timer. In this case, there's nothing you can do about it, other than buy a different brand/model of hard disk.

    If enabling APM worked but setting standby still didn't work, then you should disable APM again (putting it back to its original state) by doing smartctl -d sat -s apm,off /dev/sda

    I can't tell you what to buy; every person has different experiences with brands/models that justify their view. For example -- and I'm talking strictly about 3.5" drives here -- I personally/professionally avoid Samsung and Seagate disks because of their firmware problems, Seagate because of their extremely aggressive head-parking (esp. on recent models), and Seagate (again) because of their extremely high failure rates. I avoid very specific models of Western Digital drives, particularly their "Blue" and "Green" (or "IntelliPower") models because of excessive head parking. I've had extremely good experiences with their "Black" and "Red" (NAS) series (particularly 1TB and 2TB models). I avoid higher-capacity disks because more platters = more moving parts = higher chance something will fail.

    And I use SSDs (specifically Samsung, or Intel as a fallback) if capacity is not a focus.

    USB flash drives are also an option, as long as you don't require large capacity storage; the downside to USB flash drives is that obviously there's no SMART capability, so you can't "diagnose" what could be going wrong with one if it starts to misbehave. I've had good experiences with USB 2.0 flash drives, and have had "so-so" experiences with USB 3.0 flash drives (I have several PNY drives which don't do XHCI neg properly, thus when put on a USB 3.0 port they intentionally pick USB 2.0 protocol. I don't buy PNY any longer, obviously).

    But that's me. Your needs and experiences may be different from mine (ex. you avoid WD and Seagate, which is perfectly OK to do! I don't debate people on this stuff), and your requirements/needs may differ too. :) It's all good.
     
    Last edited: Mar 3, 2018
  23. CharlieSummers

    CharlieSummers Network Newbie Member

    Did it a second time, writing to 2a.txt, and get a file the same filesize:

    Code:
    -rw-r--r--    1 root     root          2075 Mar  3 13:30 2.txt
    -rw-r--r--    1 root     root          2075 Mar  3 17:49 2a.txt
    -rw-r--r--    1 root     root         11642 Mar  3 13:33 3.txt
    Only wrote to files b/c I thought the output might be too large for the terminal scrollback. Was right, at least in the -x output.

    Code:
    root@Tomato:/tmp/home/root# smartctl -d sat -s standby,off /dev/sda
    smartctl 6.6 2017-11-05 r4594 [mips-linux-2.6.22.19] (localbuild)
    Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF ENABLE/DISABLE COMMANDS SECTION ===
    Standby timer set to 0 (disabled)
    Well, that's a d*mned thing...I am assuming it worked, although I'll need to let it sit for a half-hour or so and then try accessing it to be sure. Easy enough to hear the drive spin-up, even from across the room.

    Am assuming I should skip the additional steps until/unless we're certain this did not work, yes?

    Going to have to study this to see if I can get a handle on it.

    Just got in two new Toshiba 2T bare drives, so I have alternatives should this all go to heck-in-a-handbasket. (With only three manufacturers of hard drives, not sure what I'll do if Toshiba ticks me off... ; )
     
  24. koitsu

    koitsu Network Guru Member

    Yup, you got it! :)

    There's also the possibility that the enclosures' SATA-USB bridge itself (specifically the USB portion) is actually going into some sort of sleep/standby mode. I believe this is colloquially known as "autosuspend" (Google that alongside word Linux and you'll see). USB does have power-saving capabilities, but I'm not sure if Linux 2.6 actually implements this, nor if it lets you toggle/adjust it -- or how that would even be accomplished on TomatoUSB. I know newer Linux has support for it (warning: kernel documentation): https://www.kernel.org/doc/html/v4.13/driver-api/usb/power-management.html

    One way to see if the actual USB layer is "falling asleep" would be to:

    1. Run dmesg and save the output somewhere (ex. output1.txt)
    2. Wait 30+ minutes (i.e. until the problem happens
    3. Run dmesg and save the output somewhere (ex. output2.txt)
    4. Compare the two outputs (ex. diff -u output2.txt output1.txt)

    There may be some MAC addresses and other "personalised" data in these outputs, so blindly copy-pasting them here might not be a good idea.

    But then again maybe only this data would be shown if USB verbose logging was compiled into the kernel + enabled (most things in TomatoUSB are disabled because one must keep firmware size to a minimum). It would at least allow one to know if it seemed like a "disk-level" thing or an entire USB layer thing. Sometimes kernel messages can hint at one or the other.
     
  25. CharlieSummers

    CharlieSummers Network Newbie Member

    Maybe, but I can verify the drive itself is still taking a snooze. If I leave it sit for a spell, when I attempt to access it (across the network or via the open SSH terminal) I can hear the drive spin-up. Can't hear any head-clicks from across the room, but the sound of the drive motor whiring-up is clear as a bell even to my ancient ears. Won't know until tomorrow whether or not the drive will stay down when writing bandwidth backups, but this isn't promising.

    I'm thinking tomorrow I restore the image to one of the 2T bare drives sitting here, haul out one of the docks and see if by replacing this enclosure/drive completely I can keep the drive from diving off...or at least have it wake up properly when Tomato tries to write the backups.
     
  26. koitsu

    koitsu Network Guru Member

    There is a way to keep the disk alive by sending it an I/O request periodically (say: read from or write data to the drive). People would normally say something like "just use dd if=/dev/sda of=/dev/null count=1" (read a single 512 bytes from the start of the raw /dev/sda device and throw it away), but that doesn't work exactly how you'd expect on TomatoUSB. Here's why:

    All device and file I/O -- which includes raw LBA (sector) reads -- are cached in the Linux buffer cache (RAM). Future reads from that same LBA/region/whatever come directly from the buffer cache; no physical I/O is ever done to the device (unless the buffer cache is emptied, which would be quite a bad thing to do).

    For writes, the situation is the same -- the kernel will eventually flush the buffered/cached write to the disk, but when it happens is up to the kernel.

    Bypassing the Linux buffer cache is possible in C code: there's a flag in the open() syscall named O_DIRECT that will do exactly this.

    The problem then becomes "what programs offer a way to use O_DIRECT"? On common Linux distros (i.e. not routers) offering GNU utilities or lots of other programs, there are lots of options: for example, GNU dd has O_DIRECT support using iflag=direct (reads) or oflag=direct (writes), hdparm has a --direct flag, and even more crazy things like using raw(8) to bypass the cache.

    On TomatoUSB, the situation is different. Busybox dd does not offer O_DIRECT support; in fact, the author has been vocal about his dislike for O_DIRECT in the past, and present-day Busybox source contains absolutely no references to O_DIRECT. Nobody ever wrote the patch referenced in that link, by the way, and Denys already stated he won't write the code himself. We don't have hdparm. We don't have raw. All of this stuff is missing because the firmware must be kept minimal (small in size) to fit within flash. Not all routers have large flash.

    So what commands on TomatoUSB are available that use O_DIRECT that might make this possible? Sadly, the answer is none -- I actually checked (grep -r O_DIRECT router | egrep -v 'DIRECTORY|DIRECTION' turns up many results, but all within daemons or code of programs that aren't included with TomatoUSB (like many utilities in e2fsprogs)).

    But there may be hope -- I just haven't tested it.

    One way you could potentially accomplish this is to do something as simple as:

    Code:
    touch /path/to/mounted/sda1/filesystem/keepdriveawake 2>/dev/null && sync
    
    And make this a cronjob (using cru) that runs once every 5 minutes (or some duration that's shorter than how often the drive spins down). You'll have a 0-byte file laying around, but that's the trade off.

    sync(8) just executes the sync(2) syscall, which is supposed to flush inode data and metadata to the underlying device. It doesn't guarantee the data is actually written to the platters, but it's supposed to guarantee writes to the underlying device (i.e. some LBA writes should happen, guaranteeing the drive stays awake).

    One danger with this approach is that during unrelated and heavy I/O (say, lots of writes to the sda disk), if the cronjob runs, there will be a period of time where the system becomes slow/stalled (until the sync finishes). So you wouldn't want to run this, say, once a minute because you could tank your disk performance in some scenarios.

    If you want to try said kludge/workaround, you can add the cronjob like so:

    Code:
    cru a keepdriveawake '*/5 * * * * touch /path/to/mounted/sda1/filesystem/keepdriveawake 2>/dev/null && sync'
    
    And you can delete the cronjob like so:

    Code:
    cru d keepdriveawake
    
    You can see what cronjobs are active like so (TomatoUSB has some of its own, so don't delete them by mistake!):

    Code:
    cru l
    
    You will need to change /path/to/mounted/sda1/filesystem/keepdriveawake to whatever path your /dev/sda drive (or its partitions) are mounted -- it's probably something like /tmp/mnt/MyHardDisk. This information isn't shown in the thread, so I don't know what path to give you. You can name the file (keepdriveawake) whatever you want, it doesn't matter.

    You can later add this to Scripts -> Init to restore the cronjob automatically on a router reboot.

    The reason for the 2>/dev/null is to throw all stderr output to /dev/null. Scripts -> Init doesn't guarantee that a USB hard disk is fully mounted by the time the script runs, so /path/to/mounted/filesystem/keepdriveawake won't exist until the drive is mounted, and will return an error in the meantime. && (instead of ; (semi-colon)) ensures that the sync command only runs if the previous touch command was successful.
     
  27. Sean B.

    Sean B. LI Guru Member

    I meant to ask this far back in the thread when I suggested disabling power management on the drive via hdparm but never did:

    You're certain Tomato isn't spinning the drive down, regardless of what the GUI shows correct? IE:

    Code:
    ps | grep idle
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice