1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Bad Flash and Redboot NAS200

Discussion in 'Cisco/Linksys Network Storage Devices' started by Treah, Dec 10, 2009.

  1. Treah

    Treah Addicted to LI Member

    Now I have not flashed my nas200 to an unworkable state but thinking back when I did I wonder if I could have recovered it. What happend then was I created a custom firmware flashed the nas200 with it and then it started to boot but then would just hang and never give me the 2 beeps to say that it was ready. Also I did not have access to SSH on it or the web page. Would this be possible to recovery from using redboot or would this type of failure require a jtag connector?

    Also if it is possible with redboot how do you go about recovering it. All of the stuff on the wiki says that it would not accept the upgrade command.

    I know you have some experience with this jac and I would like some info just in case I ever flash and brick it again.
     
  2. jac_goudsmit

    jac_goudsmit Super Moderator Staff Member Member

    Disclaimer: all the following is from memory... if there are any mistakes, let me know and I'll correct them.

    I would say the following are the most common reasons that a NAS200 may be bricked:
    • Something went wrong during flashing, e.g. you had a power outage
    • You flashed a kernel that doesn't boot (e.g. wrong boot device)
    • You flashed a root file system that stops working before it boots all the way (e.g. syntax error in a script)
    • You used the "flash" command in Redboot (e.g. just to see what it does, in the hope that it will just print an error message) Do not use the flash command! It will immediately brick your NAS200 without warning

    There are some other possibilities (like making a change to Redboot that makes the system unbootable) but those are pretty much the most common. I'm sure they all happened to me at some point in time. :grin:

    The Flash chip in the NAS200 is divided into 4 partitions: Kernel, Rootfs, Config and Redboot. When you use the Web GUI, all partitions are flashed in the order mentioned. When you use the "Update" command from Redboot, only the first three partitions are flashed; the "flash" command in Redboot (Dont use it!) flashes only the Redboot partition and does that by copying it from the RAM at location 0x400000. It doesn't perform any checks on what's there so if you try "flash" by mistake, it immediately overwrites the boot loader with random data. You won't notice until you reboot, because Redboot runs from cache RAM and the data is only written to the flash ROM, not to the cache. So after that nasty command your only chance to see your mistake is to reboot, but you can only fix your mistake if you don't reboot. :erm:

    Symptoms:
    • Redboot turns on all the lights and the fan, then it boots the kernel. If the lights don't come on, either your Redboot is borked or it's your power supply.
    • The kernel turns most of the lights off and sounds a single beep. If the lights stay on or there's no beep, the kernel could not be loaded or something went wrong while the kernel was starting.
    • The double beep is generated from one of the last startup scripts in the root file system. If it never comes, either your kernel panicked or there's something wrong in a startup script.

    It helps to have a serial port on the NAS200 to see what exactly is going on when you bricked your device. It also helps tremedously to have the source code handy, especially when working with Redboot. Here are some remedies and hints:
    • If something went wrong during flashing of the kernel or the rootfs, you will still be able to connect to Redboot (via Telnet or serial port). You can either download a working kernel and optionally a rootfs from a TFTP or HTTP server (note, due to some code changes by Sercomm/Linksys some parameters to the "linux" command are ignored including the initrd location and size), or you can use the "update" (or was it "upgrade"?) command in Redboot to flash the kernel+rootfs+config using upslug which is available from the nslu2-linux Sourceforge site (I wrote some information on the upslug page on the nslu2-linux.org wiki). You will need to use the -f parameter. This will reset the configuration to the default and it will also unfortunately reset your MAC address to 00-C0-A0-D0-E0-00. There is a command in Redboot to set the Mac address but this doesn't set the MAC address directly but listens for a special network packet that would normally be sent by Linksys' manufacturing application, which is not available. You might be able to send it using netcat, though; I never tried. I just learned to live with the strange MAC address...
    • If something went wrong during flashing of the config, you will probably lose your configuration, but the configuration partition is only one block and I think the web GUI flash program either leaves it alone or flashes it twice: once with the blank configuration, once with a backup of the pre-update configuration.
    • If something went wrong during flashing of eCos/Redboot or if you used "flash" (Don't use the "flash" command!), you will need a JTAG cable and the RDC Loader program. Unlike other devices such as the Linksys WRT54G there is no write-protected rescue partition that you can start by holding a pin low or something.

    If you ever brick your NAS200 to the point where you need to JTAG it back to life, you will run into the situation that the RDC Loader program (although stable in most areas) won't recognize the Intel flash chip. You will have to download Redboot to RAM (which may take quite a while via JTAG, even though it's only 128K), and run it to make it flash itself into the ROM. An additional problem is that the RDC3210 CPU just like the i486 starts up in Real mode, and there's something with the ecs segment register that doesn't work until it encounters a Long Jump instruction (opcode 0xEA if I'm not mistaken). So let's say your flash ROM is all a mess and you start the CPU with JTAG and trace through instructions, you will see that the instruction at the ecs:eip location is something totally different from what the CPU actually does on each Trace step (watch the registers), until it runs into a Long Jump. It took me quite a while to figure out how to get the CPU to a point where it was actually running the instructions that it was showing, and unfortunately I forgot how I did it, and I hope you understand that I don't feel much like bricking my NAS on purpose to figure it out again...

    Anyway that's the "executive summary" of bricking and unbricking the NAS200... Sorry about the lack of details; I haven't heard of anyone (else) who bricked their NAS200 for a while but if it happens to you, let me know in this forum and I'll try to help you fix it. Even if the entire flash is wiped, it's possible to revive the NAS200 so unless you let the Magic Blue Smoke escape from the hardware, it's always possible to rescue it.

    ===Jac
     
  3. alejandro_liu

    alejandro_liu Addicted to LI Member

    Wow that was quite extensive.

    I myself would have only suggested to try "upslug" first.
     
  4. Treah

    Treah Addicted to LI Member

    Damn thats a ton of stuff. When I bricked mine it still gave me the first beep so it was something in the kernel startup that was not working. I am not sure I only unpacked the firmware and then rebuilt it using the instructions on the wiki page for the slug. I was using a updated firmware package and I think thats why I had problems when it was booting. I dunno. It would have been possible to recover it but at the time there was very limited information on how to do it.


    Also I am having a problem with un-taring the source from the GPL page from linksys it keeps failing to completely tar out and fails. Have you had any trouble with the file jac?
     
  5. Treah

    Treah Addicted to LI Member

    Ohh never mind I figured it out. tar was following a symlink that was in the sources and was trying to extract files that were not directories. below is a snapshot of the errors for historical reasons in case someone else tries to extract it. The errors should not cause any problems or at least I don't think they will.

    Code:
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/fixinc/includ
    e/fixed: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/fixinc/includ
    e/README: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/fixinc/includ
    e/float.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/fixinc/includ
    e/limits.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/fixinc/includ
    e/xmmintrin.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/fixinc/includ
    e/mmintrin.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/fixinc/includ
    e/iso646.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/fixinc/includ
    e/stdbool.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/fixinc/includ
    e/varargs.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/fixinc/includ
    e/stddef.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/fixinc/includ
    e/stdarg.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/fixinc/includ
    e/syslimits.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/cp/include/fi
    xed: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/cp/include/RE
    ADME: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/cp/include/fl
    oat.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/cp/include/li
    mits.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/cp/include/xm
    mintrin.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/cp/include/mm
    intrin.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/cp/include/is
    o646.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/cp/include/st
    dbool.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/cp/include/va
    rargs.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/cp/include/st
    ddef.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/cp/include/st
    darg.h: Cannot open: Not a directory
    tar: NAS200_V34R79_GPL/ecos/gnutools/i386-elf-source/BUILD/gcc/gcc/cp/include/sy
    slimits.h: Cannot open: Not a directory
    tar: Exiting with failure status due to previous errors
    
     
  6. jac_goudsmit

    jac_goudsmit Super Moderator Staff Member Member

    Correct, there is a problem in the V3.4R79 tarball on the Linksys page. I think what they did is add a number of updated files to an existing tarball instead of just creating a new tarball, and didn't do a good job of testing. Or they used a version of tar that doesn't expose the problem somehow.

    The problem is that somewhere early in the tarball there are a few symlinks that get created and later in the tarball it tries to overwrite the symlinks with directories. I checked and made sure that the files that are stored later in the tarball are identical to the ones near the start, so the errors can be ignored.

    This is only a problem in the Linksys NAS200 V34R79 tarball, not in any other version and of course also not in my modified version.

    ===Jac
     
  7. Treah

    Treah Addicted to LI Member

    I am finally getting around to checking out the sources and building my own version again. I notice that it does not compile cleanly on my system now too I get errors with MAX_PATH not set. I even tried compiling it with and older version of gcc. I suspect this is a problem with me using newer kernel headers then what the nas was originally built with. I may have an easier way to setup a good cross compiler with the correct libs simply. Ill follow what we basically do when building a lfs system by setting a specs file and such.
     
  8. Treah

    Treah Addicted to LI Member

    I resolved the build problem with the kernel provided by linksys. The sumversion.c file is missing a include statement that is in the newer kernels. You have to add it back in there if you want to compile it on any kind of recent compiler. The include should look like this #include<limits.h>
     
  9. jac_goudsmit

    jac_goudsmit Super Moderator Staff Member Member

    Yep, that was one of the problems I ran into as well.

    Also if you try to build Redboot you will have to set an environment variable to override the POSIX version because the Redboot build system uses an old syntax of the tail program.

    If you're interested, you can look at the SVN tree of the nasi200 project on Sourceforge that I started a long time ago and pretty much abandoned because I made a mess of it. It should give you a pretty good idea of what was changed between the original GPL code and the Linksys code. Unfortunately it's not totally up to date: it's based on R75 (not R79) and also it doesn't contain any of my changes to create any of my JacX firmwares.

    ===Jac
     
  10. Treah

    Treah Addicted to LI Member

    Is that a silent failure? Because it did not give me any sort of error when building a test firmware. I did not make any changes I just did a make in the main directory. I have also been studying how they setup there cross compiler so I can build other apps. It actually turned out to be a simple process of how it builds the firmware. cross compile programs ---> decompress file system ---> run make install on file system ----> re-compress delete files extracted. You could even add programs just by modifying one of the make files. I think the real challenge would be getting the scripts setup right and dealing with old libraries. Ive also considered updating the kernel and glib. Do you know if they fiddled too much with the kernel source or at all?
    I did however notice when I ran a make menuconfig on the kernel sources that they included alot of odd things like speedstepping on the processor.
     
  11. jac_goudsmit

    jac_goudsmit Super Moderator Staff Member Member

    No it doesn't fail silently, it causes an error. But ecos/Redboot doesn't get built by default if you just type make, you have to explicitly rebuild it otherwise it will re-use the binary that's stored in the images directory.

    True. I changed the build system in my firmware so that the root filesystem is not in a separate tarball, which makes it easier to make changes to startup scripts.

    The biggest problem I have with Sercomm/Linksys' build system is that they made changes to Makefiles after they are generated by autoconf scripts. What they should have done was to run ./configure with environment overrides e.g. to use CC=/path/to/our/gcc and PREFIX=/path/to/our/bin. Instead, they changed the Makefile that gets generated by ./configure so that it includes rules.mk with those overrides. While that may be a quicker fix to getting the build to work right, it's not very elegant. It's as if you would make a change to a .o file (instead of a .c file) to make a program work differently. It makes it difficult to add software to the device and even more difficult to upgrade software that's already there (like Busybox).

    I agree (but you mean glibc, not glib). That's what I had in mind for the NASi200 project but my spare time has decreased significantly since I started working on NAS200 improvements, and although I've been programming computers and embedded software for close to 30 years and have used and programmed under Linux on and off since 1993 or so, my experience with Autoconf is limited...

    Like I said, I checked the original kernel sources into SVN on Sourceforge and then checked in the Sercomm/Linksys sources so you can see for yourself. The changes aren't very extensive but they are widespread. Many files that appear changed in SVN really aren't, it's just that some files had a change log or a label or $Id$ that got changed in the process of getting moved around through the Sercomm source control system (they used CVS obviously).

    Examples of changes:
    • Chipset setup code
    • GPIO usage (LEDs for hard disk and ethernet)
    • Hard-coded RAM and MTD memory maps
    • Squashfs patches
    • An entire module for the RDC network device
    • Many many patches to get the Real Time Clock working, I think the rtc is on an I2C bus on the parallel port (the Velleman module that generates a bunch of compiler warnings was used for I2C bitbanging on the parallel port) but I'm not sure, I haven't looked closely enough yet.

    If you're going to upgrade the kernel, it will help to know that recent kernels have built-in support for the RDC32xx to some degree, thanks to Florian Fainelli who got a lot of his work from the OpenWRT project checked in to the main kernel tree. I had a glance at those patches and they appear somewhat cleaner than the Sercomm/Linksys code, especially the network driver (probably because Florian used more recent RDC sample code to start with) but in some places I got the feeling that some changes were too specifically tailored to some kind of hardware (the RDC eval board perhaps?) e.g. the GPIO's are wired up in a certain way that appeared incompatible with the NAS200 hardware. I'm sure I will have a closer look when I get back to OpenWRT development.

    I noticed that too. I think this is one of the signs that Sercomm and Linksys wanted to get this to market as quickly as they could. They probably had a kernel config from the RDC eval board that was also put together quickly, and decided "if it works and it fits in the 1.5MB that we allocated, let's use it and we can optimize later. Compile time is irrelevant". While I don't always agree with this philosophy as a software engineer, I understand how it happens and I don't blame the engineers for this. They were probably under a lot of pressure to get it done on time and it must have been very frustrating that in the end, the NAS200 turned out to be slower than the NSLU2.

    ===Jac
     

Share This Page