1. 24 Jun, 2020 1 commit
  2. 05 Apr, 2020 1 commit
  3. 31 Mar, 2020 7 commits
  4. 30 Mar, 2020 3 commits
    • John Hemmick's avatar
      Allow debug prints without __VA_ARGS__ · 6372f515
      John Hemmick authored
      __VA_ARGS__ are frustrating in C. Even for their main purpose (printf),
      they fall short in that they don't have a _portable_ way to have zero
      arguments after the format string in a printf call.
      
      Even if we detect compilers and use ##__VA_ARGS__ where available, GCC
      emits a warning with -pedantic that is _impossible_ to explicitly
      disable.
      
      This commit contains the best solution we can think of. A bit of
      indirection that adds a hidden "%s" % "" to the end of the format
      string. This solution does not work everywhere as it has a runtime
      cost, but it is hopefully ok for debug statements.
      6372f515
    • Christopher Haster's avatar
      Bumped minor version to v2.2 · 6622f3de
      Christopher Haster authored
      6622f3de
    • Christopher Haster's avatar
      Last minute tweaks to debug scripts · 5137e4b0
      Christopher Haster authored
      - Standardized littlefs debug statements to use hex prefixes and
        brackets for printing pairs.
      
      - Removed the entry behavior for readtree and made -t the default.
        This is because 1. the CTZ skip-list parsing was broken, which is not
        surprising, and 2. the entry parsing was more complicated than useful.
        This functionality may be better implemented as a proper filesystem
        read script, complete with directory tree dumping.
      
      - Changed test.py's --gdb argument to take [init, main, assert],
        this matches the names of the stages in C's startup.
      
      - Added printing of tail to all mdir dumps in readtree/readmdir.
      
      - Added a print for if any mdirs are corrupted in readtree.
      
      - Added debug script side-effects to .gitignore.
      5137e4b0
  5. 29 Mar, 2020 6 commits
  6. 20 Mar, 2020 1 commit
    • Christopher Haster's avatar
      Added "evil" tests and detecion/recovery from bad pointers and infinite loops · 4677421a
      Christopher Haster authored
      These two features have been much requested by users, and have even had
      several PRs proposed to fix these in several cases. Before this, these
      error conditions usually were caught by internal asserts, however
      asserts prevented users from implementing their own workarounds.
      
      It's taken me a while to provide/accept a useful recovery mechanism
      (returning LFS_ERR_CORRUPT instead of asserting) because my original thinking
      was that these error conditions only occur due to bugs in the filesystem, and
      these bugs should be fixed properly.
      
      While I still think this is mostly true, the point has been made clear
      that being able to recover from these conditions is definitely worth the
      code cost. Hopefully this new behaviour helps the longevity of devices
      even if the storage code fails.
      
      Another, less important, reason I didn't want to accept fixes for these
      situations was the lack of tests that prove the code's value. This has
      been fixed with the new testing framework thanks to the additional of
      "internal tests" which can call C static functions and really take
      advantage of the internal information of the filesystem.
      4677421a
  7. 24 Feb, 2020 1 commit
  8. 23 Feb, 2020 2 commits
    • Christopher Haster's avatar
      Minor tweaks to debugging scripts, fixed explode_asserts.py off-by-1 · a7dfae45
      Christopher Haster authored
      - Changed readmdir.py to print the metadata pair and revision count,
        which is useful when debugging commit issues.
      - Added truncated data view to readtree.py by default. This does mean
        readtree.py must read all files on the filesystem to show the
        truncated data, hopefully this does not end up being a problem.
      - Made overall representation hopefully more readable, including moving
        superblock under the root dir, userattrs under files, fixing a gstate
        rendering issue.
      - Added rendering of soft-tails as dotted-arrows, hopefully this isn't
        too noisy.
      - Fixed explode_asserts.py off-by-1 in #line mapping caused by a strip
        call in the assert generation eating newlines. The script matches
        line numbers between the original+modified files by emitting assert
        statements that use the same number of lines. An off-by-1 here causes
        the entire file to map lines incorrectly, which can be very annoying.
      a7dfae45
    • Christopher Haster's avatar
      Renamed test_format -> test_superblocks, tweaked superblock tests · 50fe8ae2
      Christopher Haster authored
      With the superblock expansion stuff, the test_format tests have grown
      to test more advanced superblock-related features. This is fine but
      deserves a rename so it's more clear.
      
      Also fixed a typo that meant tests never ran with block cycles.
      50fe8ae2
  9. 19 Feb, 2020 2 commits
    • Christopher Haster's avatar
      Limited byte-level tests to native testing due to time · 09902966
      Christopher Haster authored
      Byte-level writes are expensive and not suggested (caches >= 4 bytes
      make much more sense), however there are many corner cases with
      byte-level writes that can be easy to miss (power-loss leaving single
      bytes written to disk).
      
      Unfortunately, byte-level writes mixed with power-loss testing, the
      Travis infrastructure, and Arm Thumb instruction set simulation
      exceeds the 50-minute budget Travis allocates for jobs.
      
      For now I'm disabling the byte-level tests under Qemu, with the hope that
      performance improvements in littlefs will let us turn these tests back
      on in the future.
      09902966
    • Christopher Haster's avatar
      Fixed minor things to get CI passing again · d04b0775
      Christopher Haster authored
      - Added caching to Travis install dirs, because otherwise
        pip3 install fails randomly
      - Increased size of littlefs-fuse disk because test script has
        a larger footprint now
      - Skip a couple of reentrant tests under byte-level writes because
        the tests just take too long and cause Travis to bail due to no
        output for 10m
      - Fixed various Valgrind errors
        - Suppressed uninit checks for tests where LFS_BLOCK_ERASE_VALUE == -1.
          In this case rambd goes uninitialized, which is fine for rambd's
          purposes. Note I couldn't figure out how to limit this suppression
          to only the malloc in rambd, this doesn't seem possible with Valgrind.
        - Fixed memory leaks in exhaustion tests
        - Fixed off-by-1 string null-terminator issue in paths tests
      - Fixed lfs_file_sync issue caused by revealed by fixing memory leaks
        in exhaustion tests. Getting ENOSPC during a file write puts the file
        in a bad state where littlefs doesn't know how to write it out safely.
        In this case, lfs_file_sync and lfs_file_close return 0 without
        writing out state so that device-side resources can still be cleaned
        up. To recover from ENOSPC, the file needs to be reopened and the
        writes recreated. Not sure if there is a better way to handle this.
      - Added some quality-of-life improvements to Valgrind testing
        - Fit Valgrind messages into truncated output when not in verbose mode
        - Turned on origin tracking
      d04b0775
  10. 18 Feb, 2020 1 commit
    • Christopher Haster's avatar
      Restructured .travis.yml to span more jobs · c7987a31
      Christopher Haster authored
      The core of littlefs's CI testing is the full test suite, `make test`, run
      under a number of configurations:
      
      - Processor architecture:
        - x86 (native)
        - Arm Thumb
        - MIPS
        - PowerPC
      - Storage geometry:
        - rs=16   ps=16   cs=64   bs=512   (default)
        - rs=1    ps=1    cs=64   bs=4KiB  (NOR flash)
        - rs=512  ps=512  cs=512  bs=512   (eMMC)
        - rs=4KiB ps=4KiB cs=4KiB bs=32KiB (NAND flash)
      - Other corner cases:
        - no intrinsics
        - no inline
        - byte-level read/writes
        - single block-cycles
        - odd block counts
        - odd block sizes
      
      The number of different configurations we need to test quickly exceeds the
      50 minute time limit Travis has on jobs. Fortunately, we can split these
      tests out into multiple jobs. This seems to be the intended course of
      action for large CI "builds" in Travis, as this gives Travis a finer
      grain of control over limiting builds.
      
      Unfortunately, this created a couple issues:
      
      1. The Travis configuration isn't actually that flexible. It allows a
         single "matrix expansion" which can be generated from top-level lists
         of different configurations. But it doesn't let you generate a matrix
         from two seperate environment variable lists (for arch + geometry).
      
         Without multiple matrix expansions, we're stuck writing out each test
         permutation by hand.
      
         On the bright-side, this was a good chance to really learn how YAML
         anchors work. I'm torn because on one hand anchors add what feels
         like unnecessary complexity to a config language, on the other hand,
         they did help quite a bit in working around Travis's limitations.
      
      2. Now that we have 47 jobs instead of 7, reporting a separate status
         for each job stops making sense.
      
         What I've opted for here is to use a special NAME variable to
         deduplicate jobs, and used a few state-less rules to hopefully have
         the reported status make sense most of the time.
      
         - Overwrite "pending" statuses so that the last job to start owns the
           most recent "pending" status
         - Don't overwrite "failure" statuses unless the job number matches
           our own (in the case of CI restarts)
         - Don't write "success" statuses unless the job number matches our
           own, this should delay a green check-mark until the last-to-start
           job finishes
         - Always overwrite non-failures with "failure" statuses
      
         This does mean a temporary "success" may appear if the last job
         terminates before earlier jobs. But this is the simpliest solution
         I can think of without storing some complex state somewhere.
      
         Note we can only report the size this way because it's cheap to
         calculate in every job.
      c7987a31
  11. 12 Feb, 2020 2 commits
    • Christopher Haster's avatar
      Fixed typo in LFS_MKTAG_IF_ELSE · dcae185a
      Christopher Haster authored
      dcae185a
    • Christopher Haster's avatar
      Added test.py support for tmpfs-backed disks · f4b17b37
      Christopher Haster authored
      RAM-backed testing is faster than file-backed testing. This is why
      test.py uses rambd by default.
      
      So why add support for tmpfs-backed disks if we can already run tests in
      RAM? For reentrant testing.
      
      Under reentrant testing we simulate power-loss by forcefully exiting the
      test program at specific times. To make this power-loss meaningful, we need to
      persist the disk across these power-losses. However, it's interesting to
      note this persistence doesn't need to be actually backed by the
      filesystem.
      
      It may be possible to rearchitecture the tests to simulate power-loss a
      different way, by say, using coroutines or setjmp/longjmp to leave
      behind ongoing filesystem operations without terminating the program
      completely. But at this point, I think it's best to work with what we
      have.
      
      And simply putting the test disks into a tmpfs mount-point seems to
      work just fine.
      
      Note this does force serialization of the tests, which isn't required
      otherwise. Currently they are only serialized due to limitations in
      test.py. If a future change wants to perallelize the tests, it may need
      to rework RAM-backed reentrant tests.
      f4b17b37
  12. 11 Feb, 2020 1 commit
    • Christopher Haster's avatar
      Updated .travis.yml and added additional geometry constraints · 9f546f15
      Christopher Haster authored
      Moved .travis.yml over to use the new test framework. A part of this
      involved testing all of the configurations ran on the old framework
      and deciding which to carry over. The new framework duplicates some of
      the cases tested by the configurations so some configurations could be
      dropped.
      
      The .travis.yml includes some extreme ones, such as no inline files,
      relocations every cycle, no intrinsics, power-loss every byte, unaligned
      block_count and lookahead, and odd read_sizes.
      
      There were several configurations were some tests failed because of
      limitations in the tests themselves, so many conditions were added
      to make sure the configurations can run on as many tests as possible.
      9f546f15
  13. 10 Feb, 2020 1 commit
    • Christopher Haster's avatar
      Fixed CRC check when prog_size causes multiple CRCs per commit · b69cf890
      Christopher Haster authored
      This is a bit of a strange case that can be caused by storage with
      very large prog sizes, such as NAND flash. We only have 10 bits to store
      the size of our padding, so when the prog_size gets larger than 1024
      bytes, we have to use multiple padding tags to commit to the next
      prog_size boundary.
      
      This causes some complication for the new logic that checks CRCs in case
      our block becomes "readonly" and contains existing commits that just happen
      to match our new commit size.
      
      Here we just check the CRC of the first commit. This isn't perfect but
      does protect against pure "readonly" blocks.
      b69cf890
  14. 09 Feb, 2020 8 commits
    • Christopher Haster's avatar
      Cleaned up dependent fixes on branch · 02c84ac5
      Christopher Haster authored
      These should probably have been cleaned up in each commit to allow
      cherry-picking, but due to time I haven't been able to.
      
      - Went with creating an mdir copy in lfs_dir_commit. This handles a
        number of related cleanup issues in lfs_dir_compact and it does so
        more robustly. As a plus we can use the copy to update dependencies
        in the mlist.
      
      - Eliminated code left by the ENOSPC file outlining
      
      - Cleaned up TODOs and lingering comments
      
      - Changed the reentrant many directory create/rename/remove test to use
        a smaller set of directories because of space issues when
        READ/PROG_SIZE=512
      02c84ac5
    • Christopher Haster's avatar
      Fixed lfs_fs_size doubling metadata-pairs · 6530cb3a
      Christopher Haster authored
      This was caused by the previous fix for allocations during
      lfs_fs_deorphan in this branch. To catch half-orphans during block
      allocations we needed to duplicate all metadata-pairs reported to
      lfs_fs_traverse. Unfortunately this causes lfs_fs_size to report 2x the
      number of metadata-pairs, which would undoubtably confuse users.
      
      The fix here is inelegantly simple, just do a different traversale for
      allocations and size measurements. It reuses the same code but touches
      slightly different sets of blocks.
      
      Unfortunately, this causes the public lfs_fs_traverse and lfs_fs_size
      functions to split in how they report blocks. This is technically
      allowed, since lfs_fs_traverse may report blocks multiple times due to
      CoW behavior, however it's undesirable and I'm sure there will be some
      confusion.
      
      But I don't have a better solution, so from this point lfs_fs_traverse
      will be reporting 2x metadata-blocks and shouldn't be used for finding
      the number of available blocks on the filesystem.
      6530cb3a
    • Christopher Haster's avatar
      Fixed broken wear-leveling when block_cycles = 2n-1 · fe957de8
      Christopher Haster authored
      This was an interesting issue found during a GitHub discussion with
      rmollway and thrasher8390.
      
      Blocks in the metadata-pair are relocated every "block_cycles", or, more
      mathy, when rev % block_cycles == 0 as long as rev += 1 every block write.
      
      But there's a problem, rev isn't += 1 every block write. There are two
      blocks in a metadata-pair, so looking at it from each blocks
      perspective, rev += 2 every block write.
      
      This leads to a sort of aliasing issue, where, if block_cycles is
      divisible by 2, one block in the metadata-pair is always relocated, and
      the other block is _never_ relocated. Causing a complete failure of
      block-level wear-leveling.
      
      Fortunately, because of a previous workaround to avoid block_cycles = 1
      (since this will cause the relocation algorithm to never terminate), the
      actual math is rev % (block_cycles+1) == 0. This means the bug only
      shows its head in the much less likely case where block_cycles is a
      multiple of 2 plus 1, or, in more mathy terms, block_cycles = 2n+1 for
      some n.
      
      To workaround this we can bitwise or our block_cycles with 1 to force it
      to never be a multiple of 2n.
      
      (Maybe we should do this during initialization? But then block_cycles
      would need to be mutable.)
      
      ---
      
      There's a few unrelated changes mixed into this commit that shouldn't be
      there since I added this as part of a branch of bug fixes I'm putting
      together rather hastily, so unfortunately this is not easily cherry-pickable.
      fe957de8
    • Christopher Haster's avatar
      Modified readmdir/readtree to make reading non-truncated data easier · 6a550844
      Christopher Haster authored
      Added indention so there was a more clear separation between the tag
      description and tag data.
      
      Also took the best parts of readmdir.py and added it to readtree.py.
      Initially I was thinking it was best for these to have completely
      independent data representations, since you could always call readtree
      to get more info, but this becomes tedius when needed to look at
      low-level tag info across multiple directories on the filesystem.
      6a550844
    • Christopher Haster's avatar
      Removed file outlining on ENOSPC in lfs_file_sync · f9c2fd93
      Christopher Haster authored
      This was initially added as protection against the case where a file
      grew to no longer fit in a metadata-pair. While in most cases this
      should be caught by the math in lfs_file_write, it doesn't handle a
      problem that can happen if the files metadata is large enough that even
      small inline files can't fit. This can happen if you combine a small
      block size with large file names and many custom attributes.
      
      But trying to outline on ENOSPC creates creates a lot of problems.
      
      If we are actually low on space, this is one of the worst things we can
      do. Inline files take up less space than CTZ skip-lists, but inline
      files are rendered useless if we outline inline files as soon as we run
      low on space.
      
      On top of this, the outlining logic tries multiple mdir commits if it
      gets ENOSPC, which can hide errors if ENOSPC is returned for other
      reasons.
      
      In a perfect world, we would be using a different error code for
      no-room-in-metadata-pair, and no-blocks-on-disk.
      
      For now I've removed the outlining logic and we will need to figure out
      how to handle this situation more robustly.
      f9c2fd93
    • Christopher Haster's avatar
      Fixed tests/*.toml.* in .gitignore · 44d71127
      Christopher Haster authored
      Running test.py creates a log of garbage here
      44d71127
    • Christopher Haster's avatar
      Added/fixed tests for noop writes (where bd error can't be trusted) · 77e3078b
      Christopher Haster authored
      It's interesting how many ways block devices can show failed writes:
      1. prog can error
      2. erase can error
      3. read can error after writing (ECC failure)
      4. prog doesn't error but doesn't write the data correctly
      5. erase doesn't error but doesn't erase correctly
      
      Can read fail without an error? Yes, though this appears the same as
      prog and erase failing.
      
      These weren't all simulated by testbd since I unintentionally assumed
      the block device could always error. Fixed by added additional bad-black
      behaviors to testbd.
      
      Note: This also includes a small fix where we can miss bad writes if the
      underlying block device contains a valid commit with the exact same
      size in the exact same offset.
      77e3078b
    • Christopher Haster's avatar
      Fixed more bugs, mostly related to ENOSPC on different geometries · 517d3414
      Christopher Haster authored
      Fixes:
      - Fixed reproducability issue when we can't read a directory revision
      - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
      - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
        outline a file in lfs_file_sync
      - Fixed cleanup issue if we run out of space while extending a CTZ skip-list
      - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
      
      Also:
      - Added cycle-detection to readtree.py
      - Allowed pseudo-C expressions in test conditions (and it's
        beautifully hacky, see line 187 of test.py)
      - Better handling of ctrl-C during test runs
      - Added build-only mode to test.py
      - Limited stdout of test failures to 5 lines unless in verbose mode
      
      Explanation of fixes below
      
      1. Fixed reproducability issue when we can't read a directory revision
      
         An interesting subtlety of the block-device layer is that the
         block-device is allowed to return LFS_ERR_CORRUPT on reads to
         untouched blocks. This can easily happen if a user is using ECC or
         some sort of CMAC on their blocks. Normally we never run into this,
         except for the optimization around directory revisions where we use
         uninitialized data to start our revision count.
      
         We correctly handle this case by ignoring whats on disk if the read
         fails, but end up using unitialized RAM instead. This is not an issue
         for normal use, though it can lead to a small information leak.
         However it creates a big problem for reproducability, which is very
         helpful for debugging.
      
         I ended up running into a case where the RAM values for the revision
         count was different, causing two identical runs to wear-level at
         different times, leading to one version running out of space before a
         bug occured because it expanded the superblock early.
      
      2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
      
         This could be caused if the previous tag was a valid commit and we
         lost power causing a partially written tag as the start of a new
         commit.
      
         Fortunately we already have a separate condition for exceeding the
         block size, so we can force that case to always treat the mdir as
         unerased.
      
      3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
         outline a file in lfs_file_sync
      
         Most operations involving metadata-pairs treat the mdir struct as
         entirely temporary and throw it out if any error occurs. Except for
         lfs_file_sync since the mdir is also a part of the file struct.
      
         This is relevant because of a cleanup issue in lfs_dir_compact that
         usually doesn't have side-effects. The issue is that lfs_fs_relocate
         can fail. It needs to allocate new blocks to relocate to, and as the
         disk reaches its end of life, it can fail with ENOSPC quite often.
      
         If lfs_fs_relocate fails, the containing lfs_dir_compact would return
         immediately without restoring the previous state of the mdir. If a new
         commit comes in on the same mdir, the old state left there could
         corrupt the filesystem.
      
         It's interesting to note this is forced to happen in lfs_file_sync,
         since it always tries to outline the file if it gets ENOSPC (ENOSPC
         can mean both no blocks to allocate and that the mdir is full). I'm
         not actually sure this bit of code is necessary anymore, we may be
         able to remove it.
      
      4. Fixed cleanup issue if we run out of space while extending a CTZ
         skip-list
      
         The actually CTZ skip-list logic itself hasn't been touched in more
         than a year at this point, so I was surprised to find a bug here. But
         it turns out the CTZ skip-list could be put in an invalid state if we
         run out of space while trying to extend the skip-list.
      
         This only becomes a problem if we keep the file open, clean up some
         space elsewhere, and then continue to write to the open file without
         modifying it. Fortunately an easy fix.
      
      5. Fixed missing half-orphans when allocating blocks during
         lfs_fs_deorphan
      
         This was a really interesting bug. Normally, we don't have to worry
         about allocations, since we force consistency before we are allowed
         to allocate blocks. But what about the deorphan operation itself?
         Don't we need to allocate blocks if we relocate while deorphaning?
      
         It turns out the deorphan operation can lead to allocating blocks
         while there's still orphans and half-orphans on the threaded
         linked-list. Orphans aren't an issue, but half-orphans may contain
         references to blocks in the outdated half, which doesn't get scanned
         during the normal allocation pass.
      
         Fortunately we already fetch directory entries to check CTZ lists, so
         we can also check half-orphans here. However this causes
         lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
         about this yet.
      517d3414
  15. 02 Feb, 2020 1 commit
    • zhuangqiubin's avatar
      Update SPEC.md · 4fb18836
      zhuangqiubin authored
      1.fix size in Layout of the CRC tag
      2.update (size) to (size * 8)
      4fb18836
  16. 28 Jan, 2020 1 commit
  17. 27 Jan, 2020 1 commit
    • Christopher Haster's avatar
      Cleaned up test script and directory naming · aab6aa0e
      Christopher Haster authored
      - Removed old tests and test scripts
      - Reorganize the block devices to live under one directory
      - Plugged new test framework into Makefile
      
      renamed:
      - scripts/test_.py -> scripts/test.py
      - tests_ -> tests
      - {file,ram,test}bd/* -> bd/*
      
      It took a surprising amount of effort to make the Makefile behave since
      it turns out the "test_%" rule could override "tests/test_%.toml.test"
      which is generated as part of test.py.
      aab6aa0e