There's something that has bugged me for a long time about how FastFill command works and I decided to finally articulate it - who knows, maybe it will be useful; if not I at least get the thing out of my head.
First part is how I undestand how the current implementation of FastFill works and what is the idea behind it. I may not be 100% accurate about it as it's not based on my knowledge of the algorithm but rather on my knowledge of its operation and results. So it's more about my assumption how it works.
Note also that I am not going to discuss FastFill(WithShuffling), just the straight FastFill()
The current algorithm scans the zone from the beginning and when it finds a gap it fills it using files after the start of the gap. Any files that lie before the gap are left where they are. It may fill it perfectly or it may find 'best fit', leaving the least space using available files. After that the 'gap pointer' is moved forward until it finds another gap. The filling algorithm is probably fed files sorted by their position on disk with files starting nearest the end of the disk first and file right behind the gap last. If perfect fit is found, the algorithm probably doesn't have even to go through all available files.
The problems of this approach are the following:
- files are taken in the order where they start on the disk. If a file is fragmented and some of its fragments lie in already 'filled' zone, using it to fill a gap will leave gaps in the already processed part of the zone
- files are only moved as a whole, making fastfill generally useless on zones with big files where one wants to leave files fragmented (e.g. defragmented using chunksize)
- it has strong tendency to migrate small files towards the start of the zone, leaving too few small files to fill small gaps near the end of the zone
- it has strong tendency to create many small gaps near the end of the zone
- it has no preference on file size and often wastes small files on big gaps if they happen to lie far back on the disk
Now in order to improve things, I believe just a slight change in looking at things is all we need. Leaving the matter of fragmented files aside for now, actually the only thing I am proposing to change is the order in which files are fed to the algorithm looking for perfect fit of files for the current gap.
What we need to see is, there is not only the zone begin, there is also the zone end. The end of zone is the place where the zone would end if there were no gaps in it, taking into account position of the zone start, all known unmovable and already processed files, and total size of all files in the current zone. There are two areas - the area between the zone begin and zone end (I'll call it zone area) and the area behind the zone end. The main task for FastFill in fact is, take everything lying behind zone end, and stuff it into gaps in the zone area. That's actually the 'fast' part of FastFill function. The part where FastFill rearranges files within the zone area is in reality the slow part of FastFill function, improving performance of individual files but not improving performance of the result.
Another thing we need to notice is, in the zone area there are files and gaps. Behind the zone, gaps don't really count (we need to get rid of files in there) but in the zone area gaps do count and any gap we create there will need to be filled again. Filling a gap within zone area using files from zone area usually does not improve things for us as it usually creates either a gap of equal size or even multiple smaller gaps which are harder to fill than the original gap.
Now to the actual proposal.
First thing I'd like to pay some attention to is the file right behind the current gap. If we use it to fill the gap, we do nothing. We just shift the gap by size of this file towards the end of the disk. It may be useful in case this would merge our current gap with the next gap, making the gap bigger and thus easier to fill. In my opinion, the algorithm should consider this - if the list of files between end of the current gap and the start of the next gap is smaller than the size of the current gap, it should move these files to the start of the current gap, merging the two gaps, then start again with the new gap. Otherwise it should not move the file right behind the gap at all. There's one exception to the rule - if the file behind the gap is also behind the end of the zone, we should consider moving it too.
Then we get to filling the gap. In my opinion, the algorithm should work with four categories of files:
- files behind the zone, these are files we are primarily trying to stuff into the zone
- files within the zone directly adjacent to two gaps; moving these files is slightly beneficial as it defragments gaps within the zone making their filling easier
- files within the zone directly adjacent to a single gap; moving this file is neutral as it fills one gap while making another gap bigger
- files within the zone not adjacent to any gap; moving these files is not beneficial as it creates a new gap within the zone, moving or even fragmenting the currently processed gap as result
The algorithm figuring out the perfect fill should then be fed these files in the mentioned order (files behind zone first, filest within zone not adjacent to gaps last), while within each category the sorting should be primarily by size (from largest to smallest) and secondarily (within the same file size) by position on disk (files lying nearest the end of the disk first).
I know this is not going to get completely rid of all issues with FastFill but I believe it is going to reduce these effects significantly, resulting in faster operation (less file moves) and better filling (less gaps in the result).
I also think FastFill should have an option to be able to operate with file fragments rather than with whole files. This would make it more useful on zones where we want to leave files fragmented.
When not working with file fragments and thus defragmenting files during operation, FastFill should be able to re-start scanning for gaps from the start of the zone whenever a fragmented file is used to fill a gap.
And I'd also like to ask for FastFill option which would allow it only to attempt to stuff files lying behind the zone end into the gaps within the zone area, without rearranging files within the zone. That would be the totally fastest (though the most imperfect) variant of FastFill as it would definitely make no more than one move per file that really needs such move. For this, gaps would probably be best filled in order from biggest to smallest rather than in disk order, and files behind the zone should be sorted for filling from biggest to smallest, giving best chances for filling big gaps using big files. This could (optionally) continue in 'regular' FastFill operation behind the end of the zone, putting remaining files behind the zone close together. It would be nice to have this optional though as it might be useless for some possible subsequent operations.