FSLint

Duplicates

One of the most commonly used features of FSlint is the ability to find duplicate files. The easiest way to remove lint from a hard drive is to discard any duplicate files that may exist. Often a computer user may not know that they have four, five, or more copies of the exact same song in their music collection under different names or directories. Any file type whether it be music, photos, or work documents can easily be copied and replicated on your computer. As the duplicates are collected, they eat away at the available hard drive space. The first menu option offered by FSlint allows you to find and remove these duplicate files.

Graphical Interface

Ch3.01___Duplicates.png⁞ 

The 'Duplicates' tab on the left hand side of the screen is the default tab selected at FSlint start up. The algorithm used to determine if a file is a duplicate of another is very thorough to minimize any possible false positives that may lead to data loss. FSlint scans the files and filters out files of different sizes. Any remaining files of the exact same size are then checked to ensure they are not hard linked. A hard linked file could have been created on a previous search should the user have chosen to 'Merge' the findings. Once FSlint is sure the file is not hard linked, it checks various signatures of the file using  md5sum. To guard against md5sum collisions, FSlint will re-check signatures of any remaining files using sha1sum checks.

The 'Duplicates' interface is very simple. After the user has verified the 'Search path' location that they wish to search, they can simply click the 'Find' button on the lower left of the screen. When the process has finished the results of the found duplicate files are displayed in the central portion of the screen. All of the duplicate files will be grouped together under a grey bar giving information such as how many files are in the group and the number of bytes wasted in duplicate files. The files themselves are listed below the grey divider by their name, directory, and last modification date. Listed directly below the 'Find' button is the total number of bytes wasted in the total number of files and total number of groups.

 

Command Line Interface

The command line interface to this utility is 'findup'. This utility will be found in the installation directory of FSlint.

 $/usr/share/fslint/fslint/findup --help
 find dUPlicate files.
 Usage: findup [[[-t [-m|-d]] | [--summary]] [-r] [-f] paths(s) ...]

 If no path(s) specified then the current directory is assumed.
  
 When -m is specified any found duplicates will be merged (using hardlinks).
 When -d is specified any found duplicates will be deleted (leaving just 1).
 When -t is specfied, only report what -m or -d would do.
 When --summary is specified change output format to include file sizes.
 You can also pipe this summary format to /usr/share/fslint/fslint/fstool/dupwaste
 to get a total of the wastage due to duplicates.

 Examples:
 search for duplicates in current directory and below
     findup or findup .
 search for duplicates in current directory and below listing the files full path
     findup -f
 search for duplicates in all linux source directories and merge using hardlinks
     findup -m /usr/src/linux*
 same as above but don't look in subdirectories
     findup -r .
 search for duplicates in /usr/bin
     findup /usr/bin
 search in multiple directories but not their subdirectories
     findup -r /usr/bin /bin /usr/sbin /sbin
 search for duplicates in $PATH
     findup `/usr/share/fslint/fslint/supprt/getffp`
 search system for duplicate files over 100K in size
     findup / -size +100k
 search only my files (that I own and are in my home dir)
     findup ~ -user `id -u`
 search system for duplicate files belonging to roger
     findup / -user `id -u roger`