9 Common Tasks
The sections that follow contain some extended examples that both give a good idea of the power of these programs, and show you how to solve common real-world problems.
9.1 Viewing And Editing
To view a list of files that meet certain criteria, simply run your file viewing program with the file names as arguments.
Shells substitute a command enclosed in backquotes with its output, so the whole command looks like this:
less `find /usr/include -name '*.h' | xargs grep -l mode_t`
You can edit those files by giving an editor name instead of a file viewing program:
emacs find /usr/include -name '*.h' | xargs grep -l mode_t
Because there is a limit to the length of any individual command line, there is a limit to the number of files that can be handled in this way. We can get around this difficulty by using ‘xargs’ like this:
find /usr/include -name '*.h' | xargs grep -l mode_t > todo
xargs --arg-file=todo emacs
Here, ‘xargs’ will run ‘emacs’ as many times as necessary to visit all of the files listed in the file ‘todo’. Generating a temporary file is not always convenient, though. This command does much the same thing without needing one:
find /usr/include -name ‘*.h’ | xargs grep -l mode_t | xargs sh -c ‘emacs ”$@” < /dev/tty’ Emacs
The example above illustrates a useful trick; Using ‘sh -c’ you can invoke a shell command from ‘xargs’. The ’@’ against expansion by your interactive shell (which will normally have no arguments and thus expand ’0’ by the shell that ‘xargs’ launches.
Please note that the implementations in GNU ‘xargs’ and at least BSD support the ‘-o’ option as extension to achieve the same, while the above is the portable way to redirect stdin to ‘/dev/tty’.
9.2 Archiving
You can pass a list of files produced by ‘find’ to a file archiving program.
GNU ‘tar’ and ‘cpio’ can both read lists of file names from the standard input - either delimited by nulls (the safe way) or by blanks (the lazy, risky default way).
To use null-delimited names, give them the ‘—null’ option.
You can store a file archive in a file, write it on a tape, or send it over a network to extract on another machine.
One common use of ‘find’ to archive files is to send a list of the files in a directory tree to ‘cpio’.
Use ‘-depth’ so if a directory does not have write permission for its owner, its contents can still be restored from the archive since the directory’s permissions are restored after its contents.
Here is an example of doing this using ‘cpio’; you could use a more complex ‘find’ expression to archive only certain files.
find . -depth -print0 | cpio —create —null —format=crc —file=/dev/nrst0
You could restore that archive using this command:
cpio —extract —null —make-dir —unconditional
—preserve —file=/dev/nrst0
Here are the commands to do the same things using ‘tar’:
find . -depth -print0 | tar —create —null —files-from=- —file=/dev/nrst0
tar —extract —null —preserve-perm —same-owner
—file=/dev/nrst0
Here is an example of copying a directory from one machine to another:
find . -depth -print0 | cpio -0o -Hnewc |
rsh OTHER-MACHINE “cd pwd
&& cpio -i0dum”
9.3 Cleaning Up
This section gives examples of removing unwanted files in various situations. Here is a command to remove the CVS backup files created when an update requires a merge: find . -name ‘.#*’ -print0 | xargs -0r rm -f
If your ‘find’ command removes directories, you may find that you get a spurious error message when ‘find’ tries to recurse into a directory that has now been removed. Using the ‘-depth’ option will normally resolve this problem.
It is also possible to use the ‘-delete’ action:
find . -depth -name ‘.#*’ -delete
You can run this command to clean out your clutter in ‘/tmp’. You might place it in the file your shell runs when you log out (‘.bash_logout’, ‘.logout’, or ‘.zlogout’, depending on which shell you use). find /tmp -depth -user “$LOGNAME” -type f -delete To remove old Emacs backup and auto-save files, you can use a command like the following. It is especially important in this case to use null-terminated file names because Emacs packages like the VM mailer often create temporary file names with spaces in them, like ‘#reply to David J. MacKenzie<1>#’. find ~ ( -name ’~’ -o -name ‘##’ ) -print0 | xargs —no-run-if-empty —null rm -vf Removing old files from ‘/tmp’ is commonly done from ‘cron’: find /tmp /var/tmp -depth -not -type d -mtime +3 -delete find /tmp /var/tmp -depth -mindepth 1 -type d -empty -delete The second ‘find’ command above cleans out empty directories depth-first (‘-delete’ implies ‘-depth’ anyway), hoping that the parents become empty and can be removed too. It uses ‘-mindepth’ to avoid removing ‘/tmp’ itself if it becomes totally empty. Lastly, an example of a program that almost certainly does not do what the user intended: find dirname -delete -name quux If the user hoped to delete only files named ‘quux’ they will get an unpleasant surprise; this command will attempt to delete everything at or below the starting point ‘dirname’. This is because ‘find’ evaluates the items on the command line as an expression. The ‘find’ program will normally execute an action if the preceding action succeeds. Here, there is no action or test before the ‘-delete’ so it will always be executed. The ‘-name quux’ test will be performed for files we successfully deleted, but that test has no effect since ‘-delete’ also disables the default ‘-print’ operation. So the above example will probably delete a lot of files the user didn’t want to delete. This command is also likely to do something you did not intend: find dirname -path dirname/foo -prune -o -delete Because ‘-delete’ turns on ‘-depth’, the ‘-prune’ action has no effect and files in ‘dirname/foo’ will be deleted too.
9.4 Strange File Names
‘find’ can help you remove or rename a file with strange characters in its name. People are sometimes stymied by files whose names contain characters such as spaces, tabs, control characters, or characters with the high bit set. The simplest way to remove such files is: rm -i SOMEPATTERNTHATMATCHESTHEPROBLEMFILE ‘rm’ asks you whether to remove each file matching the given pattern. If you are using an old shell, this approach might not work if the file name contains a character with the high bit set; the shell may strip it off. A more reliable way is: find . -maxdepth 1 TESTS -okdir rm ’{}’ ;
where TESTS uniquely identify the file. The ‘-maxdepth 1’ option prevents ‘find’ from wasting time searching for the file in any subdirectories; if there are no subdirectories, you may omit it. A good way to uniquely identify the problem file is to figure out its inode number; use ls -i Suppose you have a file whose name contains control characters, and you have found that its inode number is 12345. This command prompts you for whether to remove it: find . -maxdepth 1 -inum 12345 -okdir rm -f ’{}’ ; If you don’t want to be asked, perhaps because the file name may contain a strange character sequence that will mess up your screen when printed, then use ‘-execdir’ instead of ‘-okdir’. If you want to rename the file instead, you can use ‘mv’ instead of ‘rm’: find . -maxdepth 1 -inum 12345 -okdir mv ’{}’ NEW-FILE-NAME ;
9.5 Fixing Permissions
Suppose you want to make sure that everyone can write to the directories in a certain directory tree. Here is a way to find directories lacking either user or group write permission (or both), and fix their permissions: find . -type d -not -perm -ug=w | xargs chmod ug+w
You could also reverse the operations, if you want to make sure that directories do not have world write permission.
9.6 Classifying Files
If you want to classify a set of files into several groups based on
different criteria, you can use the comma operator to perform multiple
independent tests on the files. Here is an example:
find / -type d ( -perm -o=w -fprint allwrite ,
-perm -o=x -fprint allexec )
echo “Directories that can be written to by everyone:”
cat allwrite
echo ""
echo “Directories with search permissions for everyone:”
cat allexec
‘find’ has only to make one scan through the directory tree (which is
one of the most time consuming parts of its work).
10 Worked Examples
The tools in the findutils package, and in particular ‘find’, have a large number of options. This means that quite often, there is more than one way to do things. Some of the options and facilities only exist for compatibility with other tools, and findutils provides improved ways of doing things. This chapter describes a number of useful tasks that are commonly performed, and compares the different ways of achieving them.
10.1 Deleting Files
One of the most common tasks that ‘find’ is used for is locating files that can be deleted. This might include:
- Files last modified more than 3 years ago which haven’t been accessed for at least 2 years
- Files belonging to a certain user
- Temporary files which are no longer required This example concentrates on the actual deletion task rather than on sophisticated ways of locating the files that need to be deleted. We’ll assume that the files we want to delete are old files underneath ‘/var/tmp/stuff’.
10.1.1 The Traditional Way
The traditional way to delete files in ‘/var/tmp/stuff’ that have not
been modified in over 90 days would have been:
find /var/tmp/stuff -mtime +90 -exec /bin/rm {} ;
The above command uses ‘-exec’ to run the ‘/bin/rm’ command to remove
each file. This approach works and in fact would have worked in Version
7 Unix in 1979. However, there are a number of problems with this
approach.
The most obvious problem with the approach above is that it causes
‘find’ to fork every time it finds a file that needs to delete, and the
child process then has to use the ‘exec’ system call to launch
‘/bin/rm’. All this is quite inefficient. If we are going to use
‘/bin/rm’ to do this job, it is better to make it delete more than one
file at a time.
The most obvious way of doing this is to use the shell’s command
expansion feature:
/bin/rm find /var/tmp/stuff -mtime +90 -print
or you could use the more modern form
/bin/rm $(find /var/tmp/stuff -mtime +90 -print)
The commands above are much more efficient than the first attempt.
However, there is a problem with them. The shell has a maximum command
length which is imposed by the operating system (the actual limit varies
between systems). This means that while the command expansion technique
will usually work, it will suddenly fail when there are lots of files to
delete. Since the task is to delete unwanted files, this is precisely
the time we don’t want things to go wrong.
10.1.2 Making Use of ‘xargs’
So, is there a way to be more efficient in the use of ‘fork()’ and ‘exec()’ without running up against this limit? Yes, we can be almost optimally efficient by making use of the ‘xargs’ command. The ‘xargs’ command reads arguments from its standard input and builds them into command lines. We can use it like this: find /var/tmp/stuff -mtime +90 -print | xargs /bin/rm For example if the files found by ‘find’ are ‘/var/tmp/stuff/A’, ‘/var/tmp/stuff/B’ and ‘/var/tmp/stuff/C’ then ‘xargs’ might issue the commands /bin/rm /var/tmp/stuff/A /var/tmp/stuff/B /bin/rm /var/tmp/stuff/C The above assumes that ‘xargs’ has a very small maximum command line length. The real limit is much larger but the idea is that ‘xargs’ will run ‘/bin/rm’ as many times as necessary to get the job done, given the limits on command line length. This usage of ‘xargs’ is pretty efficient, and the ‘xargs’ command is widely implemented (all modern versions of Unix offer it). So far then, the news is all good. However, there is bad news too.
10.1.3 Unusual characters in filenames
Unix-like systems allow any characters to appear in file names with the exception of the ASCII NUL character and the slash. Slashes can occur in path names (as the directory separator) but not in the names of actual directory entries. This means that the list of files that ‘xargs’ reads could in fact contain white space characters - spaces, tabs and newline characters. Since by default, ‘xargs’ assumes that the list of files it is reading uses white space as an argument separator, it cannot correctly handle the case where a filename actually includes white space. This makes the default behaviour of ‘xargs’ almost useless for handling arbitrary data. To solve this problem, GNU findutils introduced the ‘-print0’ action for ‘find’. This uses the ASCII NUL character to separate the entries in the file list that it produces. This is the ideal choice of separator since it is the only character that cannot appear within a path name. The ‘-0’ option to ‘xargs’ makes it assume that arguments are separated with ASCII NUL instead of white space. It also turns off another misfeature in the default behaviour of ‘xargs’, which is that it pays attention to quote characters in its input. Some versions of ‘xargs’ also terminate when they see a lone ’_’ in the input, but GNU ‘find’ no longer does that (since it has become an optional behaviour in the Unix standard). So, putting ‘find -print0’ together with ‘xargs -0’ we get this command: find /var/tmp/stuff -mtime +90 -print0 | xargs -0 /bin/rm The result is an efficient way of proceeding that correctly handles all the possible characters that could appear in the list of files to delete. This is good news. However, there is, as I’m sure you’re expecting, also more bad news. The problem is that this is not a portable construct; although other versions of Unix (notably BSD-derived ones) support ‘-print0’, it’s not universal. So, is there a more universal mechanism?
10.1.4 Going back to ‘-exec’
There is indeed a more universal mechanism, which is a slight modification to the ‘-exec’ action. The normal ‘-exec’ action assumes that the command to run is terminated with a semicolon (the semicolon normally has to be quoted in order to protect it from interpretation as the shell command separator). The SVR4 edition of Unix introduced a slight variation, which involves terminating the command with ’+’ instead: find /var/tmp/stuff -mtime +90 -exec /bin/rm {} + The above use of ‘-exec’ causes ‘find’ to build up a long command line and then issue it. This can be less efficient than some uses of ‘xargs’; for example ‘xargs’ allows building up new command lines while the previous command is still executing, and allows specifying a number of commands to run in parallel. However, the ‘find … -exec … +’ construct has the advantage of wide portability. GNU findutils did not support ‘-exec … +’ until version 4.2.12; one of the reasons for this is that it already had the ‘-print0’ action in any case.
10.1.5 A more secure version of ‘-exec’
The command above seems to be efficient and portable. However, within it lurks a security problem. The problem is shared with all the commands we’ve tried in this worked example so far, too. The security problem is a race condition; that is, if it is possible for somebody to manipulate the filesystem that you are searching while you are searching it, it is possible for them to persuade your ‘find’ command to cause the deletion of a file that you can delete but they normally cannot. The problem occurs because the ‘-exec’ action is defined by the POSIX standard to invoke its command with the same working directory as ‘find’ had when it was started. This means that the arguments which replace the {} include a relative path from ‘find”s starting point down the file that needs to be deleted. For example, find /var/tmp/stuff -mtime +90 -exec /bin/rm {} + might actually issue the command: /bin/rm /var/tmp/stuff/A /var/tmp/stuff/B /var/tmp/stuff/passwd Notice the file ‘/var/tmp/stuff/passwd’. Likewise, the command: cd /var/tmp && find stuff -mtime +90 -exec /bin/rm {} + might actually issue the command: /bin/rm stuff/A stuff/B stuff/passwd If an attacker can rename ‘stuff’ to something else (making use of their write permissions in ‘/var/tmp’) they can replace it with a symbolic link to ‘/etc’. That means that the ‘/bin/rm’ command will be invoked on ‘/etc/passwd’. If you are running your ‘find’ command as root, the attacker has just managed to delete a vital file. All they needed to do to achieve this was replace a subdirectory with a symbolic link at the vital moment. There is however, a simple solution to the problem. This is an action which works a lot like ‘-exec’ but doesn’t need to traverse a chain of directories to reach the file that it needs to work on. This is the ‘-execdir’ action, which was introduced by the BSD family of operating systems. The command, find /var/tmp/stuff -mtime +90 -execdir /bin/rm {} + might delete a set of files by performing these actions:
- Change directory to /var/tmp/stuff/foo
- Invoke ‘/bin/rm ./file1 ./file2 ./file3’
- Change directory to /var/tmp/stuff/bar
- Invoke ‘/bin/rm ./file99 ./file100 ./file101’ This is a much more secure method. We are no longer exposed to a race condition. For many typical uses of ‘find’, this is the best strategy. It’s reasonably efficient, but the length of the command line is limited not just by the operating system limits, but also by how many files we actually need to delete from each directory. Is it possible to do any better? In the case of general file processing, no. However, in the specific case of deleting files it is indeed possible to do better.
10.1.6 Using the ‘-delete’ action
The most efficient and secure method of solving this problem is to use the ‘-delete’ action: find /var/tmp/stuff -mtime +90 -delete This alternative is more efficient than any of the ‘-exec’ or ‘-execdir’ actions, since it entirely avoids the overhead of forking a new process and using ‘exec’ to run ‘/bin/rm’. It is also normally more efficient than ‘xargs’ for the same reason. The file deletion is performed from the directory containing the entry to be deleted, so the ‘-delete’ action has the same security advantages as the ‘-execdir’ action has. The ‘-delete’ action was introduced by the BSD family of operating systems.
10.1.7 Improving things still further
Is it possible to improve things still further? Not without either modifying the system library to the operating system or having more specific knowledge of the layout of the filesystem and disk I/O subsystem, or both. The ‘find’ command traverses the filesystem, reading directories. It then issues a separate system call for each file to be deleted. If we could modify the operating system, there are potential gains that could be made:
- We could have a system call to which we pass more than one filename for deletion
- Alternatively, we could pass in a list of inode numbers (on GNU/Linux systems, ‘readdir()’ also returns the inode number of each directory entry) to be deleted. The above possibilities sound interesting, but from the kernel’s point of view it is difficult to enforce standard Unix access controls for such processing by inode number. Such a facility would probably need to be restricted to the superuser. Another way of improving performance would be to increase the parallelism of the process. For example if the directory hierarchy we are searching is actually spread across a number of disks, we might somehow be able to arrange for ‘find’ to process each disk in parallel. In practice GNU ‘find’ doesn’t have such an intimate understanding of the system’s filesystem layout and disk I/O subsystem. However, since the system administrator can have such an understanding they can take advantage of it like so: find /var/tmp/stuff1 -mtime +90 -delete & find /var/tmp/stuff2 -mtime +90 -delete & find /var/tmp/stuff3 -mtime +90 -delete & find /var/tmp/stuff4 -mtime +90 -delete & wait In the example above, four separate instances of ‘find’ are used to search four subdirectories in parallel. The ‘wait’ command simply waits for all of these to complete. Whether this approach is more or less efficient than a single instance of ‘find’ depends on a number of things:
- Are the directories being searched in parallel actually on separate disks? If not, this parallel search might just result in a lot of disk head movement and so the speed might even be slower.
- Other activity - are other programs also doing things on those disks?
10.1.8 Conclusion
The fastest and most secure way to delete files with the help of ‘find’ is to use ‘-delete’. Using ‘xargs -0 -P N’ can also make effective use of the disk, but it is not as secure. In the case where we’re doing things other than deleting files, the most secure alternative is ‘-execdir … +’, but this is not as portable as the insecure action ‘-exec … +‘. The ‘-delete’ action is not completely portable, but the only other possibility which is as secure (‘-execdir’) is no more portable. The most efficient portable alternative is ‘-exec …+’, but this is insecure and isn’t supported by versions of GNU findutils prior to 4.2.12.
10.2 Copying A Subset of Files
Suppose you want to copy some files from ‘/source-dir’ to ‘/dest-dir’,
but there are a small number of files in ‘/source-dir’ you don’t want to
copy.
One option of course is ‘cp /source-dir /dest-dir’ followed by
deletion of the unwanted material under ‘/dest-dir’. But often that can
be inconvenient, because for example we would have copied a large amount
of extraneous material, or because ‘/dest-dir’ is too small. Naturally
there are many other possible reasons why this strategy may be
unsuitable.
So we need to have some way of identifying which files we want to
copy, and we need to have a way of copying that file list. The second
part of this condition is met by ‘cpio -p’. Of course, we can identify
the files we wish to copy by using ‘find’. Here is a command that
solves our problem:
cd /source-dir
find . -name ‘.snapshot’ -prune -o ( ! -name ‘*’ -print0 ) |
cpio -pmd0 /dest-dir
The first part of the ‘find’ command here identifies files or
directories named ‘.snapshot’ and tells ‘find’ not to recurse into them
(since they do not need to be copied). The combination ‘-name
‘.snapshot’ -prune’ yields false for anything that didn’t get pruned,
but it is exactly those files we want to copy. Therefore we need to use
an OR (‘-o’) condition to introduce the rest of our expression. The
remainder of the expression simply arranges for the name of any file not
ending in ’’ to be printed.
Using ‘-print0’ ensures that white space characters in file names do
not pose a problem. The ‘cpio’ command does the actual work of copying
files. The program as a whole fails if the ‘cpio’ program returns
nonzero. If the ‘find’ command returns non-zero on the other hand, the
Unix shell will not diagnose a problem (since ‘find’ is not the last
command in the pipeline).
10.3 Updating A Timestamp File
Suppose we have a directory full of files which is maintained with a set of automated tools; perhaps one set of tools updates them and another set of tools uses the result.
In this situation, it might be useful for the second set of tools to know if the files have recently been changed.
It might be useful, for example, to have a ‘timestamp’ file which gives the timestamp on the newest file in the collection.
We can use ‘find’ to achieve this, but there are several different ways to do it.
10.3.1 Updating the Timestamp The Wrong Way
The obvious but wrong answer is just to use ‘-newer’: find subdir -newer timestamp -exec touch -r {} timestamp ; This does the right sort of thing but has a bug. Suppose that two files in the subdirectory have been updated, and that these are called ‘file1’ and ‘file2’. The command above will update ‘timestamp’ with the modification time of ‘file1’ or that of ‘file2’, but we don’t know which one. Since the timestamps on ‘file1’ and ‘file2’ will in general be different, this could well be the wrong value. One solution to this problem is to modify ‘find’ to recheck the modification time of ‘timestamp’ every time a file is to be compared against it, but that will reduce the performance of ‘find’.
10.3.2 Using the test utility to compare timestamps
The ‘test’ command can be used to compare timestamps: find subdir -exec test {} -nt timestamp ; -exec touch -r {} timestamp ; This will ensure that any changes made to the modification time of ‘timestamp’ that take place during the execution of ‘find’ are taken into account. This resolves our earlier problem, but unfortunately this runs much more slowly.
10.3.3 A combined approach
We can of course still use ‘-newer’ to cut down on the number of calls
to ‘test’:
find subdir -newer timestamp -and
-exec test {} -nt timestamp ; -and
-exec touch -r {} timestamp ;
Here, the ‘-newer’ test excludes all the files which are definitely
older than the timestamp, but all the files which are newer than the old
value of the timestamp are compared against the current updated
timestamp.
This is indeed faster in general, but the speed difference will
depend on how many updated files there are.
10.3.4 Using ‘-printf’ and ‘sort’ to compare timestamps
It is possible to use the ‘-printf’ action to abandon the use of ‘test’ entirely: newest={newest:-timestamp}” timestamp The command above works by generating a list of the timestamps and names of all the files which are newer than the timestamp. The ‘sort’, ‘tail’ and ‘cut’ commands simply pull out the name of the file with the largest timestamp value (that is, the latest file). The ‘touch’ command is then used to update the timestamp, The ‘”newest’ if that variable is set, but to ‘timestamp’ otherwise. This ensures that an argument is always given to the ‘-r’ option of the ‘touch’ command. This approach seems quite efficient, but unfortunately it has a problem. Many operating systems now keep file modification time information at a granularity which is finer than one second. Findutils version 4.3.3 and later will print a fractional part with %A@, but older versions will not.
10.3.5 Solving the problem with ‘make’
Another tool which often works with timestamps is ‘make’. We can use ‘find’ to generate a ‘Makefile’ file on the fly and then use ‘make’ to update the timestamps: makefile=(mktemp) find subdir \ \( \! -xtype l \) \ -newer timestamp \ -printf "timestamp:: %p\n\ttouch -r %p timestamp\n\n" > "makefile” make -f ”makefile” Unfortunately although the solution above is quite elegant, it fails to cope with white space within file names, and adjusting it to do so would require a rather complex shell script.
10.3.6 Coping with odd filenames too
We can fix both of these problems (looping and problems with white space), and do things more efficiently too. The following command works with newlines and doesn’t need to sort the list of filenames.
find subdir -newer timestamp -printf "%A@:%p\0" |
perl -0 newest.pl |
xargs --no-run-if-empty --null -i \
find {} -maxdepth 0 -newer timestamp -exec touch -r {} timestamp \;
The first ‘find’ command generates a list of files which are newer than the original timestamp file, and prints a list of them with their timestamps.
The ‘newest.pl’ script simply filters out all the filenames which have timestamps which are older than whatever the newest file is:
#! /usr/bin/perl -0
my @newest = ();
my $latest_stamp = undef;
while (<>) {
my ($stamp, $name) = split(/:/);
if (!defined($latest_stamp) || ($tstamp > $latest_stamp)) {
$latest_stamp = $stamp;
@newest = ();
}
if ($tstamp >= $latest_stamp) {
push @newest, $name;
}
}
print join("\0", @newest);
This prints a list of zero or more files, all of which are newer than the original timestamp file, and which have the same timestamp as each other, to the nearest second.
The second ‘find’ command takes each resulting file one at a time, and if that is newer than the timestamp file, the timestamp is updated.
10.4 Finding the Shallowest Instance
Suppose you maintain local copies of sources from various projects, each with their own choice of directory organisation and source code management (SCM) tool.
You need to periodically synchronize each project with its upstream tree.
As the number local repositories grows, so does the work involved in maintaining synchronization.
SCM utilities typically create some sort of administrative directory: .svn for Subversion, CVS for CVS, and so on.
These directories can be used as a key to search for the bases of the project source trees.
Suppose we have the following directory structure:
repo/project1/CVS
repo/gnu/project2/.svn
repo/gnu/project3/.svn
repo/gnu/project3/src/.svn
repo/gnu/project3/doc/.svn
repo/project4/.git
One would expect to update each of the ‘projectX’ directories, but not their subdirectories (src, doc, etc.).
To locate the project roots, we would need to find the least deeply nested directories containing an SCM-related subdirectory.
The following command discovers those roots efficiently.
It is efficient because it avoids searching subdirectories inside projects whose SCM directory we already found.
find repo/ \
-exec test -d {}/.svn \; -or \
-exec test -d {}/.git \; -or \
-exec test -d {}/CVS \; -print -prune
In this example, ‘test’ is used to tell if we are currently examining a directory which appears to the a project’s root directory (because it has an SCM subdirectory).
When we find a project root, there is no need to search inside it, and ‘-prune’ makes sure that we descend no further.
For large, complex trees like the Linux kernel, this will prevent searching a large portion of the structure, saving a good deal of time.