locate [OPTION...] PATTERN...
locate
searches special file name databases for file names that match patterns. The system administrator runs theupdatedb
program to create the databases.locate
is run like this:
This example prints the names of all files in the default file name
database whose name ends with Makefile
or makefile
. Which file
names are stored in the database depends on how the system administrator
ran updatedb
.
locate '*[Mm]akefile'
4 File Name Databases
The file name databases used by ‘locate’ contain lists of files that were in particular directory trees when the databases were last updated. The file name of the default database is determined when ‘locate’ and ‘updatedb’ are configured and installed. The frequency with which the databases are updated and the directories for which they contain entries depend on how often ‘updatedb’ is run, and with which arguments. You can obtain some statistics about the databases by using ‘locate —statistics’.
4.1 Database Locations
There can be multiple file name databases. Users can select which databases ‘locate’ searches using the ‘LOCATE_PATH’ environment variable or a command line option. The system administrator can choose the file name of the default database, the frequency with which the databases are updated, and the directories for which they contain entries. File name databases are updated by running the ‘updatedb’ program, typically nightly. In networked environments, it often makes sense to build a database at the root of each filesystem, containing the entries for that filesystem. ‘updatedb’ is then run for each filesystem on the fileserver where that filesystem is on a local disk, to prevent thrashing the network. *Note Invoking updatedb::, for the description of the options to ‘updatedb’. These options can be used to specify which directories are indexed by each database file. The default location for the locate database depends on how findutils is built, but the findutils installation accompanying this manual uses the default location ‘/usr/local/var/locatedb’. If no database exists at ‘/usr/local/var/locatedb’ but the user did not specify where to look (by using ‘-d’ or setting ‘LOCATE_PATH’), then ‘locate’ will also check for a “secure” database in ‘/var/lib/slocate/slocate.db’.
4.2 Database Formats
The file name databases contain lists of files that were in particular directory trees when the databases were last updated. The file name database format changed starting with GNU ‘locate’ version 4.0 to allow machines with different byte orderings to share the databases. GNU ‘locate’ can read both the old pre-findutils-4.0 database format and the ‘LOCATE02’ database format. Support for the old database format will shortly be removed from ‘locate’. It has already been removed from ‘updatedb’. If you run ‘locate —statistics’, the resulting summary indicates the type of each ‘locate’ database. You select which database format ‘updatedb’ will use with the ‘—dbformat’ option. The ‘slocate’ database format is very similar to ‘LOCATE02’ and is also supported (in both ‘updatedb’ and ‘locate’).
4.2.1 LOCATE02 Database Format
‘updatedb’ runs a program called ‘frcode’ to “front-compress” the list of file names, which reduces the database size by a factor of 4 to 5. Front-compression (also known as incremental encoding) works as follows. The database entries are a sorted list (case-insensitively, for users’ convenience). Since the list is sorted, each entry is likely to share a prefix (initial string) with the previous entry. Each database entry begins with an offset-differential count byte, which is the additional number of characters of prefix of the preceding entry to use beyond the number that the preceding entry is using of its predecessor. (The counts can be negative.) Following the count is a null-terminated ASCII remainder - the part of the name that follows the shared prefix. If the offset-differential count is larger than can be stored in a byte (+/-127), the byte has the value 0x80 and the count follows in a 2-byte word, with the high byte first (network byte order). Every database begins with a dummy entry for a file called ‘LOCATE02’, which ‘locate’ checks for to ensure that the database file has the correct format; it ignores the entry in doing the search. Databases cannot be concatenated together, even if the first (dummy) entry is trimmed from all but the first database. This is because the offset-differential count in the first entry of the second and following databases will be wrong. In the output of ‘locate —statistics’, the new database format is referred to as ‘LOCATE02’.
4.2.2 Sample LOCATE02 Database
Sample input to ‘frcode’: /usr/src /usr/src/cmd/aardvark.c /usr/src/cmd/armadillo.c /usr/tmp/zoo Length of the longest prefix of the preceding entry to share: 0 /usr/src 8 /cmd/aardvark.c 14 rmadillo.c 5 tmp/zoo Output from ‘frcode’, with trailing nulls changed to newlines and count bytes made printable: 0 LOCATE02 0 /usr/src 8 /cmd/aardvark.c 6 rmadillo.c -9 tmp/zoo (6 = 14 - 8, and -9 = 5 - 14)
4.2.3 slocate Database Format
The ‘slocate’ program uses a database format similar to, but not quite the same as, GNU ‘locate’. The first byte of the database specifies its “security level”. If the security level is 0, ‘slocate’ will read, match and print filenames on the basis of the information in the database only. However, if the security level byte is 1, ‘slocate’ omits entries from its output if the invoking user is unable to access them. The second byte of the database is zero. The second byte is immediately followed by the first database entry. The first entry in the database is not preceded by any differential count or dummy entry. Instead the differential count for the first item is assumed to be zero. Starting with the second entry (if any) in the database, data is interpreted as for the GNU LOCATE02 format.
4.2.4 Old Database Format
The old database format is used by Unix ‘locate’ and ‘find’ programs and pre-4.0 releases of GNU findutils. ‘locate’ understands this format, though ‘updatedb’ will no longer produce it. The old format differs from ‘LOCATE02’ in the following ways. Instead of each entry starting with an offset-differential count byte and ending with a null, byte values from 0 through 28 indicate offset-differential counts from -14 through 14. The byte value indicating that a long offset-differential count follows is 0x1e (30), not 0x80. The long counts are stored in host byte order, which is not necessarily network byte order, and host integer word size, which is usually 4 bytes. They also represent a count 14 less than their value. The database lines have no termination byte; the start of the next line is indicated by its first byte having a value <= 30. In addition, instead of starting with a dummy entry, the old database format starts with a 256 byte table containing the 128 most common bigrams in the file list. A bigram is a pair of adjacent bytes. Bytes in the database that have the high bit set are indexes (with the high bit cleared) into the bigram table. The bigram and offset-differential count coding makes these databases 20-25% smaller than the new format, but makes them not 8-bit clean. Any byte in a file name that is in the ranges used for the special codes is replaced in the database by a question mark, which not coincidentally is the shell wildcard to match a single character. The old format therefore cannot faithfully store entries with non-ASCII characters. Because the long counts are stored as native-order machine words, the database format is not easily used in environments which differ in terms of byte order. If locate databases are to be shared between machines, the ‘LOCATE02’ database format should be used. This has other benefits as discussed above. However, the length of the filename currently being processed can normally be used to place reasonable limits on the long counts and so this information is used by locate to help it guess the byte ordering of the old format database. Unless it finds evidence to the contrary, ‘locate’ will assume that the byte order of the database is the same as the native byte order of the machine running ‘locate’. The output of ‘locate —statistics’ also includes information about the byte order of old-format databases. The output of ‘locate —statistics’ will give an incorrect count of the number of file names containing newlines or high-bit characters for old-format databases. Old versions of GNU ‘locate’ fail to correctly handle very long file names, possibly leading to security problems relating to a heap buffer overrun. *Note Security Considerations for locate::, for a detailed explanation.
4.3 Newline Handling
Within the database, file names are terminated with a null character. This is the case for both the old and the new format. When the new database format is being used, the compression technique used to generate the database though relies on the ability to sort the list of files before they are presented to ‘frcode’. If the system’s sort command allows separating its input list of files with null characters via the ‘-z’ option, this option is used and therefore ‘updatedb’ and ‘locate’ will both correctly handle file names containing newlines. If the ‘sort’ command lacks support for this, the list of files is delimited with the newline character, meaning that parts of file names containing newlines will be incorrectly sorted. This can result in both incorrect matches and incorrect failures to match.
8.2 Invoking ‘locate’
locate [OPTION...] PATTERN...
For each PATTERN given ‘locate’ searches one or more file name databases returning each match of PATTERN.
‘—all’ ‘-A’ Print only names which match all non-option arguments, not those matching one or more non-option arguments.
‘—basename’ ‘-b’ The specified pattern is matched against just the last component of the name of a file in the ‘locate’ database. This last component is also called the “base name”. For example, the base name of ‘/tmp/mystuff/foo.old.c’ is ‘foo.old.c’. If the pattern contains metacharacters, it must match the base name exactly. If not, it must match part of the base name.
‘—count’ ‘-c’ Instead of printing the matched file names, just print the total number of matches found, unless ‘—print’ (‘-p’) is also present.
‘—database=PATH’ ‘-d PATH’ Instead of searching the default ‘locate’ database ‘/usr/local/var/locatedb’, ‘locate’ searches the file name databases in PATH, which is a colon-separated list of database file names. You can also use the environment variable ‘LOCATE_PATH’ to set the list of database files to search. The option overrides the environment variable if both are used. Empty elements in PATH (that is, a leading or trailing colon, or two colons in a row) are taken to stand for the default database. A database can be supplied on stdin, using ’-’ as an element of ‘path’. If more than one element of ‘path’ is ’-’, later instances are ignored (but a warning message is printed).
‘—existing’ ‘-e’ Only print out such names which currently exist (instead of such names which existed when the database was created). Note that this may slow down the program a lot, if there are many matches in the database. The way in which broken symbolic links are treated is affected by the ‘-L’, ‘-P’ and ‘-H’ options. Please note that it is possible for the file to be deleted after ‘locate’ has checked that it exists, but before you use it. This option is automatically turned on when reading an ‘slocate’ database in secure mode (*note slocate Database Format::).
‘—non-existing’ ‘-E’ Only print out such names which currently do not exist (instead of such names which existed when the database was created). Note that this may slow down the program a lot, if there are many matches in the database. The way in which broken symbolic links are treated is affected by the ‘-L’, ‘-P’ and ‘-H’ options. Please note that ‘locate’ checks that the file does not exist, but a file of the same name might be created after ‘locate”s check but before you read ‘locate”s output.
‘—follow’ ‘-L’ If testing for the existence of files (with the ‘-e’ or ‘-E’ options), consider broken symbolic links to be non-existing. This is the default behaviour.
‘—nofollow’ ‘-P’ ‘-H’ If testing for the existence of files (with the ‘-e’ or ‘-E’ options), treat broken symbolic links as if they were existing files. The ‘-H’ form of this option is provided purely for similarity with ‘find’; the use of ‘-P’ is recommended over ‘-H’.
‘—ignore-case’ ‘-i’ Ignore case distinctions in both the pattern and the file names.
‘—limit=N’ ‘-l N’ Limit the number of results printed to N. When used with the ‘—count’ option, the value printed will never be larger than this limit. ‘—max-database-age=D’ Normally, ‘locate’ will issue a warning message when it searches a database which is more than 8 days old. This option changes that value to something other than 8. The effect of specifying a negative value is undefined. ‘—mmap’ ‘-m’ Accepted but does nothing. The option is supported only to provide compatibility with BSD’s ‘locate’.
‘—null’ ‘-0’ Results are separated with the ASCII NUL character rather than the newline character. To get the full benefit of this option, use the new ‘locate’ database format (that is the default anyway).
‘—print’ ‘-p’ Print search results when they normally would not be due to use of ‘—statistics’ (‘-S’) or ‘—count’ (‘-c’).
‘—wholename’ ‘-w’ The specified pattern is matched against the whole name of the file in the ‘locate’ database. If the pattern contains metacharacters, it must match exactly. If not, it must match part of the whole file name. This is the default behaviour.
‘—regex’ ‘-r’ Instead of using substring or shell glob matching, the pattern specified on the command line is understood to be a regular expression. GNU Emacs-style regular expressions are assumed unless the ‘—regextype’ option is also given. File names from the ‘locate’ database are matched using the specified regular expression. If the ‘-i’ flag is also given, matching is case-insensitive. Matches are performed against the whole path name, and so by default a pathname will be matched if any part of it matches the specified regular expression. The regular expression may use ’^’ or ’$’ to anchor a match at the beginning or end of a pathname.
‘—regextype’ This option changes the regular expression syntax and behaviour used by the ‘—regex’ option. *note Regular Expressions:: for more information on the regular expression dialects understood by GNU findutils.
‘—stdio’ ‘-s’ Accepted but does nothing. The option is supported only to provide compatibility with BSD’s ‘locate’.
‘—statistics’ ‘-S’ Print some summary information for each ‘locate’ database. No search is performed unless non-option arguments are given. Although the BSD version of locate also has this option, the format of the output is different.
‘—help’ Print a summary of the command line usage for ‘locate’ and exit.
‘—version’ Print the version number of ‘locate’ and exit.