[BACK]Return to file.1 CVS log [TXT][DIR] Up to [local] / src / usr.bin / file

Annotation of src/usr.bin/file/file.1, Revision 1.33

1.33    ! jmc         1: .\" $OpenBSD: file.1,v 1.32 2010/09/03 11:09:28 jmc Exp $
1.8       aaron       2: .\" $FreeBSD: src/usr.bin/file/file.1,v 1.16 2000/03/01 12:19:39 sheldonh Exp $
1.18      jmc         3: .\"
1.19      ian         4: .\" Copyright (c) Ian F. Darwin 1986-1995.
                      5: .\" Software written by Ian F. Darwin and others;
                      6: .\" maintained 1995-present by Christos Zoulas and others.
1.20      jmc         7: .\"
1.19      ian         8: .\" Redistribution and use in source and binary forms, with or without
                      9: .\" modification, are permitted provided that the following conditions
                     10: .\" are met:
                     11: .\" 1. Redistributions of source code must retain the above copyright
                     12: .\"    notice immediately at the beginning of the file, without modification,
                     13: .\"    this list of conditions, and the following disclaimer.
                     14: .\" 2. Redistributions in binary form must reproduce the above copyright
                     15: .\"    notice, this list of conditions and the following disclaimer in the
                     16: .\"    documentation and/or other materials provided with the distribution.
1.20      jmc        17: .\"
1.19      ian        18: .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
                     19: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
                     20: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
                     21: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
                     22: .\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
                     23: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
                     24: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
                     25: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
                     26: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
                     27: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
                     28: .\" SUCH DAMAGE.
1.18      jmc        29: .\"
1.33    ! jmc        30: .Dd $Mdocdate: September 3 2010 $
1.8       aaron      31: .Dt FILE 1
                     32: .Os
                     33: .Sh NAME
                     34: .Nm file
                     35: .Nd determine file type
                     36: .Sh SYNOPSIS
1.30      ajacouto   37: .Nm
                     38: .Bk -words
                     39: .Op Fl 0bCcehikLNnprsvz
                     40: .Op Fl -help
                     41: .Op Fl -mime-encoding
                     42: .Op Fl -mime-type
1.23      jaredy     43: .Op Fl F Ar separator
1.8       aaron      44: .Op Fl f Ar namefile
                     45: .Op Fl m Ar magicfiles
1.30      ajacouto   46: .Ar file
1.23      jaredy     47: .Ek
1.8       aaron      48: .Sh DESCRIPTION
1.22      jaredy     49: The
1.8       aaron      50: .Nm
1.30      ajacouto   51: utility tests each argument in an attempt to classify it.
1.1       deraadt    52: There are three sets of tests, performed in this order:
1.30      ajacouto   53: filesystem tests, magic tests, and language tests.
1.8       aaron      54: The first test that succeeds causes the file type to be printed.
                     55: .Pp
1.1       deraadt    56: The type printed will usually contain one of the words
1.30      ajacouto   57: .Em text
1.4       millert    58: (the file contains only
1.30      ajacouto   59: printing characters and a few common control
1.4       millert    60: characters and is probably safe to read on an
1.30      ajacouto   61: ASCII terminal),
                     62: .Em executable
1.1       deraadt    63: (the file contains the result of compiling a program
1.8       aaron      64: in a form understandable to some
                     65: .Ux
                     66: kernel or another),
1.1       deraadt    67: or
1.30      ajacouto   68: .Em data
                     69: meaning anything else (data is usually
                     70: .Dq binary
                     71: or non-printable).
1.1       deraadt    72: Exceptions are well-known file formats (core files, tar archives)
                     73: that are known to contain binary data.
1.30      ajacouto   74: When modifying magic files or the program itself, make sure to
                     75: .Em preserve these keywords .
                     76: Users depend on knowing that all the readable files in a directory
1.8       aaron      77: have the word
                     78: .Dq text
                     79: printed.
1.30      ajacouto   80: Don't do as Berkeley did and change
1.8       aaron      81: .Dq shell commands text
                     82: to
                     83: .Dq shell script .
                     84: .Pp
1.1       deraadt    85: The filesystem tests are based on examining the return from a
1.8       aaron      86: .Xr stat 2
1.1       deraadt    87: system call.
                     88: The program checks to see if the file is empty,
                     89: or if it's some sort of special file.
1.30      ajacouto   90: Any known file types,
                     91: such as sockets, symbolic links, and named pipes (FIFOs),
1.1       deraadt    92: are intuited if they are defined in
                     93: the system header file
1.9       aaron      94: .Aq Pa sys/stat.h .
1.8       aaron      95: .Pp
1.30      ajacouto   96: The magic tests are used to check for files with data in
1.1       deraadt    97: particular fixed formats.
                     98: The canonical example of this is a binary executable (compiled program)
1.30      ajacouto   99: a.out file, whose format is defined in
                    100: .Aq Pa elf.h ,
                    101: .Aq Pa a.out.h ,
1.1       deraadt   102: and possibly
1.8       aaron     103: .Aq Pa exec.h
1.30      ajacouto  104: in the standard include directory.
1.8       aaron     105: These files have a
                    106: .Dq magic number
                    107: stored in a particular place
                    108: near the beginning of the file that tells the
                    109: .Ux
                    110: operating system
1.1       deraadt   111: that the file is a binary executable, and which of several types thereof.
1.30      ajacouto  112: The concept of a
                    113: .Dq magic
                    114: has been applied by extension to data files.
1.1       deraadt   115: Any file with some invariant identifier at a small fixed
                    116: offset into the file can usually be described in this way.
1.30      ajacouto  117: The information identifying these files is read from the magic file
1.8       aaron     118: .Pa /etc/magic .
1.30      ajacouto  119: In addition, if
                    120: .Pa $HOME/.magic.mgc
                    121: or
                    122: .Pa $HOME/.magic
                    123: exists, it will be used in preference to the system magic files.
1.8       aaron     124: .Pp
1.30      ajacouto  125: If a file does not match any of the entries in the magic file,
                    126: it is examined to see if it seems to be a text file.
                    127: ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
                    128: (such as those used on Macintosh and IBM PC systems),
                    129: UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
                    130: character sets can be distinguished by the different
                    131: ranges and sequences of bytes that constitute printable text
                    132: in each set.
                    133: If a file passes any of these tests, its character set is reported.
                    134: ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
                    135: as
                    136: .Dq text
                    137: because they will be mostly readable on nearly any terminal;
                    138: UTF-16 and EBCDIC are only
                    139: .Dq character data
                    140: because, while
                    141: they contain text, it is text that will require translation
                    142: before it can be read.
                    143: In addition,
                    144: .Nm
                    145: will attempt to determine other characteristics of text-type files.
                    146: If the lines of a file are terminated by CR, CRLF, or NEL, instead
                    147: of the Unix-standard LF, this will be reported.
                    148: Files that contain embedded escape sequences or overstriking
                    149: will also be identified.
                    150: .Pp
                    151: Once
                    152: .Nm
                    153: has determined the character set used in a text-type file,
                    154: it will
                    155: attempt to determine in what language the file is written.
                    156: The language tests look for particular strings (cf.\&
                    157: .Aq Pa names.h )
1.1       deraadt   158: that can appear anywhere in the first few blocks of a file.
                    159: For example, the keyword
1.8       aaron     160: .Em .br
1.4       millert   161: indicates that the file is most likely a
1.33    ! jmc       162: troff input file, just as the keyword
1.30      ajacouto  163: .Em struct
1.1       deraadt   164: indicates a C program.
                    165: These tests are less reliable than the previous
                    166: two groups, so they are performed last.
                    167: The language test routines also test for some miscellany
1.6       aaron     168: (such as
1.8       aaron     169: .Xr tar 1
1.30      ajacouto  170: archives).
                    171: .Pp
                    172: Any file that cannot be identified as having been written
                    173: in any of the character sets listed above is simply said to be
1.8       aaron     174: .Dq data .
1.30      ajacouto  175: .Sh OPTIONS
                    176: .Bl -tag -width indent
                    177: .It Fl 0 , -print0
                    178: Output a null character
                    179: .Sq \e0
                    180: after the end of the filename.
                    181: Nice to
                    182: .Xr cut 1
                    183: the output.
                    184: This does not affect the separator which is still printed.
                    185: .It Fl b , -brief
1.17      millert   186: Do not prepend filenames to output lines (brief mode).
1.30      ajacouto  187: .It Fl C , -compile
                    188: Write a
1.23      jaredy    189: .Pa magic.mgc
1.30      ajacouto  190: output file that contains a pre-parsed version of the magic file or directory.
                    191: .It Fl c , -checking-printout
1.1       deraadt   192: Cause a checking printout of the parsed form of the magic file.
1.30      ajacouto  193: This is usually used in conjunction with the
1.8       aaron     194: .Fl m
1.30      ajacouto  195: flag to debug a new magic file before installing it.
                    196: .It Fl e , -exclude Ar testname
                    197: Exclude the test named in
                    198: .Ar testname
                    199: from the list of tests made to determine the file type.
                    200: Valid test names are:
1.31      schwarze  201: .Bl -tag -width compress
1.30      ajacouto  202: .It apptype
                    203: Check for
                    204: .Dv EMX
                    205: application type (only on EMX).
                    206: .It ascii
                    207: Check for various types of ASCII files.
                    208: .It compress
                    209: Don't look for, or inside, compressed files.
                    210: .It elf
                    211: Don't print elf details.
                    212: .It fortran
                    213: Don't look for fortran sequences inside ASCII files.
                    214: .It soft
                    215: Don't consult magic files.
                    216: .It tar
                    217: Don't examine tar files.
                    218: .It token
                    219: Don't look for known tokens inside ASCII files.
                    220: .It troff
                    221: Don't look for troff sequences inside ASCII files.
                    222: .El
                    223: .It Fl F , -separator Ar separator
                    224: Use the specified string as the separator between the filename and the
                    225: file result returned.
1.23      jaredy    226: Defaults to
                    227: .Sq \&: .
1.30      ajacouto  228: .It Fl f , -files-from Ar namefile
1.6       aaron     229: Read the names of the files to be examined from
1.8       aaron     230: .Ar namefile
1.6       aaron     231: (one per line)
1.1       deraadt   232: before the argument list.
1.6       aaron     233: Either
1.8       aaron     234: .Ar namefile
1.1       deraadt   235: or at least one filename argument must be present;
1.8       aaron     236: to test the standard input, use
1.23      jaredy    237: .Sq -
1.8       aaron     238: as a filename argument.
1.30      ajacouto  239: .It Fl h , -no-dereference
                    240: Causes symlinks not to be followed.
                    241: This is the default if the environment variable
                    242: .Dv POSIXLY_CORRECT
                    243: is not defined.
                    244: .It Fl -help
                    245: Print a help message and exit.
                    246: .It Fl i , -mime
                    247: Causes the file command to output mime type strings rather than the more
                    248: traditional human readable ones.
                    249: Thus it may say
                    250: .Dq text/plain charset=us-ascii
                    251: rather than
                    252: .Dq ASCII text .
                    253: In order for this option to work,
                    254: .Nm
                    255: changes the way it handles files recognized by the command itself
                    256: (such as many of the text file types, directories etc.),
                    257: and makes use of an alternative
                    258: .Dq magic
                    259: file.
                    260: See also
                    261: .Sx FILES ,
                    262: below.
                    263: .It Fl -mime-encoding , -mime-type
                    264: Like
                    265: .Fl i ,
                    266: but print only the specified element(s).
                    267: .It Fl k , -keep-going
1.23      jaredy    268: Don't stop at the first match, keep going.
1.30      ajacouto  269: Subsequent matches will have the string
                    270: .Dq "\[rs]012\- "
                    271: prepended.
                    272: (If a newline is required, see the
                    273: .Fl r
                    274: option.)
                    275: .It Fl L , -dereference
                    276: Causes symlinks to be followed;
                    277: analogous to the option of the same name in
                    278: .Xr ls 1 .
                    279: This is the default if the environment variable
                    280: .Dv POSIXLY_CORRECT
                    281: is defined.
                    282: .It Fl m , -magic-file Ar magicfiles
                    283: Specify an alternate list of files and directories containing magic.
                    284: This can be a single item, or a colon-separated list.
                    285: If a compiled magic file is found alongside a file or directory,
                    286: it will be used instead.
                    287: .It Fl N , -no-pad
1.23      jaredy    288: Don't pad filenames so that they align in the output.
1.30      ajacouto  289: .It Fl n , -no-buffer
                    290: Force stdout to be flushed after checking each file.
1.23      jaredy    291: This is only useful if checking a list of files.
1.30      ajacouto  292: It is intended to be used by programs that want filetype output from a pipe.
                    293: .It Fl p , -preserve-date
                    294: On systems that support
                    295: .Xr utime 3
                    296: or
                    297: .Xr utimes 2 ,
                    298: attempt to preserve the access time of files analyzed, to pretend that
                    299: .Nm
                    300: never read them.
                    301: .It Fl r , -raw
                    302: Don't translate unprintable characters to \eooo.
1.23      jaredy    303: Normally
                    304: .Nm
1.30      ajacouto  305: translates unprintable characters to their octal representation.
                    306: .It Fl s , -special-files
1.23      jaredy    307: Normally,
                    308: .Nm
                    309: only attempts to read and determine the type of argument files which
                    310: .Xr stat 2
                    311: reports are ordinary files.
                    312: This prevents problems, because reading special files may have peculiar
                    313: consequences.
                    314: Specifying the
                    315: .Fl s
                    316: option causes
                    317: .Nm
                    318: to also read argument files which are block or character special files.
                    319: This is useful for determining the filesystem types of the data in raw
                    320: disk partitions, which are block special files.
                    321: This option also causes
                    322: .Nm
                    323: to disregard the file size as reported by
1.30      ajacouto  324: .Xr stat 2
1.23      jaredy    325: since on some systems it reports a zero size for raw disk partitions.
1.30      ajacouto  326: .It Fl v , -version
1.23      jaredy    327: Print the version of the program and exit.
1.30      ajacouto  328: .It Fl z , -uncompress
                    329: Try to look inside compressed files.
1.8       aaron     330: .El
                    331: .Sh ENVIRONMENT
1.30      ajacouto  332: The environment variable
                    333: .Dv MAGIC
                    334: can be used to set the default magic file name.
                    335: If that variable is set, then
                    336: .Nm
                    337: will not attempt to open
                    338: .Pa $HOME/.magic .
1.23      jaredy    339: .Nm
                    340: adds
                    341: .Dq .mgc
                    342: to the value of this variable as appropriate.
1.30      ajacouto  343: The environment variable
                    344: .Dv POSIXLY_CORRECT
                    345: controls whether
                    346: .Nm
                    347: will attempt to follow symlinks or not.
                    348: If set, then
                    349: .Nm
                    350: follows symlinks; otherwise it does not.
                    351: This is also controlled by the
                    352: .Fl L
                    353: and
                    354: .Fl h
                    355: options.
1.12      aaron     356: .Sh FILES
                    357: .Bl -tag -width /etc/magic -compact
                    358: .It Pa /etc/magic
                    359: default list of magic numbers
                    360: .El
1.32      jmc       361: .Sh EXIT STATUS
                    362: .Ex -std file
1.8       aaron     363: .Sh SEE ALSO
                    364: .Xr hexdump 1 ,
                    365: .Xr od 1 ,
                    366: .Xr strings 1 ,
                    367: .Xr magic 5
                    368: .Sh STANDARDS CONFORMANCE
1.1       deraadt   369: This program is believed to exceed the System V Interface Definition
                    370: of FILE(CMD), as near as one can determine from the vague language
1.6       aaron     371: contained therein.
1.30      ajacouto  372: Its behavior is mostly compatible with the System V program of the same name.
1.1       deraadt   373: This version knows more magic, however, so it will produce
1.6       aaron     374: different (albeit more accurate) output in many cases.
1.30      ajacouto  375: .\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html
1.8       aaron     376: .Pp
1.6       aaron     377: The one significant difference
1.1       deraadt   378: between this version and System V
1.30      ajacouto  379: is that this version treats any whitespace
1.1       deraadt   380: as a delimiter, so that spaces in pattern strings must be escaped.
                    381: For example,
1.30      ajacouto  382: .Bd -literal -offset indent
                    383: \*(Gt10        string  language impress\       (imPRESS data)
                    384: .Ed
1.8       aaron     385: .Pp
1.1       deraadt   386: in an existing magic file would have to be changed to
1.30      ajacouto  387: .Bd -literal -offset indent
                    388: \*(Gt10        string  language\e impress      (imPRESS data)
                    389: .Ed
1.8       aaron     390: .Pp
1.1       deraadt   391: In addition, in this version, if a pattern string contains a backslash,
1.9       aaron     392: it must be escaped.
                    393: For example
1.30      ajacouto  394: .Bd -literal -offset indent
                    395: 0      string          \ebegindata     Andrew Toolkit document
                    396: .Ed
1.8       aaron     397: .Pp
1.1       deraadt   398: in an existing magic file would have to be changed to
1.30      ajacouto  399: .Bd -literal -offset indent
                    400: 0      string          \e\ebegindata   Andrew Toolkit document
                    401: .Ed
1.8       aaron     402: .Pp
1.1       deraadt   403: SunOS releases 3.2 and later from Sun Microsystems include a
1.30      ajacouto  404: .Nm
1.1       deraadt   405: command derived from the System V one, but with some extensions.
1.30      ajacouto  406: This version differs from Sun's only in minor ways.
1.8       aaron     407: It includes the extension of the
1.30      ajacouto  408: .Sq &
1.8       aaron     409: operator, used as,
1.1       deraadt   410: for example,
1.30      ajacouto  411: .Bd -literal -offset indent
                    412: \*(Gt16        long&0x7fffffff \*(Gt0          not stripped
                    413: .Ed
1.8       aaron     414: .Sh HISTORY
1.6       aaron     415: There has been a
1.8       aaron     416: .Nm
                    417: command in every
                    418: .Ux
1.16      mickey    419: since at least Research Version 4
                    420: (man page dated November, 1973).
1.1       deraadt   421: The System V version introduced one significant major change:
1.30      ajacouto  422: the external list of magic types.
1.1       deraadt   423: This slowed the program down slightly but made it a lot more flexible.
1.8       aaron     424: .Pp
1.30      ajacouto  425: This program, based on the System V version,
                    426: was written by Ian Darwin
1.8       aaron     427: without looking at anybody else's source code.
                    428: .Pp
1.30      ajacouto  429: John Gilmore revised the code extensively, making it better than
1.1       deraadt   430: the first version.
1.30      ajacouto  431: Geoff Collyer found several inadequacies
1.1       deraadt   432: and provided some magic file entries.
1.30      ajacouto  433: Contributions by the `&' operator by Rob McMahon, 1989.
1.23      jaredy    434: .Pp
1.30      ajacouto  435: Guy Harris, made many changes from 1993 to the present.
1.23      jaredy    436: .Pp
1.26      david     437: Primary development and maintenance from 1990 to the present by
1.30      ajacouto  438: Christos Zoulas.
1.8       aaron     439: .Pp
1.30      ajacouto  440: Altered by Chris Lowth, 2000:
                    441: Handle the
                    442: .Fl i
                    443: option to output mime type strings, using an alternative
                    444: magic file and internal logic.
                    445: .Pp
                    446: Altered by Eric Fischer, July, 2000,
                    447: to identify character codes and attempt to identify the languages
                    448: of non-ASCII files.
                    449: .Pp
                    450: Altered by Reuben Thomas, 2007 to 2008, to improve MIME
                    451: support and merge MIME and non-MIME magic, support directories as well
                    452: as files of magic, apply many bug fixes and improve the build system.
1.23      jaredy    453: .Pp
                    454: The list of contributors to the
1.30      ajacouto  455: .Dq magic
                    456: directory (magic files)
                    457: is too long to include here.
1.23      jaredy    458: You know who you are; thank you.
1.30      ajacouto  459: Many contributors are listed in the source files.
                    460: .Sh BUGS
1.8       aaron     461: .Pp
1.1       deraadt   462: There must be a better way to automate the construction of the Magic
1.8       aaron     463: file from all the glop in Magdir.
                    464: What is it?
                    465: .Pp
                    466: .Nm
1.30      ajacouto  467: uses several algorithms that favor speed over accuracy,
1.4       millert   468: thus it can be misled about the contents of
1.30      ajacouto  469: text
1.4       millert   470: files.
1.8       aaron     471: .Pp
1.30      ajacouto  472: The support for text files (primarily for programming languages)
1.1       deraadt   473: is simplistic, inefficient and requires recompilation to update.
1.8       aaron     474: .Pp
1.6       aaron     475: The list of keywords in
1.30      ajacouto  476: .Pa ascmagic
1.1       deraadt   477: probably belongs in the Magic file.
1.8       aaron     478: This could be done by using some keyword like
1.30      ajacouto  479: .Sq *
1.8       aaron     480: for the offset value.
                    481: .Pp
1.9       aaron     482: Complain about conflicts in the magic file entries.
1.1       deraadt   483: Make a rule that the magic entries sort based on file offset rather
                    484: than position within the magic file?
1.8       aaron     485: .Pp
1.6       aaron     486: The program should provide a way to give an estimate
1.8       aaron     487: of
                    488: .Dq how good
                    489: a guess is.
1.30      ajacouto  490: We end up removing guesses (e.g.
                    491: .Dq From\
1.8       aaron     492: as first 5 chars of file) because
1.30      ajacouto  493: they are not as good as other guesses (e.g.\&
1.8       aaron     494: .Dq Newsgroups:
                    495: versus
1.30      ajacouto  496: .Dq Return-Path: ) .
                    497: Still, if the others don't pan out, it should be possible to use the
                    498: first guess.
1.8       aaron     499: .Pp
1.1       deraadt   500: This manual page, and particularly this section, is too long.