[BACK]Return to file.1 CVS log [TXT][DIR] Up to [local] / src / usr.bin / file

Annotation of src/usr.bin/file/file.1, Revision 1.31

1.31    ! schwarze    1: .\" $OpenBSD: file.1,v 1.30 2009/10/26 21:03:03 ajacoutot Exp $
1.8       aaron       2: .\" $FreeBSD: src/usr.bin/file/file.1,v 1.16 2000/03/01 12:19:39 sheldonh Exp $
1.18      jmc         3: .\"
1.19      ian         4: .\" Copyright (c) Ian F. Darwin 1986-1995.
                      5: .\" Software written by Ian F. Darwin and others;
                      6: .\" maintained 1995-present by Christos Zoulas and others.
1.20      jmc         7: .\"
1.19      ian         8: .\" Redistribution and use in source and binary forms, with or without
                      9: .\" modification, are permitted provided that the following conditions
                     10: .\" are met:
                     11: .\" 1. Redistributions of source code must retain the above copyright
                     12: .\"    notice immediately at the beginning of the file, without modification,
                     13: .\"    this list of conditions, and the following disclaimer.
                     14: .\" 2. Redistributions in binary form must reproduce the above copyright
                     15: .\"    notice, this list of conditions and the following disclaimer in the
                     16: .\"    documentation and/or other materials provided with the distribution.
1.20      jmc        17: .\"
1.19      ian        18: .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
                     19: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
                     20: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
                     21: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
                     22: .\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
                     23: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
                     24: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
                     25: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
                     26: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
                     27: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
                     28: .\" SUCH DAMAGE.
1.18      jmc        29: .\"
1.31    ! schwarze   30: .Dd $Mdocdate: October 26 2009 $
1.8       aaron      31: .Dt FILE 1
                     32: .Os
                     33: .Sh NAME
                     34: .Nm file
                     35: .Nd determine file type
                     36: .Sh SYNOPSIS
1.30      ajacouto   37: .Nm
                     38: .Bk -words
                     39: .Op Fl 0bCcehikLNnprsvz
                     40: .Op Fl -help
                     41: .Op Fl -mime-encoding
                     42: .Op Fl -mime-type
1.23      jaredy     43: .Op Fl F Ar separator
1.8       aaron      44: .Op Fl f Ar namefile
                     45: .Op Fl m Ar magicfiles
1.30      ajacouto   46: .Ar file
1.23      jaredy     47: .Ek
1.8       aaron      48: .Sh DESCRIPTION
1.22      jaredy     49: The
1.8       aaron      50: .Nm
1.30      ajacouto   51: utility tests each argument in an attempt to classify it.
1.1       deraadt    52: There are three sets of tests, performed in this order:
1.30      ajacouto   53: filesystem tests, magic tests, and language tests.
1.8       aaron      54: The first test that succeeds causes the file type to be printed.
                     55: .Pp
1.1       deraadt    56: The type printed will usually contain one of the words
1.30      ajacouto   57: .Em text
1.4       millert    58: (the file contains only
1.30      ajacouto   59: printing characters and a few common control
1.4       millert    60: characters and is probably safe to read on an
1.30      ajacouto   61: ASCII terminal),
                     62: .Em executable
1.1       deraadt    63: (the file contains the result of compiling a program
1.8       aaron      64: in a form understandable to some
                     65: .Ux
                     66: kernel or another),
1.1       deraadt    67: or
1.30      ajacouto   68: .Em data
                     69: meaning anything else (data is usually
                     70: .Dq binary
                     71: or non-printable).
1.1       deraadt    72: Exceptions are well-known file formats (core files, tar archives)
                     73: that are known to contain binary data.
1.30      ajacouto   74: When modifying magic files or the program itself, make sure to
                     75: .Em preserve these keywords .
                     76: Users depend on knowing that all the readable files in a directory
1.8       aaron      77: have the word
                     78: .Dq text
                     79: printed.
1.30      ajacouto   80: Don't do as Berkeley did and change
1.8       aaron      81: .Dq shell commands text
                     82: to
                     83: .Dq shell script .
                     84: .Pp
1.1       deraadt    85: The filesystem tests are based on examining the return from a
1.8       aaron      86: .Xr stat 2
1.1       deraadt    87: system call.
                     88: The program checks to see if the file is empty,
                     89: or if it's some sort of special file.
1.30      ajacouto   90: Any known file types,
                     91: such as sockets, symbolic links, and named pipes (FIFOs),
1.1       deraadt    92: are intuited if they are defined in
                     93: the system header file
1.9       aaron      94: .Aq Pa sys/stat.h .
1.8       aaron      95: .Pp
1.30      ajacouto   96: The magic tests are used to check for files with data in
1.1       deraadt    97: particular fixed formats.
                     98: The canonical example of this is a binary executable (compiled program)
1.30      ajacouto   99: a.out file, whose format is defined in
                    100: .Aq Pa elf.h ,
                    101: .Aq Pa a.out.h ,
1.1       deraadt   102: and possibly
1.8       aaron     103: .Aq Pa exec.h
1.30      ajacouto  104: in the standard include directory.
1.8       aaron     105: These files have a
                    106: .Dq magic number
                    107: stored in a particular place
                    108: near the beginning of the file that tells the
                    109: .Ux
                    110: operating system
1.1       deraadt   111: that the file is a binary executable, and which of several types thereof.
1.30      ajacouto  112: The concept of a
                    113: .Dq magic
                    114: has been applied by extension to data files.
1.1       deraadt   115: Any file with some invariant identifier at a small fixed
                    116: offset into the file can usually be described in this way.
1.30      ajacouto  117: The information identifying these files is read from the magic file
1.8       aaron     118: .Pa /etc/magic .
1.30      ajacouto  119: In addition, if
                    120: .Pa $HOME/.magic.mgc
                    121: or
                    122: .Pa $HOME/.magic
                    123: exists, it will be used in preference to the system magic files.
1.8       aaron     124: .Pp
1.30      ajacouto  125: If a file does not match any of the entries in the magic file,
                    126: it is examined to see if it seems to be a text file.
                    127: ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
                    128: (such as those used on Macintosh and IBM PC systems),
                    129: UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
                    130: character sets can be distinguished by the different
                    131: ranges and sequences of bytes that constitute printable text
                    132: in each set.
                    133: If a file passes any of these tests, its character set is reported.
                    134: ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
                    135: as
                    136: .Dq text
                    137: because they will be mostly readable on nearly any terminal;
                    138: UTF-16 and EBCDIC are only
                    139: .Dq character data
                    140: because, while
                    141: they contain text, it is text that will require translation
                    142: before it can be read.
                    143: In addition,
                    144: .Nm
                    145: will attempt to determine other characteristics of text-type files.
                    146: If the lines of a file are terminated by CR, CRLF, or NEL, instead
                    147: of the Unix-standard LF, this will be reported.
                    148: Files that contain embedded escape sequences or overstriking
                    149: will also be identified.
                    150: .Pp
                    151: Once
                    152: .Nm
                    153: has determined the character set used in a text-type file,
                    154: it will
                    155: attempt to determine in what language the file is written.
                    156: The language tests look for particular strings (cf.\&
                    157: .Aq Pa names.h )
1.1       deraadt   158: that can appear anywhere in the first few blocks of a file.
                    159: For example, the keyword
1.8       aaron     160: .Em .br
1.4       millert   161: indicates that the file is most likely a
1.8       aaron     162: .Xr troff 1
1.6       aaron     163: input file, just as the keyword
1.30      ajacouto  164: .Em struct
1.1       deraadt   165: indicates a C program.
                    166: These tests are less reliable than the previous
                    167: two groups, so they are performed last.
                    168: The language test routines also test for some miscellany
1.6       aaron     169: (such as
1.8       aaron     170: .Xr tar 1
1.30      ajacouto  171: archives).
                    172: .Pp
                    173: Any file that cannot be identified as having been written
                    174: in any of the character sets listed above is simply said to be
1.8       aaron     175: .Dq data .
1.30      ajacouto  176: .Sh OPTIONS
                    177: .Bl -tag -width indent
                    178: .It Fl 0 , -print0
                    179: Output a null character
                    180: .Sq \e0
                    181: after the end of the filename.
                    182: Nice to
                    183: .Xr cut 1
                    184: the output.
                    185: This does not affect the separator which is still printed.
                    186: .It Fl b , -brief
1.17      millert   187: Do not prepend filenames to output lines (brief mode).
1.30      ajacouto  188: .It Fl C , -compile
                    189: Write a
1.23      jaredy    190: .Pa magic.mgc
1.30      ajacouto  191: output file that contains a pre-parsed version of the magic file or directory.
                    192: .It Fl c , -checking-printout
1.1       deraadt   193: Cause a checking printout of the parsed form of the magic file.
1.30      ajacouto  194: This is usually used in conjunction with the
1.8       aaron     195: .Fl m
1.30      ajacouto  196: flag to debug a new magic file before installing it.
                    197: .It Fl e , -exclude Ar testname
                    198: Exclude the test named in
                    199: .Ar testname
                    200: from the list of tests made to determine the file type.
                    201: Valid test names are:
1.31    ! schwarze  202: .Bl -tag -width compress
1.30      ajacouto  203: .It apptype
                    204: Check for
                    205: .Dv EMX
                    206: application type (only on EMX).
                    207: .It ascii
                    208: Check for various types of ASCII files.
                    209: .It compress
                    210: Don't look for, or inside, compressed files.
                    211: .It elf
                    212: Don't print elf details.
                    213: .It fortran
                    214: Don't look for fortran sequences inside ASCII files.
                    215: .It soft
                    216: Don't consult magic files.
                    217: .It tar
                    218: Don't examine tar files.
                    219: .It token
                    220: Don't look for known tokens inside ASCII files.
                    221: .It troff
                    222: Don't look for troff sequences inside ASCII files.
                    223: .El
                    224: .It Fl F , -separator Ar separator
                    225: Use the specified string as the separator between the filename and the
                    226: file result returned.
1.23      jaredy    227: Defaults to
                    228: .Sq \&: .
1.30      ajacouto  229: .It Fl f , -files-from Ar namefile
1.6       aaron     230: Read the names of the files to be examined from
1.8       aaron     231: .Ar namefile
1.6       aaron     232: (one per line)
1.1       deraadt   233: before the argument list.
1.6       aaron     234: Either
1.8       aaron     235: .Ar namefile
1.1       deraadt   236: or at least one filename argument must be present;
1.8       aaron     237: to test the standard input, use
1.23      jaredy    238: .Sq -
1.8       aaron     239: as a filename argument.
1.30      ajacouto  240: .It Fl h , -no-dereference
                    241: Causes symlinks not to be followed.
                    242: This is the default if the environment variable
                    243: .Dv POSIXLY_CORRECT
                    244: is not defined.
                    245: .It Fl -help
                    246: Print a help message and exit.
                    247: .It Fl i , -mime
                    248: Causes the file command to output mime type strings rather than the more
                    249: traditional human readable ones.
                    250: Thus it may say
                    251: .Dq text/plain charset=us-ascii
                    252: rather than
                    253: .Dq ASCII text .
                    254: In order for this option to work,
                    255: .Nm
                    256: changes the way it handles files recognized by the command itself
                    257: (such as many of the text file types, directories etc.),
                    258: and makes use of an alternative
                    259: .Dq magic
                    260: file.
                    261: See also
                    262: .Sx FILES ,
                    263: below.
                    264: .It Fl -mime-encoding , -mime-type
                    265: Like
                    266: .Fl i ,
                    267: but print only the specified element(s).
                    268: .It Fl k , -keep-going
1.23      jaredy    269: Don't stop at the first match, keep going.
1.30      ajacouto  270: Subsequent matches will have the string
                    271: .Dq "\[rs]012\- "
                    272: prepended.
                    273: (If a newline is required, see the
                    274: .Fl r
                    275: option.)
                    276: .It Fl L , -dereference
                    277: Causes symlinks to be followed;
                    278: analogous to the option of the same name in
                    279: .Xr ls 1 .
                    280: This is the default if the environment variable
                    281: .Dv POSIXLY_CORRECT
                    282: is defined.
                    283: .It Fl m , -magic-file Ar magicfiles
                    284: Specify an alternate list of files and directories containing magic.
                    285: This can be a single item, or a colon-separated list.
                    286: If a compiled magic file is found alongside a file or directory,
                    287: it will be used instead.
                    288: .It Fl N , -no-pad
1.23      jaredy    289: Don't pad filenames so that they align in the output.
1.30      ajacouto  290: .It Fl n , -no-buffer
                    291: Force stdout to be flushed after checking each file.
1.23      jaredy    292: This is only useful if checking a list of files.
1.30      ajacouto  293: It is intended to be used by programs that want filetype output from a pipe.
                    294: .It Fl p , -preserve-date
                    295: On systems that support
                    296: .Xr utime 3
                    297: or
                    298: .Xr utimes 2 ,
                    299: attempt to preserve the access time of files analyzed, to pretend that
                    300: .Nm
                    301: never read them.
                    302: .It Fl r , -raw
                    303: Don't translate unprintable characters to \eooo.
1.23      jaredy    304: Normally
                    305: .Nm
1.30      ajacouto  306: translates unprintable characters to their octal representation.
                    307: .It Fl s , -special-files
1.23      jaredy    308: Normally,
                    309: .Nm
                    310: only attempts to read and determine the type of argument files which
                    311: .Xr stat 2
                    312: reports are ordinary files.
                    313: This prevents problems, because reading special files may have peculiar
                    314: consequences.
                    315: Specifying the
                    316: .Fl s
                    317: option causes
                    318: .Nm
                    319: to also read argument files which are block or character special files.
                    320: This is useful for determining the filesystem types of the data in raw
                    321: disk partitions, which are block special files.
                    322: This option also causes
                    323: .Nm
                    324: to disregard the file size as reported by
1.30      ajacouto  325: .Xr stat 2
1.23      jaredy    326: since on some systems it reports a zero size for raw disk partitions.
1.30      ajacouto  327: .It Fl v , -version
1.23      jaredy    328: Print the version of the program and exit.
1.30      ajacouto  329: .It Fl z , -uncompress
                    330: Try to look inside compressed files.
1.8       aaron     331: .El
1.30      ajacouto  332: .Pp
                    333: .Ex -std file
1.8       aaron     334: .Sh ENVIRONMENT
1.30      ajacouto  335: The environment variable
                    336: .Dv MAGIC
                    337: can be used to set the default magic file name.
                    338: If that variable is set, then
                    339: .Nm
                    340: will not attempt to open
                    341: .Pa $HOME/.magic .
1.23      jaredy    342: .Nm
                    343: adds
                    344: .Dq .mgc
                    345: to the value of this variable as appropriate.
1.30      ajacouto  346: The environment variable
                    347: .Dv POSIXLY_CORRECT
                    348: controls whether
                    349: .Nm
                    350: will attempt to follow symlinks or not.
                    351: If set, then
                    352: .Nm
                    353: follows symlinks; otherwise it does not.
                    354: This is also controlled by the
                    355: .Fl L
                    356: and
                    357: .Fl h
                    358: options.
1.12      aaron     359: .Sh FILES
                    360: .Bl -tag -width /etc/magic -compact
                    361: .It Pa /etc/magic
                    362: default list of magic numbers
                    363: .El
1.8       aaron     364: .Sh SEE ALSO
                    365: .Xr hexdump 1 ,
                    366: .Xr od 1 ,
                    367: .Xr strings 1 ,
                    368: .Xr magic 5
                    369: .Sh STANDARDS CONFORMANCE
1.1       deraadt   370: This program is believed to exceed the System V Interface Definition
                    371: of FILE(CMD), as near as one can determine from the vague language
1.6       aaron     372: contained therein.
1.30      ajacouto  373: Its behavior is mostly compatible with the System V program of the same name.
1.1       deraadt   374: This version knows more magic, however, so it will produce
1.6       aaron     375: different (albeit more accurate) output in many cases.
1.30      ajacouto  376: .\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html
1.8       aaron     377: .Pp
1.6       aaron     378: The one significant difference
1.1       deraadt   379: between this version and System V
1.30      ajacouto  380: is that this version treats any whitespace
1.1       deraadt   381: as a delimiter, so that spaces in pattern strings must be escaped.
                    382: For example,
1.30      ajacouto  383: .Bd -literal -offset indent
                    384: \*(Gt10        string  language impress\       (imPRESS data)
                    385: .Ed
1.8       aaron     386: .Pp
1.1       deraadt   387: in an existing magic file would have to be changed to
1.30      ajacouto  388: .Bd -literal -offset indent
                    389: \*(Gt10        string  language\e impress      (imPRESS data)
                    390: .Ed
1.8       aaron     391: .Pp
1.1       deraadt   392: In addition, in this version, if a pattern string contains a backslash,
1.9       aaron     393: it must be escaped.
                    394: For example
1.30      ajacouto  395: .Bd -literal -offset indent
                    396: 0      string          \ebegindata     Andrew Toolkit document
                    397: .Ed
1.8       aaron     398: .Pp
1.1       deraadt   399: in an existing magic file would have to be changed to
1.30      ajacouto  400: .Bd -literal -offset indent
                    401: 0      string          \e\ebegindata   Andrew Toolkit document
                    402: .Ed
1.8       aaron     403: .Pp
1.1       deraadt   404: SunOS releases 3.2 and later from Sun Microsystems include a
1.30      ajacouto  405: .Nm
1.1       deraadt   406: command derived from the System V one, but with some extensions.
1.30      ajacouto  407: This version differs from Sun's only in minor ways.
1.8       aaron     408: It includes the extension of the
1.30      ajacouto  409: .Sq &
1.8       aaron     410: operator, used as,
1.1       deraadt   411: for example,
1.30      ajacouto  412: .Bd -literal -offset indent
                    413: \*(Gt16        long&0x7fffffff \*(Gt0          not stripped
                    414: .Ed
1.8       aaron     415: .Sh HISTORY
1.6       aaron     416: There has been a
1.8       aaron     417: .Nm
                    418: command in every
                    419: .Ux
1.16      mickey    420: since at least Research Version 4
                    421: (man page dated November, 1973).
1.1       deraadt   422: The System V version introduced one significant major change:
1.30      ajacouto  423: the external list of magic types.
1.1       deraadt   424: This slowed the program down slightly but made it a lot more flexible.
1.8       aaron     425: .Pp
1.30      ajacouto  426: This program, based on the System V version,
                    427: was written by Ian Darwin
1.8       aaron     428: without looking at anybody else's source code.
                    429: .Pp
1.30      ajacouto  430: John Gilmore revised the code extensively, making it better than
1.1       deraadt   431: the first version.
1.30      ajacouto  432: Geoff Collyer found several inadequacies
1.1       deraadt   433: and provided some magic file entries.
1.30      ajacouto  434: Contributions by the `&' operator by Rob McMahon, 1989.
1.23      jaredy    435: .Pp
1.30      ajacouto  436: Guy Harris, made many changes from 1993 to the present.
1.23      jaredy    437: .Pp
1.26      david     438: Primary development and maintenance from 1990 to the present by
1.30      ajacouto  439: Christos Zoulas.
1.8       aaron     440: .Pp
1.30      ajacouto  441: Altered by Chris Lowth, 2000:
                    442: Handle the
                    443: .Fl i
                    444: option to output mime type strings, using an alternative
                    445: magic file and internal logic.
                    446: .Pp
                    447: Altered by Eric Fischer, July, 2000,
                    448: to identify character codes and attempt to identify the languages
                    449: of non-ASCII files.
                    450: .Pp
                    451: Altered by Reuben Thomas, 2007 to 2008, to improve MIME
                    452: support and merge MIME and non-MIME magic, support directories as well
                    453: as files of magic, apply many bug fixes and improve the build system.
1.23      jaredy    454: .Pp
                    455: The list of contributors to the
1.30      ajacouto  456: .Dq magic
                    457: directory (magic files)
                    458: is too long to include here.
1.23      jaredy    459: You know who you are; thank you.
1.30      ajacouto  460: Many contributors are listed in the source files.
                    461: .Sh BUGS
1.8       aaron     462: .Pp
1.1       deraadt   463: There must be a better way to automate the construction of the Magic
1.8       aaron     464: file from all the glop in Magdir.
                    465: What is it?
                    466: .Pp
                    467: .Nm
1.30      ajacouto  468: uses several algorithms that favor speed over accuracy,
1.4       millert   469: thus it can be misled about the contents of
1.30      ajacouto  470: text
1.4       millert   471: files.
1.8       aaron     472: .Pp
1.30      ajacouto  473: The support for text files (primarily for programming languages)
1.1       deraadt   474: is simplistic, inefficient and requires recompilation to update.
1.8       aaron     475: .Pp
1.6       aaron     476: The list of keywords in
1.30      ajacouto  477: .Pa ascmagic
1.1       deraadt   478: probably belongs in the Magic file.
1.8       aaron     479: This could be done by using some keyword like
1.30      ajacouto  480: .Sq *
1.8       aaron     481: for the offset value.
                    482: .Pp
1.9       aaron     483: Complain about conflicts in the magic file entries.
1.1       deraadt   484: Make a rule that the magic entries sort based on file offset rather
                    485: than position within the magic file?
1.8       aaron     486: .Pp
1.6       aaron     487: The program should provide a way to give an estimate
1.8       aaron     488: of
                    489: .Dq how good
                    490: a guess is.
1.30      ajacouto  491: We end up removing guesses (e.g.
                    492: .Dq From\
1.8       aaron     493: as first 5 chars of file) because
1.30      ajacouto  494: they are not as good as other guesses (e.g.\&
1.8       aaron     495: .Dq Newsgroups:
                    496: versus
1.30      ajacouto  497: .Dq Return-Path: ) .
                    498: Still, if the others don't pan out, it should be possible to use the
                    499: first guess.
1.8       aaron     500: .Pp
1.1       deraadt   501: This manual page, and particularly this section, is too long.