Annotation of src/usr.bin/file/file.1, Revision 1.30
1.30 ! ajacouto 1: .\" $OpenBSD: file.1,v 1.29 2009/08/16 09:41:08 sobrado Exp $
1.8 aaron 2: .\" $FreeBSD: src/usr.bin/file/file.1,v 1.16 2000/03/01 12:19:39 sheldonh Exp $
1.18 jmc 3: .\"
1.19 ian 4: .\" Copyright (c) Ian F. Darwin 1986-1995.
5: .\" Software written by Ian F. Darwin and others;
6: .\" maintained 1995-present by Christos Zoulas and others.
1.20 jmc 7: .\"
1.19 ian 8: .\" Redistribution and use in source and binary forms, with or without
9: .\" modification, are permitted provided that the following conditions
10: .\" are met:
11: .\" 1. Redistributions of source code must retain the above copyright
12: .\" notice immediately at the beginning of the file, without modification,
13: .\" this list of conditions, and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
1.20 jmc 17: .\"
1.19 ian 18: .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
19: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
22: .\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
24: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
25: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
26: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
27: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
28: .\" SUCH DAMAGE.
1.18 jmc 29: .\"
1.30 ! ajacouto 30: .Dd $Mdocdate: August 16 2009 $
1.8 aaron 31: .Dt FILE 1
32: .Os
33: .Sh NAME
34: .Nm file
35: .Nd determine file type
36: .Sh SYNOPSIS
1.30 ! ajacouto 37: .Nm
! 38: .Bk -words
! 39: .Op Fl 0bCcehikLNnprsvz
! 40: .Op Fl -help
! 41: .Op Fl -mime-encoding
! 42: .Op Fl -mime-type
1.23 jaredy 43: .Op Fl F Ar separator
1.8 aaron 44: .Op Fl f Ar namefile
45: .Op Fl m Ar magicfiles
1.30 ! ajacouto 46: .Ar file
1.23 jaredy 47: .Ek
1.8 aaron 48: .Sh DESCRIPTION
1.22 jaredy 49: The
1.8 aaron 50: .Nm
1.30 ! ajacouto 51: utility tests each argument in an attempt to classify it.
1.1 deraadt 52: There are three sets of tests, performed in this order:
1.30 ! ajacouto 53: filesystem tests, magic tests, and language tests.
1.8 aaron 54: The first test that succeeds causes the file type to be printed.
55: .Pp
1.1 deraadt 56: The type printed will usually contain one of the words
1.30 ! ajacouto 57: .Em text
1.4 millert 58: (the file contains only
1.30 ! ajacouto 59: printing characters and a few common control
1.4 millert 60: characters and is probably safe to read on an
1.30 ! ajacouto 61: ASCII terminal),
! 62: .Em executable
1.1 deraadt 63: (the file contains the result of compiling a program
1.8 aaron 64: in a form understandable to some
65: .Ux
66: kernel or another),
1.1 deraadt 67: or
1.30 ! ajacouto 68: .Em data
! 69: meaning anything else (data is usually
! 70: .Dq binary
! 71: or non-printable).
1.1 deraadt 72: Exceptions are well-known file formats (core files, tar archives)
73: that are known to contain binary data.
1.30 ! ajacouto 74: When modifying magic files or the program itself, make sure to
! 75: .Em preserve these keywords .
! 76: Users depend on knowing that all the readable files in a directory
1.8 aaron 77: have the word
78: .Dq text
79: printed.
1.30 ! ajacouto 80: Don't do as Berkeley did and change
1.8 aaron 81: .Dq shell commands text
82: to
83: .Dq shell script .
84: .Pp
1.1 deraadt 85: The filesystem tests are based on examining the return from a
1.8 aaron 86: .Xr stat 2
1.1 deraadt 87: system call.
88: The program checks to see if the file is empty,
89: or if it's some sort of special file.
1.30 ! ajacouto 90: Any known file types,
! 91: such as sockets, symbolic links, and named pipes (FIFOs),
1.1 deraadt 92: are intuited if they are defined in
93: the system header file
1.9 aaron 94: .Aq Pa sys/stat.h .
1.8 aaron 95: .Pp
1.30 ! ajacouto 96: The magic tests are used to check for files with data in
1.1 deraadt 97: particular fixed formats.
98: The canonical example of this is a binary executable (compiled program)
1.30 ! ajacouto 99: a.out file, whose format is defined in
! 100: .Aq Pa elf.h ,
! 101: .Aq Pa a.out.h ,
1.1 deraadt 102: and possibly
1.8 aaron 103: .Aq Pa exec.h
1.30 ! ajacouto 104: in the standard include directory.
1.8 aaron 105: These files have a
106: .Dq magic number
107: stored in a particular place
108: near the beginning of the file that tells the
109: .Ux
110: operating system
1.1 deraadt 111: that the file is a binary executable, and which of several types thereof.
1.30 ! ajacouto 112: The concept of a
! 113: .Dq magic
! 114: has been applied by extension to data files.
1.1 deraadt 115: Any file with some invariant identifier at a small fixed
116: offset into the file can usually be described in this way.
1.30 ! ajacouto 117: The information identifying these files is read from the magic file
1.8 aaron 118: .Pa /etc/magic .
1.30 ! ajacouto 119: In addition, if
! 120: .Pa $HOME/.magic.mgc
! 121: or
! 122: .Pa $HOME/.magic
! 123: exists, it will be used in preference to the system magic files.
1.8 aaron 124: .Pp
1.30 ! ajacouto 125: If a file does not match any of the entries in the magic file,
! 126: it is examined to see if it seems to be a text file.
! 127: ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
! 128: (such as those used on Macintosh and IBM PC systems),
! 129: UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
! 130: character sets can be distinguished by the different
! 131: ranges and sequences of bytes that constitute printable text
! 132: in each set.
! 133: If a file passes any of these tests, its character set is reported.
! 134: ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
! 135: as
! 136: .Dq text
! 137: because they will be mostly readable on nearly any terminal;
! 138: UTF-16 and EBCDIC are only
! 139: .Dq character data
! 140: because, while
! 141: they contain text, it is text that will require translation
! 142: before it can be read.
! 143: In addition,
! 144: .Nm
! 145: will attempt to determine other characteristics of text-type files.
! 146: If the lines of a file are terminated by CR, CRLF, or NEL, instead
! 147: of the Unix-standard LF, this will be reported.
! 148: Files that contain embedded escape sequences or overstriking
! 149: will also be identified.
! 150: .Pp
! 151: Once
! 152: .Nm
! 153: has determined the character set used in a text-type file,
! 154: it will
! 155: attempt to determine in what language the file is written.
! 156: The language tests look for particular strings (cf.\&
! 157: .Aq Pa names.h )
1.1 deraadt 158: that can appear anywhere in the first few blocks of a file.
159: For example, the keyword
1.8 aaron 160: .Em .br
1.4 millert 161: indicates that the file is most likely a
1.8 aaron 162: .Xr troff 1
1.6 aaron 163: input file, just as the keyword
1.30 ! ajacouto 164: .Em struct
1.1 deraadt 165: indicates a C program.
166: These tests are less reliable than the previous
167: two groups, so they are performed last.
168: The language test routines also test for some miscellany
1.6 aaron 169: (such as
1.8 aaron 170: .Xr tar 1
1.30 ! ajacouto 171: archives).
! 172: .Pp
! 173: Any file that cannot be identified as having been written
! 174: in any of the character sets listed above is simply said to be
1.8 aaron 175: .Dq data .
1.30 ! ajacouto 176: .Sh OPTIONS
! 177: .Bl -tag -width indent
! 178: .It Fl 0 , -print0
! 179: Output a null character
! 180: .Sq \e0
! 181: after the end of the filename.
! 182: Nice to
! 183: .Xr cut 1
! 184: the output.
! 185: This does not affect the separator which is still printed.
! 186: .It Fl b , -brief
1.17 millert 187: Do not prepend filenames to output lines (brief mode).
1.30 ! ajacouto 188: .It Fl C , -compile
! 189: Write a
1.23 jaredy 190: .Pa magic.mgc
1.30 ! ajacouto 191: output file that contains a pre-parsed version of the magic file or directory.
! 192: .It Fl c , -checking-printout
1.1 deraadt 193: Cause a checking printout of the parsed form of the magic file.
1.30 ! ajacouto 194: This is usually used in conjunction with the
1.8 aaron 195: .Fl m
1.30 ! ajacouto 196: flag to debug a new magic file before installing it.
! 197: .It Fl e , -exclude Ar testname
! 198: Exclude the test named in
! 199: .Ar testname
! 200: from the list of tests made to determine the file type.
! 201: Valid test names are:
! 202: .Bl -tag -width
! 203: .It apptype
! 204: Check for
! 205: .Dv EMX
! 206: application type (only on EMX).
! 207: .It ascii
! 208: Check for various types of ASCII files.
! 209: .It compress
! 210: Don't look for, or inside, compressed files.
! 211: .It elf
! 212: Don't print elf details.
! 213: .It fortran
! 214: Don't look for fortran sequences inside ASCII files.
! 215: .It soft
! 216: Don't consult magic files.
! 217: .It tar
! 218: Don't examine tar files.
! 219: .It token
! 220: Don't look for known tokens inside ASCII files.
! 221: .It troff
! 222: Don't look for troff sequences inside ASCII files.
! 223: .El
! 224: .It Fl F , -separator Ar separator
! 225: Use the specified string as the separator between the filename and the
! 226: file result returned.
1.23 jaredy 227: Defaults to
228: .Sq \&: .
1.30 ! ajacouto 229: .It Fl f , -files-from Ar namefile
1.6 aaron 230: Read the names of the files to be examined from
1.8 aaron 231: .Ar namefile
1.6 aaron 232: (one per line)
1.1 deraadt 233: before the argument list.
1.6 aaron 234: Either
1.8 aaron 235: .Ar namefile
1.1 deraadt 236: or at least one filename argument must be present;
1.8 aaron 237: to test the standard input, use
1.23 jaredy 238: .Sq -
1.8 aaron 239: as a filename argument.
1.30 ! ajacouto 240: .It Fl h , -no-dereference
! 241: Causes symlinks not to be followed.
! 242: This is the default if the environment variable
! 243: .Dv POSIXLY_CORRECT
! 244: is not defined.
! 245: .It Fl -help
! 246: Print a help message and exit.
! 247: .It Fl i , -mime
! 248: Causes the file command to output mime type strings rather than the more
! 249: traditional human readable ones.
! 250: Thus it may say
! 251: .Dq text/plain charset=us-ascii
! 252: rather than
! 253: .Dq ASCII text .
! 254: In order for this option to work,
! 255: .Nm
! 256: changes the way it handles files recognized by the command itself
! 257: (such as many of the text file types, directories etc.),
! 258: and makes use of an alternative
! 259: .Dq magic
! 260: file.
! 261: See also
! 262: .Sx FILES ,
! 263: below.
! 264: .It Fl -mime-encoding , -mime-type
! 265: Like
! 266: .Fl i ,
! 267: but print only the specified element(s).
! 268: .It Fl k , -keep-going
1.23 jaredy 269: Don't stop at the first match, keep going.
1.30 ! ajacouto 270: Subsequent matches will have the string
! 271: .Dq "\[rs]012\- "
! 272: prepended.
! 273: (If a newline is required, see the
! 274: .Fl r
! 275: option.)
! 276: .It Fl L , -dereference
! 277: Causes symlinks to be followed;
! 278: analogous to the option of the same name in
! 279: .Xr ls 1 .
! 280: This is the default if the environment variable
! 281: .Dv POSIXLY_CORRECT
! 282: is defined.
! 283: .It Fl m , -magic-file Ar magicfiles
! 284: Specify an alternate list of files and directories containing magic.
! 285: This can be a single item, or a colon-separated list.
! 286: If a compiled magic file is found alongside a file or directory,
! 287: it will be used instead.
! 288: .It Fl N , -no-pad
1.23 jaredy 289: Don't pad filenames so that they align in the output.
1.30 ! ajacouto 290: .It Fl n , -no-buffer
! 291: Force stdout to be flushed after checking each file.
1.23 jaredy 292: This is only useful if checking a list of files.
1.30 ! ajacouto 293: It is intended to be used by programs that want filetype output from a pipe.
! 294: .It Fl p , -preserve-date
! 295: On systems that support
! 296: .Xr utime 3
! 297: or
! 298: .Xr utimes 2 ,
! 299: attempt to preserve the access time of files analyzed, to pretend that
! 300: .Nm
! 301: never read them.
! 302: .It Fl r , -raw
! 303: Don't translate unprintable characters to \eooo.
1.23 jaredy 304: Normally
305: .Nm
1.30 ! ajacouto 306: translates unprintable characters to their octal representation.
! 307: .It Fl s , -special-files
1.23 jaredy 308: Normally,
309: .Nm
310: only attempts to read and determine the type of argument files which
311: .Xr stat 2
312: reports are ordinary files.
313: This prevents problems, because reading special files may have peculiar
314: consequences.
315: Specifying the
316: .Fl s
317: option causes
318: .Nm
319: to also read argument files which are block or character special files.
320: This is useful for determining the filesystem types of the data in raw
321: disk partitions, which are block special files.
322: This option also causes
323: .Nm
324: to disregard the file size as reported by
1.30 ! ajacouto 325: .Xr stat 2
1.23 jaredy 326: since on some systems it reports a zero size for raw disk partitions.
1.30 ! ajacouto 327: .It Fl v , -version
1.23 jaredy 328: Print the version of the program and exit.
1.30 ! ajacouto 329: .It Fl z , -uncompress
! 330: Try to look inside compressed files.
1.8 aaron 331: .El
1.30 ! ajacouto 332: .Pp
! 333: .Ex -std file
1.8 aaron 334: .Sh ENVIRONMENT
1.30 ! ajacouto 335: The environment variable
! 336: .Dv MAGIC
! 337: can be used to set the default magic file name.
! 338: If that variable is set, then
! 339: .Nm
! 340: will not attempt to open
! 341: .Pa $HOME/.magic .
1.23 jaredy 342: .Nm
343: adds
344: .Dq .mgc
345: to the value of this variable as appropriate.
1.30 ! ajacouto 346: The environment variable
! 347: .Dv POSIXLY_CORRECT
! 348: controls whether
! 349: .Nm
! 350: will attempt to follow symlinks or not.
! 351: If set, then
! 352: .Nm
! 353: follows symlinks; otherwise it does not.
! 354: This is also controlled by the
! 355: .Fl L
! 356: and
! 357: .Fl h
! 358: options.
1.12 aaron 359: .Sh FILES
360: .Bl -tag -width /etc/magic -compact
361: .It Pa /etc/magic
362: default list of magic numbers
363: .El
1.8 aaron 364: .Sh SEE ALSO
365: .Xr hexdump 1 ,
366: .Xr od 1 ,
367: .Xr strings 1 ,
368: .Xr magic 5
369: .Sh STANDARDS CONFORMANCE
1.1 deraadt 370: This program is believed to exceed the System V Interface Definition
371: of FILE(CMD), as near as one can determine from the vague language
1.6 aaron 372: contained therein.
1.30 ! ajacouto 373: Its behavior is mostly compatible with the System V program of the same name.
1.1 deraadt 374: This version knows more magic, however, so it will produce
1.6 aaron 375: different (albeit more accurate) output in many cases.
1.30 ! ajacouto 376: .\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html
1.8 aaron 377: .Pp
1.6 aaron 378: The one significant difference
1.1 deraadt 379: between this version and System V
1.30 ! ajacouto 380: is that this version treats any whitespace
1.1 deraadt 381: as a delimiter, so that spaces in pattern strings must be escaped.
382: For example,
1.30 ! ajacouto 383: .Bd -literal -offset indent
! 384: \*(Gt10 string language impress\ (imPRESS data)
! 385: .Ed
1.8 aaron 386: .Pp
1.1 deraadt 387: in an existing magic file would have to be changed to
1.30 ! ajacouto 388: .Bd -literal -offset indent
! 389: \*(Gt10 string language\e impress (imPRESS data)
! 390: .Ed
1.8 aaron 391: .Pp
1.1 deraadt 392: In addition, in this version, if a pattern string contains a backslash,
1.9 aaron 393: it must be escaped.
394: For example
1.30 ! ajacouto 395: .Bd -literal -offset indent
! 396: 0 string \ebegindata Andrew Toolkit document
! 397: .Ed
1.8 aaron 398: .Pp
1.1 deraadt 399: in an existing magic file would have to be changed to
1.30 ! ajacouto 400: .Bd -literal -offset indent
! 401: 0 string \e\ebegindata Andrew Toolkit document
! 402: .Ed
1.8 aaron 403: .Pp
1.1 deraadt 404: SunOS releases 3.2 and later from Sun Microsystems include a
1.30 ! ajacouto 405: .Nm
1.1 deraadt 406: command derived from the System V one, but with some extensions.
1.30 ! ajacouto 407: This version differs from Sun's only in minor ways.
1.8 aaron 408: It includes the extension of the
1.30 ! ajacouto 409: .Sq &
1.8 aaron 410: operator, used as,
1.1 deraadt 411: for example,
1.30 ! ajacouto 412: .Bd -literal -offset indent
! 413: \*(Gt16 long&0x7fffffff \*(Gt0 not stripped
! 414: .Ed
1.8 aaron 415: .Sh HISTORY
1.6 aaron 416: There has been a
1.8 aaron 417: .Nm
418: command in every
419: .Ux
1.16 mickey 420: since at least Research Version 4
421: (man page dated November, 1973).
1.1 deraadt 422: The System V version introduced one significant major change:
1.30 ! ajacouto 423: the external list of magic types.
1.1 deraadt 424: This slowed the program down slightly but made it a lot more flexible.
1.8 aaron 425: .Pp
1.30 ! ajacouto 426: This program, based on the System V version,
! 427: was written by Ian Darwin
1.8 aaron 428: without looking at anybody else's source code.
429: .Pp
1.30 ! ajacouto 430: John Gilmore revised the code extensively, making it better than
1.1 deraadt 431: the first version.
1.30 ! ajacouto 432: Geoff Collyer found several inadequacies
1.1 deraadt 433: and provided some magic file entries.
1.30 ! ajacouto 434: Contributions by the `&' operator by Rob McMahon, 1989.
1.23 jaredy 435: .Pp
1.30 ! ajacouto 436: Guy Harris, made many changes from 1993 to the present.
1.23 jaredy 437: .Pp
1.26 david 438: Primary development and maintenance from 1990 to the present by
1.30 ! ajacouto 439: Christos Zoulas.
1.8 aaron 440: .Pp
1.30 ! ajacouto 441: Altered by Chris Lowth, 2000:
! 442: Handle the
! 443: .Fl i
! 444: option to output mime type strings, using an alternative
! 445: magic file and internal logic.
! 446: .Pp
! 447: Altered by Eric Fischer, July, 2000,
! 448: to identify character codes and attempt to identify the languages
! 449: of non-ASCII files.
! 450: .Pp
! 451: Altered by Reuben Thomas, 2007 to 2008, to improve MIME
! 452: support and merge MIME and non-MIME magic, support directories as well
! 453: as files of magic, apply many bug fixes and improve the build system.
1.23 jaredy 454: .Pp
455: The list of contributors to the
1.30 ! ajacouto 456: .Dq magic
! 457: directory (magic files)
! 458: is too long to include here.
1.23 jaredy 459: You know who you are; thank you.
1.30 ! ajacouto 460: Many contributors are listed in the source files.
! 461: .Sh BUGS
1.8 aaron 462: .Pp
1.1 deraadt 463: There must be a better way to automate the construction of the Magic
1.8 aaron 464: file from all the glop in Magdir.
465: What is it?
466: .Pp
467: .Nm
1.30 ! ajacouto 468: uses several algorithms that favor speed over accuracy,
1.4 millert 469: thus it can be misled about the contents of
1.30 ! ajacouto 470: text
1.4 millert 471: files.
1.8 aaron 472: .Pp
1.30 ! ajacouto 473: The support for text files (primarily for programming languages)
1.1 deraadt 474: is simplistic, inefficient and requires recompilation to update.
1.8 aaron 475: .Pp
1.6 aaron 476: The list of keywords in
1.30 ! ajacouto 477: .Pa ascmagic
1.1 deraadt 478: probably belongs in the Magic file.
1.8 aaron 479: This could be done by using some keyword like
1.30 ! ajacouto 480: .Sq *
1.8 aaron 481: for the offset value.
482: .Pp
1.9 aaron 483: Complain about conflicts in the magic file entries.
1.1 deraadt 484: Make a rule that the magic entries sort based on file offset rather
485: than position within the magic file?
1.8 aaron 486: .Pp
1.6 aaron 487: The program should provide a way to give an estimate
1.8 aaron 488: of
489: .Dq how good
490: a guess is.
1.30 ! ajacouto 491: We end up removing guesses (e.g.
! 492: .Dq From\
1.8 aaron 493: as first 5 chars of file) because
1.30 ! ajacouto 494: they are not as good as other guesses (e.g.\&
1.8 aaron 495: .Dq Newsgroups:
496: versus
1.30 ! ajacouto 497: .Dq Return-Path: ) .
! 498: Still, if the others don't pan out, it should be possible to use the
! 499: first guess.
1.8 aaron 500: .Pp
1.1 deraadt 501: This manual page, and particularly this section, is too long.