Annotation of src/usr.bin/file/file.1, Revision 1.31
1.31 ! schwarze 1: .\" $OpenBSD: file.1,v 1.30 2009/10/26 21:03:03 ajacoutot Exp $
1.8 aaron 2: .\" $FreeBSD: src/usr.bin/file/file.1,v 1.16 2000/03/01 12:19:39 sheldonh Exp $
1.18 jmc 3: .\"
1.19 ian 4: .\" Copyright (c) Ian F. Darwin 1986-1995.
5: .\" Software written by Ian F. Darwin and others;
6: .\" maintained 1995-present by Christos Zoulas and others.
1.20 jmc 7: .\"
1.19 ian 8: .\" Redistribution and use in source and binary forms, with or without
9: .\" modification, are permitted provided that the following conditions
10: .\" are met:
11: .\" 1. Redistributions of source code must retain the above copyright
12: .\" notice immediately at the beginning of the file, without modification,
13: .\" this list of conditions, and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
1.20 jmc 17: .\"
1.19 ian 18: .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
19: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
22: .\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
24: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
25: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
26: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
27: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
28: .\" SUCH DAMAGE.
1.18 jmc 29: .\"
1.31 ! schwarze 30: .Dd $Mdocdate: October 26 2009 $
1.8 aaron 31: .Dt FILE 1
32: .Os
33: .Sh NAME
34: .Nm file
35: .Nd determine file type
36: .Sh SYNOPSIS
1.30 ajacouto 37: .Nm
38: .Bk -words
39: .Op Fl 0bCcehikLNnprsvz
40: .Op Fl -help
41: .Op Fl -mime-encoding
42: .Op Fl -mime-type
1.23 jaredy 43: .Op Fl F Ar separator
1.8 aaron 44: .Op Fl f Ar namefile
45: .Op Fl m Ar magicfiles
1.30 ajacouto 46: .Ar file
1.23 jaredy 47: .Ek
1.8 aaron 48: .Sh DESCRIPTION
1.22 jaredy 49: The
1.8 aaron 50: .Nm
1.30 ajacouto 51: utility tests each argument in an attempt to classify it.
1.1 deraadt 52: There are three sets of tests, performed in this order:
1.30 ajacouto 53: filesystem tests, magic tests, and language tests.
1.8 aaron 54: The first test that succeeds causes the file type to be printed.
55: .Pp
1.1 deraadt 56: The type printed will usually contain one of the words
1.30 ajacouto 57: .Em text
1.4 millert 58: (the file contains only
1.30 ajacouto 59: printing characters and a few common control
1.4 millert 60: characters and is probably safe to read on an
1.30 ajacouto 61: ASCII terminal),
62: .Em executable
1.1 deraadt 63: (the file contains the result of compiling a program
1.8 aaron 64: in a form understandable to some
65: .Ux
66: kernel or another),
1.1 deraadt 67: or
1.30 ajacouto 68: .Em data
69: meaning anything else (data is usually
70: .Dq binary
71: or non-printable).
1.1 deraadt 72: Exceptions are well-known file formats (core files, tar archives)
73: that are known to contain binary data.
1.30 ajacouto 74: When modifying magic files or the program itself, make sure to
75: .Em preserve these keywords .
76: Users depend on knowing that all the readable files in a directory
1.8 aaron 77: have the word
78: .Dq text
79: printed.
1.30 ajacouto 80: Don't do as Berkeley did and change
1.8 aaron 81: .Dq shell commands text
82: to
83: .Dq shell script .
84: .Pp
1.1 deraadt 85: The filesystem tests are based on examining the return from a
1.8 aaron 86: .Xr stat 2
1.1 deraadt 87: system call.
88: The program checks to see if the file is empty,
89: or if it's some sort of special file.
1.30 ajacouto 90: Any known file types,
91: such as sockets, symbolic links, and named pipes (FIFOs),
1.1 deraadt 92: are intuited if they are defined in
93: the system header file
1.9 aaron 94: .Aq Pa sys/stat.h .
1.8 aaron 95: .Pp
1.30 ajacouto 96: The magic tests are used to check for files with data in
1.1 deraadt 97: particular fixed formats.
98: The canonical example of this is a binary executable (compiled program)
1.30 ajacouto 99: a.out file, whose format is defined in
100: .Aq Pa elf.h ,
101: .Aq Pa a.out.h ,
1.1 deraadt 102: and possibly
1.8 aaron 103: .Aq Pa exec.h
1.30 ajacouto 104: in the standard include directory.
1.8 aaron 105: These files have a
106: .Dq magic number
107: stored in a particular place
108: near the beginning of the file that tells the
109: .Ux
110: operating system
1.1 deraadt 111: that the file is a binary executable, and which of several types thereof.
1.30 ajacouto 112: The concept of a
113: .Dq magic
114: has been applied by extension to data files.
1.1 deraadt 115: Any file with some invariant identifier at a small fixed
116: offset into the file can usually be described in this way.
1.30 ajacouto 117: The information identifying these files is read from the magic file
1.8 aaron 118: .Pa /etc/magic .
1.30 ajacouto 119: In addition, if
120: .Pa $HOME/.magic.mgc
121: or
122: .Pa $HOME/.magic
123: exists, it will be used in preference to the system magic files.
1.8 aaron 124: .Pp
1.30 ajacouto 125: If a file does not match any of the entries in the magic file,
126: it is examined to see if it seems to be a text file.
127: ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
128: (such as those used on Macintosh and IBM PC systems),
129: UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
130: character sets can be distinguished by the different
131: ranges and sequences of bytes that constitute printable text
132: in each set.
133: If a file passes any of these tests, its character set is reported.
134: ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
135: as
136: .Dq text
137: because they will be mostly readable on nearly any terminal;
138: UTF-16 and EBCDIC are only
139: .Dq character data
140: because, while
141: they contain text, it is text that will require translation
142: before it can be read.
143: In addition,
144: .Nm
145: will attempt to determine other characteristics of text-type files.
146: If the lines of a file are terminated by CR, CRLF, or NEL, instead
147: of the Unix-standard LF, this will be reported.
148: Files that contain embedded escape sequences or overstriking
149: will also be identified.
150: .Pp
151: Once
152: .Nm
153: has determined the character set used in a text-type file,
154: it will
155: attempt to determine in what language the file is written.
156: The language tests look for particular strings (cf.\&
157: .Aq Pa names.h )
1.1 deraadt 158: that can appear anywhere in the first few blocks of a file.
159: For example, the keyword
1.8 aaron 160: .Em .br
1.4 millert 161: indicates that the file is most likely a
1.8 aaron 162: .Xr troff 1
1.6 aaron 163: input file, just as the keyword
1.30 ajacouto 164: .Em struct
1.1 deraadt 165: indicates a C program.
166: These tests are less reliable than the previous
167: two groups, so they are performed last.
168: The language test routines also test for some miscellany
1.6 aaron 169: (such as
1.8 aaron 170: .Xr tar 1
1.30 ajacouto 171: archives).
172: .Pp
173: Any file that cannot be identified as having been written
174: in any of the character sets listed above is simply said to be
1.8 aaron 175: .Dq data .
1.30 ajacouto 176: .Sh OPTIONS
177: .Bl -tag -width indent
178: .It Fl 0 , -print0
179: Output a null character
180: .Sq \e0
181: after the end of the filename.
182: Nice to
183: .Xr cut 1
184: the output.
185: This does not affect the separator which is still printed.
186: .It Fl b , -brief
1.17 millert 187: Do not prepend filenames to output lines (brief mode).
1.30 ajacouto 188: .It Fl C , -compile
189: Write a
1.23 jaredy 190: .Pa magic.mgc
1.30 ajacouto 191: output file that contains a pre-parsed version of the magic file or directory.
192: .It Fl c , -checking-printout
1.1 deraadt 193: Cause a checking printout of the parsed form of the magic file.
1.30 ajacouto 194: This is usually used in conjunction with the
1.8 aaron 195: .Fl m
1.30 ajacouto 196: flag to debug a new magic file before installing it.
197: .It Fl e , -exclude Ar testname
198: Exclude the test named in
199: .Ar testname
200: from the list of tests made to determine the file type.
201: Valid test names are:
1.31 ! schwarze 202: .Bl -tag -width compress
1.30 ajacouto 203: .It apptype
204: Check for
205: .Dv EMX
206: application type (only on EMX).
207: .It ascii
208: Check for various types of ASCII files.
209: .It compress
210: Don't look for, or inside, compressed files.
211: .It elf
212: Don't print elf details.
213: .It fortran
214: Don't look for fortran sequences inside ASCII files.
215: .It soft
216: Don't consult magic files.
217: .It tar
218: Don't examine tar files.
219: .It token
220: Don't look for known tokens inside ASCII files.
221: .It troff
222: Don't look for troff sequences inside ASCII files.
223: .El
224: .It Fl F , -separator Ar separator
225: Use the specified string as the separator between the filename and the
226: file result returned.
1.23 jaredy 227: Defaults to
228: .Sq \&: .
1.30 ajacouto 229: .It Fl f , -files-from Ar namefile
1.6 aaron 230: Read the names of the files to be examined from
1.8 aaron 231: .Ar namefile
1.6 aaron 232: (one per line)
1.1 deraadt 233: before the argument list.
1.6 aaron 234: Either
1.8 aaron 235: .Ar namefile
1.1 deraadt 236: or at least one filename argument must be present;
1.8 aaron 237: to test the standard input, use
1.23 jaredy 238: .Sq -
1.8 aaron 239: as a filename argument.
1.30 ajacouto 240: .It Fl h , -no-dereference
241: Causes symlinks not to be followed.
242: This is the default if the environment variable
243: .Dv POSIXLY_CORRECT
244: is not defined.
245: .It Fl -help
246: Print a help message and exit.
247: .It Fl i , -mime
248: Causes the file command to output mime type strings rather than the more
249: traditional human readable ones.
250: Thus it may say
251: .Dq text/plain charset=us-ascii
252: rather than
253: .Dq ASCII text .
254: In order for this option to work,
255: .Nm
256: changes the way it handles files recognized by the command itself
257: (such as many of the text file types, directories etc.),
258: and makes use of an alternative
259: .Dq magic
260: file.
261: See also
262: .Sx FILES ,
263: below.
264: .It Fl -mime-encoding , -mime-type
265: Like
266: .Fl i ,
267: but print only the specified element(s).
268: .It Fl k , -keep-going
1.23 jaredy 269: Don't stop at the first match, keep going.
1.30 ajacouto 270: Subsequent matches will have the string
271: .Dq "\[rs]012\- "
272: prepended.
273: (If a newline is required, see the
274: .Fl r
275: option.)
276: .It Fl L , -dereference
277: Causes symlinks to be followed;
278: analogous to the option of the same name in
279: .Xr ls 1 .
280: This is the default if the environment variable
281: .Dv POSIXLY_CORRECT
282: is defined.
283: .It Fl m , -magic-file Ar magicfiles
284: Specify an alternate list of files and directories containing magic.
285: This can be a single item, or a colon-separated list.
286: If a compiled magic file is found alongside a file or directory,
287: it will be used instead.
288: .It Fl N , -no-pad
1.23 jaredy 289: Don't pad filenames so that they align in the output.
1.30 ajacouto 290: .It Fl n , -no-buffer
291: Force stdout to be flushed after checking each file.
1.23 jaredy 292: This is only useful if checking a list of files.
1.30 ajacouto 293: It is intended to be used by programs that want filetype output from a pipe.
294: .It Fl p , -preserve-date
295: On systems that support
296: .Xr utime 3
297: or
298: .Xr utimes 2 ,
299: attempt to preserve the access time of files analyzed, to pretend that
300: .Nm
301: never read them.
302: .It Fl r , -raw
303: Don't translate unprintable characters to \eooo.
1.23 jaredy 304: Normally
305: .Nm
1.30 ajacouto 306: translates unprintable characters to their octal representation.
307: .It Fl s , -special-files
1.23 jaredy 308: Normally,
309: .Nm
310: only attempts to read and determine the type of argument files which
311: .Xr stat 2
312: reports are ordinary files.
313: This prevents problems, because reading special files may have peculiar
314: consequences.
315: Specifying the
316: .Fl s
317: option causes
318: .Nm
319: to also read argument files which are block or character special files.
320: This is useful for determining the filesystem types of the data in raw
321: disk partitions, which are block special files.
322: This option also causes
323: .Nm
324: to disregard the file size as reported by
1.30 ajacouto 325: .Xr stat 2
1.23 jaredy 326: since on some systems it reports a zero size for raw disk partitions.
1.30 ajacouto 327: .It Fl v , -version
1.23 jaredy 328: Print the version of the program and exit.
1.30 ajacouto 329: .It Fl z , -uncompress
330: Try to look inside compressed files.
1.8 aaron 331: .El
1.30 ajacouto 332: .Pp
333: .Ex -std file
1.8 aaron 334: .Sh ENVIRONMENT
1.30 ajacouto 335: The environment variable
336: .Dv MAGIC
337: can be used to set the default magic file name.
338: If that variable is set, then
339: .Nm
340: will not attempt to open
341: .Pa $HOME/.magic .
1.23 jaredy 342: .Nm
343: adds
344: .Dq .mgc
345: to the value of this variable as appropriate.
1.30 ajacouto 346: The environment variable
347: .Dv POSIXLY_CORRECT
348: controls whether
349: .Nm
350: will attempt to follow symlinks or not.
351: If set, then
352: .Nm
353: follows symlinks; otherwise it does not.
354: This is also controlled by the
355: .Fl L
356: and
357: .Fl h
358: options.
1.12 aaron 359: .Sh FILES
360: .Bl -tag -width /etc/magic -compact
361: .It Pa /etc/magic
362: default list of magic numbers
363: .El
1.8 aaron 364: .Sh SEE ALSO
365: .Xr hexdump 1 ,
366: .Xr od 1 ,
367: .Xr strings 1 ,
368: .Xr magic 5
369: .Sh STANDARDS CONFORMANCE
1.1 deraadt 370: This program is believed to exceed the System V Interface Definition
371: of FILE(CMD), as near as one can determine from the vague language
1.6 aaron 372: contained therein.
1.30 ajacouto 373: Its behavior is mostly compatible with the System V program of the same name.
1.1 deraadt 374: This version knows more magic, however, so it will produce
1.6 aaron 375: different (albeit more accurate) output in many cases.
1.30 ajacouto 376: .\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html
1.8 aaron 377: .Pp
1.6 aaron 378: The one significant difference
1.1 deraadt 379: between this version and System V
1.30 ajacouto 380: is that this version treats any whitespace
1.1 deraadt 381: as a delimiter, so that spaces in pattern strings must be escaped.
382: For example,
1.30 ajacouto 383: .Bd -literal -offset indent
384: \*(Gt10 string language impress\ (imPRESS data)
385: .Ed
1.8 aaron 386: .Pp
1.1 deraadt 387: in an existing magic file would have to be changed to
1.30 ajacouto 388: .Bd -literal -offset indent
389: \*(Gt10 string language\e impress (imPRESS data)
390: .Ed
1.8 aaron 391: .Pp
1.1 deraadt 392: In addition, in this version, if a pattern string contains a backslash,
1.9 aaron 393: it must be escaped.
394: For example
1.30 ajacouto 395: .Bd -literal -offset indent
396: 0 string \ebegindata Andrew Toolkit document
397: .Ed
1.8 aaron 398: .Pp
1.1 deraadt 399: in an existing magic file would have to be changed to
1.30 ajacouto 400: .Bd -literal -offset indent
401: 0 string \e\ebegindata Andrew Toolkit document
402: .Ed
1.8 aaron 403: .Pp
1.1 deraadt 404: SunOS releases 3.2 and later from Sun Microsystems include a
1.30 ajacouto 405: .Nm
1.1 deraadt 406: command derived from the System V one, but with some extensions.
1.30 ajacouto 407: This version differs from Sun's only in minor ways.
1.8 aaron 408: It includes the extension of the
1.30 ajacouto 409: .Sq &
1.8 aaron 410: operator, used as,
1.1 deraadt 411: for example,
1.30 ajacouto 412: .Bd -literal -offset indent
413: \*(Gt16 long&0x7fffffff \*(Gt0 not stripped
414: .Ed
1.8 aaron 415: .Sh HISTORY
1.6 aaron 416: There has been a
1.8 aaron 417: .Nm
418: command in every
419: .Ux
1.16 mickey 420: since at least Research Version 4
421: (man page dated November, 1973).
1.1 deraadt 422: The System V version introduced one significant major change:
1.30 ajacouto 423: the external list of magic types.
1.1 deraadt 424: This slowed the program down slightly but made it a lot more flexible.
1.8 aaron 425: .Pp
1.30 ajacouto 426: This program, based on the System V version,
427: was written by Ian Darwin
1.8 aaron 428: without looking at anybody else's source code.
429: .Pp
1.30 ajacouto 430: John Gilmore revised the code extensively, making it better than
1.1 deraadt 431: the first version.
1.30 ajacouto 432: Geoff Collyer found several inadequacies
1.1 deraadt 433: and provided some magic file entries.
1.30 ajacouto 434: Contributions by the `&' operator by Rob McMahon, 1989.
1.23 jaredy 435: .Pp
1.30 ajacouto 436: Guy Harris, made many changes from 1993 to the present.
1.23 jaredy 437: .Pp
1.26 david 438: Primary development and maintenance from 1990 to the present by
1.30 ajacouto 439: Christos Zoulas.
1.8 aaron 440: .Pp
1.30 ajacouto 441: Altered by Chris Lowth, 2000:
442: Handle the
443: .Fl i
444: option to output mime type strings, using an alternative
445: magic file and internal logic.
446: .Pp
447: Altered by Eric Fischer, July, 2000,
448: to identify character codes and attempt to identify the languages
449: of non-ASCII files.
450: .Pp
451: Altered by Reuben Thomas, 2007 to 2008, to improve MIME
452: support and merge MIME and non-MIME magic, support directories as well
453: as files of magic, apply many bug fixes and improve the build system.
1.23 jaredy 454: .Pp
455: The list of contributors to the
1.30 ajacouto 456: .Dq magic
457: directory (magic files)
458: is too long to include here.
1.23 jaredy 459: You know who you are; thank you.
1.30 ajacouto 460: Many contributors are listed in the source files.
461: .Sh BUGS
1.8 aaron 462: .Pp
1.1 deraadt 463: There must be a better way to automate the construction of the Magic
1.8 aaron 464: file from all the glop in Magdir.
465: What is it?
466: .Pp
467: .Nm
1.30 ajacouto 468: uses several algorithms that favor speed over accuracy,
1.4 millert 469: thus it can be misled about the contents of
1.30 ajacouto 470: text
1.4 millert 471: files.
1.8 aaron 472: .Pp
1.30 ajacouto 473: The support for text files (primarily for programming languages)
1.1 deraadt 474: is simplistic, inefficient and requires recompilation to update.
1.8 aaron 475: .Pp
1.6 aaron 476: The list of keywords in
1.30 ajacouto 477: .Pa ascmagic
1.1 deraadt 478: probably belongs in the Magic file.
1.8 aaron 479: This could be done by using some keyword like
1.30 ajacouto 480: .Sq *
1.8 aaron 481: for the offset value.
482: .Pp
1.9 aaron 483: Complain about conflicts in the magic file entries.
1.1 deraadt 484: Make a rule that the magic entries sort based on file offset rather
485: than position within the magic file?
1.8 aaron 486: .Pp
1.6 aaron 487: The program should provide a way to give an estimate
1.8 aaron 488: of
489: .Dq how good
490: a guess is.
1.30 ajacouto 491: We end up removing guesses (e.g.
492: .Dq From\
1.8 aaron 493: as first 5 chars of file) because
1.30 ajacouto 494: they are not as good as other guesses (e.g.\&
1.8 aaron 495: .Dq Newsgroups:
496: versus
1.30 ajacouto 497: .Dq Return-Path: ) .
498: Still, if the others don't pan out, it should be possible to use the
499: first guess.
1.8 aaron 500: .Pp
1.1 deraadt 501: This manual page, and particularly this section, is too long.