Annotation of src/usr.bin/file/file.1, Revision 1.33
1.33 ! jmc 1: .\" $OpenBSD: file.1,v 1.32 2010/09/03 11:09:28 jmc Exp $
1.8 aaron 2: .\" $FreeBSD: src/usr.bin/file/file.1,v 1.16 2000/03/01 12:19:39 sheldonh Exp $
1.18 jmc 3: .\"
1.19 ian 4: .\" Copyright (c) Ian F. Darwin 1986-1995.
5: .\" Software written by Ian F. Darwin and others;
6: .\" maintained 1995-present by Christos Zoulas and others.
1.20 jmc 7: .\"
1.19 ian 8: .\" Redistribution and use in source and binary forms, with or without
9: .\" modification, are permitted provided that the following conditions
10: .\" are met:
11: .\" 1. Redistributions of source code must retain the above copyright
12: .\" notice immediately at the beginning of the file, without modification,
13: .\" this list of conditions, and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
1.20 jmc 17: .\"
1.19 ian 18: .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
19: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
22: .\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
24: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
25: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
26: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
27: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
28: .\" SUCH DAMAGE.
1.18 jmc 29: .\"
1.33 ! jmc 30: .Dd $Mdocdate: September 3 2010 $
1.8 aaron 31: .Dt FILE 1
32: .Os
33: .Sh NAME
34: .Nm file
35: .Nd determine file type
36: .Sh SYNOPSIS
1.30 ajacouto 37: .Nm
38: .Bk -words
39: .Op Fl 0bCcehikLNnprsvz
40: .Op Fl -help
41: .Op Fl -mime-encoding
42: .Op Fl -mime-type
1.23 jaredy 43: .Op Fl F Ar separator
1.8 aaron 44: .Op Fl f Ar namefile
45: .Op Fl m Ar magicfiles
1.30 ajacouto 46: .Ar file
1.23 jaredy 47: .Ek
1.8 aaron 48: .Sh DESCRIPTION
1.22 jaredy 49: The
1.8 aaron 50: .Nm
1.30 ajacouto 51: utility tests each argument in an attempt to classify it.
1.1 deraadt 52: There are three sets of tests, performed in this order:
1.30 ajacouto 53: filesystem tests, magic tests, and language tests.
1.8 aaron 54: The first test that succeeds causes the file type to be printed.
55: .Pp
1.1 deraadt 56: The type printed will usually contain one of the words
1.30 ajacouto 57: .Em text
1.4 millert 58: (the file contains only
1.30 ajacouto 59: printing characters and a few common control
1.4 millert 60: characters and is probably safe to read on an
1.30 ajacouto 61: ASCII terminal),
62: .Em executable
1.1 deraadt 63: (the file contains the result of compiling a program
1.8 aaron 64: in a form understandable to some
65: .Ux
66: kernel or another),
1.1 deraadt 67: or
1.30 ajacouto 68: .Em data
69: meaning anything else (data is usually
70: .Dq binary
71: or non-printable).
1.1 deraadt 72: Exceptions are well-known file formats (core files, tar archives)
73: that are known to contain binary data.
1.30 ajacouto 74: When modifying magic files or the program itself, make sure to
75: .Em preserve these keywords .
76: Users depend on knowing that all the readable files in a directory
1.8 aaron 77: have the word
78: .Dq text
79: printed.
1.30 ajacouto 80: Don't do as Berkeley did and change
1.8 aaron 81: .Dq shell commands text
82: to
83: .Dq shell script .
84: .Pp
1.1 deraadt 85: The filesystem tests are based on examining the return from a
1.8 aaron 86: .Xr stat 2
1.1 deraadt 87: system call.
88: The program checks to see if the file is empty,
89: or if it's some sort of special file.
1.30 ajacouto 90: Any known file types,
91: such as sockets, symbolic links, and named pipes (FIFOs),
1.1 deraadt 92: are intuited if they are defined in
93: the system header file
1.9 aaron 94: .Aq Pa sys/stat.h .
1.8 aaron 95: .Pp
1.30 ajacouto 96: The magic tests are used to check for files with data in
1.1 deraadt 97: particular fixed formats.
98: The canonical example of this is a binary executable (compiled program)
1.30 ajacouto 99: a.out file, whose format is defined in
100: .Aq Pa elf.h ,
101: .Aq Pa a.out.h ,
1.1 deraadt 102: and possibly
1.8 aaron 103: .Aq Pa exec.h
1.30 ajacouto 104: in the standard include directory.
1.8 aaron 105: These files have a
106: .Dq magic number
107: stored in a particular place
108: near the beginning of the file that tells the
109: .Ux
110: operating system
1.1 deraadt 111: that the file is a binary executable, and which of several types thereof.
1.30 ajacouto 112: The concept of a
113: .Dq magic
114: has been applied by extension to data files.
1.1 deraadt 115: Any file with some invariant identifier at a small fixed
116: offset into the file can usually be described in this way.
1.30 ajacouto 117: The information identifying these files is read from the magic file
1.8 aaron 118: .Pa /etc/magic .
1.30 ajacouto 119: In addition, if
120: .Pa $HOME/.magic.mgc
121: or
122: .Pa $HOME/.magic
123: exists, it will be used in preference to the system magic files.
1.8 aaron 124: .Pp
1.30 ajacouto 125: If a file does not match any of the entries in the magic file,
126: it is examined to see if it seems to be a text file.
127: ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
128: (such as those used on Macintosh and IBM PC systems),
129: UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
130: character sets can be distinguished by the different
131: ranges and sequences of bytes that constitute printable text
132: in each set.
133: If a file passes any of these tests, its character set is reported.
134: ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
135: as
136: .Dq text
137: because they will be mostly readable on nearly any terminal;
138: UTF-16 and EBCDIC are only
139: .Dq character data
140: because, while
141: they contain text, it is text that will require translation
142: before it can be read.
143: In addition,
144: .Nm
145: will attempt to determine other characteristics of text-type files.
146: If the lines of a file are terminated by CR, CRLF, or NEL, instead
147: of the Unix-standard LF, this will be reported.
148: Files that contain embedded escape sequences or overstriking
149: will also be identified.
150: .Pp
151: Once
152: .Nm
153: has determined the character set used in a text-type file,
154: it will
155: attempt to determine in what language the file is written.
156: The language tests look for particular strings (cf.\&
157: .Aq Pa names.h )
1.1 deraadt 158: that can appear anywhere in the first few blocks of a file.
159: For example, the keyword
1.8 aaron 160: .Em .br
1.4 millert 161: indicates that the file is most likely a
1.33 ! jmc 162: troff input file, just as the keyword
1.30 ajacouto 163: .Em struct
1.1 deraadt 164: indicates a C program.
165: These tests are less reliable than the previous
166: two groups, so they are performed last.
167: The language test routines also test for some miscellany
1.6 aaron 168: (such as
1.8 aaron 169: .Xr tar 1
1.30 ajacouto 170: archives).
171: .Pp
172: Any file that cannot be identified as having been written
173: in any of the character sets listed above is simply said to be
1.8 aaron 174: .Dq data .
1.30 ajacouto 175: .Sh OPTIONS
176: .Bl -tag -width indent
177: .It Fl 0 , -print0
178: Output a null character
179: .Sq \e0
180: after the end of the filename.
181: Nice to
182: .Xr cut 1
183: the output.
184: This does not affect the separator which is still printed.
185: .It Fl b , -brief
1.17 millert 186: Do not prepend filenames to output lines (brief mode).
1.30 ajacouto 187: .It Fl C , -compile
188: Write a
1.23 jaredy 189: .Pa magic.mgc
1.30 ajacouto 190: output file that contains a pre-parsed version of the magic file or directory.
191: .It Fl c , -checking-printout
1.1 deraadt 192: Cause a checking printout of the parsed form of the magic file.
1.30 ajacouto 193: This is usually used in conjunction with the
1.8 aaron 194: .Fl m
1.30 ajacouto 195: flag to debug a new magic file before installing it.
196: .It Fl e , -exclude Ar testname
197: Exclude the test named in
198: .Ar testname
199: from the list of tests made to determine the file type.
200: Valid test names are:
1.31 schwarze 201: .Bl -tag -width compress
1.30 ajacouto 202: .It apptype
203: Check for
204: .Dv EMX
205: application type (only on EMX).
206: .It ascii
207: Check for various types of ASCII files.
208: .It compress
209: Don't look for, or inside, compressed files.
210: .It elf
211: Don't print elf details.
212: .It fortran
213: Don't look for fortran sequences inside ASCII files.
214: .It soft
215: Don't consult magic files.
216: .It tar
217: Don't examine tar files.
218: .It token
219: Don't look for known tokens inside ASCII files.
220: .It troff
221: Don't look for troff sequences inside ASCII files.
222: .El
223: .It Fl F , -separator Ar separator
224: Use the specified string as the separator between the filename and the
225: file result returned.
1.23 jaredy 226: Defaults to
227: .Sq \&: .
1.30 ajacouto 228: .It Fl f , -files-from Ar namefile
1.6 aaron 229: Read the names of the files to be examined from
1.8 aaron 230: .Ar namefile
1.6 aaron 231: (one per line)
1.1 deraadt 232: before the argument list.
1.6 aaron 233: Either
1.8 aaron 234: .Ar namefile
1.1 deraadt 235: or at least one filename argument must be present;
1.8 aaron 236: to test the standard input, use
1.23 jaredy 237: .Sq -
1.8 aaron 238: as a filename argument.
1.30 ajacouto 239: .It Fl h , -no-dereference
240: Causes symlinks not to be followed.
241: This is the default if the environment variable
242: .Dv POSIXLY_CORRECT
243: is not defined.
244: .It Fl -help
245: Print a help message and exit.
246: .It Fl i , -mime
247: Causes the file command to output mime type strings rather than the more
248: traditional human readable ones.
249: Thus it may say
250: .Dq text/plain charset=us-ascii
251: rather than
252: .Dq ASCII text .
253: In order for this option to work,
254: .Nm
255: changes the way it handles files recognized by the command itself
256: (such as many of the text file types, directories etc.),
257: and makes use of an alternative
258: .Dq magic
259: file.
260: See also
261: .Sx FILES ,
262: below.
263: .It Fl -mime-encoding , -mime-type
264: Like
265: .Fl i ,
266: but print only the specified element(s).
267: .It Fl k , -keep-going
1.23 jaredy 268: Don't stop at the first match, keep going.
1.30 ajacouto 269: Subsequent matches will have the string
270: .Dq "\[rs]012\- "
271: prepended.
272: (If a newline is required, see the
273: .Fl r
274: option.)
275: .It Fl L , -dereference
276: Causes symlinks to be followed;
277: analogous to the option of the same name in
278: .Xr ls 1 .
279: This is the default if the environment variable
280: .Dv POSIXLY_CORRECT
281: is defined.
282: .It Fl m , -magic-file Ar magicfiles
283: Specify an alternate list of files and directories containing magic.
284: This can be a single item, or a colon-separated list.
285: If a compiled magic file is found alongside a file or directory,
286: it will be used instead.
287: .It Fl N , -no-pad
1.23 jaredy 288: Don't pad filenames so that they align in the output.
1.30 ajacouto 289: .It Fl n , -no-buffer
290: Force stdout to be flushed after checking each file.
1.23 jaredy 291: This is only useful if checking a list of files.
1.30 ajacouto 292: It is intended to be used by programs that want filetype output from a pipe.
293: .It Fl p , -preserve-date
294: On systems that support
295: .Xr utime 3
296: or
297: .Xr utimes 2 ,
298: attempt to preserve the access time of files analyzed, to pretend that
299: .Nm
300: never read them.
301: .It Fl r , -raw
302: Don't translate unprintable characters to \eooo.
1.23 jaredy 303: Normally
304: .Nm
1.30 ajacouto 305: translates unprintable characters to their octal representation.
306: .It Fl s , -special-files
1.23 jaredy 307: Normally,
308: .Nm
309: only attempts to read and determine the type of argument files which
310: .Xr stat 2
311: reports are ordinary files.
312: This prevents problems, because reading special files may have peculiar
313: consequences.
314: Specifying the
315: .Fl s
316: option causes
317: .Nm
318: to also read argument files which are block or character special files.
319: This is useful for determining the filesystem types of the data in raw
320: disk partitions, which are block special files.
321: This option also causes
322: .Nm
323: to disregard the file size as reported by
1.30 ajacouto 324: .Xr stat 2
1.23 jaredy 325: since on some systems it reports a zero size for raw disk partitions.
1.30 ajacouto 326: .It Fl v , -version
1.23 jaredy 327: Print the version of the program and exit.
1.30 ajacouto 328: .It Fl z , -uncompress
329: Try to look inside compressed files.
1.8 aaron 330: .El
331: .Sh ENVIRONMENT
1.30 ajacouto 332: The environment variable
333: .Dv MAGIC
334: can be used to set the default magic file name.
335: If that variable is set, then
336: .Nm
337: will not attempt to open
338: .Pa $HOME/.magic .
1.23 jaredy 339: .Nm
340: adds
341: .Dq .mgc
342: to the value of this variable as appropriate.
1.30 ajacouto 343: The environment variable
344: .Dv POSIXLY_CORRECT
345: controls whether
346: .Nm
347: will attempt to follow symlinks or not.
348: If set, then
349: .Nm
350: follows symlinks; otherwise it does not.
351: This is also controlled by the
352: .Fl L
353: and
354: .Fl h
355: options.
1.12 aaron 356: .Sh FILES
357: .Bl -tag -width /etc/magic -compact
358: .It Pa /etc/magic
359: default list of magic numbers
360: .El
1.32 jmc 361: .Sh EXIT STATUS
362: .Ex -std file
1.8 aaron 363: .Sh SEE ALSO
364: .Xr hexdump 1 ,
365: .Xr od 1 ,
366: .Xr strings 1 ,
367: .Xr magic 5
368: .Sh STANDARDS CONFORMANCE
1.1 deraadt 369: This program is believed to exceed the System V Interface Definition
370: of FILE(CMD), as near as one can determine from the vague language
1.6 aaron 371: contained therein.
1.30 ajacouto 372: Its behavior is mostly compatible with the System V program of the same name.
1.1 deraadt 373: This version knows more magic, however, so it will produce
1.6 aaron 374: different (albeit more accurate) output in many cases.
1.30 ajacouto 375: .\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html
1.8 aaron 376: .Pp
1.6 aaron 377: The one significant difference
1.1 deraadt 378: between this version and System V
1.30 ajacouto 379: is that this version treats any whitespace
1.1 deraadt 380: as a delimiter, so that spaces in pattern strings must be escaped.
381: For example,
1.30 ajacouto 382: .Bd -literal -offset indent
383: \*(Gt10 string language impress\ (imPRESS data)
384: .Ed
1.8 aaron 385: .Pp
1.1 deraadt 386: in an existing magic file would have to be changed to
1.30 ajacouto 387: .Bd -literal -offset indent
388: \*(Gt10 string language\e impress (imPRESS data)
389: .Ed
1.8 aaron 390: .Pp
1.1 deraadt 391: In addition, in this version, if a pattern string contains a backslash,
1.9 aaron 392: it must be escaped.
393: For example
1.30 ajacouto 394: .Bd -literal -offset indent
395: 0 string \ebegindata Andrew Toolkit document
396: .Ed
1.8 aaron 397: .Pp
1.1 deraadt 398: in an existing magic file would have to be changed to
1.30 ajacouto 399: .Bd -literal -offset indent
400: 0 string \e\ebegindata Andrew Toolkit document
401: .Ed
1.8 aaron 402: .Pp
1.1 deraadt 403: SunOS releases 3.2 and later from Sun Microsystems include a
1.30 ajacouto 404: .Nm
1.1 deraadt 405: command derived from the System V one, but with some extensions.
1.30 ajacouto 406: This version differs from Sun's only in minor ways.
1.8 aaron 407: It includes the extension of the
1.30 ajacouto 408: .Sq &
1.8 aaron 409: operator, used as,
1.1 deraadt 410: for example,
1.30 ajacouto 411: .Bd -literal -offset indent
412: \*(Gt16 long&0x7fffffff \*(Gt0 not stripped
413: .Ed
1.8 aaron 414: .Sh HISTORY
1.6 aaron 415: There has been a
1.8 aaron 416: .Nm
417: command in every
418: .Ux
1.16 mickey 419: since at least Research Version 4
420: (man page dated November, 1973).
1.1 deraadt 421: The System V version introduced one significant major change:
1.30 ajacouto 422: the external list of magic types.
1.1 deraadt 423: This slowed the program down slightly but made it a lot more flexible.
1.8 aaron 424: .Pp
1.30 ajacouto 425: This program, based on the System V version,
426: was written by Ian Darwin
1.8 aaron 427: without looking at anybody else's source code.
428: .Pp
1.30 ajacouto 429: John Gilmore revised the code extensively, making it better than
1.1 deraadt 430: the first version.
1.30 ajacouto 431: Geoff Collyer found several inadequacies
1.1 deraadt 432: and provided some magic file entries.
1.30 ajacouto 433: Contributions by the `&' operator by Rob McMahon, 1989.
1.23 jaredy 434: .Pp
1.30 ajacouto 435: Guy Harris, made many changes from 1993 to the present.
1.23 jaredy 436: .Pp
1.26 david 437: Primary development and maintenance from 1990 to the present by
1.30 ajacouto 438: Christos Zoulas.
1.8 aaron 439: .Pp
1.30 ajacouto 440: Altered by Chris Lowth, 2000:
441: Handle the
442: .Fl i
443: option to output mime type strings, using an alternative
444: magic file and internal logic.
445: .Pp
446: Altered by Eric Fischer, July, 2000,
447: to identify character codes and attempt to identify the languages
448: of non-ASCII files.
449: .Pp
450: Altered by Reuben Thomas, 2007 to 2008, to improve MIME
451: support and merge MIME and non-MIME magic, support directories as well
452: as files of magic, apply many bug fixes and improve the build system.
1.23 jaredy 453: .Pp
454: The list of contributors to the
1.30 ajacouto 455: .Dq magic
456: directory (magic files)
457: is too long to include here.
1.23 jaredy 458: You know who you are; thank you.
1.30 ajacouto 459: Many contributors are listed in the source files.
460: .Sh BUGS
1.8 aaron 461: .Pp
1.1 deraadt 462: There must be a better way to automate the construction of the Magic
1.8 aaron 463: file from all the glop in Magdir.
464: What is it?
465: .Pp
466: .Nm
1.30 ajacouto 467: uses several algorithms that favor speed over accuracy,
1.4 millert 468: thus it can be misled about the contents of
1.30 ajacouto 469: text
1.4 millert 470: files.
1.8 aaron 471: .Pp
1.30 ajacouto 472: The support for text files (primarily for programming languages)
1.1 deraadt 473: is simplistic, inefficient and requires recompilation to update.
1.8 aaron 474: .Pp
1.6 aaron 475: The list of keywords in
1.30 ajacouto 476: .Pa ascmagic
1.1 deraadt 477: probably belongs in the Magic file.
1.8 aaron 478: This could be done by using some keyword like
1.30 ajacouto 479: .Sq *
1.8 aaron 480: for the offset value.
481: .Pp
1.9 aaron 482: Complain about conflicts in the magic file entries.
1.1 deraadt 483: Make a rule that the magic entries sort based on file offset rather
484: than position within the magic file?
1.8 aaron 485: .Pp
1.6 aaron 486: The program should provide a way to give an estimate
1.8 aaron 487: of
488: .Dq how good
489: a guess is.
1.30 ajacouto 490: We end up removing guesses (e.g.
491: .Dq From\
1.8 aaron 492: as first 5 chars of file) because
1.30 ajacouto 493: they are not as good as other guesses (e.g.\&
1.8 aaron 494: .Dq Newsgroups:
495: versus
1.30 ajacouto 496: .Dq Return-Path: ) .
497: Still, if the others don't pan out, it should be possible to use the
498: first guess.
1.8 aaron 499: .Pp
1.1 deraadt 500: This manual page, and particularly this section, is too long.