Annotation of src/usr.bin/file/file.1, Revision 1.19
1.19 ! ian 1: .\" $OpenBSD: file.1,v 1.18 2003/02/15 09:44:42 jmc Exp $
1.8 aaron 2: .\" $FreeBSD: src/usr.bin/file/file.1,v 1.16 2000/03/01 12:19:39 sheldonh Exp $
1.18 jmc 3: .\"
1.19 ! ian 4: .\" Copyright (c) Ian F. Darwin 1986-1995.
! 5: .\" Software written by Ian F. Darwin and others;
! 6: .\" maintained 1995-present by Christos Zoulas and others.
! 7: .\"
! 8: .\" Redistribution and use in source and binary forms, with or without
! 9: .\" modification, are permitted provided that the following conditions
! 10: .\" are met:
! 11: .\" 1. Redistributions of source code must retain the above copyright
! 12: .\" notice immediately at the beginning of the file, without modification,
! 13: .\" this list of conditions, and the following disclaimer.
! 14: .\" 2. Redistributions in binary form must reproduce the above copyright
! 15: .\" notice, this list of conditions and the following disclaimer in the
! 16: .\" documentation and/or other materials provided with the distribution.
! 17: .\" 3. All advertising materials mentioning features or use of this software
! 18: .\" must display the following acknowledgement:
! 19: .\" This product includes software developed by Ian F. Darwin and others.
! 20: .\" 4. The name of the author may not be used to endorse or promote products
! 21: .\" derived from this software without specific prior written permission.
! 22: .\"
! 23: .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
! 24: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
! 25: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
! 26: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
! 27: .\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
! 28: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
! 29: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
! 30: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
! 31: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
! 32: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
! 33: .\" SUCH DAMAGE.
1.18 jmc 34: .\"
1.8 aaron 35: .Dd July 30, 1997
36: .Dt FILE 1
37: .Os
38: .Sh NAME
39: .Nm file
40: .Nd determine file type
41: .Sh SYNOPSIS
42: .Nm file
1.17 millert 43: .Op Fl vbczL
1.8 aaron 44: .Op Fl f Ar namefile
45: .Op Fl m Ar magicfiles
46: .Ar file Op Ar ...
47: .Sh DESCRIPTION
1.4 millert 48: This manual page documents version 3.22 of the
1.8 aaron 49: .Nm
1.4 millert 50: command.
1.8 aaron 51: .Nm
1.1 deraadt 52: tests each argument in an attempt to classify it.
53: There are three sets of tests, performed in this order:
54: filesystem tests, magic number tests, and language tests.
1.8 aaron 55: The first test that succeeds causes the file type to be printed.
56: .Pp
1.1 deraadt 57: The type printed will usually contain one of the words
1.8 aaron 58: .Dq text
1.4 millert 59: (the file contains only
1.8 aaron 60: .Tn ASCII
1.4 millert 61: characters and is probably safe to read on an
1.8 aaron 62: .Tn ASCII
1.4 millert 63: terminal),
1.8 aaron 64: .Dq executable
1.1 deraadt 65: (the file contains the result of compiling a program
1.8 aaron 66: in a form understandable to some
67: .Ux
68: kernel or another),
1.1 deraadt 69: or
1.8 aaron 70: .Dq data
71: meaning anything else (data is usually binary or non-printable).
72: .Pp
1.1 deraadt 73: Exceptions are well-known file formats (core files, tar archives)
74: that are known to contain binary data.
75: When modifying the file
1.8 aaron 76: .Pa /etc/magic
1.6 aaron 77: or the program itself,
1.8 aaron 78: .Em "preserve these keywords" .
79: .Pp
1.1 deraadt 80: People depend on knowing that all the readable files in a directory
1.8 aaron 81: have the word
82: .Dq text
83: printed.
84: Don't do as Berkeley did; change
85: .Dq shell commands text
86: to
87: .Dq shell script .
88: .Pp
1.1 deraadt 89: The filesystem tests are based on examining the return from a
1.8 aaron 90: .Xr stat 2
1.1 deraadt 91: system call.
92: The program checks to see if the file is empty,
93: or if it's some sort of special file.
94: Any known file types appropriate to the system you are running on
95: (sockets, symbolic links, or named pipes (FIFOs) on those systems that
96: implement them)
97: are intuited if they are defined in
98: the system header file
1.9 aaron 99: .Aq Pa sys/stat.h .
1.8 aaron 100: .Pp
1.1 deraadt 101: The magic number tests are used to check for files with data in
102: particular fixed formats.
103: The canonical example of this is a binary executable (compiled program)
1.8 aaron 104: .Pa a.out
1.6 aaron 105: file, whose format is defined in
1.8 aaron 106: .Aq Pa a.out.h
1.1 deraadt 107: and possibly
1.8 aaron 108: .Aq Pa exec.h
1.1 deraadt 109: in the standard include directory.
1.8 aaron 110: These files have a
111: .Dq magic number
112: stored in a particular place
113: near the beginning of the file that tells the
114: .Ux
115: operating system
1.1 deraadt 116: that the file is a binary executable, and which of several types thereof.
1.8 aaron 117: .Pp
118: The concept of magic number has been applied by extension to data files.
1.1 deraadt 119: Any file with some invariant identifier at a small fixed
120: offset into the file can usually be described in this way.
121: The information in these files is read from the magic file
1.8 aaron 122: .Pa /etc/magic .
123: .Pp
1.1 deraadt 124: If an argument appears to be an
1.8 aaron 125: .Tn ASCII
1.1 deraadt 126: file,
1.8 aaron 127: .Nm
1.1 deraadt 128: attempts to guess its language.
1.4 millert 129: The language tests look for particular strings (cf
1.8 aaron 130: .Pa names.h )
1.1 deraadt 131: that can appear anywhere in the first few blocks of a file.
132: For example, the keyword
1.8 aaron 133: .Em .br
1.4 millert 134: indicates that the file is most likely a
1.8 aaron 135: .Xr troff 1
1.6 aaron 136: input file, just as the keyword
1.8 aaron 137: .Li struct
1.1 deraadt 138: indicates a C program.
139: These tests are less reliable than the previous
140: two groups, so they are performed last.
141: The language test routines also test for some miscellany
1.6 aaron 142: (such as
1.8 aaron 143: .Xr tar 1
1.1 deraadt 144: archives) and determine whether an unknown file should be
1.8 aaron 145: labelled as
146: .Dq ASCII text
147: or
148: .Dq data .
149: .Pp
150: The options are as follows:
1.11 aaron 151: .Bl -tag -width Ds
1.8 aaron 152: .It Fl v
1.1 deraadt 153: Print the version of the program and exit.
1.8 aaron 154: .It Fl m Ar list
155: Specify an alternate
156: .Ar list
157: of files containing magic numbers.
1.2 deraadt 158: This can be a single file, or a colon-separated list of files.
1.8 aaron 159: .It Fl z
1.1 deraadt 160: Try to look inside compressed files.
1.17 millert 161: .It Fl b
162: Do not prepend filenames to output lines (brief mode).
1.8 aaron 163: .It Fl c
1.1 deraadt 164: Cause a checking printout of the parsed form of the magic file.
1.6 aaron 165: This is usually used in conjunction with
1.8 aaron 166: .Fl m
1.1 deraadt 167: to debug a new magic file before installing it.
1.8 aaron 168: .It Fl f Ar namefile
1.6 aaron 169: Read the names of the files to be examined from
1.8 aaron 170: .Ar namefile
1.6 aaron 171: (one per line)
1.1 deraadt 172: before the argument list.
1.6 aaron 173: Either
1.8 aaron 174: .Ar namefile
1.1 deraadt 175: or at least one filename argument must be present;
1.8 aaron 176: to test the standard input, use
177: .Dq -
178: as a filename argument.
179: .It Fl L
180: Cause symlinks to be followed, as the like-named option in
181: .Xr ls 1 .
1.1 deraadt 182: (on systems that support symbolic links).
1.8 aaron 183: .El
184: .Sh ENVIRONMENT
185: .Bl -tag -width indent
1.13 smart 186: .It Ev MAGIC
1.8 aaron 187: Default magic number files.
188: .El
1.12 aaron 189: .Sh FILES
190: .Bl -tag -width /etc/magic -compact
191: .It Pa /etc/magic
192: default list of magic numbers
193: .El
1.8 aaron 194: .Sh SEE ALSO
195: .Xr hexdump 1 ,
196: .Xr od 1 ,
197: .Xr strings 1 ,
198: .Xr magic 5
199: .Sh STANDARDS CONFORMANCE
1.1 deraadt 200: This program is believed to exceed the System V Interface Definition
201: of FILE(CMD), as near as one can determine from the vague language
1.6 aaron 202: contained therein.
1.1 deraadt 203: Its behaviour is mostly compatible with the System V program of the same name.
204: This version knows more magic, however, so it will produce
1.6 aaron 205: different (albeit more accurate) output in many cases.
1.8 aaron 206: .Pp
1.6 aaron 207: The one significant difference
1.1 deraadt 208: between this version and System V
1.8 aaron 209: is that this version treats any white space
1.1 deraadt 210: as a delimiter, so that spaces in pattern strings must be escaped.
211: For example,
1.8 aaron 212: .Pp
213: >10 string language impress\ (imPRESS data)
214: .Pp
1.1 deraadt 215: in an existing magic file would have to be changed to
1.8 aaron 216: .Pp
217: >10 string language\e impress (imPRESS data)
218: .Pp
1.1 deraadt 219: In addition, in this version, if a pattern string contains a backslash,
1.9 aaron 220: it must be escaped.
221: For example
1.8 aaron 222: .Pp
223: 0 string \ebegindata Andrew Toolkit document
224: .Pp
1.1 deraadt 225: in an existing magic file would have to be changed to
1.8 aaron 226: .Pp
227: 0 string \e\ebegindata Andrew Toolkit document
228: .Pp
1.1 deraadt 229: SunOS releases 3.2 and later from Sun Microsystems include a
1.8 aaron 230: .Xr file 1
1.1 deraadt 231: command derived from the System V one, but with some extensions.
232: My version differs from Sun's only in minor ways.
1.8 aaron 233: It includes the extension of the
234: .Ql &
235: operator, used as,
1.1 deraadt 236: for example,
1.8 aaron 237: .Pp
238: >16 long&0x7fffffff >0 not stripped
239: .Sh MAGIC DIRECTORY
1.1 deraadt 240: The magic file entries have been collected from various sources,
241: mainly USENET, and contributed by various authors.
1.8 aaron 242: .An Christos Zoulas
243: (address below) will collect additional
1.1 deraadt 244: or corrected magic file entries.
1.6 aaron 245: A consolidation of magic file entries
1.1 deraadt 246: will be distributed periodically.
247: The order of entries in the magic file is significant.
248: Depending on what system you are using, the order that
249: they are put together may be incorrect.
250: If your old
1.8 aaron 251: .Nm
1.1 deraadt 252: command uses a magic file,
253: keep the old magic file around for comparison purposes
1.6 aaron 254: (rename it to
1.8 aaron 255: .Pa /etc/magic.orig ) .
256: .Sh HISTORY
1.6 aaron 257: There has been a
1.8 aaron 258: .Nm
259: command in every
260: .Ux
1.16 mickey 261: since at least Research Version 4
262: (man page dated November, 1973).
1.1 deraadt 263: The System V version introduced one significant major change:
264: the external list of magic number types.
265: This slowed the program down slightly but made it a lot more flexible.
1.8 aaron 266: .Pp
1.10 ian 267: This program, based on the System V version, was written by
268: .An Ian F. Darwin Aq ian@darwinisys.com
1.8 aaron 269: without looking at anybody else's source code.
270: .Pp
271: .An John Gilmore
272: revised the code extensively, making it better than
1.1 deraadt 273: the first version.
1.8 aaron 274: .An Geoff Collyer
275: found several inadequacies
1.1 deraadt 276: and provided some magic file entries.
1.8 aaron 277: .Pp
278: Altered by
279: .An Rob McMahon Aq cudcv@warwick.ac.uk ,
280: 1989, to extend the
281: .Ql &
282: operator from simple
283: .Dq x&y != 0
284: to
285: .Dq x&y op z .
286: .Pp
287: Altered by
288: .An Guy Harris Aq guy@auspex.com ,
289: 1993, to:
290: .Bl -item -offset indent
291: .It
292: put the
293: .Dq old-style
294: .Ql &
295: operator back the way it was, because
296: .Bl -enum -offset indent
297: .It
298: Rob McMahon's change broke the
299: previous style of usage,
300: .It
301: The SunOS
302: .Dq new-style
303: .Ql &
304: operator, which this version of
305: .Nm
306: supports, also handles
307: .Dq x&y op z ,
308: .It
309: Rob's change wasn't documented in any case;
310: .El
311: .It
312: put in multiple levels of
313: .Ql > ;
314: .It
315: put in
316: .Dq beshort ,
317: .Dq leshort ,
318: etc. keywords to look at numbers in the
1.1 deraadt 319: file in a specific byte order, rather than in the native byte order of
320: the process running
1.8 aaron 321: .Nm file .
322: .El
323: .Pp
1.10 ian 324: Currently maintained by
325: .An Christos Zoulas Aq christos@zoulas.com .
1.8 aaron 326: .Sh LEGAL NOTICE
1.10 ian 327: Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
328: Covered by the standard Berkeley Software Distribution copyright; see the file
329: LEGAL.NOTICE in the distribution.
1.8 aaron 330: .Pp
1.1 deraadt 331: The files
1.8 aaron 332: .Pa tar.h
1.1 deraadt 333: and
1.8 aaron 334: .Pa is_tar.c
335: were written by
336: .An John Gilmore
337: from his public-domain
338: .Nm tar
1.10 ian 339: program.
1.8 aaron 340: .Sh BUGS
1.1 deraadt 341: There must be a better way to automate the construction of the Magic
1.8 aaron 342: file from all the glop in Magdir.
343: What is it?
1.1 deraadt 344: Better yet, the magic file should be compiled into binary (say,
1.8 aaron 345: .Xr ndbm 3
1.4 millert 346: or, better yet, fixed-length
1.8 aaron 347: .Tn ASCII
1.4 millert 348: strings for use in heterogenous network environments) for faster startup.
1.1 deraadt 349: Then the program would run as fast as the Version 7 program of the same name,
350: with the flexibility of the System V version.
1.8 aaron 351: .Pp
352: .Nm
1.15 pjanzen 353: uses several algorithms that favor speed over accuracy;
1.4 millert 354: thus it can be misled about the contents of
1.8 aaron 355: .Tn ASCII
1.4 millert 356: files.
1.8 aaron 357: .Pp
1.4 millert 358: The support for
1.8 aaron 359: .Tn ASCII
1.4 millert 360: files (primarily for programming languages)
1.1 deraadt 361: is simplistic, inefficient and requires recompilation to update.
1.8 aaron 362: .Pp
363: There should be an
364: .Dq else
365: clause to follow a series of continuation lines.
366: .Pp
1.1 deraadt 367: The magic file and keywords should have regular expression support.
1.4 millert 368: Their use of
1.8 aaron 369: .Tn ASCII TAB
1.4 millert 370: as a field delimiter is ugly and makes
1.1 deraadt 371: it hard to edit the files, but is entrenched.
1.8 aaron 372: .Pp
1.1 deraadt 373: It might be advisable to allow upper-case letters in keywords
1.4 millert 374: for e.g.,
1.8 aaron 375: .Xr troff 1
1.4 millert 376: commands vs man page macros.
1.1 deraadt 377: Regular expression support would make this easy.
1.8 aaron 378: .Pp
1.1 deraadt 379: The program doesn't grok \s-2FORTRAN\s0.
1.6 aaron 380: It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
1.1 deraadt 381: appear indented at the start of line.
382: Regular expression support would make this easy.
1.8 aaron 383: .Pp
1.6 aaron 384: The list of keywords in
1.8 aaron 385: .Em ascmagic
1.1 deraadt 386: probably belongs in the Magic file.
1.8 aaron 387: This could be done by using some keyword like
388: .Ql *
389: for the offset value.
390: .Pp
391: Another optimization would be to sort
1.1 deraadt 392: the magic file so that we can just run down all the
393: tests for the first byte, first word, first long, etc, once we
1.9 aaron 394: have fetched it.
395: Complain about conflicts in the magic file entries.
1.1 deraadt 396: Make a rule that the magic entries sort based on file offset rather
397: than position within the magic file?
1.8 aaron 398: .Pp
1.6 aaron 399: The program should provide a way to give an estimate
1.8 aaron 400: of
401: .Dq how good
402: a guess is.
403: We end up removing guesses (e.g.,
404: .Dq From\
405: as first 5 chars of file) because
406: they are not as good as other guesses (e.g.,
407: .Dq Newsgroups:
408: versus
409: .Qq Return-Path: ) .
410: Still, if the others don't pan out, it should be
1.6 aaron 411: possible to use the first guess.
1.8 aaron 412: .Pp
413: This program is slower than some vendors'
414: .Nm
415: commands.
416: .Pp
1.1 deraadt 417: This manual page, and particularly this section, is too long.
1.8 aaron 418: .Sh AVAILABILITY
1.1 deraadt 419: You can obtain the original author's latest version by anonymous FTP
1.8 aaron 420: on
1.15 pjanzen 421: .Em ftp.astron.com
1.8 aaron 422: in the directory
423: .Pa /pub/file/file-X.YY.tar.gz