Annotation of src/usr.bin/file/file.1, Revision 1.22
1.22 ! jaredy 1: .\" $OpenBSD: file.1,v 1.21 2003/06/13 18:31:14 deraadt Exp $
1.8 aaron 2: .\" $FreeBSD: src/usr.bin/file/file.1,v 1.16 2000/03/01 12:19:39 sheldonh Exp $
1.18 jmc 3: .\"
1.19 ian 4: .\" Copyright (c) Ian F. Darwin 1986-1995.
5: .\" Software written by Ian F. Darwin and others;
6: .\" maintained 1995-present by Christos Zoulas and others.
1.20 jmc 7: .\"
1.19 ian 8: .\" Redistribution and use in source and binary forms, with or without
9: .\" modification, are permitted provided that the following conditions
10: .\" are met:
11: .\" 1. Redistributions of source code must retain the above copyright
12: .\" notice immediately at the beginning of the file, without modification,
13: .\" this list of conditions, and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
1.20 jmc 17: .\"
1.19 ian 18: .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
19: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
22: .\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
24: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
25: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
26: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
27: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
28: .\" SUCH DAMAGE.
1.18 jmc 29: .\"
1.8 aaron 30: .Dd July 30, 1997
31: .Dt FILE 1
32: .Os
33: .Sh NAME
34: .Nm file
35: .Nd determine file type
36: .Sh SYNOPSIS
37: .Nm file
1.17 millert 38: .Op Fl vbczL
1.8 aaron 39: .Op Fl f Ar namefile
40: .Op Fl m Ar magicfiles
41: .Ar file Op Ar ...
42: .Sh DESCRIPTION
1.22 ! jaredy 43: The
1.8 aaron 44: .Nm
1.22 ! jaredy 45: utility
1.1 deraadt 46: tests each argument in an attempt to classify it.
47: There are three sets of tests, performed in this order:
48: filesystem tests, magic number tests, and language tests.
1.8 aaron 49: The first test that succeeds causes the file type to be printed.
50: .Pp
1.1 deraadt 51: The type printed will usually contain one of the words
1.8 aaron 52: .Dq text
1.4 millert 53: (the file contains only
1.8 aaron 54: .Tn ASCII
1.4 millert 55: characters and is probably safe to read on an
1.8 aaron 56: .Tn ASCII
1.4 millert 57: terminal),
1.8 aaron 58: .Dq executable
1.1 deraadt 59: (the file contains the result of compiling a program
1.8 aaron 60: in a form understandable to some
61: .Ux
62: kernel or another),
1.1 deraadt 63: or
1.8 aaron 64: .Dq data
65: meaning anything else (data is usually binary or non-printable).
66: .Pp
1.1 deraadt 67: Exceptions are well-known file formats (core files, tar archives)
68: that are known to contain binary data.
69: When modifying the file
1.8 aaron 70: .Pa /etc/magic
1.6 aaron 71: or the program itself,
1.8 aaron 72: .Em "preserve these keywords" .
73: .Pp
1.1 deraadt 74: People depend on knowing that all the readable files in a directory
1.8 aaron 75: have the word
76: .Dq text
77: printed.
78: Don't do as Berkeley did; change
79: .Dq shell commands text
80: to
81: .Dq shell script .
82: .Pp
1.1 deraadt 83: The filesystem tests are based on examining the return from a
1.8 aaron 84: .Xr stat 2
1.1 deraadt 85: system call.
86: The program checks to see if the file is empty,
87: or if it's some sort of special file.
88: Any known file types appropriate to the system you are running on
89: (sockets, symbolic links, or named pipes (FIFOs) on those systems that
90: implement them)
91: are intuited if they are defined in
92: the system header file
1.9 aaron 93: .Aq Pa sys/stat.h .
1.8 aaron 94: .Pp
1.1 deraadt 95: The magic number tests are used to check for files with data in
96: particular fixed formats.
97: The canonical example of this is a binary executable (compiled program)
1.8 aaron 98: .Pa a.out
1.6 aaron 99: file, whose format is defined in
1.8 aaron 100: .Aq Pa a.out.h
1.1 deraadt 101: and possibly
1.8 aaron 102: .Aq Pa exec.h
1.1 deraadt 103: in the standard include directory.
1.8 aaron 104: These files have a
105: .Dq magic number
106: stored in a particular place
107: near the beginning of the file that tells the
108: .Ux
109: operating system
1.1 deraadt 110: that the file is a binary executable, and which of several types thereof.
1.8 aaron 111: .Pp
112: The concept of magic number has been applied by extension to data files.
1.1 deraadt 113: Any file with some invariant identifier at a small fixed
114: offset into the file can usually be described in this way.
115: The information in these files is read from the magic file
1.8 aaron 116: .Pa /etc/magic .
117: .Pp
1.1 deraadt 118: If an argument appears to be an
1.8 aaron 119: .Tn ASCII
1.1 deraadt 120: file,
1.8 aaron 121: .Nm
1.1 deraadt 122: attempts to guess its language.
1.4 millert 123: The language tests look for particular strings (cf
1.8 aaron 124: .Pa names.h )
1.1 deraadt 125: that can appear anywhere in the first few blocks of a file.
126: For example, the keyword
1.8 aaron 127: .Em .br
1.4 millert 128: indicates that the file is most likely a
1.8 aaron 129: .Xr troff 1
1.6 aaron 130: input file, just as the keyword
1.8 aaron 131: .Li struct
1.1 deraadt 132: indicates a C program.
133: These tests are less reliable than the previous
134: two groups, so they are performed last.
135: The language test routines also test for some miscellany
1.6 aaron 136: (such as
1.8 aaron 137: .Xr tar 1
1.1 deraadt 138: archives) and determine whether an unknown file should be
1.8 aaron 139: labelled as
140: .Dq ASCII text
141: or
142: .Dq data .
143: .Pp
144: The options are as follows:
1.11 aaron 145: .Bl -tag -width Ds
1.8 aaron 146: .It Fl v
1.1 deraadt 147: Print the version of the program and exit.
1.8 aaron 148: .It Fl m Ar list
149: Specify an alternate
150: .Ar list
151: of files containing magic numbers.
1.2 deraadt 152: This can be a single file, or a colon-separated list of files.
1.8 aaron 153: .It Fl z
1.1 deraadt 154: Try to look inside compressed files.
1.17 millert 155: .It Fl b
156: Do not prepend filenames to output lines (brief mode).
1.8 aaron 157: .It Fl c
1.1 deraadt 158: Cause a checking printout of the parsed form of the magic file.
1.6 aaron 159: This is usually used in conjunction with
1.8 aaron 160: .Fl m
1.1 deraadt 161: to debug a new magic file before installing it.
1.8 aaron 162: .It Fl f Ar namefile
1.6 aaron 163: Read the names of the files to be examined from
1.8 aaron 164: .Ar namefile
1.6 aaron 165: (one per line)
1.1 deraadt 166: before the argument list.
1.6 aaron 167: Either
1.8 aaron 168: .Ar namefile
1.1 deraadt 169: or at least one filename argument must be present;
1.8 aaron 170: to test the standard input, use
171: .Dq -
172: as a filename argument.
173: .It Fl L
174: Cause symlinks to be followed, as the like-named option in
175: .Xr ls 1 .
1.1 deraadt 176: (on systems that support symbolic links).
1.8 aaron 177: .El
178: .Sh ENVIRONMENT
179: .Bl -tag -width indent
1.13 smart 180: .It Ev MAGIC
1.8 aaron 181: Default magic number files.
182: .El
1.12 aaron 183: .Sh FILES
184: .Bl -tag -width /etc/magic -compact
185: .It Pa /etc/magic
186: default list of magic numbers
187: .El
1.8 aaron 188: .Sh SEE ALSO
189: .Xr hexdump 1 ,
190: .Xr od 1 ,
191: .Xr strings 1 ,
192: .Xr magic 5
193: .Sh STANDARDS CONFORMANCE
1.1 deraadt 194: This program is believed to exceed the System V Interface Definition
195: of FILE(CMD), as near as one can determine from the vague language
1.6 aaron 196: contained therein.
1.1 deraadt 197: Its behaviour is mostly compatible with the System V program of the same name.
198: This version knows more magic, however, so it will produce
1.6 aaron 199: different (albeit more accurate) output in many cases.
1.8 aaron 200: .Pp
1.6 aaron 201: The one significant difference
1.1 deraadt 202: between this version and System V
1.8 aaron 203: is that this version treats any white space
1.1 deraadt 204: as a delimiter, so that spaces in pattern strings must be escaped.
205: For example,
1.8 aaron 206: .Pp
207: >10 string language impress\ (imPRESS data)
208: .Pp
1.1 deraadt 209: in an existing magic file would have to be changed to
1.8 aaron 210: .Pp
211: >10 string language\e impress (imPRESS data)
212: .Pp
1.1 deraadt 213: In addition, in this version, if a pattern string contains a backslash,
1.9 aaron 214: it must be escaped.
215: For example
1.8 aaron 216: .Pp
217: 0 string \ebegindata Andrew Toolkit document
218: .Pp
1.1 deraadt 219: in an existing magic file would have to be changed to
1.8 aaron 220: .Pp
221: 0 string \e\ebegindata Andrew Toolkit document
222: .Pp
1.1 deraadt 223: SunOS releases 3.2 and later from Sun Microsystems include a
1.20 jmc 224: .Nm file
1.1 deraadt 225: command derived from the System V one, but with some extensions.
226: My version differs from Sun's only in minor ways.
1.8 aaron 227: It includes the extension of the
228: .Ql &
229: operator, used as,
1.1 deraadt 230: for example,
1.8 aaron 231: .Pp
232: >16 long&0x7fffffff >0 not stripped
233: .Sh MAGIC DIRECTORY
1.1 deraadt 234: The magic file entries have been collected from various sources,
235: mainly USENET, and contributed by various authors.
1.8 aaron 236: .An Christos Zoulas
237: (address below) will collect additional
1.1 deraadt 238: or corrected magic file entries.
1.6 aaron 239: A consolidation of magic file entries
1.1 deraadt 240: will be distributed periodically.
241: The order of entries in the magic file is significant.
242: Depending on what system you are using, the order that
243: they are put together may be incorrect.
244: If your old
1.8 aaron 245: .Nm
1.1 deraadt 246: command uses a magic file,
247: keep the old magic file around for comparison purposes
1.6 aaron 248: (rename it to
1.8 aaron 249: .Pa /etc/magic.orig ) .
250: .Sh HISTORY
1.6 aaron 251: There has been a
1.8 aaron 252: .Nm
253: command in every
254: .Ux
1.16 mickey 255: since at least Research Version 4
256: (man page dated November, 1973).
1.1 deraadt 257: The System V version introduced one significant major change:
258: the external list of magic number types.
259: This slowed the program down slightly but made it a lot more flexible.
1.8 aaron 260: .Pp
1.10 ian 261: This program, based on the System V version, was written by
262: .An Ian F. Darwin Aq ian@darwinisys.com
1.8 aaron 263: without looking at anybody else's source code.
264: .Pp
265: .An John Gilmore
266: revised the code extensively, making it better than
1.1 deraadt 267: the first version.
1.8 aaron 268: .An Geoff Collyer
269: found several inadequacies
1.1 deraadt 270: and provided some magic file entries.
1.8 aaron 271: .Pp
272: Altered by
273: .An Rob McMahon Aq cudcv@warwick.ac.uk ,
274: 1989, to extend the
275: .Ql &
276: operator from simple
277: .Dq x&y != 0
278: to
279: .Dq x&y op z .
280: .Pp
281: Altered by
282: .An Guy Harris Aq guy@auspex.com ,
283: 1993, to:
284: .Bl -item -offset indent
285: .It
286: put the
287: .Dq old-style
288: .Ql &
289: operator back the way it was, because
290: .Bl -enum -offset indent
291: .It
292: Rob McMahon's change broke the
293: previous style of usage,
294: .It
295: The SunOS
296: .Dq new-style
297: .Ql &
298: operator, which this version of
299: .Nm
300: supports, also handles
301: .Dq x&y op z ,
302: .It
303: Rob's change wasn't documented in any case;
304: .El
305: .It
306: put in multiple levels of
307: .Ql > ;
308: .It
309: put in
310: .Dq beshort ,
311: .Dq leshort ,
312: etc. keywords to look at numbers in the
1.1 deraadt 313: file in a specific byte order, rather than in the native byte order of
314: the process running
1.8 aaron 315: .Nm file .
316: .El
317: .Pp
1.10 ian 318: Currently maintained by
319: .An Christos Zoulas Aq christos@zoulas.com .
1.8 aaron 320: .Sh LEGAL NOTICE
1.10 ian 321: Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
322: Covered by the standard Berkeley Software Distribution copyright; see the file
323: LEGAL.NOTICE in the distribution.
1.8 aaron 324: .Pp
1.1 deraadt 325: The files
1.8 aaron 326: .Pa tar.h
1.1 deraadt 327: and
1.8 aaron 328: .Pa is_tar.c
329: were written by
330: .An John Gilmore
331: from his public-domain
332: .Nm tar
1.10 ian 333: program.
1.8 aaron 334: .Sh BUGS
1.1 deraadt 335: There must be a better way to automate the construction of the Magic
1.8 aaron 336: file from all the glop in Magdir.
337: What is it?
1.1 deraadt 338: Better yet, the magic file should be compiled into binary (say,
1.8 aaron 339: .Xr ndbm 3
1.4 millert 340: or, better yet, fixed-length
1.8 aaron 341: .Tn ASCII
1.4 millert 342: strings for use in heterogenous network environments) for faster startup.
1.1 deraadt 343: Then the program would run as fast as the Version 7 program of the same name,
344: with the flexibility of the System V version.
1.8 aaron 345: .Pp
346: .Nm
1.15 pjanzen 347: uses several algorithms that favor speed over accuracy;
1.4 millert 348: thus it can be misled about the contents of
1.8 aaron 349: .Tn ASCII
1.4 millert 350: files.
1.8 aaron 351: .Pp
1.4 millert 352: The support for
1.8 aaron 353: .Tn ASCII
1.4 millert 354: files (primarily for programming languages)
1.1 deraadt 355: is simplistic, inefficient and requires recompilation to update.
1.8 aaron 356: .Pp
357: There should be an
358: .Dq else
359: clause to follow a series of continuation lines.
360: .Pp
1.1 deraadt 361: The magic file and keywords should have regular expression support.
1.4 millert 362: Their use of
1.8 aaron 363: .Tn ASCII TAB
1.4 millert 364: as a field delimiter is ugly and makes
1.1 deraadt 365: it hard to edit the files, but is entrenched.
1.8 aaron 366: .Pp
1.1 deraadt 367: It might be advisable to allow upper-case letters in keywords
1.4 millert 368: for e.g.,
1.8 aaron 369: .Xr troff 1
1.4 millert 370: commands vs man page macros.
1.1 deraadt 371: Regular expression support would make this easy.
1.8 aaron 372: .Pp
1.1 deraadt 373: The program doesn't grok \s-2FORTRAN\s0.
1.6 aaron 374: It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
1.1 deraadt 375: appear indented at the start of line.
376: Regular expression support would make this easy.
1.8 aaron 377: .Pp
1.6 aaron 378: The list of keywords in
1.8 aaron 379: .Em ascmagic
1.1 deraadt 380: probably belongs in the Magic file.
1.8 aaron 381: This could be done by using some keyword like
382: .Ql *
383: for the offset value.
384: .Pp
385: Another optimization would be to sort
1.1 deraadt 386: the magic file so that we can just run down all the
387: tests for the first byte, first word, first long, etc, once we
1.9 aaron 388: have fetched it.
389: Complain about conflicts in the magic file entries.
1.1 deraadt 390: Make a rule that the magic entries sort based on file offset rather
391: than position within the magic file?
1.8 aaron 392: .Pp
1.6 aaron 393: The program should provide a way to give an estimate
1.8 aaron 394: of
395: .Dq how good
396: a guess is.
397: We end up removing guesses (e.g.,
1.20 jmc 398: .Dq From\ \&
1.8 aaron 399: as first 5 chars of file) because
400: they are not as good as other guesses (e.g.,
401: .Dq Newsgroups:
402: versus
403: .Qq Return-Path: ) .
404: Still, if the others don't pan out, it should be
1.6 aaron 405: possible to use the first guess.
1.8 aaron 406: .Pp
407: This program is slower than some vendors'
408: .Nm
409: commands.
410: .Pp
1.1 deraadt 411: This manual page, and particularly this section, is too long.
1.8 aaron 412: .Sh AVAILABILITY
1.1 deraadt 413: You can obtain the original author's latest version by anonymous FTP
1.8 aaron 414: on
1.15 pjanzen 415: .Em ftp.astron.com
1.8 aaron 416: in the directory
1.20 jmc 417: .Pa /pub/file/file-X.YY.tar.gz .