Annotation of src/usr.bin/file/file.1, Revision 1.21
1.21 ! deraadt 1: .\" $OpenBSD: file.1,v 1.20 2003/06/10 09:12:10 jmc Exp $
1.8 aaron 2: .\" $FreeBSD: src/usr.bin/file/file.1,v 1.16 2000/03/01 12:19:39 sheldonh Exp $
1.18 jmc 3: .\"
1.19 ian 4: .\" Copyright (c) Ian F. Darwin 1986-1995.
5: .\" Software written by Ian F. Darwin and others;
6: .\" maintained 1995-present by Christos Zoulas and others.
1.20 jmc 7: .\"
1.19 ian 8: .\" Redistribution and use in source and binary forms, with or without
9: .\" modification, are permitted provided that the following conditions
10: .\" are met:
11: .\" 1. Redistributions of source code must retain the above copyright
12: .\" notice immediately at the beginning of the file, without modification,
13: .\" this list of conditions, and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
1.20 jmc 17: .\"
1.19 ian 18: .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
19: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
22: .\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
24: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
25: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
26: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
27: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
28: .\" SUCH DAMAGE.
1.18 jmc 29: .\"
1.8 aaron 30: .Dd July 30, 1997
31: .Dt FILE 1
32: .Os
33: .Sh NAME
34: .Nm file
35: .Nd determine file type
36: .Sh SYNOPSIS
37: .Nm file
1.17 millert 38: .Op Fl vbczL
1.8 aaron 39: .Op Fl f Ar namefile
40: .Op Fl m Ar magicfiles
41: .Ar file Op Ar ...
42: .Sh DESCRIPTION
1.4 millert 43: This manual page documents version 3.22 of the
1.8 aaron 44: .Nm
1.4 millert 45: command.
1.8 aaron 46: .Nm
1.1 deraadt 47: tests each argument in an attempt to classify it.
48: There are three sets of tests, performed in this order:
49: filesystem tests, magic number tests, and language tests.
1.8 aaron 50: The first test that succeeds causes the file type to be printed.
51: .Pp
1.1 deraadt 52: The type printed will usually contain one of the words
1.8 aaron 53: .Dq text
1.4 millert 54: (the file contains only
1.8 aaron 55: .Tn ASCII
1.4 millert 56: characters and is probably safe to read on an
1.8 aaron 57: .Tn ASCII
1.4 millert 58: terminal),
1.8 aaron 59: .Dq executable
1.1 deraadt 60: (the file contains the result of compiling a program
1.8 aaron 61: in a form understandable to some
62: .Ux
63: kernel or another),
1.1 deraadt 64: or
1.8 aaron 65: .Dq data
66: meaning anything else (data is usually binary or non-printable).
67: .Pp
1.1 deraadt 68: Exceptions are well-known file formats (core files, tar archives)
69: that are known to contain binary data.
70: When modifying the file
1.8 aaron 71: .Pa /etc/magic
1.6 aaron 72: or the program itself,
1.8 aaron 73: .Em "preserve these keywords" .
74: .Pp
1.1 deraadt 75: People depend on knowing that all the readable files in a directory
1.8 aaron 76: have the word
77: .Dq text
78: printed.
79: Don't do as Berkeley did; change
80: .Dq shell commands text
81: to
82: .Dq shell script .
83: .Pp
1.1 deraadt 84: The filesystem tests are based on examining the return from a
1.8 aaron 85: .Xr stat 2
1.1 deraadt 86: system call.
87: The program checks to see if the file is empty,
88: or if it's some sort of special file.
89: Any known file types appropriate to the system you are running on
90: (sockets, symbolic links, or named pipes (FIFOs) on those systems that
91: implement them)
92: are intuited if they are defined in
93: the system header file
1.9 aaron 94: .Aq Pa sys/stat.h .
1.8 aaron 95: .Pp
1.1 deraadt 96: The magic number tests are used to check for files with data in
97: particular fixed formats.
98: The canonical example of this is a binary executable (compiled program)
1.8 aaron 99: .Pa a.out
1.6 aaron 100: file, whose format is defined in
1.8 aaron 101: .Aq Pa a.out.h
1.1 deraadt 102: and possibly
1.8 aaron 103: .Aq Pa exec.h
1.1 deraadt 104: in the standard include directory.
1.8 aaron 105: These files have a
106: .Dq magic number
107: stored in a particular place
108: near the beginning of the file that tells the
109: .Ux
110: operating system
1.1 deraadt 111: that the file is a binary executable, and which of several types thereof.
1.8 aaron 112: .Pp
113: The concept of magic number has been applied by extension to data files.
1.1 deraadt 114: Any file with some invariant identifier at a small fixed
115: offset into the file can usually be described in this way.
116: The information in these files is read from the magic file
1.8 aaron 117: .Pa /etc/magic .
118: .Pp
1.1 deraadt 119: If an argument appears to be an
1.8 aaron 120: .Tn ASCII
1.1 deraadt 121: file,
1.8 aaron 122: .Nm
1.1 deraadt 123: attempts to guess its language.
1.4 millert 124: The language tests look for particular strings (cf
1.8 aaron 125: .Pa names.h )
1.1 deraadt 126: that can appear anywhere in the first few blocks of a file.
127: For example, the keyword
1.8 aaron 128: .Em .br
1.4 millert 129: indicates that the file is most likely a
1.8 aaron 130: .Xr troff 1
1.6 aaron 131: input file, just as the keyword
1.8 aaron 132: .Li struct
1.1 deraadt 133: indicates a C program.
134: These tests are less reliable than the previous
135: two groups, so they are performed last.
136: The language test routines also test for some miscellany
1.6 aaron 137: (such as
1.8 aaron 138: .Xr tar 1
1.1 deraadt 139: archives) and determine whether an unknown file should be
1.8 aaron 140: labelled as
141: .Dq ASCII text
142: or
143: .Dq data .
144: .Pp
145: The options are as follows:
1.11 aaron 146: .Bl -tag -width Ds
1.8 aaron 147: .It Fl v
1.1 deraadt 148: Print the version of the program and exit.
1.8 aaron 149: .It Fl m Ar list
150: Specify an alternate
151: .Ar list
152: of files containing magic numbers.
1.2 deraadt 153: This can be a single file, or a colon-separated list of files.
1.8 aaron 154: .It Fl z
1.1 deraadt 155: Try to look inside compressed files.
1.17 millert 156: .It Fl b
157: Do not prepend filenames to output lines (brief mode).
1.8 aaron 158: .It Fl c
1.1 deraadt 159: Cause a checking printout of the parsed form of the magic file.
1.6 aaron 160: This is usually used in conjunction with
1.8 aaron 161: .Fl m
1.1 deraadt 162: to debug a new magic file before installing it.
1.8 aaron 163: .It Fl f Ar namefile
1.6 aaron 164: Read the names of the files to be examined from
1.8 aaron 165: .Ar namefile
1.6 aaron 166: (one per line)
1.1 deraadt 167: before the argument list.
1.6 aaron 168: Either
1.8 aaron 169: .Ar namefile
1.1 deraadt 170: or at least one filename argument must be present;
1.8 aaron 171: to test the standard input, use
172: .Dq -
173: as a filename argument.
174: .It Fl L
175: Cause symlinks to be followed, as the like-named option in
176: .Xr ls 1 .
1.1 deraadt 177: (on systems that support symbolic links).
1.8 aaron 178: .El
179: .Sh ENVIRONMENT
180: .Bl -tag -width indent
1.13 smart 181: .It Ev MAGIC
1.8 aaron 182: Default magic number files.
183: .El
1.12 aaron 184: .Sh FILES
185: .Bl -tag -width /etc/magic -compact
186: .It Pa /etc/magic
187: default list of magic numbers
188: .El
1.8 aaron 189: .Sh SEE ALSO
190: .Xr hexdump 1 ,
191: .Xr od 1 ,
192: .Xr strings 1 ,
193: .Xr magic 5
194: .Sh STANDARDS CONFORMANCE
1.1 deraadt 195: This program is believed to exceed the System V Interface Definition
196: of FILE(CMD), as near as one can determine from the vague language
1.6 aaron 197: contained therein.
1.1 deraadt 198: Its behaviour is mostly compatible with the System V program of the same name.
199: This version knows more magic, however, so it will produce
1.6 aaron 200: different (albeit more accurate) output in many cases.
1.8 aaron 201: .Pp
1.6 aaron 202: The one significant difference
1.1 deraadt 203: between this version and System V
1.8 aaron 204: is that this version treats any white space
1.1 deraadt 205: as a delimiter, so that spaces in pattern strings must be escaped.
206: For example,
1.8 aaron 207: .Pp
208: >10 string language impress\ (imPRESS data)
209: .Pp
1.1 deraadt 210: in an existing magic file would have to be changed to
1.8 aaron 211: .Pp
212: >10 string language\e impress (imPRESS data)
213: .Pp
1.1 deraadt 214: In addition, in this version, if a pattern string contains a backslash,
1.9 aaron 215: it must be escaped.
216: For example
1.8 aaron 217: .Pp
218: 0 string \ebegindata Andrew Toolkit document
219: .Pp
1.1 deraadt 220: in an existing magic file would have to be changed to
1.8 aaron 221: .Pp
222: 0 string \e\ebegindata Andrew Toolkit document
223: .Pp
1.1 deraadt 224: SunOS releases 3.2 and later from Sun Microsystems include a
1.20 jmc 225: .Nm file
1.1 deraadt 226: command derived from the System V one, but with some extensions.
227: My version differs from Sun's only in minor ways.
1.8 aaron 228: It includes the extension of the
229: .Ql &
230: operator, used as,
1.1 deraadt 231: for example,
1.8 aaron 232: .Pp
233: >16 long&0x7fffffff >0 not stripped
234: .Sh MAGIC DIRECTORY
1.1 deraadt 235: The magic file entries have been collected from various sources,
236: mainly USENET, and contributed by various authors.
1.8 aaron 237: .An Christos Zoulas
238: (address below) will collect additional
1.1 deraadt 239: or corrected magic file entries.
1.6 aaron 240: A consolidation of magic file entries
1.1 deraadt 241: will be distributed periodically.
242: The order of entries in the magic file is significant.
243: Depending on what system you are using, the order that
244: they are put together may be incorrect.
245: If your old
1.8 aaron 246: .Nm
1.1 deraadt 247: command uses a magic file,
248: keep the old magic file around for comparison purposes
1.6 aaron 249: (rename it to
1.8 aaron 250: .Pa /etc/magic.orig ) .
251: .Sh HISTORY
1.6 aaron 252: There has been a
1.8 aaron 253: .Nm
254: command in every
255: .Ux
1.16 mickey 256: since at least Research Version 4
257: (man page dated November, 1973).
1.1 deraadt 258: The System V version introduced one significant major change:
259: the external list of magic number types.
260: This slowed the program down slightly but made it a lot more flexible.
1.8 aaron 261: .Pp
1.10 ian 262: This program, based on the System V version, was written by
263: .An Ian F. Darwin Aq ian@darwinisys.com
1.8 aaron 264: without looking at anybody else's source code.
265: .Pp
266: .An John Gilmore
267: revised the code extensively, making it better than
1.1 deraadt 268: the first version.
1.8 aaron 269: .An Geoff Collyer
270: found several inadequacies
1.1 deraadt 271: and provided some magic file entries.
1.8 aaron 272: .Pp
273: Altered by
274: .An Rob McMahon Aq cudcv@warwick.ac.uk ,
275: 1989, to extend the
276: .Ql &
277: operator from simple
278: .Dq x&y != 0
279: to
280: .Dq x&y op z .
281: .Pp
282: Altered by
283: .An Guy Harris Aq guy@auspex.com ,
284: 1993, to:
285: .Bl -item -offset indent
286: .It
287: put the
288: .Dq old-style
289: .Ql &
290: operator back the way it was, because
291: .Bl -enum -offset indent
292: .It
293: Rob McMahon's change broke the
294: previous style of usage,
295: .It
296: The SunOS
297: .Dq new-style
298: .Ql &
299: operator, which this version of
300: .Nm
301: supports, also handles
302: .Dq x&y op z ,
303: .It
304: Rob's change wasn't documented in any case;
305: .El
306: .It
307: put in multiple levels of
308: .Ql > ;
309: .It
310: put in
311: .Dq beshort ,
312: .Dq leshort ,
313: etc. keywords to look at numbers in the
1.1 deraadt 314: file in a specific byte order, rather than in the native byte order of
315: the process running
1.8 aaron 316: .Nm file .
317: .El
318: .Pp
1.10 ian 319: Currently maintained by
320: .An Christos Zoulas Aq christos@zoulas.com .
1.8 aaron 321: .Sh LEGAL NOTICE
1.10 ian 322: Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
323: Covered by the standard Berkeley Software Distribution copyright; see the file
324: LEGAL.NOTICE in the distribution.
1.8 aaron 325: .Pp
1.1 deraadt 326: The files
1.8 aaron 327: .Pa tar.h
1.1 deraadt 328: and
1.8 aaron 329: .Pa is_tar.c
330: were written by
331: .An John Gilmore
332: from his public-domain
333: .Nm tar
1.10 ian 334: program.
1.8 aaron 335: .Sh BUGS
1.1 deraadt 336: There must be a better way to automate the construction of the Magic
1.8 aaron 337: file from all the glop in Magdir.
338: What is it?
1.1 deraadt 339: Better yet, the magic file should be compiled into binary (say,
1.8 aaron 340: .Xr ndbm 3
1.4 millert 341: or, better yet, fixed-length
1.8 aaron 342: .Tn ASCII
1.4 millert 343: strings for use in heterogenous network environments) for faster startup.
1.1 deraadt 344: Then the program would run as fast as the Version 7 program of the same name,
345: with the flexibility of the System V version.
1.8 aaron 346: .Pp
347: .Nm
1.15 pjanzen 348: uses several algorithms that favor speed over accuracy;
1.4 millert 349: thus it can be misled about the contents of
1.8 aaron 350: .Tn ASCII
1.4 millert 351: files.
1.8 aaron 352: .Pp
1.4 millert 353: The support for
1.8 aaron 354: .Tn ASCII
1.4 millert 355: files (primarily for programming languages)
1.1 deraadt 356: is simplistic, inefficient and requires recompilation to update.
1.8 aaron 357: .Pp
358: There should be an
359: .Dq else
360: clause to follow a series of continuation lines.
361: .Pp
1.1 deraadt 362: The magic file and keywords should have regular expression support.
1.4 millert 363: Their use of
1.8 aaron 364: .Tn ASCII TAB
1.4 millert 365: as a field delimiter is ugly and makes
1.1 deraadt 366: it hard to edit the files, but is entrenched.
1.8 aaron 367: .Pp
1.1 deraadt 368: It might be advisable to allow upper-case letters in keywords
1.4 millert 369: for e.g.,
1.8 aaron 370: .Xr troff 1
1.4 millert 371: commands vs man page macros.
1.1 deraadt 372: Regular expression support would make this easy.
1.8 aaron 373: .Pp
1.1 deraadt 374: The program doesn't grok \s-2FORTRAN\s0.
1.6 aaron 375: It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
1.1 deraadt 376: appear indented at the start of line.
377: Regular expression support would make this easy.
1.8 aaron 378: .Pp
1.6 aaron 379: The list of keywords in
1.8 aaron 380: .Em ascmagic
1.1 deraadt 381: probably belongs in the Magic file.
1.8 aaron 382: This could be done by using some keyword like
383: .Ql *
384: for the offset value.
385: .Pp
386: Another optimization would be to sort
1.1 deraadt 387: the magic file so that we can just run down all the
388: tests for the first byte, first word, first long, etc, once we
1.9 aaron 389: have fetched it.
390: Complain about conflicts in the magic file entries.
1.1 deraadt 391: Make a rule that the magic entries sort based on file offset rather
392: than position within the magic file?
1.8 aaron 393: .Pp
1.6 aaron 394: The program should provide a way to give an estimate
1.8 aaron 395: of
396: .Dq how good
397: a guess is.
398: We end up removing guesses (e.g.,
1.20 jmc 399: .Dq From\ \&
1.8 aaron 400: as first 5 chars of file) because
401: they are not as good as other guesses (e.g.,
402: .Dq Newsgroups:
403: versus
404: .Qq Return-Path: ) .
405: Still, if the others don't pan out, it should be
1.6 aaron 406: possible to use the first guess.
1.8 aaron 407: .Pp
408: This program is slower than some vendors'
409: .Nm
410: commands.
411: .Pp
1.1 deraadt 412: This manual page, and particularly this section, is too long.
1.8 aaron 413: .Sh AVAILABILITY
1.1 deraadt 414: You can obtain the original author's latest version by anonymous FTP
1.8 aaron 415: on
1.15 pjanzen 416: .Em ftp.astron.com
1.8 aaron 417: in the directory
1.20 jmc 418: .Pa /pub/file/file-X.YY.tar.gz .