Annotation of src/usr.bin/file/file.1, Revision 1.4
1.4 ! millert 1: .\" $OpenBSD: file.1,v 1.3 1996/06/26 05:32:56 deraadt Exp $
! 2: .TH FILE 1 "Copyrighted but distributable"
1.1 deraadt 3: .SH NAME
4: file
5: \- determine file type
6: .SH SYNOPSIS
7: .B file
8: [
9: .B \-vczL
10: ]
11: [
12: .B \-f
13: namefile ]
14: [
15: .B \-m
1.2 deraadt 16: magicfiles ]
1.1 deraadt 17: file ...
18: .SH DESCRIPTION
1.4 ! millert 19: This manual page documents version 3.22 of the
! 20: .B file
! 21: command.
! 22: .B File
1.1 deraadt 23: tests each argument in an attempt to classify it.
24: There are three sets of tests, performed in this order:
25: filesystem tests, magic number tests, and language tests.
26: The
27: .I first
28: test that succeeds causes the file type to be printed.
29: .PP
30: The type printed will usually contain one of the words
31: .B text
1.4 ! millert 32: (the file contains only
! 33: .SM ASCII
! 34: characters and is probably safe to read on an
! 35: .SM ASCII
! 36: terminal),
1.1 deraadt 37: .B executable
38: (the file contains the result of compiling a program
39: in a form understandable to some \s-1UNIX\s0 kernel or another),
40: or
41: .B data
42: meaning anything else (data is usually `binary' or non-printable).
43: Exceptions are well-known file formats (core files, tar archives)
44: that are known to contain binary data.
45: When modifying the file
46: .I /etc/magic
47: or the program itself,
48: .B "preserve these keywords" .
49: People depend on knowing that all the readable files in a directory
50: have the word ``text'' printed.
51: Don't do as Berkeley did \- change ``shell commands text''
52: to ``shell script''.
53: .PP
54: The filesystem tests are based on examining the return from a
1.4 ! millert 55: .BR stat (2)
1.1 deraadt 56: system call.
57: The program checks to see if the file is empty,
58: or if it's some sort of special file.
59: Any known file types appropriate to the system you are running on
60: (sockets, symbolic links, or named pipes (FIFOs) on those systems that
61: implement them)
62: are intuited if they are defined in
63: the system header file
1.4 ! millert 64: .IR sys/stat.h .
1.1 deraadt 65: .PP
66: The magic number tests are used to check for files with data in
67: particular fixed formats.
68: The canonical example of this is a binary executable (compiled program)
1.4 ! millert 69: .I a.out
1.1 deraadt 70: file, whose format is defined in
1.4 ! millert 71: .I a.out.h
1.1 deraadt 72: and possibly
1.4 ! millert 73: .I exec.h
1.1 deraadt 74: in the standard include directory.
75: These files have a `magic number' stored in a particular place
76: near the beginning of the file that tells the \s-1UNIX\s0 operating system
77: that the file is a binary executable, and which of several types thereof.
78: The concept of `magic number' has been applied by extension to data files.
79: Any file with some invariant identifier at a small fixed
80: offset into the file can usually be described in this way.
81: The information in these files is read from the magic file
82: .I /etc/magic.
83: .PP
84: If an argument appears to be an
85: .SM ASCII
86: file,
1.4 ! millert 87: .B file
1.1 deraadt 88: attempts to guess its language.
1.4 ! millert 89: The language tests look for particular strings (cf
! 90: .IR names.h )
1.1 deraadt 91: that can appear anywhere in the first few blocks of a file.
92: For example, the keyword
93: .B .br
1.4 ! millert 94: indicates that the file is most likely a
! 95: .BR troff (1)
! 96: input file, just as the keyword
1.1 deraadt 97: .B struct
98: indicates a C program.
99: These tests are less reliable than the previous
100: two groups, so they are performed last.
101: The language test routines also test for some miscellany
102: (such as
1.4 ! millert 103: .BR tar (1)
1.1 deraadt 104: archives) and determine whether an unknown file should be
105: labelled as `ascii text' or `data'.
106: .SH OPTIONS
107: .TP 8
108: .B \-v
109: Print the version of the program and exit.
110: .TP 8
1.2 deraadt 111: .B \-m list
112: Specify an alternate list of files containing magic numbers.
113: This can be a single file, or a colon-separated list of files.
1.1 deraadt 114: .TP 8
115: .B \-z
116: Try to look inside compressed files.
117: .TP 8
118: .B \-c
119: Cause a checking printout of the parsed form of the magic file.
120: This is usually used in conjunction with
121: .B \-m
122: to debug a new magic file before installing it.
123: .TP 8
124: .B \-f namefile
125: Read the names of the files to be examined from
126: .I namefile
127: (one per line)
128: before the argument list.
129: Either
130: .I namefile
131: or at least one filename argument must be present;
132: to test the standard input, use ``-'' as a filename argument.
133: .TP 8
134: .B \-L
135: option causes symlinks to be followed, as the like-named option in
1.4 ! millert 136: .BR ls (1).
1.1 deraadt 137: (on systems that support symbolic links).
138: .SH FILES
139: .I /etc/magic
140: \- default list of magic numbers
1.2 deraadt 141: .SH ENVIRONMENT
142: The environment variable
143: .B MAGIC
144: can be used to set the default magic number files.
1.1 deraadt 145: .SH SEE ALSO
1.4 ! millert 146: .BR magic (5)
1.1 deraadt 147: \- description of magic file format.
148: .br
1.4 ! millert 149: .BR strings (1), " od" (1)
1.1 deraadt 150: \- tools for examining non-textfiles.
151: .SH STANDARDS CONFORMANCE
152: This program is believed to exceed the System V Interface Definition
153: of FILE(CMD), as near as one can determine from the vague language
154: contained therein.
155: Its behaviour is mostly compatible with the System V program of the same name.
156: This version knows more magic, however, so it will produce
157: different (albeit more accurate) output in many cases.
158: .PP
159: The one significant difference
160: between this version and System V
161: is that this version treats any white space
162: as a delimiter, so that spaces in pattern strings must be escaped.
163: For example,
164: .br
165: >10 string language impress\ (imPRESS data)
166: .br
167: in an existing magic file would have to be changed to
168: .br
169: >10 string language\e impress (imPRESS data)
170: .br
171: In addition, in this version, if a pattern string contains a backslash,
172: it must be escaped. For example
173: .br
174: 0 string \ebegindata Andrew Toolkit document
175: .br
176: in an existing magic file would have to be changed to
177: .br
178: 0 string \e\ebegindata Andrew Toolkit document
179: .br
180: .PP
181: SunOS releases 3.2 and later from Sun Microsystems include a
1.4 ! millert 182: .BR file (1)
1.1 deraadt 183: command derived from the System V one, but with some extensions.
184: My version differs from Sun's only in minor ways.
185: It includes the extension of the `&' operator, used as,
186: for example,
187: .br
188: >16 long&0x7fffffff >0 not stripped
189: .SH MAGIC DIRECTORY
190: The magic file entries have been collected from various sources,
191: mainly USENET, and contributed by various authors.
192: Christos Zoulas (address below) will collect additional
193: or corrected magic file entries.
194: A consolidation of magic file entries
195: will be distributed periodically.
196: .PP
197: The order of entries in the magic file is significant.
198: Depending on what system you are using, the order that
199: they are put together may be incorrect.
200: If your old
1.4 ! millert 201: .B file
1.1 deraadt 202: command uses a magic file,
203: keep the old magic file around for comparison purposes
204: (rename it to
205: .IR /etc/magic.orig ).
206: .SH HISTORY
207: There has been a
1.4 ! millert 208: .B file
! 209: command in every \s-1UNIX\s0 since at least Research Version 6
1.1 deraadt 210: (man page dated January, 1975).
211: The System V version introduced one significant major change:
212: the external list of magic number types.
213: This slowed the program down slightly but made it a lot more flexible.
214: .PP
215: This program, based on the System V version,
216: was written by Ian Darwin without looking at anybody else's source code.
217: .PP
218: John Gilmore revised the code extensively, making it better than
219: the first version.
220: Geoff Collyer found several inadequacies
221: and provided some magic file entries.
222: The program has undergone continued evolution since.
223: .SH AUTHOR
224: Written by Ian F. Darwin, UUCP address {utzoo | ihnp4}!darwin!ian,
225: Internet address ian@sq.com,
226: postal address: P.O. Box 603, Station F, Toronto, Ontario, CANADA M4Y 2L8.
227: .PP
228: Altered by Rob McMahon, cudcv@warwick.ac.uk, 1989, to extend the `&' operator
229: from simple `x&y != 0' to `x&y op z'.
230: .PP
231: Altered by Guy Harris, guy@auspex.com, 1993, to:
232: .RS
233: .PP
234: put the ``old-style'' `&'
235: operator back the way it was, because 1) Rob McMahon's change broke the
236: previous style of usage, 2) the SunOS ``new-style'' `&' operator,
237: which this version of
1.4 ! millert 238: .B file
1.1 deraadt 239: supports, also handles `x&y op z', and 3) Rob's change wasn't documented
240: in any case;
241: .PP
242: put in multiple levels of `>';
243: .PP
244: put in ``beshort'', ``leshort'', etc. keywords to look at numbers in the
245: file in a specific byte order, rather than in the native byte order of
246: the process running
1.4 ! millert 247: .BR file .
1.1 deraadt 248: .RE
249: .PP
250: Changes by Ian Darwin and various authors including
1.4 ! millert 251: Christos Zoulas (christos@deshaw.com), 1990-1997.
1.1 deraadt 252: .SH LEGAL NOTICE
253: Copyright (c) Ian F. Darwin, Toronto, Canada,
254: 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993.
255: .PP
256: This software is not subject to and may not be made subject to any
257: license of the American Telephone and Telegraph Company, Sun
258: Microsystems Inc., Digital Equipment Inc., Lotus Development Inc., the
259: Regents of the University of California, The X Consortium or MIT, or
260: The Free Software Foundation.
261: .PP
262: This software is not subject to any export provision of the United States
263: Department of Commerce, and may be exported to any country or planet.
264: .PP
265: Permission is granted to anyone to use this software for any purpose on
266: any computer system, and to alter it and redistribute it freely, subject
267: to the following restrictions:
268: .PP
269: 1. The author is not responsible for the consequences of use of this
270: software, no matter how awful, even if they arise from flaws in it.
271: .PP
272: 2. The origin of this software must not be misrepresented, either by
273: explicit claim or by omission. Since few users ever read sources,
274: credits must appear in the documentation.
275: .PP
276: 3. Altered versions must be plainly marked as such, and must not be
277: misrepresented as being the original software. Since few users
278: ever read sources, credits must appear in the documentation.
279: .PP
280: 4. This notice may not be removed or altered.
281: .PP
282: A few support files (\fIgetopt\fP, \fIstrtok\fP)
283: distributed with this package
284: are by Henry Spencer and are subject to the same terms as above.
285: .PP
286: A few simple support files (\fIstrtol\fP, \fIstrchr\fP)
287: distributed with this package
288: are in the public domain; they are so marked.
289: .PP
290: The files
291: .I tar.h
292: and
293: .I is_tar.c
294: were written by John Gilmore from his public-domain
1.4 ! millert 295: .B tar
1.1 deraadt 296: program, and are not covered by the above restrictions.
297: .SH BUGS
298: There must be a better way to automate the construction of the Magic
299: file from all the glop in Magdir. What is it?
300: Better yet, the magic file should be compiled into binary (say,
1.4 ! millert 301: .BR ndbm (3)
! 302: or, better yet, fixed-length
! 303: .SM ASCII
! 304: strings for use in heterogenous network environments) for faster startup.
1.1 deraadt 305: Then the program would run as fast as the Version 7 program of the same name,
306: with the flexibility of the System V version.
307: .PP
1.4 ! millert 308: .B File
1.1 deraadt 309: uses several algorithms that favor speed over accuracy,
1.4 ! millert 310: thus it can be misled about the contents of
! 311: .SM ASCII
! 312: files.
! 313: .PP
! 314: The support for
! 315: .SM ASCII
! 316: files (primarily for programming languages)
1.1 deraadt 317: is simplistic, inefficient and requires recompilation to update.
318: .PP
319: There should be an ``else'' clause to follow a series of continuation lines.
320: .PP
321: The magic file and keywords should have regular expression support.
1.4 ! millert 322: Their use of
! 323: .SM "ASCII TAB"
! 324: as a field delimiter is ugly and makes
1.1 deraadt 325: it hard to edit the files, but is entrenched.
326: .PP
327: It might be advisable to allow upper-case letters in keywords
1.4 ! millert 328: for e.g.,
! 329: .BR troff (1)
! 330: commands vs man page macros.
1.1 deraadt 331: Regular expression support would make this easy.
332: .PP
333: The program doesn't grok \s-2FORTRAN\s0.
334: It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
335: appear indented at the start of line.
336: Regular expression support would make this easy.
337: .PP
338: The list of keywords in
339: .I ascmagic
340: probably belongs in the Magic file.
341: This could be done by using some keyword like `*' for the offset value.
342: .PP
343: Another optimisation would be to sort
344: the magic file so that we can just run down all the
345: tests for the first byte, first word, first long, etc, once we
346: have fetched it. Complain about conflicts in the magic file entries.
347: Make a rule that the magic entries sort based on file offset rather
348: than position within the magic file?
349: .PP
350: The program should provide a way to give an estimate
351: of ``how good'' a guess is.
352: We end up removing guesses (e.g. ``From '' as first 5 chars of file) because
353: they are not as good as other guesses (e.g. ``Newsgroups:'' versus
354: "Return-Path:"). Still, if the others don't pan out, it should be
355: possible to use the first guess.
356: .PP
357: This program is slower than some vendors' file commands.
358: .PP
359: This manual page, and particularly this section, is too long.
360: .SH AVAILABILITY
361: You can obtain the original author's latest version by anonymous FTP
362: on
1.4 ! millert 363: .B ftp.deshaw.com
1.1 deraadt 364: in the directory
1.4 ! millert 365: .I /pub/file/file-X.YY.tar.gz