Annotation of src/usr.bin/file/file.1, Revision 1.3
1.3 ! deraadt 1: .\" $OpenBSD: file.1,v 1.2 1995/12/14 03:30:02 deraadt Exp $
1.1 deraadt 2: .TH FILE 1 "Copyright but distributable"
3: .SH NAME
4: file
5: \- determine file type
6: .SH SYNOPSIS
7: .B file
8: [
9: .B \-vczL
10: ]
11: [
12: .B \-f
13: namefile ]
14: [
15: .B \-m
1.2 deraadt 16: magicfiles ]
1.1 deraadt 17: file ...
18: .SH DESCRIPTION
19: .I File
20: tests each argument in an attempt to classify it.
21: There are three sets of tests, performed in this order:
22: filesystem tests, magic number tests, and language tests.
23: The
24: .I first
25: test that succeeds causes the file type to be printed.
26: .PP
27: The type printed will usually contain one of the words
28: .B text
29: (the file contains only ASCII characters and is
30: probably safe to read on an ASCII terminal),
31: .B executable
32: (the file contains the result of compiling a program
33: in a form understandable to some \s-1UNIX\s0 kernel or another),
34: or
35: .B data
36: meaning anything else (data is usually `binary' or non-printable).
37: Exceptions are well-known file formats (core files, tar archives)
38: that are known to contain binary data.
39: When modifying the file
40: .I /etc/magic
41: or the program itself,
42: .B "preserve these keywords" .
43: People depend on knowing that all the readable files in a directory
44: have the word ``text'' printed.
45: Don't do as Berkeley did \- change ``shell commands text''
46: to ``shell script''.
47: .PP
48: The filesystem tests are based on examining the return from a
49: .IR stat (2)
50: system call.
51: The program checks to see if the file is empty,
52: or if it's some sort of special file.
53: Any known file types appropriate to the system you are running on
54: (sockets, symbolic links, or named pipes (FIFOs) on those systems that
55: implement them)
56: are intuited if they are defined in
57: the system header file
58: .BR sys/stat.h .
59: .PP
60: The magic number tests are used to check for files with data in
61: particular fixed formats.
62: The canonical example of this is a binary executable (compiled program)
63: .B a.out
64: file, whose format is defined in
65: .B a.out.h
66: and possibly
67: .B exec.h
68: in the standard include directory.
69: These files have a `magic number' stored in a particular place
70: near the beginning of the file that tells the \s-1UNIX\s0 operating system
71: that the file is a binary executable, and which of several types thereof.
72: The concept of `magic number' has been applied by extension to data files.
73: Any file with some invariant identifier at a small fixed
74: offset into the file can usually be described in this way.
75: The information in these files is read from the magic file
76: .I /etc/magic.
77: .PP
78: If an argument appears to be an
79: .SM ASCII
80: file,
81: .I file
82: attempts to guess its language.
83: The language tests look for particular strings (cf \fInames.h\fP)
84: that can appear anywhere in the first few blocks of a file.
85: For example, the keyword
86: .B .br
87: indicates that the file is most likely a troff input file,
88: just as the keyword
89: .B struct
90: indicates a C program.
91: These tests are less reliable than the previous
92: two groups, so they are performed last.
93: The language test routines also test for some miscellany
94: (such as
95: .I tar
96: archives) and determine whether an unknown file should be
97: labelled as `ascii text' or `data'.
98: .SH OPTIONS
99: .TP 8
100: .B \-v
101: Print the version of the program and exit.
102: .TP 8
1.2 deraadt 103: .B \-m list
104: Specify an alternate list of files containing magic numbers.
105: This can be a single file, or a colon-separated list of files.
1.1 deraadt 106: .TP 8
107: .B \-z
108: Try to look inside compressed files.
109: .TP 8
110: .B \-c
111: Cause a checking printout of the parsed form of the magic file.
112: This is usually used in conjunction with
113: .B \-m
114: to debug a new magic file before installing it.
115: .TP 8
116: .B \-f namefile
117: Read the names of the files to be examined from
118: .I namefile
119: (one per line)
120: before the argument list.
121: Either
122: .I namefile
123: or at least one filename argument must be present;
124: to test the standard input, use ``-'' as a filename argument.
125: .TP 8
126: .B \-L
127: option causes symlinks to be followed, as the like-named option in
128: .IR ls (1).
129: (on systems that support symbolic links).
130: .SH FILES
131: .I /etc/magic
132: \- default list of magic numbers
1.2 deraadt 133: .SH ENVIRONMENT
134: The environment variable
135: .B MAGIC
136: can be used to set the default magic number files.
1.1 deraadt 137: .SH SEE ALSO
138: .IR magic (5)
139: \- description of magic file format.
140: .br
141: .IR Strings (1), " od" (1)
142: \- tools for examining non-textfiles.
143: .SH STANDARDS CONFORMANCE
144: This program is believed to exceed the System V Interface Definition
145: of FILE(CMD), as near as one can determine from the vague language
146: contained therein.
147: Its behaviour is mostly compatible with the System V program of the same name.
148: This version knows more magic, however, so it will produce
149: different (albeit more accurate) output in many cases.
150: .PP
151: The one significant difference
152: between this version and System V
153: is that this version treats any white space
154: as a delimiter, so that spaces in pattern strings must be escaped.
155: For example,
156: .br
157: >10 string language impress\ (imPRESS data)
158: .br
159: in an existing magic file would have to be changed to
160: .br
161: >10 string language\e impress (imPRESS data)
162: .br
163: In addition, in this version, if a pattern string contains a backslash,
164: it must be escaped. For example
165: .br
166: 0 string \ebegindata Andrew Toolkit document
167: .br
168: in an existing magic file would have to be changed to
169: .br
170: 0 string \e\ebegindata Andrew Toolkit document
171: .br
172: .PP
173: SunOS releases 3.2 and later from Sun Microsystems include a
174: .IR file (1)
175: command derived from the System V one, but with some extensions.
176: My version differs from Sun's only in minor ways.
177: It includes the extension of the `&' operator, used as,
178: for example,
179: .br
180: >16 long&0x7fffffff >0 not stripped
181: .SH MAGIC DIRECTORY
182: The magic file entries have been collected from various sources,
183: mainly USENET, and contributed by various authors.
184: Christos Zoulas (address below) will collect additional
185: or corrected magic file entries.
186: A consolidation of magic file entries
187: will be distributed periodically.
188: .PP
189: The order of entries in the magic file is significant.
190: Depending on what system you are using, the order that
191: they are put together may be incorrect.
192: If your old
193: .I file
194: command uses a magic file,
195: keep the old magic file around for comparison purposes
196: (rename it to
197: .IR /etc/magic.orig ).
198: .SH HISTORY
199: There has been a
200: .I file
201: command in every UNIX since at least Research Version 6
202: (man page dated January, 1975).
203: The System V version introduced one significant major change:
204: the external list of magic number types.
205: This slowed the program down slightly but made it a lot more flexible.
206: .PP
207: This program, based on the System V version,
208: was written by Ian Darwin without looking at anybody else's source code.
209: .PP
210: John Gilmore revised the code extensively, making it better than
211: the first version.
212: Geoff Collyer found several inadequacies
213: and provided some magic file entries.
214: The program has undergone continued evolution since.
215: .SH AUTHOR
216: Written by Ian F. Darwin, UUCP address {utzoo | ihnp4}!darwin!ian,
217: Internet address ian@sq.com,
218: postal address: P.O. Box 603, Station F, Toronto, Ontario, CANADA M4Y 2L8.
219: .PP
220: Altered by Rob McMahon, cudcv@warwick.ac.uk, 1989, to extend the `&' operator
221: from simple `x&y != 0' to `x&y op z'.
222: .PP
223: Altered by Guy Harris, guy@auspex.com, 1993, to:
224: .RS
225: .PP
226: put the ``old-style'' `&'
227: operator back the way it was, because 1) Rob McMahon's change broke the
228: previous style of usage, 2) the SunOS ``new-style'' `&' operator,
229: which this version of
230: .I file
231: supports, also handles `x&y op z', and 3) Rob's change wasn't documented
232: in any case;
233: .PP
234: put in multiple levels of `>';
235: .PP
236: put in ``beshort'', ``leshort'', etc. keywords to look at numbers in the
237: file in a specific byte order, rather than in the native byte order of
238: the process running
239: .IR file .
240: .RE
241: .PP
242: Changes by Ian Darwin and various authors including
243: Christos Zoulas (christos@ee.cornell.edu), 1990-1992.
244: .SH LEGAL NOTICE
245: Copyright (c) Ian F. Darwin, Toronto, Canada,
246: 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993.
247: .PP
248: This software is not subject to and may not be made subject to any
249: license of the American Telephone and Telegraph Company, Sun
250: Microsystems Inc., Digital Equipment Inc., Lotus Development Inc., the
251: Regents of the University of California, The X Consortium or MIT, or
252: The Free Software Foundation.
253: .PP
254: This software is not subject to any export provision of the United States
255: Department of Commerce, and may be exported to any country or planet.
256: .PP
257: Permission is granted to anyone to use this software for any purpose on
258: any computer system, and to alter it and redistribute it freely, subject
259: to the following restrictions:
260: .PP
261: 1. The author is not responsible for the consequences of use of this
262: software, no matter how awful, even if they arise from flaws in it.
263: .PP
264: 2. The origin of this software must not be misrepresented, either by
265: explicit claim or by omission. Since few users ever read sources,
266: credits must appear in the documentation.
267: .PP
268: 3. Altered versions must be plainly marked as such, and must not be
269: misrepresented as being the original software. Since few users
270: ever read sources, credits must appear in the documentation.
271: .PP
272: 4. This notice may not be removed or altered.
273: .PP
274: A few support files (\fIgetopt\fP, \fIstrtok\fP)
275: distributed with this package
276: are by Henry Spencer and are subject to the same terms as above.
277: .PP
278: A few simple support files (\fIstrtol\fP, \fIstrchr\fP)
279: distributed with this package
280: are in the public domain; they are so marked.
281: .PP
282: The files
283: .I tar.h
284: and
285: .I is_tar.c
286: were written by John Gilmore from his public-domain
287: .I tar
288: program, and are not covered by the above restrictions.
289: .SH BUGS
290: There must be a better way to automate the construction of the Magic
291: file from all the glop in Magdir. What is it?
292: Better yet, the magic file should be compiled into binary (say,
293: .IR ndbm (3)
294: or, better yet, fixed-length ASCII strings
295: for use in heterogenous network environments) for faster startup.
296: Then the program would run as fast as the Version 7 program of the same name,
297: with the flexibility of the System V version.
298: .PP
299: .I File
300: uses several algorithms that favor speed over accuracy,
301: thus it can be misled about the contents of ASCII files.
302: .PP
303: The support for ASCII files (primarily for programming languages)
304: is simplistic, inefficient and requires recompilation to update.
305: .PP
306: There should be an ``else'' clause to follow a series of continuation lines.
307: .PP
308: The magic file and keywords should have regular expression support.
309: Their use of ASCII TAB as a field delimiter is ugly and makes
310: it hard to edit the files, but is entrenched.
311: .PP
312: It might be advisable to allow upper-case letters in keywords
313: for e.g., troff commands vs man page macros.
314: Regular expression support would make this easy.
315: .PP
316: The program doesn't grok \s-2FORTRAN\s0.
317: It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
318: appear indented at the start of line.
319: Regular expression support would make this easy.
320: .PP
321: The list of keywords in
322: .I ascmagic
323: probably belongs in the Magic file.
324: This could be done by using some keyword like `*' for the offset value.
325: .PP
326: Another optimisation would be to sort
327: the magic file so that we can just run down all the
328: tests for the first byte, first word, first long, etc, once we
329: have fetched it. Complain about conflicts in the magic file entries.
330: Make a rule that the magic entries sort based on file offset rather
331: than position within the magic file?
332: .PP
333: The program should provide a way to give an estimate
334: of ``how good'' a guess is.
335: We end up removing guesses (e.g. ``From '' as first 5 chars of file) because
336: they are not as good as other guesses (e.g. ``Newsgroups:'' versus
337: "Return-Path:"). Still, if the others don't pan out, it should be
338: possible to use the first guess.
339: .PP
340: This program is slower than some vendors' file commands.
341: .PP
342: This manual page, and particularly this section, is too long.
343: .SH AVAILABILITY
344: You can obtain the original author's latest version by anonymous FTP
345: on
346: .B tesla.ee.cornell.edu
347: in the directory
348: .BR /pub/file-X.YY.tar.gz