Annotation of src/usr.bin/file/file.1, Revision 1.1.1.1
1.1 deraadt 1: .TH FILE 1 "Copyright but distributable"
2: .\" $Id: file.1,v 1.7 1995/03/25 22:35:42 christos Exp $
3: .SH NAME
4: file
5: \- determine file type
6: .SH SYNOPSIS
7: .B file
8: [
9: .B \-vczL
10: ]
11: [
12: .B \-f
13: namefile ]
14: [
15: .B \-m
16: magicfile ]
17: file ...
18: .SH DESCRIPTION
19: .I File
20: tests each argument in an attempt to classify it.
21: There are three sets of tests, performed in this order:
22: filesystem tests, magic number tests, and language tests.
23: The
24: .I first
25: test that succeeds causes the file type to be printed.
26: .PP
27: The type printed will usually contain one of the words
28: .B text
29: (the file contains only ASCII characters and is
30: probably safe to read on an ASCII terminal),
31: .B executable
32: (the file contains the result of compiling a program
33: in a form understandable to some \s-1UNIX\s0 kernel or another),
34: or
35: .B data
36: meaning anything else (data is usually `binary' or non-printable).
37: Exceptions are well-known file formats (core files, tar archives)
38: that are known to contain binary data.
39: When modifying the file
40: .I /etc/magic
41: or the program itself,
42: .B "preserve these keywords" .
43: People depend on knowing that all the readable files in a directory
44: have the word ``text'' printed.
45: Don't do as Berkeley did \- change ``shell commands text''
46: to ``shell script''.
47: .PP
48: The filesystem tests are based on examining the return from a
49: .IR stat (2)
50: system call.
51: The program checks to see if the file is empty,
52: or if it's some sort of special file.
53: Any known file types appropriate to the system you are running on
54: (sockets, symbolic links, or named pipes (FIFOs) on those systems that
55: implement them)
56: are intuited if they are defined in
57: the system header file
58: .BR sys/stat.h .
59: .PP
60: The magic number tests are used to check for files with data in
61: particular fixed formats.
62: The canonical example of this is a binary executable (compiled program)
63: .B a.out
64: file, whose format is defined in
65: .B a.out.h
66: and possibly
67: .B exec.h
68: in the standard include directory.
69: These files have a `magic number' stored in a particular place
70: near the beginning of the file that tells the \s-1UNIX\s0 operating system
71: that the file is a binary executable, and which of several types thereof.
72: The concept of `magic number' has been applied by extension to data files.
73: Any file with some invariant identifier at a small fixed
74: offset into the file can usually be described in this way.
75: The information in these files is read from the magic file
76: .I /etc/magic.
77: .PP
78: If an argument appears to be an
79: .SM ASCII
80: file,
81: .I file
82: attempts to guess its language.
83: The language tests look for particular strings (cf \fInames.h\fP)
84: that can appear anywhere in the first few blocks of a file.
85: For example, the keyword
86: .B .br
87: indicates that the file is most likely a troff input file,
88: just as the keyword
89: .B struct
90: indicates a C program.
91: These tests are less reliable than the previous
92: two groups, so they are performed last.
93: The language test routines also test for some miscellany
94: (such as
95: .I tar
96: archives) and determine whether an unknown file should be
97: labelled as `ascii text' or `data'.
98: .SH OPTIONS
99: .TP 8
100: .B \-v
101: Print the version of the program and exit.
102: .TP 8
103: .B \-m file
104: Specify an alternate file of magic numbers.
105: .TP 8
106: .B \-z
107: Try to look inside compressed files.
108: .TP 8
109: .B \-c
110: Cause a checking printout of the parsed form of the magic file.
111: This is usually used in conjunction with
112: .B \-m
113: to debug a new magic file before installing it.
114: .TP 8
115: .B \-f namefile
116: Read the names of the files to be examined from
117: .I namefile
118: (one per line)
119: before the argument list.
120: Either
121: .I namefile
122: or at least one filename argument must be present;
123: to test the standard input, use ``-'' as a filename argument.
124: .TP 8
125: .B \-L
126: option causes symlinks to be followed, as the like-named option in
127: .IR ls (1).
128: (on systems that support symbolic links).
129: .SH FILES
130: .I /etc/magic
131: \- default list of magic numbers
132: .SH SEE ALSO
133: .IR magic (5)
134: \- description of magic file format.
135: .br
136: .IR Strings (1), " od" (1)
137: \- tools for examining non-textfiles.
138: .SH STANDARDS CONFORMANCE
139: This program is believed to exceed the System V Interface Definition
140: of FILE(CMD), as near as one can determine from the vague language
141: contained therein.
142: Its behaviour is mostly compatible with the System V program of the same name.
143: This version knows more magic, however, so it will produce
144: different (albeit more accurate) output in many cases.
145: .PP
146: The one significant difference
147: between this version and System V
148: is that this version treats any white space
149: as a delimiter, so that spaces in pattern strings must be escaped.
150: For example,
151: .br
152: >10 string language impress\ (imPRESS data)
153: .br
154: in an existing magic file would have to be changed to
155: .br
156: >10 string language\e impress (imPRESS data)
157: .br
158: In addition, in this version, if a pattern string contains a backslash,
159: it must be escaped. For example
160: .br
161: 0 string \ebegindata Andrew Toolkit document
162: .br
163: in an existing magic file would have to be changed to
164: .br
165: 0 string \e\ebegindata Andrew Toolkit document
166: .br
167: .PP
168: SunOS releases 3.2 and later from Sun Microsystems include a
169: .IR file (1)
170: command derived from the System V one, but with some extensions.
171: My version differs from Sun's only in minor ways.
172: It includes the extension of the `&' operator, used as,
173: for example,
174: .br
175: >16 long&0x7fffffff >0 not stripped
176: .SH MAGIC DIRECTORY
177: The magic file entries have been collected from various sources,
178: mainly USENET, and contributed by various authors.
179: Christos Zoulas (address below) will collect additional
180: or corrected magic file entries.
181: A consolidation of magic file entries
182: will be distributed periodically.
183: .PP
184: The order of entries in the magic file is significant.
185: Depending on what system you are using, the order that
186: they are put together may be incorrect.
187: If your old
188: .I file
189: command uses a magic file,
190: keep the old magic file around for comparison purposes
191: (rename it to
192: .IR /etc/magic.orig ).
193: .SH HISTORY
194: There has been a
195: .I file
196: command in every UNIX since at least Research Version 6
197: (man page dated January, 1975).
198: The System V version introduced one significant major change:
199: the external list of magic number types.
200: This slowed the program down slightly but made it a lot more flexible.
201: .PP
202: This program, based on the System V version,
203: was written by Ian Darwin without looking at anybody else's source code.
204: .PP
205: John Gilmore revised the code extensively, making it better than
206: the first version.
207: Geoff Collyer found several inadequacies
208: and provided some magic file entries.
209: The program has undergone continued evolution since.
210: .SH AUTHOR
211: Written by Ian F. Darwin, UUCP address {utzoo | ihnp4}!darwin!ian,
212: Internet address ian@sq.com,
213: postal address: P.O. Box 603, Station F, Toronto, Ontario, CANADA M4Y 2L8.
214: .PP
215: Altered by Rob McMahon, cudcv@warwick.ac.uk, 1989, to extend the `&' operator
216: from simple `x&y != 0' to `x&y op z'.
217: .PP
218: Altered by Guy Harris, guy@auspex.com, 1993, to:
219: .RS
220: .PP
221: put the ``old-style'' `&'
222: operator back the way it was, because 1) Rob McMahon's change broke the
223: previous style of usage, 2) the SunOS ``new-style'' `&' operator,
224: which this version of
225: .I file
226: supports, also handles `x&y op z', and 3) Rob's change wasn't documented
227: in any case;
228: .PP
229: put in multiple levels of `>';
230: .PP
231: put in ``beshort'', ``leshort'', etc. keywords to look at numbers in the
232: file in a specific byte order, rather than in the native byte order of
233: the process running
234: .IR file .
235: .RE
236: .PP
237: Changes by Ian Darwin and various authors including
238: Christos Zoulas (christos@ee.cornell.edu), 1990-1992.
239: .SH LEGAL NOTICE
240: Copyright (c) Ian F. Darwin, Toronto, Canada,
241: 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993.
242: .PP
243: This software is not subject to and may not be made subject to any
244: license of the American Telephone and Telegraph Company, Sun
245: Microsystems Inc., Digital Equipment Inc., Lotus Development Inc., the
246: Regents of the University of California, The X Consortium or MIT, or
247: The Free Software Foundation.
248: .PP
249: This software is not subject to any export provision of the United States
250: Department of Commerce, and may be exported to any country or planet.
251: .PP
252: Permission is granted to anyone to use this software for any purpose on
253: any computer system, and to alter it and redistribute it freely, subject
254: to the following restrictions:
255: .PP
256: 1. The author is not responsible for the consequences of use of this
257: software, no matter how awful, even if they arise from flaws in it.
258: .PP
259: 2. The origin of this software must not be misrepresented, either by
260: explicit claim or by omission. Since few users ever read sources,
261: credits must appear in the documentation.
262: .PP
263: 3. Altered versions must be plainly marked as such, and must not be
264: misrepresented as being the original software. Since few users
265: ever read sources, credits must appear in the documentation.
266: .PP
267: 4. This notice may not be removed or altered.
268: .PP
269: A few support files (\fIgetopt\fP, \fIstrtok\fP)
270: distributed with this package
271: are by Henry Spencer and are subject to the same terms as above.
272: .PP
273: A few simple support files (\fIstrtol\fP, \fIstrchr\fP)
274: distributed with this package
275: are in the public domain; they are so marked.
276: .PP
277: The files
278: .I tar.h
279: and
280: .I is_tar.c
281: were written by John Gilmore from his public-domain
282: .I tar
283: program, and are not covered by the above restrictions.
284: .SH BUGS
285: There must be a better way to automate the construction of the Magic
286: file from all the glop in Magdir. What is it?
287: Better yet, the magic file should be compiled into binary (say,
288: .IR ndbm (3)
289: or, better yet, fixed-length ASCII strings
290: for use in heterogenous network environments) for faster startup.
291: Then the program would run as fast as the Version 7 program of the same name,
292: with the flexibility of the System V version.
293: .PP
294: .I File
295: uses several algorithms that favor speed over accuracy,
296: thus it can be misled about the contents of ASCII files.
297: .PP
298: The support for ASCII files (primarily for programming languages)
299: is simplistic, inefficient and requires recompilation to update.
300: .PP
301: There should be an ``else'' clause to follow a series of continuation lines.
302: .PP
303: The magic file and keywords should have regular expression support.
304: Their use of ASCII TAB as a field delimiter is ugly and makes
305: it hard to edit the files, but is entrenched.
306: .PP
307: It might be advisable to allow upper-case letters in keywords
308: for e.g., troff commands vs man page macros.
309: Regular expression support would make this easy.
310: .PP
311: The program doesn't grok \s-2FORTRAN\s0.
312: It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
313: appear indented at the start of line.
314: Regular expression support would make this easy.
315: .PP
316: The list of keywords in
317: .I ascmagic
318: probably belongs in the Magic file.
319: This could be done by using some keyword like `*' for the offset value.
320: .PP
321: Another optimisation would be to sort
322: the magic file so that we can just run down all the
323: tests for the first byte, first word, first long, etc, once we
324: have fetched it. Complain about conflicts in the magic file entries.
325: Make a rule that the magic entries sort based on file offset rather
326: than position within the magic file?
327: .PP
328: The program should provide a way to give an estimate
329: of ``how good'' a guess is.
330: We end up removing guesses (e.g. ``From '' as first 5 chars of file) because
331: they are not as good as other guesses (e.g. ``Newsgroups:'' versus
332: "Return-Path:"). Still, if the others don't pan out, it should be
333: possible to use the first guess.
334: .PP
335: This program is slower than some vendors' file commands.
336: .PP
337: This manual page, and particularly this section, is too long.
338: .SH AVAILABILITY
339: You can obtain the original author's latest version by anonymous FTP
340: on
341: .B tesla.ee.cornell.edu
342: in the directory
343: .BR /pub/file-X.YY.tar.gz