Annotation of src/usr.bin/file/file.1, Revision 1.9
1.9 ! aaron 1: .\" $OpenBSD: file.1,v 1.8 2000/03/06 02:38:19 aaron Exp $
1.8 aaron 2: .\" $FreeBSD: src/usr.bin/file/file.1,v 1.16 2000/03/01 12:19:39 sheldonh Exp $
3: .Dd July 30, 1997
4: .Dt FILE 1
5: .Os
6: .Sh NAME
7: .Nm file
8: .Nd determine file type
9: .Sh SYNOPSIS
10: .Nm file
11: .Op Fl vczL
12: .Op Fl f Ar namefile
13: .Op Fl m Ar magicfiles
14: .Ar file Op Ar ...
15: .Sh DESCRIPTION
1.4 millert 16: This manual page documents version 3.22 of the
1.8 aaron 17: .Nm
1.4 millert 18: command.
1.8 aaron 19: .Nm
1.1 deraadt 20: tests each argument in an attempt to classify it.
21: There are three sets of tests, performed in this order:
22: filesystem tests, magic number tests, and language tests.
1.8 aaron 23: The first test that succeeds causes the file type to be printed.
24: .Pp
1.1 deraadt 25: The type printed will usually contain one of the words
1.8 aaron 26: .Dq text
1.4 millert 27: (the file contains only
1.8 aaron 28: .Tn ASCII
1.4 millert 29: characters and is probably safe to read on an
1.8 aaron 30: .Tn ASCII
1.4 millert 31: terminal),
1.8 aaron 32: .Dq executable
1.1 deraadt 33: (the file contains the result of compiling a program
1.8 aaron 34: in a form understandable to some
35: .Ux
36: kernel or another),
1.1 deraadt 37: or
1.8 aaron 38: .Dq data
39: meaning anything else (data is usually binary or non-printable).
40: .Pp
1.1 deraadt 41: Exceptions are well-known file formats (core files, tar archives)
42: that are known to contain binary data.
43: When modifying the file
1.8 aaron 44: .Pa /etc/magic
1.6 aaron 45: or the program itself,
1.8 aaron 46: .Em "preserve these keywords" .
47: .Pp
1.1 deraadt 48: People depend on knowing that all the readable files in a directory
1.8 aaron 49: have the word
50: .Dq text
51: printed.
52: Don't do as Berkeley did; change
53: .Dq shell commands text
54: to
55: .Dq shell script .
56: .Pp
1.1 deraadt 57: The filesystem tests are based on examining the return from a
1.8 aaron 58: .Xr stat 2
1.1 deraadt 59: system call.
60: The program checks to see if the file is empty,
61: or if it's some sort of special file.
62: Any known file types appropriate to the system you are running on
63: (sockets, symbolic links, or named pipes (FIFOs) on those systems that
64: implement them)
65: are intuited if they are defined in
66: the system header file
1.9 ! aaron 67: .Aq Pa sys/stat.h .
1.8 aaron 68: .Pp
1.1 deraadt 69: The magic number tests are used to check for files with data in
70: particular fixed formats.
71: The canonical example of this is a binary executable (compiled program)
1.8 aaron 72: .Pa a.out
1.6 aaron 73: file, whose format is defined in
1.8 aaron 74: .Aq Pa a.out.h
1.1 deraadt 75: and possibly
1.8 aaron 76: .Aq Pa exec.h
1.1 deraadt 77: in the standard include directory.
1.8 aaron 78: These files have a
79: .Dq magic number
80: stored in a particular place
81: near the beginning of the file that tells the
82: .Ux
83: operating system
1.1 deraadt 84: that the file is a binary executable, and which of several types thereof.
1.8 aaron 85: .Pp
86: The concept of magic number has been applied by extension to data files.
1.1 deraadt 87: Any file with some invariant identifier at a small fixed
88: offset into the file can usually be described in this way.
89: The information in these files is read from the magic file
1.8 aaron 90: .Pa /etc/magic .
91: .Pp
1.1 deraadt 92: If an argument appears to be an
1.8 aaron 93: .Tn ASCII
1.1 deraadt 94: file,
1.8 aaron 95: .Nm
1.1 deraadt 96: attempts to guess its language.
1.4 millert 97: The language tests look for particular strings (cf
1.8 aaron 98: .Pa names.h )
1.1 deraadt 99: that can appear anywhere in the first few blocks of a file.
100: For example, the keyword
1.8 aaron 101: .Em .br
1.4 millert 102: indicates that the file is most likely a
1.8 aaron 103: .Xr troff 1
1.6 aaron 104: input file, just as the keyword
1.8 aaron 105: .Li struct
1.1 deraadt 106: indicates a C program.
107: These tests are less reliable than the previous
108: two groups, so they are performed last.
109: The language test routines also test for some miscellany
1.6 aaron 110: (such as
1.8 aaron 111: .Xr tar 1
1.1 deraadt 112: archives) and determine whether an unknown file should be
1.8 aaron 113: labelled as
114: .Dq ASCII text
115: or
116: .Dq data .
117: .Pp
118: The options are as follows:
119: .Bl -tag -width indent
120: .It Fl v
1.1 deraadt 121: Print the version of the program and exit.
1.8 aaron 122: .It Fl m Ar list
123: Specify an alternate
124: .Ar list
125: of files containing magic numbers.
1.2 deraadt 126: This can be a single file, or a colon-separated list of files.
1.8 aaron 127: .It Fl z
1.1 deraadt 128: Try to look inside compressed files.
1.8 aaron 129: .It Fl c
1.1 deraadt 130: Cause a checking printout of the parsed form of the magic file.
1.6 aaron 131: This is usually used in conjunction with
1.8 aaron 132: .Fl m
1.1 deraadt 133: to debug a new magic file before installing it.
1.8 aaron 134: .It Fl f Ar namefile
1.6 aaron 135: Read the names of the files to be examined from
1.8 aaron 136: .Ar namefile
1.6 aaron 137: (one per line)
1.1 deraadt 138: before the argument list.
1.6 aaron 139: Either
1.8 aaron 140: .Ar namefile
1.1 deraadt 141: or at least one filename argument must be present;
1.8 aaron 142: to test the standard input, use
143: .Dq -
144: as a filename argument.
145: .It Fl L
146: Cause symlinks to be followed, as the like-named option in
147: .Xr ls 1 .
1.1 deraadt 148: (on systems that support symbolic links).
1.8 aaron 149: .El
150: .Sh FILES
151: .Bl -tag -width /etc/magic -compact
152: .It Pa /etc/magic
153: default list of magic numbers
154: .El
155: .Sh ENVIRONMENT
156: The following environment varibles affect the execution of
157: .Nm file :
158: .Pp
159: .Bl -tag -width indent
160: .Ev MAGIC
161: Default magic number files.
162: .El
163: .Sh SEE ALSO
164: .Xr hexdump 1 ,
165: .Xr od 1 ,
166: .Xr strings 1 ,
167: .Xr magic 5
168: .Sh STANDARDS CONFORMANCE
1.1 deraadt 169: This program is believed to exceed the System V Interface Definition
170: of FILE(CMD), as near as one can determine from the vague language
1.6 aaron 171: contained therein.
1.1 deraadt 172: Its behaviour is mostly compatible with the System V program of the same name.
173: This version knows more magic, however, so it will produce
1.6 aaron 174: different (albeit more accurate) output in many cases.
1.8 aaron 175: .Pp
1.6 aaron 176: The one significant difference
1.1 deraadt 177: between this version and System V
1.8 aaron 178: is that this version treats any white space
1.1 deraadt 179: as a delimiter, so that spaces in pattern strings must be escaped.
180: For example,
1.8 aaron 181: .Pp
182: >10 string language impress\ (imPRESS data)
183: .Pp
1.1 deraadt 184: in an existing magic file would have to be changed to
1.8 aaron 185: .Pp
186: >10 string language\e impress (imPRESS data)
187: .Pp
1.1 deraadt 188: In addition, in this version, if a pattern string contains a backslash,
1.9 ! aaron 189: it must be escaped.
! 190: For example
1.8 aaron 191: .Pp
192: 0 string \ebegindata Andrew Toolkit document
193: .Pp
1.1 deraadt 194: in an existing magic file would have to be changed to
1.8 aaron 195: .Pp
196: 0 string \e\ebegindata Andrew Toolkit document
197: .Pp
1.1 deraadt 198: SunOS releases 3.2 and later from Sun Microsystems include a
1.8 aaron 199: .Xr file 1
1.1 deraadt 200: command derived from the System V one, but with some extensions.
201: My version differs from Sun's only in minor ways.
1.8 aaron 202: It includes the extension of the
203: .Ql &
204: operator, used as,
1.1 deraadt 205: for example,
1.8 aaron 206: .Pp
207: >16 long&0x7fffffff >0 not stripped
208: .Sh MAGIC DIRECTORY
1.1 deraadt 209: The magic file entries have been collected from various sources,
210: mainly USENET, and contributed by various authors.
1.8 aaron 211: .An Christos Zoulas
212: (address below) will collect additional
1.1 deraadt 213: or corrected magic file entries.
1.6 aaron 214: A consolidation of magic file entries
1.1 deraadt 215: will be distributed periodically.
216: The order of entries in the magic file is significant.
217: Depending on what system you are using, the order that
218: they are put together may be incorrect.
219: If your old
1.8 aaron 220: .Nm
1.1 deraadt 221: command uses a magic file,
222: keep the old magic file around for comparison purposes
1.6 aaron 223: (rename it to
1.8 aaron 224: .Pa /etc/magic.orig ) .
225: .Sh HISTORY
1.6 aaron 226: There has been a
1.8 aaron 227: .Nm
228: command in every
229: .Ux
230: since at least Research Version 6
1.1 deraadt 231: (man page dated January, 1975).
232: The System V version introduced one significant major change:
233: the external list of magic number types.
234: This slowed the program down slightly but made it a lot more flexible.
1.8 aaron 235: .Pp
1.1 deraadt 236: This program, based on the System V version,
1.8 aaron 237: was written by
238: .An Ian Darwin
239: without looking at anybody else's source code.
240: .Pp
241: .An John Gilmore
242: revised the code extensively, making it better than
1.1 deraadt 243: the first version.
1.8 aaron 244: .An Geoff Collyer
245: found several inadequacies
1.1 deraadt 246: and provided some magic file entries.
247: The program has undergone continued evolution since.
1.8 aaron 248: .Sh AUTHORS
249: Written by
250: .An Ian F. Darwin Aq ian@sq.com ,
251: UUCP address {utzoo | ihnp4}!darwin!ian,
252: postal address: P.O. Box 603, Station F, Toronto, Ontario, CANADA M4Y 2L8.
253: .Pp
254: Altered by
255: .An Rob McMahon Aq cudcv@warwick.ac.uk ,
256: 1989, to extend the
257: .Ql &
258: operator from simple
259: .Dq x&y != 0
260: to
261: .Dq x&y op z .
262: .Pp
263: Altered by
264: .An Guy Harris Aq guy@auspex.com ,
265: 1993, to:
266: .Bl -item -offset indent
267: .It
268: put the
269: .Dq old-style
270: .Ql &
271: operator back the way it was, because
272: .Bl -enum -offset indent
273: .It
274: Rob McMahon's change broke the
275: previous style of usage,
276: .It
277: The SunOS
278: .Dq new-style
279: .Ql &
280: operator, which this version of
281: .Nm
282: supports, also handles
283: .Dq x&y op z ,
284: .It
285: Rob's change wasn't documented in any case;
286: .El
287: .It
288: put in multiple levels of
289: .Ql > ;
290: .It
291: put in
292: .Dq beshort ,
293: .Dq leshort ,
294: etc. keywords to look at numbers in the
1.1 deraadt 295: file in a specific byte order, rather than in the native byte order of
296: the process running
1.8 aaron 297: .Nm file .
298: .El
299: .Pp
300: Changes by
301: .An Ian Darwin
302: and various authors including
303: .An Christos Zoulas Aq christos@deshaw.com ,
304: 1990-1992.
305: .Sh LEGAL NOTICE
306: Copyright (c) Ian F. Darwin, Toronto, Canada,
307: 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993.
308: .Pp
309: This software is not subject to and may not be made subject to any
310: license of the American Telephone and Telegraph Company, Sun
311: Microsystems Inc., Digital Equipment Inc., Lotus Development Inc., the
312: Regents of the University of California, The X Consortium or MIT, or
313: The Free Software Foundation.
314: .Pp
315: This software is not subject to any export provision of the United States
316: Department of Commerce, and may be exported to any country or planet.
317: .Pp
318: Permission is granted to anyone to use this software for any purpose on
319: any computer system, and to alter it and redistribute it freely, subject
320: to the following restrictions:
321: .Bl -enum -offset indent
322: .It
323: The author is not responsible for the consequences of use of this
324: software, no matter how awful, even if they arise from flaws in it;
325: .It
326: The origin of this software must not be misrepresented, either by
1.9 ! aaron 327: explicit claim or by omission.
! 328: Since few users ever read sources,
1.8 aaron 329: credits must appear in the documentation;
330: .It
331: Altered versions must be plainly marked as such, and must not be
1.9 ! aaron 332: misrepresented as being the original software.
! 333: Since few users ever read sources, credits must appear in the documentation;
1.8 aaron 334: .It
335: This notice may not be removed or altered.
336: .El
337: .Pp
338: A few support files
339: .Pf ( Fn getopt ,
340: .Fn strtok )
1.1 deraadt 341: distributed with this package
1.8 aaron 342: are by
343: .An Henry Spencer
344: and are subject to the same terms as above.
345: .Pp
346: A few simple support files
347: .Pf ( Fn strtol ,
348: .Fn strchr )
1.1 deraadt 349: distributed with this package
350: are in the public domain; they are so marked.
1.8 aaron 351: .Pp
1.1 deraadt 352: The files
1.8 aaron 353: .Pa tar.h
1.1 deraadt 354: and
1.8 aaron 355: .Pa is_tar.c
356: were written by
357: .An John Gilmore
358: from his public-domain
359: .Nm tar
1.1 deraadt 360: program, and are not covered by the above restrictions.
1.8 aaron 361: .Sh BUGS
1.1 deraadt 362: There must be a better way to automate the construction of the Magic
1.8 aaron 363: file from all the glop in Magdir.
364: What is it?
1.1 deraadt 365: Better yet, the magic file should be compiled into binary (say,
1.8 aaron 366: .Xr ndbm 3
1.4 millert 367: or, better yet, fixed-length
1.8 aaron 368: .Tn ASCII
1.4 millert 369: strings for use in heterogenous network environments) for faster startup.
1.1 deraadt 370: Then the program would run as fast as the Version 7 program of the same name,
371: with the flexibility of the System V version.
1.8 aaron 372: .Pp
373: .Nm
1.1 deraadt 374: uses several algorithms that favor speed over accuracy,
1.4 millert 375: thus it can be misled about the contents of
1.8 aaron 376: .Tn ASCII
1.4 millert 377: files.
1.8 aaron 378: .Pp
1.4 millert 379: The support for
1.8 aaron 380: .Tn ASCII
1.4 millert 381: files (primarily for programming languages)
1.1 deraadt 382: is simplistic, inefficient and requires recompilation to update.
1.8 aaron 383: .Pp
384: There should be an
385: .Dq else
386: clause to follow a series of continuation lines.
387: .Pp
1.1 deraadt 388: The magic file and keywords should have regular expression support.
1.4 millert 389: Their use of
1.8 aaron 390: .Tn ASCII TAB
1.4 millert 391: as a field delimiter is ugly and makes
1.1 deraadt 392: it hard to edit the files, but is entrenched.
1.8 aaron 393: .Pp
1.1 deraadt 394: It might be advisable to allow upper-case letters in keywords
1.4 millert 395: for e.g.,
1.8 aaron 396: .Xr troff 1
1.4 millert 397: commands vs man page macros.
1.1 deraadt 398: Regular expression support would make this easy.
1.8 aaron 399: .Pp
1.1 deraadt 400: The program doesn't grok \s-2FORTRAN\s0.
1.6 aaron 401: It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
1.1 deraadt 402: appear indented at the start of line.
403: Regular expression support would make this easy.
1.8 aaron 404: .Pp
1.6 aaron 405: The list of keywords in
1.8 aaron 406: .Em ascmagic
1.1 deraadt 407: probably belongs in the Magic file.
1.8 aaron 408: This could be done by using some keyword like
409: .Ql *
410: for the offset value.
411: .Pp
412: Another optimization would be to sort
1.1 deraadt 413: the magic file so that we can just run down all the
414: tests for the first byte, first word, first long, etc, once we
1.9 ! aaron 415: have fetched it.
! 416: Complain about conflicts in the magic file entries.
1.1 deraadt 417: Make a rule that the magic entries sort based on file offset rather
418: than position within the magic file?
1.8 aaron 419: .Pp
1.6 aaron 420: The program should provide a way to give an estimate
1.8 aaron 421: of
422: .Dq how good
423: a guess is.
424: We end up removing guesses (e.g.,
425: .Dq From\
426: as first 5 chars of file) because
427: they are not as good as other guesses (e.g.,
428: .Dq Newsgroups:
429: versus
430: .Qq Return-Path: ) .
431: Still, if the others don't pan out, it should be
1.6 aaron 432: possible to use the first guess.
1.8 aaron 433: .Pp
434: This program is slower than some vendors'
435: .Nm
436: commands.
437: .Pp
1.1 deraadt 438: This manual page, and particularly this section, is too long.
1.8 aaron 439: .Sh AVAILABILITY
1.1 deraadt 440: You can obtain the original author's latest version by anonymous FTP
1.8 aaron 441: on
442: .Em ftp.deshaw.com
443: in the directory
444: .Pa /pub/file/file-X.YY.tar.gz
445: