Annotation of src/usr.bin/file/file.1, Revision 1.18
1.18 ! jmc 1: .\" $OpenBSD: file.1,v 1.17 2002/11/29 00:27:03 millert Exp $
1.8 aaron 2: .\" $FreeBSD: src/usr.bin/file/file.1,v 1.16 2000/03/01 12:19:39 sheldonh Exp $
1.18 ! jmc 3: .\"
! 4: .\" Copyright (c) Ian F. Darwin, 1987.
! 5: .\" Written by Ian F. Darwin.
! 6: .\"
! 7: .\" This software is not subject to any license of the American Telephone
! 8: .\" and Telegraph Company or of the Regents of the University of California.
! 9: .\"
! 10: .\" Permission is granted to anyone to use this software for any purpose on
! 11: .\" any computer system, and to alter it and redistribute it freely, subject
! 12: .\" to the following restrictions:
! 13: .\"
! 14: .\" 1. The author is not responsible for the consequences of use of this
! 15: .\" software, no matter how awful, even if they arise from flaws in it.
! 16: .\"
! 17: .\" 2. The origin of this software must not be misrepresented, either by
! 18: .\" explicit claim or by omission. Since few users ever read sources,
! 19: .\" credits must appear in the documentation.
! 20: .\"
! 21: .\" 3. Altered versions must be plainly marked as such, and must not be
! 22: .\" misrepresented as being the original software. Since few users
! 23: .\" ever read sources, credits must appear in the documentation.
! 24: .\"
! 25: .\" 4. This notice may not be removed or altered.
! 26: .\"
1.8 aaron 27: .Dd July 30, 1997
28: .Dt FILE 1
29: .Os
30: .Sh NAME
31: .Nm file
32: .Nd determine file type
33: .Sh SYNOPSIS
34: .Nm file
1.17 millert 35: .Op Fl vbczL
1.8 aaron 36: .Op Fl f Ar namefile
37: .Op Fl m Ar magicfiles
38: .Ar file Op Ar ...
39: .Sh DESCRIPTION
1.4 millert 40: This manual page documents version 3.22 of the
1.8 aaron 41: .Nm
1.4 millert 42: command.
1.8 aaron 43: .Nm
1.1 deraadt 44: tests each argument in an attempt to classify it.
45: There are three sets of tests, performed in this order:
46: filesystem tests, magic number tests, and language tests.
1.8 aaron 47: The first test that succeeds causes the file type to be printed.
48: .Pp
1.1 deraadt 49: The type printed will usually contain one of the words
1.8 aaron 50: .Dq text
1.4 millert 51: (the file contains only
1.8 aaron 52: .Tn ASCII
1.4 millert 53: characters and is probably safe to read on an
1.8 aaron 54: .Tn ASCII
1.4 millert 55: terminal),
1.8 aaron 56: .Dq executable
1.1 deraadt 57: (the file contains the result of compiling a program
1.8 aaron 58: in a form understandable to some
59: .Ux
60: kernel or another),
1.1 deraadt 61: or
1.8 aaron 62: .Dq data
63: meaning anything else (data is usually binary or non-printable).
64: .Pp
1.1 deraadt 65: Exceptions are well-known file formats (core files, tar archives)
66: that are known to contain binary data.
67: When modifying the file
1.8 aaron 68: .Pa /etc/magic
1.6 aaron 69: or the program itself,
1.8 aaron 70: .Em "preserve these keywords" .
71: .Pp
1.1 deraadt 72: People depend on knowing that all the readable files in a directory
1.8 aaron 73: have the word
74: .Dq text
75: printed.
76: Don't do as Berkeley did; change
77: .Dq shell commands text
78: to
79: .Dq shell script .
80: .Pp
1.1 deraadt 81: The filesystem tests are based on examining the return from a
1.8 aaron 82: .Xr stat 2
1.1 deraadt 83: system call.
84: The program checks to see if the file is empty,
85: or if it's some sort of special file.
86: Any known file types appropriate to the system you are running on
87: (sockets, symbolic links, or named pipes (FIFOs) on those systems that
88: implement them)
89: are intuited if they are defined in
90: the system header file
1.9 aaron 91: .Aq Pa sys/stat.h .
1.8 aaron 92: .Pp
1.1 deraadt 93: The magic number tests are used to check for files with data in
94: particular fixed formats.
95: The canonical example of this is a binary executable (compiled program)
1.8 aaron 96: .Pa a.out
1.6 aaron 97: file, whose format is defined in
1.8 aaron 98: .Aq Pa a.out.h
1.1 deraadt 99: and possibly
1.8 aaron 100: .Aq Pa exec.h
1.1 deraadt 101: in the standard include directory.
1.8 aaron 102: These files have a
103: .Dq magic number
104: stored in a particular place
105: near the beginning of the file that tells the
106: .Ux
107: operating system
1.1 deraadt 108: that the file is a binary executable, and which of several types thereof.
1.8 aaron 109: .Pp
110: The concept of magic number has been applied by extension to data files.
1.1 deraadt 111: Any file with some invariant identifier at a small fixed
112: offset into the file can usually be described in this way.
113: The information in these files is read from the magic file
1.8 aaron 114: .Pa /etc/magic .
115: .Pp
1.1 deraadt 116: If an argument appears to be an
1.8 aaron 117: .Tn ASCII
1.1 deraadt 118: file,
1.8 aaron 119: .Nm
1.1 deraadt 120: attempts to guess its language.
1.4 millert 121: The language tests look for particular strings (cf
1.8 aaron 122: .Pa names.h )
1.1 deraadt 123: that can appear anywhere in the first few blocks of a file.
124: For example, the keyword
1.8 aaron 125: .Em .br
1.4 millert 126: indicates that the file is most likely a
1.8 aaron 127: .Xr troff 1
1.6 aaron 128: input file, just as the keyword
1.8 aaron 129: .Li struct
1.1 deraadt 130: indicates a C program.
131: These tests are less reliable than the previous
132: two groups, so they are performed last.
133: The language test routines also test for some miscellany
1.6 aaron 134: (such as
1.8 aaron 135: .Xr tar 1
1.1 deraadt 136: archives) and determine whether an unknown file should be
1.8 aaron 137: labelled as
138: .Dq ASCII text
139: or
140: .Dq data .
141: .Pp
142: The options are as follows:
1.11 aaron 143: .Bl -tag -width Ds
1.8 aaron 144: .It Fl v
1.1 deraadt 145: Print the version of the program and exit.
1.8 aaron 146: .It Fl m Ar list
147: Specify an alternate
148: .Ar list
149: of files containing magic numbers.
1.2 deraadt 150: This can be a single file, or a colon-separated list of files.
1.8 aaron 151: .It Fl z
1.1 deraadt 152: Try to look inside compressed files.
1.17 millert 153: .It Fl b
154: Do not prepend filenames to output lines (brief mode).
1.8 aaron 155: .It Fl c
1.1 deraadt 156: Cause a checking printout of the parsed form of the magic file.
1.6 aaron 157: This is usually used in conjunction with
1.8 aaron 158: .Fl m
1.1 deraadt 159: to debug a new magic file before installing it.
1.8 aaron 160: .It Fl f Ar namefile
1.6 aaron 161: Read the names of the files to be examined from
1.8 aaron 162: .Ar namefile
1.6 aaron 163: (one per line)
1.1 deraadt 164: before the argument list.
1.6 aaron 165: Either
1.8 aaron 166: .Ar namefile
1.1 deraadt 167: or at least one filename argument must be present;
1.8 aaron 168: to test the standard input, use
169: .Dq -
170: as a filename argument.
171: .It Fl L
172: Cause symlinks to be followed, as the like-named option in
173: .Xr ls 1 .
1.1 deraadt 174: (on systems that support symbolic links).
1.8 aaron 175: .El
176: .Sh ENVIRONMENT
177: .Bl -tag -width indent
1.13 smart 178: .It Ev MAGIC
1.8 aaron 179: Default magic number files.
180: .El
1.12 aaron 181: .Sh FILES
182: .Bl -tag -width /etc/magic -compact
183: .It Pa /etc/magic
184: default list of magic numbers
185: .El
1.8 aaron 186: .Sh SEE ALSO
187: .Xr hexdump 1 ,
188: .Xr od 1 ,
189: .Xr strings 1 ,
190: .Xr magic 5
191: .Sh STANDARDS CONFORMANCE
1.1 deraadt 192: This program is believed to exceed the System V Interface Definition
193: of FILE(CMD), as near as one can determine from the vague language
1.6 aaron 194: contained therein.
1.1 deraadt 195: Its behaviour is mostly compatible with the System V program of the same name.
196: This version knows more magic, however, so it will produce
1.6 aaron 197: different (albeit more accurate) output in many cases.
1.8 aaron 198: .Pp
1.6 aaron 199: The one significant difference
1.1 deraadt 200: between this version and System V
1.8 aaron 201: is that this version treats any white space
1.1 deraadt 202: as a delimiter, so that spaces in pattern strings must be escaped.
203: For example,
1.8 aaron 204: .Pp
205: >10 string language impress\ (imPRESS data)
206: .Pp
1.1 deraadt 207: in an existing magic file would have to be changed to
1.8 aaron 208: .Pp
209: >10 string language\e impress (imPRESS data)
210: .Pp
1.1 deraadt 211: In addition, in this version, if a pattern string contains a backslash,
1.9 aaron 212: it must be escaped.
213: For example
1.8 aaron 214: .Pp
215: 0 string \ebegindata Andrew Toolkit document
216: .Pp
1.1 deraadt 217: in an existing magic file would have to be changed to
1.8 aaron 218: .Pp
219: 0 string \e\ebegindata Andrew Toolkit document
220: .Pp
1.1 deraadt 221: SunOS releases 3.2 and later from Sun Microsystems include a
1.8 aaron 222: .Xr file 1
1.1 deraadt 223: command derived from the System V one, but with some extensions.
224: My version differs from Sun's only in minor ways.
1.8 aaron 225: It includes the extension of the
226: .Ql &
227: operator, used as,
1.1 deraadt 228: for example,
1.8 aaron 229: .Pp
230: >16 long&0x7fffffff >0 not stripped
231: .Sh MAGIC DIRECTORY
1.1 deraadt 232: The magic file entries have been collected from various sources,
233: mainly USENET, and contributed by various authors.
1.8 aaron 234: .An Christos Zoulas
235: (address below) will collect additional
1.1 deraadt 236: or corrected magic file entries.
1.6 aaron 237: A consolidation of magic file entries
1.1 deraadt 238: will be distributed periodically.
239: The order of entries in the magic file is significant.
240: Depending on what system you are using, the order that
241: they are put together may be incorrect.
242: If your old
1.8 aaron 243: .Nm
1.1 deraadt 244: command uses a magic file,
245: keep the old magic file around for comparison purposes
1.6 aaron 246: (rename it to
1.8 aaron 247: .Pa /etc/magic.orig ) .
248: .Sh HISTORY
1.6 aaron 249: There has been a
1.8 aaron 250: .Nm
251: command in every
252: .Ux
1.16 mickey 253: since at least Research Version 4
254: (man page dated November, 1973).
1.1 deraadt 255: The System V version introduced one significant major change:
256: the external list of magic number types.
257: This slowed the program down slightly but made it a lot more flexible.
1.8 aaron 258: .Pp
1.10 ian 259: This program, based on the System V version, was written by
260: .An Ian F. Darwin Aq ian@darwinisys.com
1.8 aaron 261: without looking at anybody else's source code.
262: .Pp
263: .An John Gilmore
264: revised the code extensively, making it better than
1.1 deraadt 265: the first version.
1.8 aaron 266: .An Geoff Collyer
267: found several inadequacies
1.1 deraadt 268: and provided some magic file entries.
1.8 aaron 269: .Pp
270: Altered by
271: .An Rob McMahon Aq cudcv@warwick.ac.uk ,
272: 1989, to extend the
273: .Ql &
274: operator from simple
275: .Dq x&y != 0
276: to
277: .Dq x&y op z .
278: .Pp
279: Altered by
280: .An Guy Harris Aq guy@auspex.com ,
281: 1993, to:
282: .Bl -item -offset indent
283: .It
284: put the
285: .Dq old-style
286: .Ql &
287: operator back the way it was, because
288: .Bl -enum -offset indent
289: .It
290: Rob McMahon's change broke the
291: previous style of usage,
292: .It
293: The SunOS
294: .Dq new-style
295: .Ql &
296: operator, which this version of
297: .Nm
298: supports, also handles
299: .Dq x&y op z ,
300: .It
301: Rob's change wasn't documented in any case;
302: .El
303: .It
304: put in multiple levels of
305: .Ql > ;
306: .It
307: put in
308: .Dq beshort ,
309: .Dq leshort ,
310: etc. keywords to look at numbers in the
1.1 deraadt 311: file in a specific byte order, rather than in the native byte order of
312: the process running
1.8 aaron 313: .Nm file .
314: .El
315: .Pp
1.10 ian 316: Currently maintained by
317: .An Christos Zoulas Aq christos@zoulas.com .
1.8 aaron 318: .Sh LEGAL NOTICE
1.10 ian 319: Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
320: Covered by the standard Berkeley Software Distribution copyright; see the file
321: LEGAL.NOTICE in the distribution.
1.8 aaron 322: .Pp
1.1 deraadt 323: The files
1.8 aaron 324: .Pa tar.h
1.1 deraadt 325: and
1.8 aaron 326: .Pa is_tar.c
327: were written by
328: .An John Gilmore
329: from his public-domain
330: .Nm tar
1.10 ian 331: program.
1.8 aaron 332: .Sh BUGS
1.1 deraadt 333: There must be a better way to automate the construction of the Magic
1.8 aaron 334: file from all the glop in Magdir.
335: What is it?
1.1 deraadt 336: Better yet, the magic file should be compiled into binary (say,
1.8 aaron 337: .Xr ndbm 3
1.4 millert 338: or, better yet, fixed-length
1.8 aaron 339: .Tn ASCII
1.4 millert 340: strings for use in heterogenous network environments) for faster startup.
1.1 deraadt 341: Then the program would run as fast as the Version 7 program of the same name,
342: with the flexibility of the System V version.
1.8 aaron 343: .Pp
344: .Nm
1.15 pjanzen 345: uses several algorithms that favor speed over accuracy;
1.4 millert 346: thus it can be misled about the contents of
1.8 aaron 347: .Tn ASCII
1.4 millert 348: files.
1.8 aaron 349: .Pp
1.4 millert 350: The support for
1.8 aaron 351: .Tn ASCII
1.4 millert 352: files (primarily for programming languages)
1.1 deraadt 353: is simplistic, inefficient and requires recompilation to update.
1.8 aaron 354: .Pp
355: There should be an
356: .Dq else
357: clause to follow a series of continuation lines.
358: .Pp
1.1 deraadt 359: The magic file and keywords should have regular expression support.
1.4 millert 360: Their use of
1.8 aaron 361: .Tn ASCII TAB
1.4 millert 362: as a field delimiter is ugly and makes
1.1 deraadt 363: it hard to edit the files, but is entrenched.
1.8 aaron 364: .Pp
1.1 deraadt 365: It might be advisable to allow upper-case letters in keywords
1.4 millert 366: for e.g.,
1.8 aaron 367: .Xr troff 1
1.4 millert 368: commands vs man page macros.
1.1 deraadt 369: Regular expression support would make this easy.
1.8 aaron 370: .Pp
1.1 deraadt 371: The program doesn't grok \s-2FORTRAN\s0.
1.6 aaron 372: It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
1.1 deraadt 373: appear indented at the start of line.
374: Regular expression support would make this easy.
1.8 aaron 375: .Pp
1.6 aaron 376: The list of keywords in
1.8 aaron 377: .Em ascmagic
1.1 deraadt 378: probably belongs in the Magic file.
1.8 aaron 379: This could be done by using some keyword like
380: .Ql *
381: for the offset value.
382: .Pp
383: Another optimization would be to sort
1.1 deraadt 384: the magic file so that we can just run down all the
385: tests for the first byte, first word, first long, etc, once we
1.9 aaron 386: have fetched it.
387: Complain about conflicts in the magic file entries.
1.1 deraadt 388: Make a rule that the magic entries sort based on file offset rather
389: than position within the magic file?
1.8 aaron 390: .Pp
1.6 aaron 391: The program should provide a way to give an estimate
1.8 aaron 392: of
393: .Dq how good
394: a guess is.
395: We end up removing guesses (e.g.,
396: .Dq From\
397: as first 5 chars of file) because
398: they are not as good as other guesses (e.g.,
399: .Dq Newsgroups:
400: versus
401: .Qq Return-Path: ) .
402: Still, if the others don't pan out, it should be
1.6 aaron 403: possible to use the first guess.
1.8 aaron 404: .Pp
405: This program is slower than some vendors'
406: .Nm
407: commands.
408: .Pp
1.1 deraadt 409: This manual page, and particularly this section, is too long.
1.8 aaron 410: .Sh AVAILABILITY
1.1 deraadt 411: You can obtain the original author's latest version by anonymous FTP
1.8 aaron 412: on
1.15 pjanzen 413: .Em ftp.astron.com
1.8 aaron 414: in the directory
415: .Pa /pub/file/file-X.YY.tar.gz