Annotation of src/usr.bin/file/file.1, Revision 1.1
1.1 ! deraadt 1: .TH FILE 1 "Copyright but distributable"
! 2: .\" $Id: file.1,v 1.7 1995/03/25 22:35:42 christos Exp $
! 3: .SH NAME
! 4: file
! 5: \- determine file type
! 6: .SH SYNOPSIS
! 7: .B file
! 8: [
! 9: .B \-vczL
! 10: ]
! 11: [
! 12: .B \-f
! 13: namefile ]
! 14: [
! 15: .B \-m
! 16: magicfile ]
! 17: file ...
! 18: .SH DESCRIPTION
! 19: .I File
! 20: tests each argument in an attempt to classify it.
! 21: There are three sets of tests, performed in this order:
! 22: filesystem tests, magic number tests, and language tests.
! 23: The
! 24: .I first
! 25: test that succeeds causes the file type to be printed.
! 26: .PP
! 27: The type printed will usually contain one of the words
! 28: .B text
! 29: (the file contains only ASCII characters and is
! 30: probably safe to read on an ASCII terminal),
! 31: .B executable
! 32: (the file contains the result of compiling a program
! 33: in a form understandable to some \s-1UNIX\s0 kernel or another),
! 34: or
! 35: .B data
! 36: meaning anything else (data is usually `binary' or non-printable).
! 37: Exceptions are well-known file formats (core files, tar archives)
! 38: that are known to contain binary data.
! 39: When modifying the file
! 40: .I /etc/magic
! 41: or the program itself,
! 42: .B "preserve these keywords" .
! 43: People depend on knowing that all the readable files in a directory
! 44: have the word ``text'' printed.
! 45: Don't do as Berkeley did \- change ``shell commands text''
! 46: to ``shell script''.
! 47: .PP
! 48: The filesystem tests are based on examining the return from a
! 49: .IR stat (2)
! 50: system call.
! 51: The program checks to see if the file is empty,
! 52: or if it's some sort of special file.
! 53: Any known file types appropriate to the system you are running on
! 54: (sockets, symbolic links, or named pipes (FIFOs) on those systems that
! 55: implement them)
! 56: are intuited if they are defined in
! 57: the system header file
! 58: .BR sys/stat.h .
! 59: .PP
! 60: The magic number tests are used to check for files with data in
! 61: particular fixed formats.
! 62: The canonical example of this is a binary executable (compiled program)
! 63: .B a.out
! 64: file, whose format is defined in
! 65: .B a.out.h
! 66: and possibly
! 67: .B exec.h
! 68: in the standard include directory.
! 69: These files have a `magic number' stored in a particular place
! 70: near the beginning of the file that tells the \s-1UNIX\s0 operating system
! 71: that the file is a binary executable, and which of several types thereof.
! 72: The concept of `magic number' has been applied by extension to data files.
! 73: Any file with some invariant identifier at a small fixed
! 74: offset into the file can usually be described in this way.
! 75: The information in these files is read from the magic file
! 76: .I /etc/magic.
! 77: .PP
! 78: If an argument appears to be an
! 79: .SM ASCII
! 80: file,
! 81: .I file
! 82: attempts to guess its language.
! 83: The language tests look for particular strings (cf \fInames.h\fP)
! 84: that can appear anywhere in the first few blocks of a file.
! 85: For example, the keyword
! 86: .B .br
! 87: indicates that the file is most likely a troff input file,
! 88: just as the keyword
! 89: .B struct
! 90: indicates a C program.
! 91: These tests are less reliable than the previous
! 92: two groups, so they are performed last.
! 93: The language test routines also test for some miscellany
! 94: (such as
! 95: .I tar
! 96: archives) and determine whether an unknown file should be
! 97: labelled as `ascii text' or `data'.
! 98: .SH OPTIONS
! 99: .TP 8
! 100: .B \-v
! 101: Print the version of the program and exit.
! 102: .TP 8
! 103: .B \-m file
! 104: Specify an alternate file of magic numbers.
! 105: .TP 8
! 106: .B \-z
! 107: Try to look inside compressed files.
! 108: .TP 8
! 109: .B \-c
! 110: Cause a checking printout of the parsed form of the magic file.
! 111: This is usually used in conjunction with
! 112: .B \-m
! 113: to debug a new magic file before installing it.
! 114: .TP 8
! 115: .B \-f namefile
! 116: Read the names of the files to be examined from
! 117: .I namefile
! 118: (one per line)
! 119: before the argument list.
! 120: Either
! 121: .I namefile
! 122: or at least one filename argument must be present;
! 123: to test the standard input, use ``-'' as a filename argument.
! 124: .TP 8
! 125: .B \-L
! 126: option causes symlinks to be followed, as the like-named option in
! 127: .IR ls (1).
! 128: (on systems that support symbolic links).
! 129: .SH FILES
! 130: .I /etc/magic
! 131: \- default list of magic numbers
! 132: .SH SEE ALSO
! 133: .IR magic (5)
! 134: \- description of magic file format.
! 135: .br
! 136: .IR Strings (1), " od" (1)
! 137: \- tools for examining non-textfiles.
! 138: .SH STANDARDS CONFORMANCE
! 139: This program is believed to exceed the System V Interface Definition
! 140: of FILE(CMD), as near as one can determine from the vague language
! 141: contained therein.
! 142: Its behaviour is mostly compatible with the System V program of the same name.
! 143: This version knows more magic, however, so it will produce
! 144: different (albeit more accurate) output in many cases.
! 145: .PP
! 146: The one significant difference
! 147: between this version and System V
! 148: is that this version treats any white space
! 149: as a delimiter, so that spaces in pattern strings must be escaped.
! 150: For example,
! 151: .br
! 152: >10 string language impress\ (imPRESS data)
! 153: .br
! 154: in an existing magic file would have to be changed to
! 155: .br
! 156: >10 string language\e impress (imPRESS data)
! 157: .br
! 158: In addition, in this version, if a pattern string contains a backslash,
! 159: it must be escaped. For example
! 160: .br
! 161: 0 string \ebegindata Andrew Toolkit document
! 162: .br
! 163: in an existing magic file would have to be changed to
! 164: .br
! 165: 0 string \e\ebegindata Andrew Toolkit document
! 166: .br
! 167: .PP
! 168: SunOS releases 3.2 and later from Sun Microsystems include a
! 169: .IR file (1)
! 170: command derived from the System V one, but with some extensions.
! 171: My version differs from Sun's only in minor ways.
! 172: It includes the extension of the `&' operator, used as,
! 173: for example,
! 174: .br
! 175: >16 long&0x7fffffff >0 not stripped
! 176: .SH MAGIC DIRECTORY
! 177: The magic file entries have been collected from various sources,
! 178: mainly USENET, and contributed by various authors.
! 179: Christos Zoulas (address below) will collect additional
! 180: or corrected magic file entries.
! 181: A consolidation of magic file entries
! 182: will be distributed periodically.
! 183: .PP
! 184: The order of entries in the magic file is significant.
! 185: Depending on what system you are using, the order that
! 186: they are put together may be incorrect.
! 187: If your old
! 188: .I file
! 189: command uses a magic file,
! 190: keep the old magic file around for comparison purposes
! 191: (rename it to
! 192: .IR /etc/magic.orig ).
! 193: .SH HISTORY
! 194: There has been a
! 195: .I file
! 196: command in every UNIX since at least Research Version 6
! 197: (man page dated January, 1975).
! 198: The System V version introduced one significant major change:
! 199: the external list of magic number types.
! 200: This slowed the program down slightly but made it a lot more flexible.
! 201: .PP
! 202: This program, based on the System V version,
! 203: was written by Ian Darwin without looking at anybody else's source code.
! 204: .PP
! 205: John Gilmore revised the code extensively, making it better than
! 206: the first version.
! 207: Geoff Collyer found several inadequacies
! 208: and provided some magic file entries.
! 209: The program has undergone continued evolution since.
! 210: .SH AUTHOR
! 211: Written by Ian F. Darwin, UUCP address {utzoo | ihnp4}!darwin!ian,
! 212: Internet address ian@sq.com,
! 213: postal address: P.O. Box 603, Station F, Toronto, Ontario, CANADA M4Y 2L8.
! 214: .PP
! 215: Altered by Rob McMahon, cudcv@warwick.ac.uk, 1989, to extend the `&' operator
! 216: from simple `x&y != 0' to `x&y op z'.
! 217: .PP
! 218: Altered by Guy Harris, guy@auspex.com, 1993, to:
! 219: .RS
! 220: .PP
! 221: put the ``old-style'' `&'
! 222: operator back the way it was, because 1) Rob McMahon's change broke the
! 223: previous style of usage, 2) the SunOS ``new-style'' `&' operator,
! 224: which this version of
! 225: .I file
! 226: supports, also handles `x&y op z', and 3) Rob's change wasn't documented
! 227: in any case;
! 228: .PP
! 229: put in multiple levels of `>';
! 230: .PP
! 231: put in ``beshort'', ``leshort'', etc. keywords to look at numbers in the
! 232: file in a specific byte order, rather than in the native byte order of
! 233: the process running
! 234: .IR file .
! 235: .RE
! 236: .PP
! 237: Changes by Ian Darwin and various authors including
! 238: Christos Zoulas (christos@ee.cornell.edu), 1990-1992.
! 239: .SH LEGAL NOTICE
! 240: Copyright (c) Ian F. Darwin, Toronto, Canada,
! 241: 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993.
! 242: .PP
! 243: This software is not subject to and may not be made subject to any
! 244: license of the American Telephone and Telegraph Company, Sun
! 245: Microsystems Inc., Digital Equipment Inc., Lotus Development Inc., the
! 246: Regents of the University of California, The X Consortium or MIT, or
! 247: The Free Software Foundation.
! 248: .PP
! 249: This software is not subject to any export provision of the United States
! 250: Department of Commerce, and may be exported to any country or planet.
! 251: .PP
! 252: Permission is granted to anyone to use this software for any purpose on
! 253: any computer system, and to alter it and redistribute it freely, subject
! 254: to the following restrictions:
! 255: .PP
! 256: 1. The author is not responsible for the consequences of use of this
! 257: software, no matter how awful, even if they arise from flaws in it.
! 258: .PP
! 259: 2. The origin of this software must not be misrepresented, either by
! 260: explicit claim or by omission. Since few users ever read sources,
! 261: credits must appear in the documentation.
! 262: .PP
! 263: 3. Altered versions must be plainly marked as such, and must not be
! 264: misrepresented as being the original software. Since few users
! 265: ever read sources, credits must appear in the documentation.
! 266: .PP
! 267: 4. This notice may not be removed or altered.
! 268: .PP
! 269: A few support files (\fIgetopt\fP, \fIstrtok\fP)
! 270: distributed with this package
! 271: are by Henry Spencer and are subject to the same terms as above.
! 272: .PP
! 273: A few simple support files (\fIstrtol\fP, \fIstrchr\fP)
! 274: distributed with this package
! 275: are in the public domain; they are so marked.
! 276: .PP
! 277: The files
! 278: .I tar.h
! 279: and
! 280: .I is_tar.c
! 281: were written by John Gilmore from his public-domain
! 282: .I tar
! 283: program, and are not covered by the above restrictions.
! 284: .SH BUGS
! 285: There must be a better way to automate the construction of the Magic
! 286: file from all the glop in Magdir. What is it?
! 287: Better yet, the magic file should be compiled into binary (say,
! 288: .IR ndbm (3)
! 289: or, better yet, fixed-length ASCII strings
! 290: for use in heterogenous network environments) for faster startup.
! 291: Then the program would run as fast as the Version 7 program of the same name,
! 292: with the flexibility of the System V version.
! 293: .PP
! 294: .I File
! 295: uses several algorithms that favor speed over accuracy,
! 296: thus it can be misled about the contents of ASCII files.
! 297: .PP
! 298: The support for ASCII files (primarily for programming languages)
! 299: is simplistic, inefficient and requires recompilation to update.
! 300: .PP
! 301: There should be an ``else'' clause to follow a series of continuation lines.
! 302: .PP
! 303: The magic file and keywords should have regular expression support.
! 304: Their use of ASCII TAB as a field delimiter is ugly and makes
! 305: it hard to edit the files, but is entrenched.
! 306: .PP
! 307: It might be advisable to allow upper-case letters in keywords
! 308: for e.g., troff commands vs man page macros.
! 309: Regular expression support would make this easy.
! 310: .PP
! 311: The program doesn't grok \s-2FORTRAN\s0.
! 312: It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
! 313: appear indented at the start of line.
! 314: Regular expression support would make this easy.
! 315: .PP
! 316: The list of keywords in
! 317: .I ascmagic
! 318: probably belongs in the Magic file.
! 319: This could be done by using some keyword like `*' for the offset value.
! 320: .PP
! 321: Another optimisation would be to sort
! 322: the magic file so that we can just run down all the
! 323: tests for the first byte, first word, first long, etc, once we
! 324: have fetched it. Complain about conflicts in the magic file entries.
! 325: Make a rule that the magic entries sort based on file offset rather
! 326: than position within the magic file?
! 327: .PP
! 328: The program should provide a way to give an estimate
! 329: of ``how good'' a guess is.
! 330: We end up removing guesses (e.g. ``From '' as first 5 chars of file) because
! 331: they are not as good as other guesses (e.g. ``Newsgroups:'' versus
! 332: "Return-Path:"). Still, if the others don't pan out, it should be
! 333: possible to use the first guess.
! 334: .PP
! 335: This program is slower than some vendors' file commands.
! 336: .PP
! 337: This manual page, and particularly this section, is too long.
! 338: .SH AVAILABILITY
! 339: You can obtain the original author's latest version by anonymous FTP
! 340: on
! 341: .B tesla.ee.cornell.edu
! 342: in the directory
! 343: .BR /pub/file-X.YY.tar.gz