Annotation of src/usr.bin/file/file.1, Revision 1.23
1.23 ! jaredy 1: .\" $OpenBSD: file.1,v 1.22 2004/10/14 20:56:57 jaredy Exp $
1.8 aaron 2: .\" $FreeBSD: src/usr.bin/file/file.1,v 1.16 2000/03/01 12:19:39 sheldonh Exp $
1.18 jmc 3: .\"
1.19 ian 4: .\" Copyright (c) Ian F. Darwin 1986-1995.
5: .\" Software written by Ian F. Darwin and others;
6: .\" maintained 1995-present by Christos Zoulas and others.
1.20 jmc 7: .\"
1.19 ian 8: .\" Redistribution and use in source and binary forms, with or without
9: .\" modification, are permitted provided that the following conditions
10: .\" are met:
11: .\" 1. Redistributions of source code must retain the above copyright
12: .\" notice immediately at the beginning of the file, without modification,
13: .\" this list of conditions, and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
1.20 jmc 17: .\"
1.19 ian 18: .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
19: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
22: .\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
24: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
25: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
26: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
27: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
28: .\" SUCH DAMAGE.
1.18 jmc 29: .\"
1.23 ! jaredy 30: .Dd December 4, 2004
1.8 aaron 31: .Dt FILE 1
32: .Os
33: .Sh NAME
34: .Nm file
35: .Nd determine file type
36: .Sh SYNOPSIS
37: .Nm file
1.23 ! jaredy 38: .Op Fl bckLNnrsvz
! 39: .Op Fl F Ar separator
1.8 aaron 40: .Op Fl f Ar namefile
41: .Op Fl m Ar magicfiles
1.23 ! jaredy 42: .Bk -words
! 43: .Ar file ...
! 44: .Ek
! 45: .Nm file
! 46: .Op Fl m Ar magicfiles
! 47: .Fl C
1.8 aaron 48: .Sh DESCRIPTION
1.22 jaredy 49: The
1.8 aaron 50: .Nm
1.22 jaredy 51: utility
1.1 deraadt 52: tests each argument in an attempt to classify it.
53: There are three sets of tests, performed in this order:
54: filesystem tests, magic number tests, and language tests.
1.8 aaron 55: The first test that succeeds causes the file type to be printed.
56: .Pp
1.1 deraadt 57: The type printed will usually contain one of the words
1.8 aaron 58: .Dq text
1.4 millert 59: (the file contains only
1.8 aaron 60: .Tn ASCII
1.4 millert 61: characters and is probably safe to read on an
1.8 aaron 62: .Tn ASCII
1.4 millert 63: terminal),
1.8 aaron 64: .Dq executable
1.1 deraadt 65: (the file contains the result of compiling a program
1.8 aaron 66: in a form understandable to some
67: .Ux
68: kernel or another),
1.1 deraadt 69: or
1.8 aaron 70: .Dq data
71: meaning anything else (data is usually binary or non-printable).
72: .Pp
1.1 deraadt 73: Exceptions are well-known file formats (core files, tar archives)
74: that are known to contain binary data.
75: When modifying the file
1.8 aaron 76: .Pa /etc/magic
1.6 aaron 77: or the program itself,
1.8 aaron 78: .Em "preserve these keywords" .
79: .Pp
1.1 deraadt 80: People depend on knowing that all the readable files in a directory
1.8 aaron 81: have the word
82: .Dq text
83: printed.
84: Don't do as Berkeley did; change
85: .Dq shell commands text
86: to
87: .Dq shell script .
88: .Pp
1.1 deraadt 89: The filesystem tests are based on examining the return from a
1.8 aaron 90: .Xr stat 2
1.1 deraadt 91: system call.
92: The program checks to see if the file is empty,
93: or if it's some sort of special file.
94: Any known file types appropriate to the system you are running on
95: (sockets, symbolic links, or named pipes (FIFOs) on those systems that
96: implement them)
97: are intuited if they are defined in
98: the system header file
1.9 aaron 99: .Aq Pa sys/stat.h .
1.8 aaron 100: .Pp
1.1 deraadt 101: The magic number tests are used to check for files with data in
102: particular fixed formats.
103: The canonical example of this is a binary executable (compiled program)
1.8 aaron 104: .Pa a.out
1.6 aaron 105: file, whose format is defined in
1.8 aaron 106: .Aq Pa a.out.h
1.1 deraadt 107: and possibly
1.8 aaron 108: .Aq Pa exec.h
1.23 ! jaredy 109: in the standard include directory and is explained in
! 110: .Xr a.out 5 .
1.8 aaron 111: These files have a
112: .Dq magic number
113: stored in a particular place
114: near the beginning of the file that tells the
115: .Ux
116: operating system
1.1 deraadt 117: that the file is a binary executable, and which of several types thereof.
1.8 aaron 118: .Pp
119: The concept of magic number has been applied by extension to data files.
1.1 deraadt 120: Any file with some invariant identifier at a small fixed
121: offset into the file can usually be described in this way.
122: The information in these files is read from the magic file
1.8 aaron 123: .Pa /etc/magic .
124: .Pp
1.1 deraadt 125: If an argument appears to be an
1.8 aaron 126: .Tn ASCII
1.1 deraadt 127: file,
1.8 aaron 128: .Nm
1.1 deraadt 129: attempts to guess its language.
1.4 millert 130: The language tests look for particular strings (cf
1.8 aaron 131: .Pa names.h )
1.1 deraadt 132: that can appear anywhere in the first few blocks of a file.
133: For example, the keyword
1.8 aaron 134: .Em .br
1.4 millert 135: indicates that the file is most likely a
1.8 aaron 136: .Xr troff 1
1.6 aaron 137: input file, just as the keyword
1.8 aaron 138: .Li struct
1.1 deraadt 139: indicates a C program.
140: These tests are less reliable than the previous
141: two groups, so they are performed last.
142: The language test routines also test for some miscellany
1.6 aaron 143: (such as
1.8 aaron 144: .Xr tar 1
1.1 deraadt 145: archives) and determine whether an unknown file should be
1.8 aaron 146: labelled as
147: .Dq ASCII text
148: or
149: .Dq data .
150: .Pp
151: The options are as follows:
1.11 aaron 152: .Bl -tag -width Ds
1.17 millert 153: .It Fl b
154: Do not prepend filenames to output lines (brief mode).
1.23 ! jaredy 155: .It Fl C
! 156: For each magic number file, write a
! 157: .Pa magic.mgc
! 158: output file that contains a preparsed (compiled) version of it.
1.8 aaron 159: .It Fl c
1.1 deraadt 160: Cause a checking printout of the parsed form of the magic file.
1.6 aaron 161: This is usually used in conjunction with
1.8 aaron 162: .Fl m
1.1 deraadt 163: to debug a new magic file before installing it.
1.23 ! jaredy 164: .It Fl F Ar separator
! 165: Use the specified string as the separator between the filename and
! 166: the file result returned.
! 167: Defaults to
! 168: .Sq \&: .
1.8 aaron 169: .It Fl f Ar namefile
1.6 aaron 170: Read the names of the files to be examined from
1.8 aaron 171: .Ar namefile
1.6 aaron 172: (one per line)
1.1 deraadt 173: before the argument list.
1.6 aaron 174: Either
1.8 aaron 175: .Ar namefile
1.1 deraadt 176: or at least one filename argument must be present;
1.8 aaron 177: to test the standard input, use
1.23 ! jaredy 178: .Sq -
1.8 aaron 179: as a filename argument.
1.23 ! jaredy 180: .It Fl k
! 181: Don't stop at the first match, keep going.
1.8 aaron 182: .It Fl L
183: Cause symlinks to be followed, as the like-named option in
1.23 ! jaredy 184: .Xr ls 1
1.1 deraadt 185: (on systems that support symbolic links).
1.23 ! jaredy 186: .It Fl m Ar magiclist
! 187: Specify an alternate list,
! 188: .Ar magiclist ,
! 189: of files containing magic numbers.
! 190: This can be a single file or a colon-separated list of files.
! 191: If a compiled magic file is found alongside, it will be used instead.
! 192: .It Fl N
! 193: Don't pad filenames so that they align in the output.
! 194: .It Fl n
! 195: Force
! 196: .Em stdout
! 197: to be flushed after checking each file.
! 198: This is only useful if checking a list of files.
! 199: It is intended to be used by programs that want filetype output from a
! 200: pipe.
! 201: .It Fl r
! 202: Don't translate unprintable characters to
! 203: .Sq \e Ns Em ooo .
! 204: Normally
! 205: .Nm
! 206: translates unprintable characters to their octal representation
! 207: (raw mode).
! 208: .It Fl s
! 209: Normally,
! 210: .Nm
! 211: only attempts to read and determine the type of argument files which
! 212: .Xr stat 2
! 213: reports are ordinary files.
! 214: This prevents problems, because reading special files may have peculiar
! 215: consequences.
! 216: Specifying the
! 217: .Fl s
! 218: option causes
! 219: .Nm
! 220: to also read argument files which are block or character special files.
! 221: This is useful for determining the filesystem types of the data in raw
! 222: disk partitions, which are block special files.
! 223: This option also causes
! 224: .Nm
! 225: to disregard the file size as reported by
! 226: .Xr stat 2 ,
! 227: since on some systems it reports a zero size for raw disk partitions.
! 228: .It Fl v
! 229: Print the version of the program and exit.
! 230: .It Fl z
! 231: Try to look inside files that have been run through
! 232: .Xr compress 1 .
1.8 aaron 233: .El
234: .Sh ENVIRONMENT
235: .Bl -tag -width indent
1.13 smart 236: .It Ev MAGIC
1.23 ! jaredy 237: Default magic number files, separated by colon characters.
! 238: .Nm
! 239: adds
! 240: .Dq .mgc
! 241: to the value of this variable as appropriate.
1.8 aaron 242: .El
1.12 aaron 243: .Sh FILES
244: .Bl -tag -width /etc/magic -compact
245: .It Pa /etc/magic
246: default list of magic numbers
247: .El
1.8 aaron 248: .Sh SEE ALSO
1.23 ! jaredy 249: .Xr compress 1 ,
1.8 aaron 250: .Xr hexdump 1 ,
1.23 ! jaredy 251: .Xr ls 1 ,
1.8 aaron 252: .Xr od 1 ,
253: .Xr strings 1 ,
1.23 ! jaredy 254: .Xr a.out 5 ,
1.8 aaron 255: .Xr magic 5
256: .Sh STANDARDS CONFORMANCE
1.1 deraadt 257: This program is believed to exceed the System V Interface Definition
258: of FILE(CMD), as near as one can determine from the vague language
1.6 aaron 259: contained therein.
1.1 deraadt 260: Its behaviour is mostly compatible with the System V program of the same name.
261: This version knows more magic, however, so it will produce
1.6 aaron 262: different (albeit more accurate) output in many cases.
1.8 aaron 263: .Pp
1.6 aaron 264: The one significant difference
1.1 deraadt 265: between this version and System V
1.8 aaron 266: is that this version treats any white space
1.1 deraadt 267: as a delimiter, so that spaces in pattern strings must be escaped.
268: For example,
1.8 aaron 269: .Pp
270: >10 string language impress\ (imPRESS data)
271: .Pp
1.1 deraadt 272: in an existing magic file would have to be changed to
1.8 aaron 273: .Pp
274: >10 string language\e impress (imPRESS data)
275: .Pp
1.1 deraadt 276: In addition, in this version, if a pattern string contains a backslash,
1.9 aaron 277: it must be escaped.
278: For example
1.8 aaron 279: .Pp
280: 0 string \ebegindata Andrew Toolkit document
281: .Pp
1.1 deraadt 282: in an existing magic file would have to be changed to
1.8 aaron 283: .Pp
284: 0 string \e\ebegindata Andrew Toolkit document
285: .Pp
1.1 deraadt 286: SunOS releases 3.2 and later from Sun Microsystems include a
1.20 jmc 287: .Nm file
1.1 deraadt 288: command derived from the System V one, but with some extensions.
289: My version differs from Sun's only in minor ways.
1.8 aaron 290: It includes the extension of the
291: .Ql &
292: operator, used as,
1.1 deraadt 293: for example,
1.8 aaron 294: .Pp
295: >16 long&0x7fffffff >0 not stripped
296: .Sh MAGIC DIRECTORY
1.1 deraadt 297: The magic file entries have been collected from various sources,
298: mainly USENET, and contributed by various authors.
1.8 aaron 299: .An Christos Zoulas
300: (address below) will collect additional
1.1 deraadt 301: or corrected magic file entries.
1.6 aaron 302: A consolidation of magic file entries
1.1 deraadt 303: will be distributed periodically.
304: The order of entries in the magic file is significant.
305: Depending on what system you are using, the order that
306: they are put together may be incorrect.
307: If your old
1.8 aaron 308: .Nm
1.1 deraadt 309: command uses a magic file,
310: keep the old magic file around for comparison purposes
1.6 aaron 311: (rename it to
1.8 aaron 312: .Pa /etc/magic.orig ) .
313: .Sh HISTORY
1.6 aaron 314: There has been a
1.8 aaron 315: .Nm
316: command in every
317: .Ux
1.16 mickey 318: since at least Research Version 4
319: (man page dated November, 1973).
1.1 deraadt 320: The System V version introduced one significant major change:
321: the external list of magic number types.
322: This slowed the program down slightly but made it a lot more flexible.
1.8 aaron 323: .Pp
1.10 ian 324: This program, based on the System V version, was written by
325: .An Ian F. Darwin Aq ian@darwinisys.com
1.8 aaron 326: without looking at anybody else's source code.
327: .Pp
328: .An John Gilmore
329: revised the code extensively, making it better than
1.1 deraadt 330: the first version.
1.8 aaron 331: .An Geoff Collyer
332: found several inadequacies
1.1 deraadt 333: and provided some magic file entries.
1.23 ! jaredy 334: Contributions to the
! 335: .Ql &
! 336: operator by
! 337: .An Rob McMahon Aq cudcv@warwick.ac.uk ,
! 338: 1989.
! 339: .Pp
! 340: .An Guy Harris Aq guy@auspex.com
! 341: made many changes from 1993 to the present.
! 342: .Pp
! 343: Primary development and maintenence from 1990 to the present by
! 344: .An Christos Zoulas Aq christos@zoulas.com .
1.8 aaron 345: .Pp
346: Altered by
1.23 ! jaredy 347: .An Chris Lowth Aq chris@lowth.com ,
! 348: 2000: Handle the
! 349: .Fl i
! 350: option to output mime type strings and using an alternative magic file
! 351: and internal logic.
1.8 aaron 352: .Pp
353: Altered by
1.23 ! jaredy 354: .An Eric Fischer Aq enf@pobox.com ,
! 355: July, 2000, to identify character codes and attempt to identify the
! 356: languages of non-ASCII files.
! 357: .Pp
! 358: The list of contributors to the
! 359: .Dq magdir
! 360: directory (source for the
! 361: .Pa /etc/magic
! 362: file) is too long to include here.
! 363: You know who you are; thank you.
1.8 aaron 364: .Sh LEGAL NOTICE
1.10 ian 365: Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
366: Covered by the standard Berkeley Software Distribution copyright; see the file
367: LEGAL.NOTICE in the distribution.
1.8 aaron 368: .Pp
1.1 deraadt 369: The files
1.8 aaron 370: .Pa tar.h
1.1 deraadt 371: and
1.8 aaron 372: .Pa is_tar.c
373: were written by
374: .An John Gilmore
375: from his public-domain
376: .Nm tar
1.23 ! jaredy 377: program, and are not covered by the above license.
1.8 aaron 378: .Sh BUGS
1.1 deraadt 379: There must be a better way to automate the construction of the Magic
1.8 aaron 380: file from all the glop in Magdir.
381: What is it?
1.1 deraadt 382: Better yet, the magic file should be compiled into binary (say,
1.8 aaron 383: .Xr ndbm 3
1.4 millert 384: or, better yet, fixed-length
1.8 aaron 385: .Tn ASCII
1.4 millert 386: strings for use in heterogenous network environments) for faster startup.
1.1 deraadt 387: Then the program would run as fast as the Version 7 program of the same name,
388: with the flexibility of the System V version.
1.8 aaron 389: .Pp
390: .Nm
1.15 pjanzen 391: uses several algorithms that favor speed over accuracy;
1.4 millert 392: thus it can be misled about the contents of
1.8 aaron 393: .Tn ASCII
1.4 millert 394: files.
1.8 aaron 395: .Pp
1.4 millert 396: The support for
1.8 aaron 397: .Tn ASCII
1.4 millert 398: files (primarily for programming languages)
1.1 deraadt 399: is simplistic, inefficient and requires recompilation to update.
1.8 aaron 400: .Pp
401: There should be an
402: .Dq else
403: clause to follow a series of continuation lines.
404: .Pp
1.1 deraadt 405: The magic file and keywords should have regular expression support.
1.4 millert 406: Their use of
1.8 aaron 407: .Tn ASCII TAB
1.4 millert 408: as a field delimiter is ugly and makes
1.1 deraadt 409: it hard to edit the files, but is entrenched.
1.8 aaron 410: .Pp
1.1 deraadt 411: It might be advisable to allow upper-case letters in keywords
1.4 millert 412: for e.g.,
1.8 aaron 413: .Xr troff 1
1.4 millert 414: commands vs man page macros.
1.1 deraadt 415: Regular expression support would make this easy.
1.8 aaron 416: .Pp
1.1 deraadt 417: The program doesn't grok \s-2FORTRAN\s0.
1.6 aaron 418: It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
1.1 deraadt 419: appear indented at the start of line.
420: Regular expression support would make this easy.
1.8 aaron 421: .Pp
1.6 aaron 422: The list of keywords in
1.8 aaron 423: .Em ascmagic
1.1 deraadt 424: probably belongs in the Magic file.
1.8 aaron 425: This could be done by using some keyword like
426: .Ql *
427: for the offset value.
428: .Pp
429: Another optimization would be to sort
1.1 deraadt 430: the magic file so that we can just run down all the
431: tests for the first byte, first word, first long, etc, once we
1.9 aaron 432: have fetched it.
433: Complain about conflicts in the magic file entries.
1.1 deraadt 434: Make a rule that the magic entries sort based on file offset rather
435: than position within the magic file?
1.8 aaron 436: .Pp
1.6 aaron 437: The program should provide a way to give an estimate
1.8 aaron 438: of
439: .Dq how good
440: a guess is.
441: We end up removing guesses (e.g.,
1.20 jmc 442: .Dq From\ \&
1.8 aaron 443: as first 5 chars of file) because
444: they are not as good as other guesses (e.g.,
445: .Dq Newsgroups:
446: versus
447: .Qq Return-Path: ) .
448: Still, if the others don't pan out, it should be
1.6 aaron 449: possible to use the first guess.
1.8 aaron 450: .Pp
451: This program is slower than some vendors'
452: .Nm
453: commands.
454: .Pp
1.1 deraadt 455: This manual page, and particularly this section, is too long.
1.8 aaron 456: .Sh AVAILABILITY
1.1 deraadt 457: You can obtain the original author's latest version by anonymous FTP
1.8 aaron 458: on
1.15 pjanzen 459: .Em ftp.astron.com
1.8 aaron 460: in the directory
1.20 jmc 461: .Pa /pub/file/file-X.YY.tar.gz .