Annotation of src/usr.bin/file/magic.5, Revision 1.14
1.14 ! jmc 1: .\" $OpenBSD: magic.5,v 1.13 2009/11/26 20:22:50 jmc Exp $
1.4 aaron 2: .\"
3: .\" @(#)$FreeBSD: src/usr.bin/file/magic.5,v 1.11 2000/03/01 12:19:39 sheldonh Exp $
4: .\"
1.3 millert 5: .\" install as magic.4 on USG, magic.5 on V7 or Berkeley systems.
1.7 jmc 6: .\"
1.8 ian 7: .\" Copyright (c) Ian F. Darwin 1986-1995.
8: .\" Software written by Ian F. Darwin and others;
9: .\" maintained 1995-present by Christos Zoulas and others.
1.9 jmc 10: .\"
1.8 ian 11: .\" Redistribution and use in source and binary forms, with or without
12: .\" modification, are permitted provided that the following conditions
13: .\" are met:
14: .\" 1. Redistributions of source code must retain the above copyright
15: .\" notice immediately at the beginning of the file, without modification,
16: .\" this list of conditions, and the following disclaimer.
17: .\" 2. Redistributions in binary form must reproduce the above copyright
18: .\" notice, this list of conditions and the following disclaimer in the
19: .\" documentation and/or other materials provided with the distribution.
1.9 jmc 20: .\"
1.8 ian 21: .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
22: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
24: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
25: .\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
27: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
28: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
29: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
30: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31: .\" SUCH DAMAGE.
1.4 aaron 32: .\"
1.14 ! jmc 33: .Dd $Mdocdate: November 26 2009 $
1.4 aaron 34: .Dt MAGIC 5
35: .Os
1.12 ajacouto 36: .\" install as magic.4 on USG, magic.5 on V7, Berkeley and Linux systems.
1.4 aaron 37: .Sh NAME
38: .Nm magic
1.12 ajacouto 39: .Nd file command's magic pattern file
1.4 aaron 40: .Sh DESCRIPTION
1.3 millert 41: This manual page documents the format of the magic file as
42: used by the
1.4 aaron 43: .Xr file 1
1.12 ajacouto 44: command, version 4.24.
1.5 aaron 45: The
1.12 ajacouto 46: .Xr file 1
1.1 deraadt 47: command identifies the type of a file using,
48: among other tests,
1.12 ajacouto 49: a test for whether the file contains certain
50: .Dq "magic patterns" .
1.1 deraadt 51: The file
1.4 aaron 52: .Pa /etc/magic
1.1 deraadt 53: specifies what magic numbers are to be tested for,
54: what message to print if a particular magic number is found,
55: and additional information to extract from the file.
1.4 aaron 56: .Pp
1.1 deraadt 57: Each line of the file specifies a test to be performed.
58: A test compares the data starting at a particular offset
1.12 ajacouto 59: in the file with a byte value, a string or a numeric value.
1.4 aaron 60: If the test succeeds, a message is printed.
1.1 deraadt 61: The line consists of the following fields:
1.12 ajacouto 62: .Bl -tag -width ".Dv message"
63: .It Dv offset
1.1 deraadt 64: A number specifying the offset, in bytes, into the file of the data
65: which is to be tested.
1.12 ajacouto 66: .It Dv type
1.4 aaron 67: The type of the data to be tested.
68: The possible values are:
1.12 ajacouto 69: .Bl -tag -width ".Dv lestring16"
70: .It Dv byte
1.1 deraadt 71: A one-byte value.
1.12 ajacouto 72: .It Dv short
73: A two-byte value in this machine's native byte order.
74: .It Dv long
75: A four-byte value in this machine's native byte order.
76: .It Dv quad
77: An eight-byte value in this machine's native byte order.
78: .It Dv float
79: A 32-bit single precision IEEE floating point number in this machine's native byte order.
80: .It Dv double
81: A 64-bit double precision IEEE floating point number in this machine's native byte order.
82: .It Dv string
1.1 deraadt 83: A string of bytes.
1.12 ajacouto 84: The string type specification can be optionally followed
85: by /[Bbc]*.
86: The
87: .Dq B
88: flag compacts whitespace in the target, which must
89: contain at least one whitespace character.
90: If the magic has
91: .Dv n
92: consecutive blanks, the target needs at least
93: .Dv n
94: consecutive blanks to match.
95: The
96: .Dq b
97: flag treats every blank in the target as an optional blank.
98: Finally the
99: .Dq c
100: flag, specifies case insensitive matching: lowercase
101: characters in the magic match both lower and upper case characters in the
102: target, whereas upper case characters in the magic only match uppercase
103: characters in the target.
104: .It Dv pstring
105: A Pascal-style string where the first byte is interpreted as the an
106: unsigned length.
107: The string is not NUL terminated.
108: .It Dv date
109: A four-byte value interpreted as a UNIX date.
110: .It Dv qdate
1.13 jmc 111: An eight-byte value interpreted as a UNIX date.
1.12 ajacouto 112: .It Dv ldate
113: A four-byte value interpreted as a UNIX-style date, but interpreted as
114: local time rather than UTC.
115: .It Dv qldate
116: An eight-byte value interpreted as a UNIX-style date, but interpreted as
117: local time rather than UTC.
118: .It Dv beshort
119: A two-byte value in big-endian byte order.
120: .It Dv belong
121: A four-byte value in big-endian byte order.
122: .It Dv bequad
123: An eight-byte value in big-endian byte order.
124: .It Dv befloat
125: A 32-bit single precision IEEE floating point number in big-endian byte order.
126: .It Dv bedouble
127: A 64-bit double precision IEEE floating point number in big-endian byte order.
128: .It Dv bedate
129: A four-byte value in big-endian byte order,
130: interpreted as a Unix date.
131: .It Dv beqdate
132: An eight-byte value in big-endian byte order,
133: interpreted as a Unix date.
134: .It Dv beldate
135: A four-byte value in big-endian byte order,
136: interpreted as a UNIX-style date, but interpreted as local time rather
137: than UTC.
138: .It Dv beqldate
139: An eight-byte value in big-endian byte order,
140: interpreted as a UNIX-style date, but interpreted as local time rather
141: than UTC.
142: .It Dv bestring16
143: A two-byte unicode (UCS16) string in big-endian byte order.
144: .It Dv leshort
145: A two-byte value in little-endian byte order.
146: .It Dv lelong
147: A four-byte value in little-endian byte order.
148: .It Dv lequad
149: An eight-byte value in little-endian byte order.
150: .It Dv lefloat
151: A 32-bit single precision IEEE floating point number in little-endian byte order.
152: .It Dv ledouble
153: A 64-bit double precision IEEE floating point number in little-endian byte order.
154: .It Dv ledate
155: A four-byte value in little-endian byte order,
156: interpreted as a UNIX date.
157: .It Dv leqdate
158: An eight-byte value in little-endian byte order,
159: interpreted as a UNIX date.
160: .It Dv leldate
161: A four-byte value in little-endian byte order,
162: interpreted as a UNIX-style date, but interpreted as local time rather
163: than UTC.
164: .It Dv leqldate
165: An eight-byte value in little-endian byte order,
166: interpreted as a UNIX-style date, but interpreted as local time rather
167: than UTC.
168: .It Dv lestring16
169: A two-byte unicode (UCS16) string in little-endian byte order.
170: .It Dv melong
171: A four-byte value in middle-endian (PDP-11) byte order.
172: .It Dv medate
173: A four-byte value in middle-endian (PDP-11) byte order,
174: interpreted as a UNIX date.
175: .It Dv meldate
176: A four-byte value in middle-endian (PDP-11) byte order,
177: interpreted as a UNIX-style date, but interpreted as local time rather
178: than UTC.
179: .It Dv regex
180: A regular expression match in extended POSIX regular expression syntax
181: (like egrep).
182: Regular expressions can take exponential time to process,
183: and their performance is hard to predict, so their use is discouraged.
184: When used in production environments,
185: their performance should be carefully checked.
186: The type specification can be optionally followed by
187: .Dv /[c][s] .
188: The
189: .Dq c
190: flag makes the match case insensitive, while the
191: .Dq s
192: flag update the offset to the start offset of the match, rather than the end.
193: The regular expression is tested against line
194: .Dv N + 1
195: onwards, where
196: .Dv N
197: is the given offset.
198: Line endings are assumed to be in the machine's native format.
199: .Dv ^
200: and
201: .Dv $
202: match the beginning and end of individual lines, respectively,
203: not beginning and end of file.
204: .It Dv search
205: A literal string search starting at the given offset.
206: The same modifier flags can be used as for string patterns.
207: The modifier flags (if any) must be followed by
208: .Dv /number
209: the range, that is, the number of positions at which the match will be
210: attempted, starting from the start offset.
211: This is suitable for searching larger binary expressions
212: with variable offsets, using
213: .Dv \e
214: escapes for special characters.
215: The offset works as for regex.
216: .It Dv default
217: This is intended to be used with the test
218: .Em x
219: (which is always true) and a message that is to be used if there are
220: no other matches.
1.4 aaron 221: .El
1.12 ajacouto 222: .Pp
223: Each top-level magic pattern (see below for an explanation of levels)
224: is classified as text or binary according to the types used.
225: Types
226: .Dq regex
227: and
228: .Dq search
229: are classified as text tests, unless non-printable characters are used
230: in the pattern.
231: All other tests are classified as binary.
232: A top-level pattern is considered to be a test text
233: when all its patterns are text
234: patterns; otherwise, it is considered to be a binary pattern.
235: When matching a file, binary patterns are tried first; if no match is
236: found, and the file looks like text, then its encoding is determined
237: and the text patterns are tried.
1.4 aaron 238: .Pp
1.1 deraadt 239: The numeric types may optionally be followed by
1.12 ajacouto 240: .Dv &
1.1 deraadt 241: and a numeric value,
242: to specify that the value is to be AND'ed with the
1.4 aaron 243: numeric value before any comparisons are done.
244: Prepending a
1.12 ajacouto 245: .Dv u
1.1 deraadt 246: to the type indicates that ordered comparisons should be unsigned.
1.12 ajacouto 247: .It Dv test
1.4 aaron 248: The value to be compared with the value from the file.
249: If the type is
1.1 deraadt 250: numeric, this value
251: is specified in C form; if it is a string, it is specified as a C string
1.12 ajacouto 252: with the usual escapes permitted (e.g. \en for new-line).
253: .Pp
1.1 deraadt 254: Numeric values
255: may be preceded by a character indicating the operation to be performed.
256: It may be
1.12 ajacouto 257: .Dv = ,
1.1 deraadt 258: to specify that the value from the file must equal the specified value,
1.12 ajacouto 259: .Dv \*(Lt ,
1.1 deraadt 260: to specify that the value from the file must be less than the specified
261: value,
1.12 ajacouto 262: .Dv \*(Gt ,
1.1 deraadt 263: to specify that the value from the file must be greater than the specified
264: value,
1.12 ajacouto 265: .Dv & ,
1.6 aaron 266: to specify that the value from the file must have set all of the bits
1.1 deraadt 267: that are set in the specified value,
1.12 ajacouto 268: .Dv ^ ,
1.6 aaron 269: to specify that the value from the file must have clear any of the bits
1.1 deraadt 270: that are set in the specified value, or
1.12 ajacouto 271: .Dv ~ ,
272: the value specified after is negated before tested.
273: .Dv x ,
1.4 aaron 274: to specify that any value will match.
1.12 ajacouto 275: If the character is omitted, it is assumed to be
276: .Dv = .
277: Operators
278: .Dv & ,
279: .Dv ^ ,
280: and
281: .Dv ~
282: don't work with floats and doubles.
283: The operator
284: .Dv !\&
285: specifies that the line matches if the test does
286: .Em not
287: succeed.
288: .Pp
289: Numeric values are specified in C form; e.g.
290: .Dv 13
1.1 deraadt 291: is decimal,
1.12 ajacouto 292: .Dv 013
1.1 deraadt 293: is octal, and
1.12 ajacouto 294: .Dv 0x13
1.1 deraadt 295: is hexadecimal.
1.12 ajacouto 296: .Pp
297: For string values, the string from the
298: file must match the specified string.
1.1 deraadt 299: The operators
1.12 ajacouto 300: .Dv = ,
301: .Dv \*(Lt
1.1 deraadt 302: and
1.12 ajacouto 303: .Dv \*(Gt
1.1 deraadt 304: (but not
1.12 ajacouto 305: .Dv & )
1.1 deraadt 306: can be applied to strings.
307: The length used for matching is that of the string argument
1.4 aaron 308: in the magic file.
1.12 ajacouto 309: This means that a line can match any non-empty string (usually used to
310: then print the string), with
311: .Em \*(Gt\e0
312: (because all non-empty strings are greater than the empty string).
313: .Pp
314: The special test
315: .Em x
316: always evaluates to true.
317: .Dv message
1.4 aaron 318: The message to be printed if the comparison succeeds.
1.12 ajacouto 319: If the string contains a
1.4 aaron 320: .Xr printf 3
1.1 deraadt 321: format specification, the value from the file (with any specified masking
322: performed) is printed using the message as the format string.
1.12 ajacouto 323: If the string begins with
324: .Dq \eb ,
325: the message printed is the remainder of the string with no whitespace
326: added before it: multiple matches are normally separated by a single
327: space.
1.4 aaron 328: .El
329: .Pp
1.12 ajacouto 330: A MIME type is given on a separate line, which must be the next
331: non-blank or comment line after the magic line that identifies the
332: file type, and has the following format:
333: .Bd -literal -offset indent
334: !:mime MIMETYPE
335: .Ed
336: .Pp
337: i.e. the literal string
338: .Dq !:mime
339: followed by the MIME type.
340: .Pp
1.1 deraadt 341: Some file formats contain additional information which is to be printed
1.12 ajacouto 342: along with the file type or need additional tests to determine the true
343: file type.
344: These additional tests are introduced by one or more
345: .Em \*(Gt
346: characters preceding the offset.
1.4 aaron 347: The number of
1.12 ajacouto 348: .Em \*(Gt
1.1 deraadt 349: on the line indicates the level of the test; a line with no
1.12 ajacouto 350: .Em \*(Gt
1.1 deraadt 351: at the beginning is considered to be at level 0.
1.12 ajacouto 352: Tests are arranged in a tree-like hierarchy:
353: If a the test on a line at level
1.4 aaron 354: .Em n
1.12 ajacouto 355: succeeds, all following tests at level
1.4 aaron 356: .Em n+1
1.12 ajacouto 357: are performed, and the messages printed if the tests succeed, untile a line
358: with level
1.4 aaron 359: .Em n
1.12 ajacouto 360: (or less) appears.
361: For more complex files, one can use empty messages to get just the
362: "if/then" effect, in the following way:
363: .Bd -literal -offset indent
364: 0 string MZ
365: \*(Gt0x18 leshort \*(Lt0x40 MS-DOS executable
366: \*(Gt0x18 leshort \*(Gt0x3f extended PC executable (e.g., MS Windows)
367: .Ed
1.4 aaron 368: .Pp
1.12 ajacouto 369: Offsets do not need to be constant, but can also be read from the file
370: being examined.
1.1 deraadt 371: If the first character following the last
1.12 ajacouto 372: .Em \*(Gt
1.1 deraadt 373: is a
1.14 ! jmc 374: .Em \&(
1.1 deraadt 375: then the string after the parenthesis is interpreted as an indirect offset.
376: That means that the number after the parenthesis is used as an offset in
1.4 aaron 377: the file.
378: The value at that offset is read, and is used again as an offset
379: in the file.
380: Indirect offsets are of the form:
1.12 ajacouto 381: .Em (( x [.[bslBSL]][+\-][ y ]) .
1.6 aaron 382: The value of
1.12 ajacouto 383: .Em x
1.4 aaron 384: is used as an offset in the file.
1.12 ajacouto 385: A byte, short or long is read at that offset depending on the
386: .Op bslBSLm
1.4 aaron 387: type specifier.
1.12 ajacouto 388: The capitalized types interpret the number as a big endian
389: value, whereas the small letter versions interpret the number as a little
390: endian value;
391: the
392: .Em m
393: type interprets the number as a middle endian (PDP-11) value.
1.4 aaron 394: To that number the value of
1.12 ajacouto 395: .Em y
1.4 aaron 396: is added and the result is used as an offset in the file.
1.12 ajacouto 397: The default type if one is not specified is long.
398: .Pp
399: That way variable length structures can be examined:
400: .Bd -literal -offset indent
401: # MS Windows executables are also valid MS-DOS executables
402: 0 string MZ
403: \*(Gt0x18 leshort \*(Lt0x40 MZ executable (MS-DOS)
404: # skip the whole block below if it is not an extended executable
405: \*(Gt0x18 leshort \*(Gt0x3f
406: \*(Gt\*(Gt(0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
407: \*(Gt\*(Gt(0x3c.l) string LX\e0\e0 LX executable (OS/2)
408: .Ed
409: .Pp
410: This strategy of examining has a drawback: You must make sure that
411: you eventually print something, or users may get empty output (like, when
412: there is neither PE\e0\e0 nor LE\e0\e0 in the above example)
1.4 aaron 413: .Pp
1.12 ajacouto 414: If this indirect offset cannot be used directly, simple calculations are
415: possible: appending
416: .Em [+-*/%&|^]number
417: inside parentheses allows one to modify
418: the value read from the file before it is used as an offset:
419: .Bd -literal -offset indent
420: # MS Windows executables are also valid MS-DOS executables
421: 0 string MZ
422: # sometimes, the value at 0x18 is less that 0x40 but there's still an
423: # extended executable, simply appended to the file
424: \*(Gt0x18 leshort \*(Lt0x40
425: \*(Gt\*(Gt(4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP)
426: \*(Gt\*(Gt(4.s*512) leshort !0x014c MZ executable (MS-DOS)
427: .Ed
428: .Pp
429: Sometimes you do not know the exact offset as this depends on the length or
430: position (when indirection was used before) of preceding fields.
431: You can specify an offset relative to the end of the last up-level
432: field using
433: .Sq &
434: as a prefix to the offset:
435: .Bd -literal -offset indent
436: 0 string MZ
437: \*(Gt0x18 leshort \*(Gt0x3f
438: \*(Gt\*(Gt(0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
439: # immediately following the PE signature is the CPU type
440: \*(Gt\*(Gt\*(Gt&0 leshort 0x14c for Intel 80386
441: \*(Gt\*(Gt\*(Gt&0 leshort 0x184 for DEC Alpha
442: .Ed
443: .Pp
444: Indirect and relative offsets can be combined:
445: .Bd -literal -offset indent
446: 0 string MZ
447: \*(Gt0x18 leshort \*(Lt0x40
448: \*(Gt\*(Gt(4.s*512) leshort !0x014c MZ executable (MS-DOS)
449: # if it's not COFF, go back 512 bytes and add the offset taken
450: # from byte 2/3, which is yet another way of finding the start
451: # of the extended executable
452: \*(Gt\*(Gt\*(Gt&(2.s-514) string LE LE executable (MS Windows VxD driver)
453: .Ed
454: .Pp
455: Or the other way around:
456: .Bd -literal -offset indent
457: 0 string MZ
458: \*(Gt0x18 leshort \*(Gt0x3f
459: \*(Gt\*(Gt(0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
460: # at offset 0x80 (-4, since relative offsets start at the end
461: # of the up-level match) inside the LE header, we find the absolute
462: # offset to the code area, where we look for a specific signature
463: \*(Gt\*(Gt\*(Gt(&0x7c.l+0x26) string UPX \eb, UPX compressed
464: .Ed
465: .Pp
466: Or even both!
467: .Bd -literal -offset indent
468: 0 string MZ
469: \*(Gt0x18 leshort \*(Gt0x3f
470: \*(Gt\*(Gt(0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
471: # at offset 0x58 inside the LE header, we find the relative offset
472: # to a data area where we look for a specific signature
473: \*(Gt\*(Gt\*(Gt&(&0x54.l-3) string UNACE \eb, ACE self-extracting archive
474: .Ed
475: .Pp
476: Finally, if you have to deal with offset/length pairs in your file, even the
477: second value in a parenthesized expression can be taken from the file itself,
478: using another set of parentheses.
479: Note that this additional indirect offset is always relative to the
480: start of the main indirect offset.
481: .Bd -literal -offset indent
482: 0 string MZ
483: \*(Gt0x18 leshort \*(Gt0x3f
484: \*(Gt\*(Gt(0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
485: # search for the PE section called ".idata"...
486: \*(Gt\*(Gt\*(Gt&0xf4 search/0x140 .idata
487: # ...and go to the end of it, calculated from start+length;
488: # these are located 14 and 10 bytes after the section name
489: \*(Gt\*(Gt\*(Gt\*(Gt(&0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive
490: .Ed
1.9 jmc 491: .Sh SEE ALSO
492: .Xr file 1
1.12 ajacouto 493: \- the command that reads this file.
1.4 aaron 494: .Sh BUGS
1.6 aaron 495: The formats
1.12 ajacouto 496: .Dv long ,
497: .Dv belong ,
498: .Dv lelong ,
499: .Dv melong ,
500: .Dv short ,
501: .Dv beshort ,
502: .Dv leshort ,
503: .Dv date ,
504: .Dv bedate ,
505: .Dv medate ,
506: .Dv ledate ,
507: .Dv beldate ,
508: .Dv leldate ,
1.1 deraadt 509: and
1.12 ajacouto 510: .Dv meldate
1.1 deraadt 511: are system-dependent; perhaps they should be specified as a number
1.6 aaron 512: of bytes (2B, 4B, etc),
1.1 deraadt 513: since the files being recognized typically come from
514: a system on which the lengths are invariant.
515: .\"
516: .\" From: guy@sun.uucp (Guy Harris)
517: .\" Newsgroups: net.bugs.usg
518: .\" Subject: /etc/magic's format isn't well documented
519: .\" Message-ID: <2752@sun.uucp>
520: .\" Date: 3 Sep 85 08:19:07 GMT
521: .\" Organization: Sun Microsystems, Inc.
522: .\" Lines: 136
1.6 aaron 523: .\"
1.1 deraadt 524: .\" Here's a manual page for the format accepted by the "file" made by adding
525: .\" the changes I posted to the S5R2 version.
526: .\"
527: .\" Modified for Ian Darwin's version of the file command.