Annotation of src/usr.bin/file/magic.5, Revision 1.19
1.19 ! jmc 1: .\" $OpenBSD: magic.5,v 1.18 2017/09/20 10:03:34 jmc Exp $
1.4 aaron 2: .\"
3: .\" @(#)$FreeBSD: src/usr.bin/file/magic.5,v 1.11 2000/03/01 12:19:39 sheldonh Exp $
4: .\"
1.3 millert 5: .\" install as magic.4 on USG, magic.5 on V7 or Berkeley systems.
1.7 jmc 6: .\"
1.8 ian 7: .\" Copyright (c) Ian F. Darwin 1986-1995.
8: .\" Software written by Ian F. Darwin and others;
9: .\" maintained 1995-present by Christos Zoulas and others.
1.9 jmc 10: .\"
1.8 ian 11: .\" Redistribution and use in source and binary forms, with or without
12: .\" modification, are permitted provided that the following conditions
13: .\" are met:
14: .\" 1. Redistributions of source code must retain the above copyright
15: .\" notice immediately at the beginning of the file, without modification,
16: .\" this list of conditions, and the following disclaimer.
17: .\" 2. Redistributions in binary form must reproduce the above copyright
18: .\" notice, this list of conditions and the following disclaimer in the
19: .\" documentation and/or other materials provided with the distribution.
1.9 jmc 20: .\"
1.8 ian 21: .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
22: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
24: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
25: .\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
27: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
28: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
29: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
30: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31: .\" SUCH DAMAGE.
1.4 aaron 32: .\"
1.19 ! jmc 33: .Dd $Mdocdate: September 20 2017 $
1.4 aaron 34: .Dt MAGIC 5
35: .Os
1.12 ajacouto 36: .\" install as magic.4 on USG, magic.5 on V7, Berkeley and Linux systems.
1.4 aaron 37: .Sh NAME
38: .Nm magic
1.12 ajacouto 39: .Nd file command's magic pattern file
1.4 aaron 40: .Sh DESCRIPTION
1.3 millert 41: This manual page documents the format of the magic file as
42: used by the
1.4 aaron 43: .Xr file 1
1.18 jmc 44: command.
1.12 ajacouto 45: .Xr file 1
1.18 jmc 46: identifies the type of a file using,
1.1 deraadt 47: among other tests,
1.12 ajacouto 48: a test for whether the file contains certain
49: .Dq "magic patterns" .
1.1 deraadt 50: The file
1.4 aaron 51: .Pa /etc/magic
1.1 deraadt 52: specifies what magic numbers are to be tested for,
53: what message to print if a particular magic number is found,
54: and additional information to extract from the file.
1.4 aaron 55: .Pp
1.1 deraadt 56: Each line of the file specifies a test to be performed.
57: A test compares the data starting at a particular offset
1.12 ajacouto 58: in the file with a byte value, a string or a numeric value.
1.4 aaron 59: If the test succeeds, a message is printed.
1.1 deraadt 60: The line consists of the following fields:
1.19 ! jmc 61: .Bl -tag -width "message"
1.12 ajacouto 62: .It Dv offset
1.1 deraadt 63: A number specifying the offset, in bytes, into the file of the data
64: which is to be tested.
1.12 ajacouto 65: .It Dv type
1.4 aaron 66: The type of the data to be tested.
67: The possible values are:
1.19 ! jmc 68: .Bl -tag -width "lestring16"
1.12 ajacouto 69: .It Dv byte
1.1 deraadt 70: A one-byte value.
1.12 ajacouto 71: .It Dv short
72: A two-byte value in this machine's native byte order.
73: .It Dv long
74: A four-byte value in this machine's native byte order.
75: .It Dv quad
76: An eight-byte value in this machine's native byte order.
77: .It Dv float
78: A 32-bit single precision IEEE floating point number in this machine's native byte order.
79: .It Dv double
80: A 64-bit double precision IEEE floating point number in this machine's native byte order.
81: .It Dv string
1.1 deraadt 82: A string of bytes.
1.12 ajacouto 83: The string type specification can be optionally followed
84: by /[Bbc]*.
85: The
86: .Dq B
87: flag compacts whitespace in the target, which must
88: contain at least one whitespace character.
89: If the magic has
90: .Dv n
91: consecutive blanks, the target needs at least
92: .Dv n
93: consecutive blanks to match.
94: The
95: .Dq b
96: flag treats every blank in the target as an optional blank.
97: Finally the
98: .Dq c
99: flag, specifies case insensitive matching: lowercase
100: characters in the magic match both lower and upper case characters in the
101: target, whereas upper case characters in the magic only match uppercase
102: characters in the target.
103: .It Dv pstring
104: A Pascal-style string where the first byte is interpreted as the an
105: unsigned length.
106: The string is not NUL terminated.
107: .It Dv date
108: A four-byte value interpreted as a UNIX date.
109: .It Dv qdate
1.13 jmc 110: An eight-byte value interpreted as a UNIX date.
1.12 ajacouto 111: .It Dv ldate
112: A four-byte value interpreted as a UNIX-style date, but interpreted as
113: local time rather than UTC.
114: .It Dv qldate
115: An eight-byte value interpreted as a UNIX-style date, but interpreted as
116: local time rather than UTC.
117: .It Dv beshort
118: A two-byte value in big-endian byte order.
119: .It Dv belong
120: A four-byte value in big-endian byte order.
121: .It Dv bequad
122: An eight-byte value in big-endian byte order.
123: .It Dv befloat
124: A 32-bit single precision IEEE floating point number in big-endian byte order.
125: .It Dv bedouble
126: A 64-bit double precision IEEE floating point number in big-endian byte order.
127: .It Dv bedate
128: A four-byte value in big-endian byte order,
129: interpreted as a Unix date.
130: .It Dv beqdate
131: An eight-byte value in big-endian byte order,
132: interpreted as a Unix date.
133: .It Dv beldate
134: A four-byte value in big-endian byte order,
135: interpreted as a UNIX-style date, but interpreted as local time rather
136: than UTC.
137: .It Dv beqldate
138: An eight-byte value in big-endian byte order,
139: interpreted as a UNIX-style date, but interpreted as local time rather
140: than UTC.
141: .It Dv bestring16
142: A two-byte unicode (UCS16) string in big-endian byte order.
143: .It Dv leshort
144: A two-byte value in little-endian byte order.
145: .It Dv lelong
146: A four-byte value in little-endian byte order.
147: .It Dv lequad
148: An eight-byte value in little-endian byte order.
149: .It Dv lefloat
150: A 32-bit single precision IEEE floating point number in little-endian byte order.
151: .It Dv ledouble
152: A 64-bit double precision IEEE floating point number in little-endian byte order.
153: .It Dv ledate
154: A four-byte value in little-endian byte order,
155: interpreted as a UNIX date.
156: .It Dv leqdate
157: An eight-byte value in little-endian byte order,
158: interpreted as a UNIX date.
159: .It Dv leldate
160: A four-byte value in little-endian byte order,
161: interpreted as a UNIX-style date, but interpreted as local time rather
162: than UTC.
163: .It Dv leqldate
164: An eight-byte value in little-endian byte order,
165: interpreted as a UNIX-style date, but interpreted as local time rather
166: than UTC.
167: .It Dv lestring16
168: A two-byte unicode (UCS16) string in little-endian byte order.
169: .It Dv melong
170: A four-byte value in middle-endian (PDP-11) byte order.
171: .It Dv medate
172: A four-byte value in middle-endian (PDP-11) byte order,
173: interpreted as a UNIX date.
174: .It Dv meldate
175: A four-byte value in middle-endian (PDP-11) byte order,
176: interpreted as a UNIX-style date, but interpreted as local time rather
177: than UTC.
178: .It Dv regex
179: A regular expression match in extended POSIX regular expression syntax
180: (like egrep).
181: Regular expressions can take exponential time to process,
182: and their performance is hard to predict, so their use is discouraged.
183: When used in production environments,
184: their performance should be carefully checked.
185: The type specification can be optionally followed by
186: .Dv /[c][s] .
187: The
188: .Dq c
189: flag makes the match case insensitive, while the
190: .Dq s
191: flag update the offset to the start offset of the match, rather than the end.
192: The regular expression is tested against line
193: .Dv N + 1
194: onwards, where
195: .Dv N
196: is the given offset.
197: Line endings are assumed to be in the machine's native format.
198: .Dv ^
199: and
200: .Dv $
201: match the beginning and end of individual lines, respectively,
202: not beginning and end of file.
203: .It Dv search
204: A literal string search starting at the given offset.
205: The same modifier flags can be used as for string patterns.
206: The modifier flags (if any) must be followed by
207: .Dv /number
208: the range, that is, the number of positions at which the match will be
209: attempted, starting from the start offset.
210: This is suitable for searching larger binary expressions
211: with variable offsets, using
212: .Dv \e
213: escapes for special characters.
214: The offset works as for regex.
215: .It Dv default
216: This is intended to be used with the test
217: .Em x
218: (which is always true) and a message that is to be used if there are
219: no other matches.
1.18 jmc 220: .It Dv clear
221: This test is always true and clears the match flag for that level.
222: It is intended to be used with the default test.
223: .It Dv name
224: Define a named magic instance that can be called from another
225: .Dv use
226: magic entry, like a subroutine call.
227: Named instance direct magic offsets are relative to the offset of the
228: previous matched entry, but indirect offsets are relative to the
229: beginning of the file as usual.
230: Named magic entries always match.
231: .It Dv use
232: Recursively call the named magic starting from the current offset.
233: If the name of the referenced instance begins with a
234: .Dv ^
235: then the endianness of the magic is switched; if the magic mentioned
236: .Dv leshort
237: for example,
238: it is treated as
239: .Dv beshort
240: and vice versa.
241: This is useful to avoid duplicating the rules for different endianness.
1.4 aaron 242: .El
1.12 ajacouto 243: .Pp
244: Each top-level magic pattern (see below for an explanation of levels)
245: is classified as text or binary according to the types used.
246: Types
247: .Dq regex
248: and
249: .Dq search
250: are classified as text tests, unless non-printable characters are used
251: in the pattern.
252: All other tests are classified as binary.
253: A top-level pattern is considered to be a test text
254: when all its patterns are text
255: patterns; otherwise, it is considered to be a binary pattern.
256: When matching a file, binary patterns are tried first; if no match is
257: found, and the file looks like text, then its encoding is determined
258: and the text patterns are tried.
1.4 aaron 259: .Pp
1.1 deraadt 260: The numeric types may optionally be followed by
1.12 ajacouto 261: .Dv &
1.1 deraadt 262: and a numeric value,
263: to specify that the value is to be AND'ed with the
1.4 aaron 264: numeric value before any comparisons are done.
265: Prepending a
1.12 ajacouto 266: .Dv u
1.1 deraadt 267: to the type indicates that ordered comparisons should be unsigned.
1.12 ajacouto 268: .It Dv test
1.4 aaron 269: The value to be compared with the value from the file.
270: If the type is
1.1 deraadt 271: numeric, this value
272: is specified in C form; if it is a string, it is specified as a C string
1.12 ajacouto 273: with the usual escapes permitted (e.g. \en for new-line).
274: .Pp
1.1 deraadt 275: Numeric values
276: may be preceded by a character indicating the operation to be performed.
277: It may be
1.12 ajacouto 278: .Dv = ,
1.1 deraadt 279: to specify that the value from the file must equal the specified value,
1.12 ajacouto 280: .Dv \*(Lt ,
1.1 deraadt 281: to specify that the value from the file must be less than the specified
282: value,
1.12 ajacouto 283: .Dv \*(Gt ,
1.1 deraadt 284: to specify that the value from the file must be greater than the specified
285: value,
1.12 ajacouto 286: .Dv & ,
1.6 aaron 287: to specify that the value from the file must have set all of the bits
1.1 deraadt 288: that are set in the specified value,
1.12 ajacouto 289: .Dv ^ ,
1.6 aaron 290: to specify that the value from the file must have clear any of the bits
1.1 deraadt 291: that are set in the specified value, or
1.12 ajacouto 292: .Dv ~ ,
293: the value specified after is negated before tested.
294: .Dv x ,
1.4 aaron 295: to specify that any value will match.
1.12 ajacouto 296: If the character is omitted, it is assumed to be
297: .Dv = .
298: Operators
299: .Dv & ,
300: .Dv ^ ,
301: and
302: .Dv ~
303: don't work with floats and doubles.
304: The operator
305: .Dv !\&
306: specifies that the line matches if the test does
307: .Em not
308: succeed.
309: .Pp
310: Numeric values are specified in C form; e.g.
311: .Dv 13
1.1 deraadt 312: is decimal,
1.12 ajacouto 313: .Dv 013
1.1 deraadt 314: is octal, and
1.12 ajacouto 315: .Dv 0x13
1.1 deraadt 316: is hexadecimal.
1.12 ajacouto 317: .Pp
318: For string values, the string from the
319: file must match the specified string.
1.1 deraadt 320: The operators
1.12 ajacouto 321: .Dv = ,
322: .Dv \*(Lt
1.1 deraadt 323: and
1.12 ajacouto 324: .Dv \*(Gt
1.1 deraadt 325: (but not
1.12 ajacouto 326: .Dv & )
1.1 deraadt 327: can be applied to strings.
328: The length used for matching is that of the string argument
1.4 aaron 329: in the magic file.
1.12 ajacouto 330: This means that a line can match any non-empty string (usually used to
331: then print the string), with
332: .Em \*(Gt\e0
333: (because all non-empty strings are greater than the empty string).
334: .Pp
335: The special test
336: .Em x
337: always evaluates to true.
1.16 czarkoff 338: .It Dv message
1.4 aaron 339: The message to be printed if the comparison succeeds.
1.12 ajacouto 340: If the string contains a
1.4 aaron 341: .Xr printf 3
1.1 deraadt 342: format specification, the value from the file (with any specified masking
343: performed) is printed using the message as the format string.
1.12 ajacouto 344: If the string begins with
345: .Dq \eb ,
346: the message printed is the remainder of the string with no whitespace
347: added before it: multiple matches are normally separated by a single
348: space.
1.4 aaron 349: .El
350: .Pp
1.12 ajacouto 351: A MIME type is given on a separate line, which must be the next
352: non-blank or comment line after the magic line that identifies the
353: file type, and has the following format:
354: .Bd -literal -offset indent
355: !:mime MIMETYPE
356: .Ed
357: .Pp
358: i.e. the literal string
359: .Dq !:mime
360: followed by the MIME type.
361: .Pp
1.1 deraadt 362: Some file formats contain additional information which is to be printed
1.12 ajacouto 363: along with the file type or need additional tests to determine the true
364: file type.
365: These additional tests are introduced by one or more
366: .Em \*(Gt
367: characters preceding the offset.
1.4 aaron 368: The number of
1.12 ajacouto 369: .Em \*(Gt
1.1 deraadt 370: on the line indicates the level of the test; a line with no
1.12 ajacouto 371: .Em \*(Gt
1.1 deraadt 372: at the beginning is considered to be at level 0.
1.12 ajacouto 373: Tests are arranged in a tree-like hierarchy:
1.17 jmc 374: If a test on a line at level
1.4 aaron 375: .Em n
1.12 ajacouto 376: succeeds, all following tests at level
1.4 aaron 377: .Em n+1
1.15 czarkoff 378: are performed, and the messages printed if the tests succeed, until a line
1.12 ajacouto 379: with level
1.4 aaron 380: .Em n
1.12 ajacouto 381: (or less) appears.
382: For more complex files, one can use empty messages to get just the
383: "if/then" effect, in the following way:
384: .Bd -literal -offset indent
385: 0 string MZ
386: \*(Gt0x18 leshort \*(Lt0x40 MS-DOS executable
387: \*(Gt0x18 leshort \*(Gt0x3f extended PC executable (e.g., MS Windows)
388: .Ed
1.4 aaron 389: .Pp
1.12 ajacouto 390: Offsets do not need to be constant, but can also be read from the file
391: being examined.
1.1 deraadt 392: If the first character following the last
1.12 ajacouto 393: .Em \*(Gt
1.1 deraadt 394: is a
1.14 jmc 395: .Em \&(
1.1 deraadt 396: then the string after the parenthesis is interpreted as an indirect offset.
397: That means that the number after the parenthesis is used as an offset in
1.4 aaron 398: the file.
399: The value at that offset is read, and is used again as an offset
400: in the file.
401: Indirect offsets are of the form:
1.12 ajacouto 402: .Em (( x [.[bslBSL]][+\-][ y ]) .
1.6 aaron 403: The value of
1.12 ajacouto 404: .Em x
1.4 aaron 405: is used as an offset in the file.
1.12 ajacouto 406: A byte, short or long is read at that offset depending on the
407: .Op bslBSLm
1.4 aaron 408: type specifier.
1.12 ajacouto 409: The capitalized types interpret the number as a big endian
410: value, whereas the small letter versions interpret the number as a little
411: endian value;
412: the
413: .Em m
414: type interprets the number as a middle endian (PDP-11) value.
1.4 aaron 415: To that number the value of
1.12 ajacouto 416: .Em y
1.4 aaron 417: is added and the result is used as an offset in the file.
1.12 ajacouto 418: The default type if one is not specified is long.
419: .Pp
420: That way variable length structures can be examined:
421: .Bd -literal -offset indent
422: # MS Windows executables are also valid MS-DOS executables
423: 0 string MZ
424: \*(Gt0x18 leshort \*(Lt0x40 MZ executable (MS-DOS)
425: # skip the whole block below if it is not an extended executable
426: \*(Gt0x18 leshort \*(Gt0x3f
427: \*(Gt\*(Gt(0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
428: \*(Gt\*(Gt(0x3c.l) string LX\e0\e0 LX executable (OS/2)
429: .Ed
430: .Pp
431: This strategy of examining has a drawback: You must make sure that
432: you eventually print something, or users may get empty output (like, when
433: there is neither PE\e0\e0 nor LE\e0\e0 in the above example)
1.4 aaron 434: .Pp
1.12 ajacouto 435: If this indirect offset cannot be used directly, simple calculations are
436: possible: appending
437: .Em [+-*/%&|^]number
438: inside parentheses allows one to modify
439: the value read from the file before it is used as an offset:
440: .Bd -literal -offset indent
441: # MS Windows executables are also valid MS-DOS executables
442: 0 string MZ
443: # sometimes, the value at 0x18 is less that 0x40 but there's still an
444: # extended executable, simply appended to the file
445: \*(Gt0x18 leshort \*(Lt0x40
446: \*(Gt\*(Gt(4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP)
447: \*(Gt\*(Gt(4.s*512) leshort !0x014c MZ executable (MS-DOS)
448: .Ed
449: .Pp
450: Sometimes you do not know the exact offset as this depends on the length or
451: position (when indirection was used before) of preceding fields.
452: You can specify an offset relative to the end of the last up-level
453: field using
454: .Sq &
455: as a prefix to the offset:
456: .Bd -literal -offset indent
457: 0 string MZ
458: \*(Gt0x18 leshort \*(Gt0x3f
459: \*(Gt\*(Gt(0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
460: # immediately following the PE signature is the CPU type
461: \*(Gt\*(Gt\*(Gt&0 leshort 0x14c for Intel 80386
462: \*(Gt\*(Gt\*(Gt&0 leshort 0x184 for DEC Alpha
463: .Ed
464: .Pp
465: Indirect and relative offsets can be combined:
466: .Bd -literal -offset indent
467: 0 string MZ
468: \*(Gt0x18 leshort \*(Lt0x40
469: \*(Gt\*(Gt(4.s*512) leshort !0x014c MZ executable (MS-DOS)
470: # if it's not COFF, go back 512 bytes and add the offset taken
471: # from byte 2/3, which is yet another way of finding the start
472: # of the extended executable
473: \*(Gt\*(Gt\*(Gt&(2.s-514) string LE LE executable (MS Windows VxD driver)
474: .Ed
475: .Pp
476: Or the other way around:
477: .Bd -literal -offset indent
478: 0 string MZ
479: \*(Gt0x18 leshort \*(Gt0x3f
480: \*(Gt\*(Gt(0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
481: # at offset 0x80 (-4, since relative offsets start at the end
482: # of the up-level match) inside the LE header, we find the absolute
483: # offset to the code area, where we look for a specific signature
484: \*(Gt\*(Gt\*(Gt(&0x7c.l+0x26) string UPX \eb, UPX compressed
485: .Ed
486: .Pp
487: Or even both!
488: .Bd -literal -offset indent
489: 0 string MZ
490: \*(Gt0x18 leshort \*(Gt0x3f
491: \*(Gt\*(Gt(0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
492: # at offset 0x58 inside the LE header, we find the relative offset
493: # to a data area where we look for a specific signature
494: \*(Gt\*(Gt\*(Gt&(&0x54.l-3) string UNACE \eb, ACE self-extracting archive
495: .Ed
496: .Pp
497: Finally, if you have to deal with offset/length pairs in your file, even the
498: second value in a parenthesized expression can be taken from the file itself,
499: using another set of parentheses.
500: Note that this additional indirect offset is always relative to the
501: start of the main indirect offset.
502: .Bd -literal -offset indent
503: 0 string MZ
504: \*(Gt0x18 leshort \*(Gt0x3f
505: \*(Gt\*(Gt(0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
506: # search for the PE section called ".idata"...
507: \*(Gt\*(Gt\*(Gt&0xf4 search/0x140 .idata
508: # ...and go to the end of it, calculated from start+length;
509: # these are located 14 and 10 bytes after the section name
510: \*(Gt\*(Gt\*(Gt\*(Gt(&0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive
511: .Ed
1.9 jmc 512: .Sh SEE ALSO
513: .Xr file 1
1.12 ajacouto 514: \- the command that reads this file.
1.4 aaron 515: .Sh BUGS
1.6 aaron 516: The formats
1.12 ajacouto 517: .Dv long ,
518: .Dv belong ,
519: .Dv lelong ,
520: .Dv melong ,
521: .Dv short ,
522: .Dv beshort ,
523: .Dv leshort ,
524: .Dv date ,
525: .Dv bedate ,
526: .Dv medate ,
527: .Dv ledate ,
528: .Dv beldate ,
529: .Dv leldate ,
1.1 deraadt 530: and
1.12 ajacouto 531: .Dv meldate
1.1 deraadt 532: are system-dependent; perhaps they should be specified as a number
1.6 aaron 533: of bytes (2B, 4B, etc),
1.1 deraadt 534: since the files being recognized typically come from
535: a system on which the lengths are invariant.
536: .\"
537: .\" From: guy@sun.uucp (Guy Harris)
538: .\" Newsgroups: net.bugs.usg
539: .\" Subject: /etc/magic's format isn't well documented
540: .\" Message-ID: <2752@sun.uucp>
541: .\" Date: 3 Sep 85 08:19:07 GMT
542: .\" Organization: Sun Microsystems, Inc.
543: .\" Lines: 136
1.6 aaron 544: .\"
1.1 deraadt 545: .\" Here's a manual page for the format accepted by the "file" made by adding
546: .\" the changes I posted to the S5R2 version.
547: .\"
548: .\" Modified for Ian Darwin's version of the file command.