version 1.11, 2007/05/31 19:20:10 |
version 1.12, 2009/10/26 21:03:03 |
|
|
.Dd $Mdocdate$ |
.Dd $Mdocdate$ |
.Dt MAGIC 5 |
.Dt MAGIC 5 |
.Os |
.Os |
|
.\" install as magic.4 on USG, magic.5 on V7, Berkeley and Linux systems. |
.Sh NAME |
.Sh NAME |
.Nm magic |
.Nm magic |
.Nd file command's magic number file |
.Nd file command's magic pattern file |
.Sh DESCRIPTION |
.Sh DESCRIPTION |
This manual page documents the format of the magic file as |
This manual page documents the format of the magic file as |
used by the |
used by the |
.Xr file 1 |
.Xr file 1 |
command, version 3.22. |
command, version 4.24. |
The |
The |
.Nm file |
.Xr file 1 |
command identifies the type of a file using, |
command identifies the type of a file using, |
among other tests, |
among other tests, |
a test for whether the file begins with a certain |
a test for whether the file contains certain |
.Dq magic number . |
.Dq "magic patterns" . |
.Pp |
|
The file |
The file |
.Pa /etc/magic |
.Pa /etc/magic |
specifies what magic numbers are to be tested for, |
specifies what magic numbers are to be tested for, |
|
|
.Pp |
.Pp |
Each line of the file specifies a test to be performed. |
Each line of the file specifies a test to be performed. |
A test compares the data starting at a particular offset |
A test compares the data starting at a particular offset |
in the file with a 1-byte, 2-byte, or 4-byte numeric value or |
in the file with a byte value, a string or a numeric value. |
a string. |
|
If the test succeeds, a message is printed. |
If the test succeeds, a message is printed. |
The line consists of the following fields: |
The line consists of the following fields: |
.Bl -tag -width indent |
.Bl -tag -width ".Dv message" |
.It Sy offset |
.It Dv offset |
A number specifying the offset, in bytes, into the file of the data |
A number specifying the offset, in bytes, into the file of the data |
which is to be tested. |
which is to be tested. |
.It Sy type |
.It Dv type |
The type of the data to be tested. |
The type of the data to be tested. |
The possible values are: |
The possible values are: |
.Bl -tag -width beshort |
.Bl -tag -width ".Dv lestring16" |
.It Sy byte |
.It Dv byte |
A one-byte value. |
A one-byte value. |
.It Sy short |
.It Dv short |
A two-byte value (on most systems) in this machine's native byte order. |
A two-byte value in this machine's native byte order. |
.It Sy long |
.It Dv long |
A four-byte value (on most systems) in this machine's native byte order. |
A four-byte value in this machine's native byte order. |
.It Sy string |
.It Dv quad |
|
An eight-byte value in this machine's native byte order. |
|
.It Dv float |
|
A 32-bit single precision IEEE floating point number in this machine's native byte order. |
|
.It Dv double |
|
A 64-bit double precision IEEE floating point number in this machine's native byte order. |
|
.It Dv string |
A string of bytes. |
A string of bytes. |
.It Sy date |
The string type specification can be optionally followed |
A four-byte value interpreted as a |
by /[Bbc]*. |
.Ux |
The |
date. |
.Dq B |
.It Sy beshort |
flag compacts whitespace in the target, which must |
A two-byte value (on most systems) in big-endian byte order. |
contain at least one whitespace character. |
.It Sy belong |
If the magic has |
A four-byte value (on most systems) in big-endian byte order. |
.Dv n |
.It Sy bedate |
consecutive blanks, the target needs at least |
A four-byte value (on most systems) in big-endian byte order, |
.Dv n |
interpreted as a |
consecutive blanks to match. |
.Ux |
The |
date. |
.Dq b |
.It Sy leshort |
flag treats every blank in the target as an optional blank. |
A two-byte value (on most systems) in little-endian byte order. |
Finally the |
.It Sy lelong |
.Dq c |
A four-byte value (on most systems) in little-endian byte order. |
flag, specifies case insensitive matching: lowercase |
.It Sy ledate |
characters in the magic match both lower and upper case characters in the |
A four-byte value (on most systems) in little-endian byte order, |
target, whereas upper case characters in the magic only match uppercase |
interpreted as a |
characters in the target. |
.Ux |
.It Dv pstring |
date. |
A Pascal-style string where the first byte is interpreted as the an |
|
unsigned length. |
|
The string is not NUL terminated. |
|
.It Dv date |
|
A four-byte value interpreted as a UNIX date. |
|
.It Dv qdate |
|
A eight-byte value interpreted as a UNIX date. |
|
.It Dv ldate |
|
A four-byte value interpreted as a UNIX-style date, but interpreted as |
|
local time rather than UTC. |
|
.It Dv qldate |
|
An eight-byte value interpreted as a UNIX-style date, but interpreted as |
|
local time rather than UTC. |
|
.It Dv beshort |
|
A two-byte value in big-endian byte order. |
|
.It Dv belong |
|
A four-byte value in big-endian byte order. |
|
.It Dv bequad |
|
An eight-byte value in big-endian byte order. |
|
.It Dv befloat |
|
A 32-bit single precision IEEE floating point number in big-endian byte order. |
|
.It Dv bedouble |
|
A 64-bit double precision IEEE floating point number in big-endian byte order. |
|
.It Dv bedate |
|
A four-byte value in big-endian byte order, |
|
interpreted as a Unix date. |
|
.It Dv beqdate |
|
An eight-byte value in big-endian byte order, |
|
interpreted as a Unix date. |
|
.It Dv beldate |
|
A four-byte value in big-endian byte order, |
|
interpreted as a UNIX-style date, but interpreted as local time rather |
|
than UTC. |
|
.It Dv beqldate |
|
An eight-byte value in big-endian byte order, |
|
interpreted as a UNIX-style date, but interpreted as local time rather |
|
than UTC. |
|
.It Dv bestring16 |
|
A two-byte unicode (UCS16) string in big-endian byte order. |
|
.It Dv leshort |
|
A two-byte value in little-endian byte order. |
|
.It Dv lelong |
|
A four-byte value in little-endian byte order. |
|
.It Dv lequad |
|
An eight-byte value in little-endian byte order. |
|
.It Dv lefloat |
|
A 32-bit single precision IEEE floating point number in little-endian byte order. |
|
.It Dv ledouble |
|
A 64-bit double precision IEEE floating point number in little-endian byte order. |
|
.It Dv ledate |
|
A four-byte value in little-endian byte order, |
|
interpreted as a UNIX date. |
|
.It Dv leqdate |
|
An eight-byte value in little-endian byte order, |
|
interpreted as a UNIX date. |
|
.It Dv leldate |
|
A four-byte value in little-endian byte order, |
|
interpreted as a UNIX-style date, but interpreted as local time rather |
|
than UTC. |
|
.It Dv leqldate |
|
An eight-byte value in little-endian byte order, |
|
interpreted as a UNIX-style date, but interpreted as local time rather |
|
than UTC. |
|
.It Dv lestring16 |
|
A two-byte unicode (UCS16) string in little-endian byte order. |
|
.It Dv melong |
|
A four-byte value in middle-endian (PDP-11) byte order. |
|
.It Dv medate |
|
A four-byte value in middle-endian (PDP-11) byte order, |
|
interpreted as a UNIX date. |
|
.It Dv meldate |
|
A four-byte value in middle-endian (PDP-11) byte order, |
|
interpreted as a UNIX-style date, but interpreted as local time rather |
|
than UTC. |
|
.It Dv regex |
|
A regular expression match in extended POSIX regular expression syntax |
|
(like egrep). |
|
Regular expressions can take exponential time to process, |
|
and their performance is hard to predict, so their use is discouraged. |
|
When used in production environments, |
|
their performance should be carefully checked. |
|
The type specification can be optionally followed by |
|
.Dv /[c][s] . |
|
The |
|
.Dq c |
|
flag makes the match case insensitive, while the |
|
.Dq s |
|
flag update the offset to the start offset of the match, rather than the end. |
|
The regular expression is tested against line |
|
.Dv N + 1 |
|
onwards, where |
|
.Dv N |
|
is the given offset. |
|
Line endings are assumed to be in the machine's native format. |
|
.Dv ^ |
|
and |
|
.Dv $ |
|
match the beginning and end of individual lines, respectively, |
|
not beginning and end of file. |
|
.It Dv search |
|
A literal string search starting at the given offset. |
|
The same modifier flags can be used as for string patterns. |
|
The modifier flags (if any) must be followed by |
|
.Dv /number |
|
the range, that is, the number of positions at which the match will be |
|
attempted, starting from the start offset. |
|
This is suitable for searching larger binary expressions |
|
with variable offsets, using |
|
.Dv \e |
|
escapes for special characters. |
|
The offset works as for regex. |
|
.It Dv default |
|
This is intended to be used with the test |
|
.Em x |
|
(which is always true) and a message that is to be used if there are |
|
no other matches. |
.El |
.El |
.El |
|
.Pp |
.Pp |
|
Each top-level magic pattern (see below for an explanation of levels) |
|
is classified as text or binary according to the types used. |
|
Types |
|
.Dq regex |
|
and |
|
.Dq search |
|
are classified as text tests, unless non-printable characters are used |
|
in the pattern. |
|
All other tests are classified as binary. |
|
A top-level pattern is considered to be a test text |
|
when all its patterns are text |
|
patterns; otherwise, it is considered to be a binary pattern. |
|
When matching a file, binary patterns are tried first; if no match is |
|
found, and the file looks like text, then its encoding is determined |
|
and the text patterns are tried. |
|
.Pp |
The numeric types may optionally be followed by |
The numeric types may optionally be followed by |
.Ql & |
.Dv & |
and a numeric value, |
and a numeric value, |
to specify that the value is to be AND'ed with the |
to specify that the value is to be AND'ed with the |
numeric value before any comparisons are done. |
numeric value before any comparisons are done. |
Prepending a |
Prepending a |
.Sq u |
.Dv u |
to the type indicates that ordered comparisons should be unsigned. |
to the type indicates that ordered comparisons should be unsigned. |
.Bl -tag -width indent |
.It Dv test |
.It Sy test |
|
The value to be compared with the value from the file. |
The value to be compared with the value from the file. |
If the type is |
If the type is |
numeric, this value |
numeric, this value |
is specified in C form; if it is a string, it is specified as a C string |
is specified in C form; if it is a string, it is specified as a C string |
with the usual escapes permitted (e.g., |
with the usual escapes permitted (e.g. \en for new-line). |
.Ql \en |
.Pp |
for newline). |
|
.It Sy "" |
|
Numeric values |
Numeric values |
may be preceded by a character indicating the operation to be performed. |
may be preceded by a character indicating the operation to be performed. |
It may be |
It may be |
.Ql = |
.Dv = , |
to specify that the value from the file must equal the specified value, |
to specify that the value from the file must equal the specified value, |
.Ql < |
.Dv \*(Lt , |
to specify that the value from the file must be less than the specified |
to specify that the value from the file must be less than the specified |
value, |
value, |
.Ql > |
.Dv \*(Gt , |
to specify that the value from the file must be greater than the specified |
to specify that the value from the file must be greater than the specified |
value, |
value, |
.Ql & |
.Dv & , |
to specify that the value from the file must have set all of the bits |
to specify that the value from the file must have set all of the bits |
that are set in the specified value, |
that are set in the specified value, |
.Ql ^ |
.Dv ^ , |
to specify that the value from the file must have clear any of the bits |
to specify that the value from the file must have clear any of the bits |
that are set in the specified value, or |
that are set in the specified value, or |
.Sq x |
.Dv ~ , |
|
the value specified after is negated before tested. |
|
.Dv x , |
to specify that any value will match. |
to specify that any value will match. |
If the character is omitted, |
If the character is omitted, it is assumed to be |
it is assumed to be |
.Dv = . |
.Ql = . |
Operators |
.It Sy "" |
.Dv & , |
Numeric values are specified in C form; e.g., |
.Dv ^ , |
.Dq 13 |
and |
|
.Dv ~ |
|
don't work with floats and doubles. |
|
The operator |
|
.Dv !\& |
|
specifies that the line matches if the test does |
|
.Em not |
|
succeed. |
|
.Pp |
|
Numeric values are specified in C form; e.g. |
|
.Dv 13 |
is decimal, |
is decimal, |
.Dq 013 |
.Dv 013 |
is octal, and |
is octal, and |
.Dq 0x13 |
.Dv 0x13 |
is hexadecimal. |
is hexadecimal. |
.It Sy "" |
.Pp |
For string values, the byte string from the |
For string values, the string from the |
file must match the specified byte string. |
file must match the specified string. |
The operators |
The operators |
.Ql = , |
.Dv = , |
.Ql < , |
.Dv \*(Lt |
and |
and |
.Ql > |
.Dv \*(Gt |
(but not |
(but not |
.Ql & ) |
.Dv & ) |
can be applied to strings. |
can be applied to strings. |
The length used for matching is that of the string argument |
The length used for matching is that of the string argument |
in the magic file. |
in the magic file. |
This means that a line can match any string, and |
This means that a line can match any non-empty string (usually used to |
then presumably print that string, by doing |
then print the string), with |
.Ql >\e0 |
.Em \*(Gt\e0 |
(because all strings are greater than the null string). |
(because all non-empty strings are greater than the empty string). |
.It Sy message |
.Pp |
|
The special test |
|
.Em x |
|
always evaluates to true. |
|
.Dv message |
The message to be printed if the comparison succeeds. |
The message to be printed if the comparison succeeds. |
If the string |
If the string contains a |
contains a |
|
.Xr printf 3 |
.Xr printf 3 |
format specification, the value from the file (with any specified masking |
format specification, the value from the file (with any specified masking |
performed) is printed using the message as the format string. |
performed) is printed using the message as the format string. |
|
If the string begins with |
|
.Dq \eb , |
|
the message printed is the remainder of the string with no whitespace |
|
added before it: multiple matches are normally separated by a single |
|
space. |
.El |
.El |
.Pp |
.Pp |
|
A MIME type is given on a separate line, which must be the next |
|
non-blank or comment line after the magic line that identifies the |
|
file type, and has the following format: |
|
.Bd -literal -offset indent |
|
!:mime MIMETYPE |
|
.Ed |
|
.Pp |
|
i.e. the literal string |
|
.Dq !:mime |
|
followed by the MIME type. |
|
.Pp |
Some file formats contain additional information which is to be printed |
Some file formats contain additional information which is to be printed |
along with the file type. |
along with the file type or need additional tests to determine the true |
A line which begins with the character |
file type. |
.Ql > |
These additional tests are introduced by one or more |
indicates additional tests and messages to be printed. |
.Em \*(Gt |
|
characters preceding the offset. |
The number of |
The number of |
.Ql > |
.Em \*(Gt |
on the line indicates the level of the test; a line with no |
on the line indicates the level of the test; a line with no |
.Ql > |
.Em \*(Gt |
at the beginning is considered to be at level 0. |
at the beginning is considered to be at level 0. |
.Pp |
Tests are arranged in a tree-like hierarchy: |
Each line at level |
If a the test on a line at level |
.Em n+1 |
|
is under the control of the line at level |
|
.Em n |
.Em n |
most closely preceding it in the magic file. |
succeeds, all following tests at level |
If the test on a line at level |
|
.Em n |
|
succeeds, the tests specified in all the subsequent lines at level |
|
.Em n+1 |
.Em n+1 |
are performed, and the messages printed if the tests succeed. |
are performed, and the messages printed if the tests succeed, untile a line |
The next |
with level |
line at level |
|
.Em n |
.Em n |
terminates this. |
(or less) appears. |
|
For more complex files, one can use empty messages to get just the |
|
"if/then" effect, in the following way: |
|
.Bd -literal -offset indent |
|
0 string MZ |
|
\*(Gt0x18 leshort \*(Lt0x40 MS-DOS executable |
|
\*(Gt0x18 leshort \*(Gt0x3f extended PC executable (e.g., MS Windows) |
|
.Ed |
.Pp |
.Pp |
|
Offsets do not need to be constant, but can also be read from the file |
|
being examined. |
If the first character following the last |
If the first character following the last |
.Ql > |
.Em \*(Gt |
is a |
is a |
.Ql ( |
.Em ( |
then the string after the parenthesis is interpreted as an indirect offset. |
then the string after the parenthesis is interpreted as an indirect offset. |
That means that the number after the parenthesis is used as an offset in |
That means that the number after the parenthesis is used as an offset in |
the file. |
the file. |
The value at that offset is read, and is used again as an offset |
The value at that offset is read, and is used again as an offset |
in the file. |
in the file. |
.Pp |
|
Indirect offsets are of the form: |
Indirect offsets are of the form: |
.Dq (x[.[bsl]][+-][y]) . |
.Em (( x [.[bslBSL]][+\-][ y ]) . |
The value of |
The value of |
.Sq x |
.Em x |
is used as an offset in the file. |
is used as an offset in the file. |
A byte, short or long is read at that offset |
A byte, short or long is read at that offset depending on the |
depending on the |
.Op bslBSLm |
.Dq [bsl] |
|
type specifier. |
type specifier. |
|
The capitalized types interpret the number as a big endian |
|
value, whereas the small letter versions interpret the number as a little |
|
endian value; |
|
the |
|
.Em m |
|
type interprets the number as a middle endian (PDP-11) value. |
To that number the value of |
To that number the value of |
.Sq y |
.Em y |
is added and the result is used as an offset in the file. |
is added and the result is used as an offset in the file. |
The default type |
The default type if one is not specified is long. |
if one is not specified is long. |
|
.Pp |
.Pp |
Sometimes you do not know the exact offset as this depends on the length of |
That way variable length structures can be examined: |
preceding fields. |
.Bd -literal -offset indent |
You can specify an offset relative to the end of the |
# MS Windows executables are also valid MS-DOS executables |
last uplevel field (of course this may only be done for sublevel tests, i.e., |
0 string MZ |
test beginning with |
\*(Gt0x18 leshort \*(Lt0x40 MZ executable (MS-DOS) |
.Ql > ) . |
# skip the whole block below if it is not an extended executable |
Such a relative offset is specified using |
\*(Gt0x18 leshort \*(Gt0x3f |
.Ql & |
\*(Gt\*(Gt(0x3c.l) string PE\e0\e0 PE executable (MS-Windows) |
as a prefix to the offset. |
\*(Gt\*(Gt(0x3c.l) string LX\e0\e0 LX executable (OS/2) |
.Sh FILES |
.Ed |
.Bl -tag -width /etc/magic |
.Pp |
.It Pa /etc/magic |
This strategy of examining has a drawback: You must make sure that |
.El |
you eventually print something, or users may get empty output (like, when |
|
there is neither PE\e0\e0 nor LE\e0\e0 in the above example) |
|
.Pp |
|
If this indirect offset cannot be used directly, simple calculations are |
|
possible: appending |
|
.Em [+-*/%&|^]number |
|
inside parentheses allows one to modify |
|
the value read from the file before it is used as an offset: |
|
.Bd -literal -offset indent |
|
# MS Windows executables are also valid MS-DOS executables |
|
0 string MZ |
|
# sometimes, the value at 0x18 is less that 0x40 but there's still an |
|
# extended executable, simply appended to the file |
|
\*(Gt0x18 leshort \*(Lt0x40 |
|
\*(Gt\*(Gt(4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP) |
|
\*(Gt\*(Gt(4.s*512) leshort !0x014c MZ executable (MS-DOS) |
|
.Ed |
|
.Pp |
|
Sometimes you do not know the exact offset as this depends on the length or |
|
position (when indirection was used before) of preceding fields. |
|
You can specify an offset relative to the end of the last up-level |
|
field using |
|
.Sq & |
|
as a prefix to the offset: |
|
.Bd -literal -offset indent |
|
0 string MZ |
|
\*(Gt0x18 leshort \*(Gt0x3f |
|
\*(Gt\*(Gt(0x3c.l) string PE\e0\e0 PE executable (MS-Windows) |
|
# immediately following the PE signature is the CPU type |
|
\*(Gt\*(Gt\*(Gt&0 leshort 0x14c for Intel 80386 |
|
\*(Gt\*(Gt\*(Gt&0 leshort 0x184 for DEC Alpha |
|
.Ed |
|
.Pp |
|
Indirect and relative offsets can be combined: |
|
.Bd -literal -offset indent |
|
0 string MZ |
|
\*(Gt0x18 leshort \*(Lt0x40 |
|
\*(Gt\*(Gt(4.s*512) leshort !0x014c MZ executable (MS-DOS) |
|
# if it's not COFF, go back 512 bytes and add the offset taken |
|
# from byte 2/3, which is yet another way of finding the start |
|
# of the extended executable |
|
\*(Gt\*(Gt\*(Gt&(2.s-514) string LE LE executable (MS Windows VxD driver) |
|
.Ed |
|
.Pp |
|
Or the other way around: |
|
.Bd -literal -offset indent |
|
0 string MZ |
|
\*(Gt0x18 leshort \*(Gt0x3f |
|
\*(Gt\*(Gt(0x3c.l) string LE\e0\e0 LE executable (MS-Windows) |
|
# at offset 0x80 (-4, since relative offsets start at the end |
|
# of the up-level match) inside the LE header, we find the absolute |
|
# offset to the code area, where we look for a specific signature |
|
\*(Gt\*(Gt\*(Gt(&0x7c.l+0x26) string UPX \eb, UPX compressed |
|
.Ed |
|
.Pp |
|
Or even both! |
|
.Bd -literal -offset indent |
|
0 string MZ |
|
\*(Gt0x18 leshort \*(Gt0x3f |
|
\*(Gt\*(Gt(0x3c.l) string LE\e0\e0 LE executable (MS-Windows) |
|
# at offset 0x58 inside the LE header, we find the relative offset |
|
# to a data area where we look for a specific signature |
|
\*(Gt\*(Gt\*(Gt&(&0x54.l-3) string UNACE \eb, ACE self-extracting archive |
|
.Ed |
|
.Pp |
|
Finally, if you have to deal with offset/length pairs in your file, even the |
|
second value in a parenthesized expression can be taken from the file itself, |
|
using another set of parentheses. |
|
Note that this additional indirect offset is always relative to the |
|
start of the main indirect offset. |
|
.Bd -literal -offset indent |
|
0 string MZ |
|
\*(Gt0x18 leshort \*(Gt0x3f |
|
\*(Gt\*(Gt(0x3c.l) string PE\e0\e0 PE executable (MS-Windows) |
|
# search for the PE section called ".idata"... |
|
\*(Gt\*(Gt\*(Gt&0xf4 search/0x140 .idata |
|
# ...and go to the end of it, calculated from start+length; |
|
# these are located 14 and 10 bytes after the section name |
|
\*(Gt\*(Gt\*(Gt\*(Gt(&0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive |
|
.Ed |
.Sh SEE ALSO |
.Sh SEE ALSO |
.Xr file 1 |
.Xr file 1 |
|
\- the command that reads this file. |
.Sh BUGS |
.Sh BUGS |
The formats |
The formats |
.Li long , |
.Dv long , |
.Li belong , |
.Dv belong , |
.Li lelong , |
.Dv lelong , |
.Li short , |
.Dv melong , |
.Li beshort , |
.Dv short , |
.Li leshort , |
.Dv beshort , |
.Li date , |
.Dv leshort , |
.Li bedate , |
.Dv date , |
|
.Dv bedate , |
|
.Dv medate , |
|
.Dv ledate , |
|
.Dv beldate , |
|
.Dv leldate , |
and |
and |
.Li ledate |
.Dv meldate |
are system-dependent; perhaps they should be specified as a number |
are system-dependent; perhaps they should be specified as a number |
of bytes (2B, 4B, etc), |
of bytes (2B, 4B, etc), |
since the files being recognized typically come from |
since the files being recognized typically come from |
a system on which the lengths are invariant. |
a system on which the lengths are invariant. |
.Pp |
|
There is (currently) no support for specified-endian data to be used in |
|
indirect offsets. |
|
.\" |
.\" |
.\" From: guy@sun.uucp (Guy Harris) |
.\" From: guy@sun.uucp (Guy Harris) |
.\" Newsgroups: net.bugs.usg |
.\" Newsgroups: net.bugs.usg |