Annotation of src/usr.bin/rsync/rsync.5, Revision 1.2
1.1 benno 1: .\" $OpenBSD$
2: .\"
3: .\" Copyright (c) 2019 Kristaps Dzonsons <kristaps@bsd.lv>
4: .\"
5: .\" Permission to use, copy, modify, and distribute this software for any
6: .\" purpose with or without fee is hereby granted, provided that the above
7: .\" copyright notice and this permission notice appear in all copies.
8: .\"
9: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10: .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11: .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12: .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14: .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15: .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16: .\"
17: .Dd $Mdocdate$
18: .Dt RSYNC 5
19: .Os
20: .Sh NAME
21: .Nm rsync
22: .Nd rsync wire protocol
23: .Sh DESCRIPTION
24: The
25: .Nm
26: protocol described in this relates to the BSD-licensed
27: .Xr openrsync 1 ,
28: a re-implementation of the GPL-licensed reference utility
29: .Xr rsync 1 .
30: It is compatible with version 27 of the reference.
31: .Pp
32: In this document, the
33: .Qq client process
34: refers to the utility as run on the operator's local computer.
35: The
36: .Qq server process
37: is run either on the local or remote computer, depending upon the
38: command-line given file locations.
39: .Pp
40: There are a number of options in the protocol that are dictated by command-line
41: flags.
42: These will be noted as
43: .Fl n
44: for dry-run,
45: .Fl l
46: for links,
47: .Fl r
48: for recursion,
49: .Fl v
50: for verbose, and
51: .Fl -delete
52: for deletion (before).
53: .Ss Data types
54: The binary protocol encodes all data in little-endian format.
55: Integers are signed 32-bit, shorts are signed 16-bit, bytes are unsigned
56: 8-bit.
57: A long is variable-length.
58: For values less than the maximum integer, the value is transmitted and
59: read as a 32-bit integer.
60: For values greater, the value is transmitted first as a maximum integer,
61: then a 64-bit signed integer.
62: .Pp
63: There are three types of checksums: long (slow), short (fast), and
64: whole-file.
65: The fast checksum is a derivative of Adler-32.
66: The slow checksum is MD4,
67: made over the checksum seed first (serialised in little-endian format),
68: then the data.
69: The whole-file applies MD4 to the file first, then the checksum seed at
70: the end (also serialised in little-endian format).
71: .Ss Multiplexing
72: Most
73: .Nm
74: transmissions are wrapped in a multiplexing envelope protocol.
75: It is composed as follows:
76: .Pp
77: .Bl -enum -compact
78: .It
79: envelope header (4 bytes)
80: .It
81: envelope payload (arbitrary length)
82: .El
83: .Pp
84: The first byte of the envelope header consists of a tag.
85: If the tag is 7, the payload is normal data.
86: Otherwise, the payload is out-of-band server messages.
87: If the tag is 1, it is an error on the sender's part and must trigger an
88: exit.
89: This limits message payloads to 24 bit integer size,
90: .Li 0x0fffffff .
91: .Pp
92: The only data not using this envelope are the initial handshake between
93: client and server.
94: .Ss File list
95: A central part of the protocol is the file list, which is generated by
96: the sender.
97: It consists of all files that must be sent to the receiver, either
98: explicitly as given or recursively generated.
99: .Pp
100: The file list itself consists of filenames and attributes (mode, time,
101: size, etc.).
102: Filenames must be relative to the destination root and not be absolute
103: or contain backtracking.
104: So if a file is given to the sender as
105: .Pa ../../foo/bar ,
106: it must be sent as
107: .Pa foo/bar .
108: .Pp
109: The file list should be cleaned of inappropriate files prior to sending.
110: For example, if
111: .Fl l
112: is not specified, symbolic links may be omitted.
113: Directory entries without
114: .Fl r
115: may also be omitted.
116: Duplicates may be omitted.
117: .Pp
118: The receiver
119: .Em must not
120: assume that the file list is clean.
121: It should not omit inappropriate files from the file list (which would
122: affect the indexing), but may omit them during processing.
123: .Pp
124: Prior to be sent from sender to receiver, and upon being received, the
125: file list must be lexicographically sorted such as with
126: .Xr strcmp 3 .
127: Subsequent references to the file are by index in the sorted list.
128: .Ss Client process
129: The client can operate in sender or receiver mode depending upon the
130: command-line source and destination.
131: .Pp
132: If the destination directory (sink) is remote, the client is in sender
133: mode: the client will push its data to the server.
134: If the source file is remote, it is in receiver mode: the server pushes
135: to the client.
136: If neither are remote, the client operates in sender mode.
137: These are all mutually exclusive.
138: .Pp
139: When the client starts, regardless its mode, it first handshakes the
140: server.
141: This exchange is
142: .Em not
143: multiplexed.
144: .Pp
145: .Bl -enum -compact
146: .It
147: send local version (integer)
148: .It
149: receive remote version (integer)
150: .It
151: receive random seed (integer)
152: .El
153: .Pp
154: Following this, the client multiplexes when reading from the server.
155: Transmissions sent from client to server are not multiplexed.
156: It then enters the
157: .Sx Update exchange
158: protocol.
159: .Ss Server process
160: The server can operate in sender or receiver mode depending upon how the
161: client starts the server.
162: This may be directly from the parent process (when invoked for local
163: files) or indirectly via a remote shell.
164: .Pp
165: When in sender mode, the server pushes data to the client.
166: (This is equivalent to receiver mode for the client.)
167: In receiver, the opposite is true.
168: .Pp
169: When the server starts, regardless the mode, it first handshakes the
170: client.
171: This exchange is
172: .Em not
173: multiplexed.
174: .Pp
175: .Bl -enum -compact
176: .It
177: send local version (integer)
178: .It
179: receive remote version (integer)
180: .It
181: send random seed (integer)
182: .El
183: .Pp
184: Following this, the server multiplexes when writing to the client.
185: (Transmissions received from the client are not multiplexed.)
186: It then enters the
187: .Sx Update exchange
188: protocol.
189: .Ss Update exchange
190: When the client or server is in sender mode, it begins by conditionally
191: sending the exclusion list.
192: At this time, this is always empty.
193: .Pp
194: .Bl -enum -compact
195: .It
196: if
197: .Fl -delete
198: and the client, exclusion list zero (integer)
199: .El
200: .Pp
201: It then sends the
202: .Sx File list .
203: Prior to being sent, the file list should be lexicographically sorted.
204: .Pp
205: .Bl -enum -compact
206: .It
207: status byte (integer)
208: .It
209: inherited filename length (optional, byte)
210: .It
211: filename length (integer or byte)
212: .It
213: file (byte array)
214: .It
215: file length (long)
216: .It
217: file modification time (optional, time_t, integer)
218: .It
219: file mode (optional, mode_t, integer)
220: .It
221: if a symbolic link and
222: .Fl l ,
223: the link target's length (integer)
224: .It
225: if a symbolic link and
226: .Fl l ,
227: the link target (byte array)
228: .El
229: .Pp
230: The status byte may consist of the following bits and determines which
231: of the optional fields are transmitted.
232: .Pp
233: .Bl -tag -compact -width Ds
234: .It 0x02
235: Do not send the file mode: it is a repeat of the last file's mode.
236: .It 0x20
237: Inherit some of the prior file name.
238: Enables the inherited filename length transmission.
239: .It 0x40
240: Use full integer length for file name.
241: Otherwise, use only the byte length.
242: .It 0x80
243: Do not send the file modification time: it is a repeat of the last
244: file's.
245: .El
246: .Pp
247: If the status byte is zero, the file-list has terminated.
248: The sender then sends any IO error values, which for
249: .Xr openrsync 1
250: is always zero.
251: .Pp
252: .Bl -enum -compact
253: .It
254: constant zero (integer)
255: .El
256: .Pp
257: The server sender then reads the exclusion list, which is always zero.
258: .Pp
259: .Bl -enum -compact
260: .It
261: if server, constant zero (integer)
262: .El
263: .Pp
264: Following that, the sender receives data regarding the receiver's copy
265: of the file list contents.
266: This data is not ordered in any way.
267: Each of these requests starts as follows:
268: .Pp
269: .Bl -enum -compact
270: .It
271: file index or -1 to signal a change of phase (integer)
272: .El
273: .Pp
274: The phase starts in phase 1, then proceeds to phase 2, and phase 3
275: signals an end of transmission (no subsequent blocks).
276: If a phase change occurs, the sender must write back the -1 constant
277: integer value and increment its phase state.
278: .Pp
279: Blocks are read as follows:
280: .Pp
281: .Bl -enum -compact
282: .It
283: block index (integer)
284: .El
285: .Pp
286: In
287: .Pq Fl n
288: mode, the sender may immediately write back the index (integer) to skip
289: the following.
290: .Pp
291: .Bl -enum -compact
292: .It
293: number of blocks (integer)
294: .It
295: block length in the file (integer)
296: .It
297: long checksum length (integer)
298: .It
299: terminal (remainder) block length (integer)
300: .El
301: .Pp
302: And for each block:
303: .Pp
304: .Bl -enum -compact
305: .It
306: short checksum (integer)
307: .It
308: long checksum (bytes of checksum length)
309: .El
310: .Pp
311: The client then compares the two files, block by block, and updates the
312: server with mismatches as follows.
313: .Pp
314: .Bl -enum -compact
315: .It
316: file index (integer)
317: .It
318: number of blocks (integer)
319: .It
320: block length (integer)
321: .It
322: long checksum length (integer)
323: .It
324: remainder block length (integer)
325: .El
326: .Pp
327: Then for each block:
328: .Pp
329: .Bl -enum -compact
330: .It
331: data chunk size (integer)
332: .It
333: data chunk (bytes)
334: .It
335: block index subsequent to chunk or zero for finished (integer)
336: .El
337: .Pp
338: Following this sequence, the sender sends the followng:
339: .Pp
340: .Bl -enum -compact
341: .It
342: whole-file long checksum (16 bytes)
343: .El
344: .Pp
345: The sender then either handles the next queued file or, if the receiver
346: has written a phase change, the phase change step.
347: .Pp
348: If the sender is the server and
349: .Fl v
350: has been specified, the sender must send statistics.
351: .Pp
352: .Bl -enum -compact
353: .It
354: total bytes read (long)
355: .It
356: total bytes written (long)
357: .It
358: total size of files (long)
359: .El
360: .Pp
361: Finally, the sender must read a final constant-value integer.
362: .Pp
363: .Bl -enum -compact
364: .It
365: end-of-sequence -1 value (integer)
366: .El
367: .Pp
368: If in receiver mode, the inverse above (write instead of read, read
369: instead of write) is performed.
370: .Pp
371: The receiver begins by conditionally writing, then reading, the
372: exclusion list count, which is always zero.
373: .Pp
374: .Bl -enum -compact
375: .It
376: if client, send zero (integer)
377: .It
378: if receiver and
379: .Fl -delete ,
380: read zero (integer)
381: .El
382: .Pp
383: The receiver then proceeds with reading the
384: .Sx File list
385: as already
386: defined.
387: Following the list, the receiver reads the IO error, which must be zero.
388: .Pp
389: .Bl -enum -compact
390: .It
391: constant zero (integer)
392: .El
393: .Pp
394: The receiver must then sort the file names lexicographically.
395: .Pp
396: If there are no files in the file list at this time, the receiver must
397: exit prior to sending per-file data.
398: It then proceeds with the file blocks.
399: .Pp
400: For file blocks, the receiver must look at each file that is not up to
401: date, defined by having the same file size and timestamp, and send it to
402: the server.
403: Symbolic links and directory entries are never sent to the server.
404: .Pp
405: After the second phase has completed and prior to writing the
406: end-of-data signal, the client receiver reads statistics.
407: This is only performed with
408: .Pq Fl v .
409: .Pp
410: .Bl -enum -compact
411: .It
412: total bytes read (long)
413: .It
414: total bytes written (long)
415: .It
416: total size of files (long)
417: .El
418: .Pp
419: Finally, the receiver must send the constant end-of-sequence marker.
420: .Pp
421: .Bl -enum -compact
422: .It
423: end-of-sequence -1 value (integer)
424: .El
425: .Ss Sender and receiver asynchrony
426: The sender and receiver need not work in lockstep.
427: The receiver may send file update requests as quickly as it parses them,
428: and respond to the sender's update notices on demand.
429: Similarly, the sender may read as many update requests as it can, and
430: service them in any order it wishes.
431: .Pp
432: The sender and receiver synchronise state only at the end of phase.
433: .Pp
434: The reference
435: .Xr rsync 1
436: takes advantage of this with a two-process receiver, one for sending
437: update requests (the generator) and another for receiving.
438: .Xr openrsync 1
439: uses an event-loop model instead.
440: .\" .Sh CONTEXT
441: .\" For section 9 functions only.
442: .\" .Sh RETURN VALUES
443: .\" For sections 2, 3, and 9 function return values only.
444: .\" .Sh ENVIRONMENT
445: .\" For sections 1, 6, 7, and 8 only.
446: .\" .Sh FILES
447: .\" .Sh EXIT STATUS
448: .\" For sections 1, 6, and 8 only.
449: .\" .Sh EXAMPLES
450: .\" .Sh DIAGNOSTICS
451: .\" For sections 1, 4, 6, 7, 8, and 9 printf/stderr messages only.
452: .\" .Sh ERRORS
453: .\" For sections 2, 3, 4, and 9 errno settings only.
454: .Sh SEE ALSO
455: .Xr openrsync 1 ,
456: .Xr rsync 1 ,
457: .Xr rsyncd 5
458: .\" .Sh STANDARDS
459: .\" .Sh HISTORY
460: .\" .Sh AUTHORS
461: .\" .Sh CAVEATS
462: .Sh BUGS
463: Time values are sent as 32-bit integers.
464: .Pp
465: When in server mode
466: .Em and
467: when communicating to a client with a newer protocol (>27), the phase
468: change integer (-1) acknowledgement must be sent twice by the sender.
469: The is probably a bug in the reference implementation.