Annotation of src/usr.bin/rsync/rsync.5, Revision 1.4
1.4 ! benno 1: .\" $OpenBSD: rsync.5,v 1.3 2019/02/12 18:59:34 benno Exp $
1.1 benno 2: .\"
3: .\" Copyright (c) 2019 Kristaps Dzonsons <kristaps@bsd.lv>
4: .\"
5: .\" Permission to use, copy, modify, and distribute this software for any
6: .\" purpose with or without fee is hereby granted, provided that the above
7: .\" copyright notice and this permission notice appear in all copies.
8: .\"
9: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10: .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11: .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12: .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14: .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15: .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16: .\"
1.4 ! benno 17: .Dd $Mdocdate: February 12 2019 $
1.1 benno 18: .Dt RSYNC 5
19: .Os
20: .Sh NAME
21: .Nm rsync
22: .Nd rsync wire protocol
23: .Sh DESCRIPTION
24: The
25: .Nm
26: protocol described in this relates to the BSD-licensed
27: .Xr openrsync 1 ,
28: a re-implementation of the GPL-licensed reference utility
29: .Xr rsync 1 .
30: It is compatible with version 27 of the reference.
31: .Pp
32: In this document, the
33: .Qq client process
34: refers to the utility as run on the operator's local computer.
35: The
36: .Qq server process
37: is run either on the local or remote computer, depending upon the
38: command-line given file locations.
39: .Pp
40: There are a number of options in the protocol that are dictated by command-line
41: flags.
42: These will be noted as
43: .Fl n
44: for dry-run,
1.3 benno 45: .Fl g
46: for group ids,
1.1 benno 47: .Fl l
48: for links,
49: .Fl r
50: for recursion,
51: .Fl v
52: for verbose, and
53: .Fl -delete
54: for deletion (before).
55: .Ss Data types
56: The binary protocol encodes all data in little-endian format.
57: Integers are signed 32-bit, shorts are signed 16-bit, bytes are unsigned
58: 8-bit.
59: A long is variable-length.
60: For values less than the maximum integer, the value is transmitted and
61: read as a 32-bit integer.
62: For values greater, the value is transmitted first as a maximum integer,
63: then a 64-bit signed integer.
64: .Pp
65: There are three types of checksums: long (slow), short (fast), and
66: whole-file.
67: The fast checksum is a derivative of Adler-32.
68: The slow checksum is MD4,
69: made over the checksum seed first (serialised in little-endian format),
70: then the data.
71: The whole-file applies MD4 to the file first, then the checksum seed at
72: the end (also serialised in little-endian format).
73: .Ss Multiplexing
74: Most
75: .Nm
76: transmissions are wrapped in a multiplexing envelope protocol.
77: It is composed as follows:
78: .Pp
79: .Bl -enum -compact
80: .It
81: envelope header (4 bytes)
82: .It
83: envelope payload (arbitrary length)
84: .El
85: .Pp
86: The first byte of the envelope header consists of a tag.
87: If the tag is 7, the payload is normal data.
88: Otherwise, the payload is out-of-band server messages.
89: If the tag is 1, it is an error on the sender's part and must trigger an
90: exit.
91: This limits message payloads to 24 bit integer size,
92: .Li 0x0fffffff .
93: .Pp
94: The only data not using this envelope are the initial handshake between
95: client and server.
96: .Ss File list
97: A central part of the protocol is the file list, which is generated by
98: the sender.
99: It consists of all files that must be sent to the receiver, either
100: explicitly as given or recursively generated.
101: .Pp
102: The file list itself consists of filenames and attributes (mode, time,
103: size, etc.).
104: Filenames must be relative to the destination root and not be absolute
105: or contain backtracking.
106: So if a file is given to the sender as
107: .Pa ../../foo/bar ,
108: it must be sent as
109: .Pa foo/bar .
110: .Pp
111: The file list should be cleaned of inappropriate files prior to sending.
112: For example, if
113: .Fl l
114: is not specified, symbolic links may be omitted.
115: Directory entries without
116: .Fl r
117: may also be omitted.
118: Duplicates may be omitted.
119: .Pp
120: The receiver
121: .Em must not
122: assume that the file list is clean.
123: It should not omit inappropriate files from the file list (which would
124: affect the indexing), but may omit them during processing.
125: .Pp
126: Prior to be sent from sender to receiver, and upon being received, the
127: file list must be lexicographically sorted such as with
128: .Xr strcmp 3 .
129: Subsequent references to the file are by index in the sorted list.
130: .Ss Client process
131: The client can operate in sender or receiver mode depending upon the
132: command-line source and destination.
133: .Pp
134: If the destination directory (sink) is remote, the client is in sender
135: mode: the client will push its data to the server.
136: If the source file is remote, it is in receiver mode: the server pushes
137: to the client.
138: If neither are remote, the client operates in sender mode.
139: These are all mutually exclusive.
140: .Pp
141: When the client starts, regardless its mode, it first handshakes the
142: server.
143: This exchange is
144: .Em not
145: multiplexed.
146: .Pp
147: .Bl -enum -compact
148: .It
149: send local version (integer)
150: .It
151: receive remote version (integer)
152: .It
153: receive random seed (integer)
154: .El
155: .Pp
156: Following this, the client multiplexes when reading from the server.
157: Transmissions sent from client to server are not multiplexed.
158: It then enters the
159: .Sx Update exchange
160: protocol.
161: .Ss Server process
162: The server can operate in sender or receiver mode depending upon how the
163: client starts the server.
164: This may be directly from the parent process (when invoked for local
165: files) or indirectly via a remote shell.
166: .Pp
167: When in sender mode, the server pushes data to the client.
168: (This is equivalent to receiver mode for the client.)
169: In receiver, the opposite is true.
170: .Pp
171: When the server starts, regardless the mode, it first handshakes the
172: client.
173: This exchange is
174: .Em not
175: multiplexed.
176: .Pp
177: .Bl -enum -compact
178: .It
179: send local version (integer)
180: .It
181: receive remote version (integer)
182: .It
183: send random seed (integer)
184: .El
185: .Pp
186: Following this, the server multiplexes when writing to the client.
187: (Transmissions received from the client are not multiplexed.)
188: It then enters the
189: .Sx Update exchange
190: protocol.
191: .Ss Update exchange
192: When the client or server is in sender mode, it begins by conditionally
193: sending the exclusion list.
194: At this time, this is always empty.
195: .Pp
196: .Bl -enum -compact
197: .It
198: if
199: .Fl -delete
200: and the client, exclusion list zero (integer)
201: .El
202: .Pp
203: It then sends the
204: .Sx File list .
205: Prior to being sent, the file list should be lexicographically sorted.
206: .Pp
207: .Bl -enum -compact
208: .It
209: status byte (integer)
210: .It
211: inherited filename length (optional, byte)
212: .It
213: filename length (integer or byte)
214: .It
215: file (byte array)
216: .It
217: file length (long)
218: .It
219: file modification time (optional, time_t, integer)
220: .It
221: file mode (optional, mode_t, integer)
222: .It
1.3 benno 223: if
224: .Fl g ,
225: the group id (integer)
226: .It
1.1 benno 227: if a symbolic link and
228: .Fl l ,
229: the link target's length (integer)
230: .It
231: if a symbolic link and
232: .Fl l ,
233: the link target (byte array)
234: .El
235: .Pp
236: The status byte may consist of the following bits and determines which
237: of the optional fields are transmitted.
238: .Pp
239: .Bl -tag -compact -width Ds
240: .It 0x02
241: Do not send the file mode: it is a repeat of the last file's mode.
1.3 benno 242: .It 0x10
243: Like
244: .Li 0x02 ,
245: but for the group id.
1.1 benno 246: .It 0x20
247: Inherit some of the prior file name.
248: Enables the inherited filename length transmission.
249: .It 0x40
250: Use full integer length for file name.
251: Otherwise, use only the byte length.
252: .It 0x80
253: Do not send the file modification time: it is a repeat of the last
254: file's.
255: .El
256: .Pp
257: If the status byte is zero, the file-list has terminated.
1.4 ! benno 258: If
! 259: .Fl g
! 260: has been specified, the sender sends the list of all groups encountered
! 261: in the file list:
! 262: .Pp
! 263: .Bl -enum -compact
! 264: .It
! 265: group identifier or zero to indicate end of set (integer)
! 266: .It
! 267: length of group (byte)
! 268: .It
! 269: group name (prior length)
! 270: .El
! 271: .Pp
1.1 benno 272: The sender then sends any IO error values, which for
273: .Xr openrsync 1
274: is always zero.
275: .Pp
276: .Bl -enum -compact
277: .It
278: constant zero (integer)
279: .El
280: .Pp
281: The server sender then reads the exclusion list, which is always zero.
282: .Pp
283: .Bl -enum -compact
284: .It
285: if server, constant zero (integer)
286: .El
287: .Pp
288: Following that, the sender receives data regarding the receiver's copy
289: of the file list contents.
290: This data is not ordered in any way.
291: Each of these requests starts as follows:
292: .Pp
293: .Bl -enum -compact
294: .It
295: file index or -1 to signal a change of phase (integer)
296: .El
297: .Pp
298: The phase starts in phase 1, then proceeds to phase 2, and phase 3
299: signals an end of transmission (no subsequent blocks).
300: If a phase change occurs, the sender must write back the -1 constant
301: integer value and increment its phase state.
302: .Pp
303: Blocks are read as follows:
304: .Pp
305: .Bl -enum -compact
306: .It
307: block index (integer)
308: .El
309: .Pp
310: In
311: .Pq Fl n
312: mode, the sender may immediately write back the index (integer) to skip
313: the following.
314: .Pp
315: .Bl -enum -compact
316: .It
317: number of blocks (integer)
318: .It
319: block length in the file (integer)
320: .It
321: long checksum length (integer)
322: .It
323: terminal (remainder) block length (integer)
324: .El
325: .Pp
326: And for each block:
327: .Pp
328: .Bl -enum -compact
329: .It
330: short checksum (integer)
331: .It
332: long checksum (bytes of checksum length)
333: .El
334: .Pp
335: The client then compares the two files, block by block, and updates the
336: server with mismatches as follows.
337: .Pp
338: .Bl -enum -compact
339: .It
340: file index (integer)
341: .It
342: number of blocks (integer)
343: .It
344: block length (integer)
345: .It
346: long checksum length (integer)
347: .It
348: remainder block length (integer)
349: .El
350: .Pp
351: Then for each block:
352: .Pp
353: .Bl -enum -compact
354: .It
355: data chunk size (integer)
356: .It
357: data chunk (bytes)
358: .It
359: block index subsequent to chunk or zero for finished (integer)
360: .El
361: .Pp
362: Following this sequence, the sender sends the followng:
363: .Pp
364: .Bl -enum -compact
365: .It
366: whole-file long checksum (16 bytes)
367: .El
368: .Pp
369: The sender then either handles the next queued file or, if the receiver
370: has written a phase change, the phase change step.
371: .Pp
372: If the sender is the server and
373: .Fl v
374: has been specified, the sender must send statistics.
375: .Pp
376: .Bl -enum -compact
377: .It
378: total bytes read (long)
379: .It
380: total bytes written (long)
381: .It
382: total size of files (long)
383: .El
384: .Pp
385: Finally, the sender must read a final constant-value integer.
386: .Pp
387: .Bl -enum -compact
388: .It
389: end-of-sequence -1 value (integer)
390: .El
391: .Pp
392: If in receiver mode, the inverse above (write instead of read, read
393: instead of write) is performed.
394: .Pp
395: The receiver begins by conditionally writing, then reading, the
396: exclusion list count, which is always zero.
397: .Pp
398: .Bl -enum -compact
399: .It
400: if client, send zero (integer)
401: .It
402: if receiver and
403: .Fl -delete ,
404: read zero (integer)
405: .El
406: .Pp
407: The receiver then proceeds with reading the
408: .Sx File list
409: as already
410: defined.
411: Following the list, the receiver reads the IO error, which must be zero.
412: .Pp
413: .Bl -enum -compact
414: .It
415: constant zero (integer)
416: .El
417: .Pp
418: The receiver must then sort the file names lexicographically.
419: .Pp
420: If there are no files in the file list at this time, the receiver must
421: exit prior to sending per-file data.
422: It then proceeds with the file blocks.
423: .Pp
424: For file blocks, the receiver must look at each file that is not up to
425: date, defined by having the same file size and timestamp, and send it to
426: the server.
427: Symbolic links and directory entries are never sent to the server.
428: .Pp
429: After the second phase has completed and prior to writing the
430: end-of-data signal, the client receiver reads statistics.
431: This is only performed with
432: .Pq Fl v .
433: .Pp
434: .Bl -enum -compact
435: .It
436: total bytes read (long)
437: .It
438: total bytes written (long)
439: .It
440: total size of files (long)
441: .El
442: .Pp
443: Finally, the receiver must send the constant end-of-sequence marker.
444: .Pp
445: .Bl -enum -compact
446: .It
447: end-of-sequence -1 value (integer)
448: .El
449: .Ss Sender and receiver asynchrony
450: The sender and receiver need not work in lockstep.
451: The receiver may send file update requests as quickly as it parses them,
452: and respond to the sender's update notices on demand.
453: Similarly, the sender may read as many update requests as it can, and
454: service them in any order it wishes.
455: .Pp
456: The sender and receiver synchronise state only at the end of phase.
457: .Pp
458: The reference
459: .Xr rsync 1
460: takes advantage of this with a two-process receiver, one for sending
461: update requests (the generator) and another for receiving.
462: .Xr openrsync 1
463: uses an event-loop model instead.
464: .\" .Sh CONTEXT
465: .\" For section 9 functions only.
466: .\" .Sh RETURN VALUES
467: .\" For sections 2, 3, and 9 function return values only.
468: .\" .Sh ENVIRONMENT
469: .\" For sections 1, 6, 7, and 8 only.
470: .\" .Sh FILES
471: .\" .Sh EXIT STATUS
472: .\" For sections 1, 6, and 8 only.
473: .\" .Sh EXAMPLES
474: .\" .Sh DIAGNOSTICS
475: .\" For sections 1, 4, 6, 7, 8, and 9 printf/stderr messages only.
476: .\" .Sh ERRORS
477: .\" For sections 2, 3, 4, and 9 errno settings only.
478: .Sh SEE ALSO
479: .Xr openrsync 1 ,
480: .Xr rsync 1 ,
481: .Xr rsyncd 5
482: .\" .Sh STANDARDS
483: .\" .Sh HISTORY
484: .\" .Sh AUTHORS
485: .\" .Sh CAVEATS
486: .Sh BUGS
487: Time values are sent as 32-bit integers.
488: .Pp
489: When in server mode
490: .Em and
491: when communicating to a client with a newer protocol (>27), the phase
492: change integer (-1) acknowledgement must be sent twice by the sender.
493: The is probably a bug in the reference implementation.