Victor_VG
Tracker Mod | Редактировать | Профиль | Сообщение | Цитировать | Сообщить модератору 2. About the mbox Database The mbox database format is not documented in an authoritative specification, but instead exists as a well-known output format that is anecdotally documented, or which is only authoritatively documented for a specific platform or tool. mbox databases typically contain a linear sequence of electronic mail messages. Each message begins with a separator line that identifies the message sender, and also identifies the date and time at which the message was received by the final recipient (either the last-hop system in the transfer path, or the system which serves as the recipient's mailstore). Each message is typically terminated by an empty line. The end of the database is usually recognized by either the absence of any additional data, or by the presence of an explicit end-of-file marker. The structure of the separator lines vary across implementations, but usually contain the exact character sequence of "From", followed by a single Space character (0x20), an email address of some kind, another Space character, a timestamp sequence of some kind, and an end-of- line marker. However, due to the lack of any authoritative specification, each of these attributes are known to vary widely across implementations. For example, the email address can reflect any addressing syntax that has ever been used on any messaging system in all of history (specifically including address forms that are not compatible with Internet messages, as defined by RFC 2822 [RFC2822]). Similarly, the timestamp sequences can also vary according to system output, while the end-of-line sequences will often reflect platform- specific requirements. Different data formats can even appear within a single database as a result of multiple mbox files being concatenated together, or because a single file was accessed by multiple messaging clients, each of which has used its own syntax for the separator line. Message data within mbox databases often reflects site-specific peculiarities. For example, it is entirely possible for the message body or headers in an mbox database to contain untagged eight-bit character data that implicitly reflects a site-specific default language or locale, or that reflects local defaults for timestamps and email addresses; none of this data is widely portable beyond the local scope. Similarly, message data can also contain unencoded eight-bit binary data, or can use encoding formats that represent a specific platform (e.g., BINHEX or UUENCODE sequences). Many implementations are also known to escape message body lines that begin with the character sequence of "From ", so as to prevent confusion with overly-liberal parsers that do not search for full separator lines. In the common case, a leading Greater-Than symbol (0x3E) is used for this purpose (with "From " becoming ">From "). However, other implementations are known not to escape such lines unless they are immediately preceded by a blank line or if they also appear to contain an email address and a timestamp. Other implementations are also known to perform secondary escapes against these lines if they are already escaped or quoted, while others ignore these mechanisms altogether. A comprehensive description of mbox database files on UNIX-like systems can be found at http://qmail.org./man/man5/mbox.html, which should be treated as mostly authoritative for those variations that are otherwise only documented in anecdotal form. However, readers are advised that many other platforms and tools make use of mbox databases, and that there are many more potential variations that can be encountered in the wild. In order to mitigate errors that may arise from such vagaries, this specification defines a "format" parameter to the application/mbox media type declaration, which can be used to identify the specific kind of mbox database that is being transferred. Furthermore, this specification defines a "default" database format which MUST be supported by implementations that claim to be compliant with this specification, and which is to be used as the implicit format for undeclared application/mbox data objects. Additional format types are to be defined in subsequent specifications. Messaging systems that receive an mbox database with an unknown format parameter value SHOULD treat the data as an opaque binary object, as if the data had been declared as application/octet-stream Refer to Appendix A for a description of the default mbox format. Note that RFC 2046 [RFC2046] defines the multipart/digest media type for transferring platform-independent message files. Because that specification defines a set of neutral and strict formatting rules, the multipart/digest media type already facilitates highly- predictable transfer and conversion operations; as such, implementers are strongly encouraged to support and use that media type where possible. |