Four packed code file format
All ASCII based file formats have one disadvantage in common: they all need more than double the amount of characters as opposed to the number of bytes to be sent. Address fields and checksums will add even more characters. So the shorter the records, the more characters have to be sent to get the file across.
The FPC format may be used to reduce the number of characters needed to send a file in ASCII format, although it still needs more characters than the actual bytes it sends. FPC stands for "Four Packed Code". The reduction is accomplished by squeezing 4 real bytes into 5 ASCII characters. In fact every ASCII character will be a digit in the base 85 number system. There aren't enough letters, digits and punctuation marks available to get 85 different characters, but if we use both upper case and lower case letters we will manage. This implies that the FPC is case sensitive, as opposed to all other ASCII based file formats.
The numbering system is in base 85, and is somewhat hard to understand for us humans who are usually only familiar with base 10 numbers. Some of us understand base 2 and base 16 as well, but base 85 is for most people something new. Luckily we don't have to do any math with this number system. We just convert a 32 bit number into a 5 digit number in base 85. A 32 bit number has a range of 4,294,967,296, while a 5 digit number in base 85 has a range of 4,437,053,125, which is enough to do the trick. One drawback is that we always have to send multiples of 4 bytes, even if we actually want to send 1, 2 or 3 bytes. Unused bytes are padded with zeroes, and are discarded at the receiving end.
The digits of the base 85 numbering system start at %, which represents the value of 0. The highest value of a digit in base 85 is 84, and is represented by the character 'z'. If you want to check this with a normal ASCII table you will notice that we have used one character too many! Why? I don't know, but for some reason we have to skip the '*' character in the row. This means that after the ')' character follows the '+' character.
We can use normal number conversion algorithms to generate the FPC digits, with this tiny difference. We have to check whether the digit is going to be equal or larger than the ASCII value for '*'. If this is the case we have to increment the digit once to stay clear of the '*'. In base 85 MSD digits go first, like in all number systems!
The benefit of this all is hopefully clear. For every 4 bytes we only have to send 5 ASCII characters, as opposed to 8 characters for all other formats.
Now we take a look at the the formatting of the FPC records. We look at the record at byte level, not at the actual base 85 encoded level. Only after formatting the FPC record at byte level we convert 4 bytes at a time to a 5 digit base 85 number. If we don't have enough bytes in the record to fill the last group of 5 digits we will add bytes with the value of 0 behind the record.
The field are defined as:
Every line starts with the character $, all other characters are digits of base 85.
The checksum. A one byte 2's\[hy]complement checksum of all bytes of the record.
The byte\[hy]count. A one byte value, counting all the bytes in the record minus 4.
Format code, a two byte value, defining the record type.
The address field. A 4 byte number representing the first address of this record.
The actual data of this record.
Every record begins with the ASCII character "$". No spaces or tabs are allowed in a record. All other characters in the record are formed by groups of 5 digits of base 85.
This field is a one byte 2's\[hy]complement checksum of the entire record. To create the checksum make a one byte sum from all of the bytes from all of the fields of the record:
Then take the 2's\[hy]complement of this sum to create the final checksum. The 2's\[hy]complement is simply inverting all bits and then increment by 1 (or using the negative operator). Checking the checksum at the receivers end is done by adding all bytes together including the checksum itself, discarding all carries, and the result must be $00. The padding bytes at the end of the line, should they exist, should not be included in checksum. But it doesn't really matter if they are, for their influence will be 0 anyway.
The byte count cc counts the number of bytes in the current record minus 4. So only the number of address bytes and the data bytes are counted and not the first 4 bytes of the record (checksum, byte count and format flags). The byte count can have any value from 0 to 255.
Usually records have 32 data bytes. It is not recommended to send too many data bytes in a record for that may increase the transmission time in case of errors. Also avoid sending only a few data bytes per record, because the address overhead will be too heavy in comparison to the payload.
This is a 2 byte number, indicating what format is represented in this record. Only a few formats are available, so we actually waste 1 byte in each record for the sake of having multiples of 4 bytes.
Format code 0 means that the address field in this record is to be treated as the absolute address where the first data byte of the record should be stored.
Format code 1 means that the address field in this record is missing. Simply the last known address of the previous record +1 is used to store the first data byte. As if the FPC format wasn't fast enough already ;\[hy])
Format code 2 means that the address field in this record is to be treated as a relative address. Relative to what is not really clear. The relative address will remain in effect until an absolute address is received again.
The first data byte of the record is stored in the address specified by the Address field aaaaaaaa. After storing that data byte, the address is incremented by 1 to point to the address for the next data byte of the record. And so on, until all data bytes are stored.
The length of the address field is always 4 bytes, if present of course. So the address range for the FPC format is always 2**32.
If only the address field is given, without any data bytes, the address will be set as starting address for records that have no address field.
Addresses between records are non sequential. There may be gaps in the addressing or the address pointer may even point to lower addresses as before in the same file. But every time the sequence of addressing must be changed, a format 0 record must be used. Addressing within one single record is sequential of course.
This field contains 0 or more data bytes. The actual number of data bytes is indicated by the byte count in the beginning of the record less the number of address bytes. The first data byte is stored in the location indicated by the address in the address field. After that the address is incremented by 1 and the next data byte is stored in that new location. This continues until all bytes are stored. If there are not enough data bytes to obtain a multiple of 4 we use 0x00 as padding bytes at the end of the record. These padding bytes are ignored on the receiving side.
End of file is recognized if the first four bytes of the record all contain 0x00. In base 85 this will be \[lq]W$%%%%%\[rq]. This is the only decent way to terminate the file.
In general, binary data will expand in sized by approximately 1.7 times when represented with this format.
Now it's time for an example. In the first table you can see the byte representation of the file to be transferred. The 4th row of bytes is not a multiple of 4 bytes. But that does not matter, for we append $00 bytes at the end until we do have a multiple of 4 bytes. These padding bytes are not counted in the byte count however!
D81400000000B000576F77212044696420796F7520726561 431400000000B0106C6C7920676F207468726F7567682061 361400000000B0206C6C20746861742074726F75626C6520 591100000000B030746F207265616420746869733F000000 00000000
Only after converting the bytes to base 85 we get the records of the FPC type file format presented in the next table. Note that there is always a multiple of 5 characters to represent a multiple of 4 bytes in each record.
[email protected]%%,:,B.\?00EPuX0K3rO0JI)) $;UPR'%%,:<Hn&FCG:at<GVF(;G9wIw $7FD1p%%,:LHmy:>GTV%/[email protected][kYz $B[6\;%%,:\KIn?GFWY/qKI1G5:;-_e $%%%%%
As you can see the length of the lines is clearly shorter than the original ASCII lines.
This man page was taken from the above Web page. It was written by San Bergmans <[email protected]>
For extra points: Who invented this format? Where is it used?