Base 52

I've been playing a lot of Solitaire recently, and for some reason I got the crazy idea of encoding data as a series of playing cards. The idea started as a moderately-complicated encoding scheme resulting in security through obscurity (with a bit of whimsy thrown in) but then I thought a more straightforward approach would be useful in a greater variety of situations. Relative to the complicated scheme, that is.

So anyway, now I've got a method of representing arbitrary binary data as a series of playing cards — or, equivalently, a method of efficiently representing a series of playing cards as a binary stream. Each card is represented by either 5 or 6 bits. Because of this inequity, translating from binary to base 52 will produce the 5-bit cards (namely aces, kings, and queens) twice as frequently as the others.

Of course, when a computer converts from binary to base 52, it still has to internally represent the base 52 in binary. Using six bits to store a card, base 52 becomes a subset of base 64. The computer could just as easily represent one card with an 8-, 16-, or 32-bit value (of which only 6 bits are used), a pair of 8-bit characters, or perhaps a longer text string which might be the name of a graphic representing the card. Apparently, there may be Unicode codepoints representing the 52 playing cards, but I can't quickly find them in the charts.

Encoding

The following table lists each card's “convenient” 6-bit encoding (convenient because the suit is in the high two bits and the value is in the low four bits) as well as its packed 5- or 6-bit binary form. Interestingly, when the packed form is 6 bits long, it's the same as the convenient 6-bit form. By the way, I designed the card graphics used in the table. Is anyone surprised that they look kind of like highway signs?

Card	6-Bit		Packed
Card	Dec	Bin	Bin
	1	000001	00000
	2	000010
	3	000011
	4	000100
	5	000101
	6	000110
	7	000111
	8	001000
	9	001001
	10	001010
	11	001011
	12	001100	00110
	13	001101	00111
	17	01001	01000
	18	010010
	19	010011
	20	010100
	21	010101
	22	010110
	23	010111
	24	011000
	25	011001
	26	011010
	27	011011
	28	011100	01110
	29	011101	01111
	33	100001	10000
	34	100010
	35	100011
	36	100100
	37	100101
	38	100110
	39	100111
	40	101000
	41	101001
	42	101010
	43	101011
	44	101100	10110
	45	101101	10111
	49	110001	11000
	50	110010
	51	110011
	52	110100
	53	110101
	54	110110
	55	110111
	56	111000
	57	111001
	58	111010
	59	111011
	60	111100	11110
	61	111101	11111

Example

Text	H								e								l								l								o																I								n								t								e								r								n								e								t								!
Binary	0	1	0	0	1	0	0	0	0	1	1	0	0	1	0	1	0	1	1	0	1	1	0	0	0	1	1	0	1	1	0	0	0	1	1	0	1	1	1	1	0	0	1	0	0	0	0	0	0	1	0	0	1	0	0	1	0	1	1	0	1	1	1	0	0	1	1	1	0	1	0	0	0	1	1	0	0	1	0	1	0	1	1	1	0	0	1	0	0	1	1	0	1	1	1	0	0	1	1	0	0	1	0	1	0	1	1	1	0	1	0	0	0	0	1	0	0	0	0	1	---
Cards

Length and Padding

Notice that last ace of diamonds is generated from the last two bits of the message itself, plus three imagined zeroes. Because the message ended before completing a valid bit code representing a card, it's padded with at most five zeroes to complete the card. When the sequence of cards is assumed to represent binary data in multiples of eight bits, any leftover bits on the end can be discarded as padding when reconstructing the original binary message. (The alternative is to pad the message again, which would result in an extra null-byte appearing on the end of most reconstructed messages — which sometimes isn't a problem at all.) Interestingly, had I left the exclamation point out of the example, there wouldn't be an issue to point out.

On the other hand, if the binary message is not to be assumed to have byte-aligned length, then some method is needed to encode the exact original length of the binary message. My solution is relatively simple. Remove the last two bits from the binary message and set them aside. Encode the message as usual, but pay attention to how many padding bits are added to the end in the process. Take the number of padding bits (replacing the number zero with eight, and optionally replacing one with nine) and that's the value of a new card, whose suit is determined by the two bits set aside earlier. When the original binary stream is reconstructed, remove and set aside the last card before translating the sequence of cards as normal. Take the lower 3 bits of the 6-bit form of the last card as a number, and discard that many bits from the end of the binary stream already decoded. Finally, take the upper two bits of the 6-bit form of the last card, and append them onto the binary stream. The result is the original binary stream, with no extra padding bits. Of course, this procedure (which adds an extra card of overhead to about ²⁄₃ of all messages) is only recommended where the binary message is not guaranteed to end on a byte boundary.

Now, when the original message is a sequence of cards, translating to binary data which must end on a byte boundary could require so much padding that a whole extra card appears on the end of the message. To avoid this, translate the cards to binary as normal but don't immediately pad to a byte boundary. Append a “dummy ” zero bit, and then pad the binary string to the next byte boundary, paying attention to how many pad bits (not counting the dummy) are required. If five or more pad bits were required, set to 1 the lowest bit of the final byte. When decoding, remove the lowest bit from the last byte and set it aside. Convert the resulting binary string to a sequence of cards; if there are leftover bits that don't make a complete card in packed format, discard them. If the set-aside bit is set (1) then discard the last-decoded card from the sequence. The result is the original sequence, without any extra cards that weren't there before.

Base 52

Encoding

Example

Length and Padding

Intellectual Property