Arithmetic coding

I really did not intend this blog to become a repository of Haskell code snippets, but I’ve been rather busy as of late, and writing toy code while waiting for a compile to finish has somehow become my primary means of entertainment. Here is the latest.

Arithmetic coding is a remarkably simple and clever thing. The idea is that given some half-open interval [a,b), that is, the interval a <= x < b, we can partition it into half-open subintervals, such that there is one subinterval per character in the message to be encoded, and the lengths correspond to the character frequencies multiplied by b-a. The same procedure is applied, recursively, to each subinterval, resulting in an infinite hierarchy of coverings of the original interval — call it S. Now, if we throw a rock at S, record the point where it hit, and follow the interval hierarchy, we’ll come up with a unique infinite string of characters.

To construct the actual encoding, set S to [0,1), and find out which subinterval S_1 the first character of the message falls into. For the second character, let S_2 be the appropriate subinterval of S_1, for the third character, let S_3 be the appropriate subinterval of S_2, and so on; if we repeat this procedure as many times as there are characters, we’ll arrive at some interval S_n. Numbers that fall in this interval have a useful property: given any such number, call it x, we have x in S_{n-1} (since x is in S_n, and S_n is a subinterval of S_{n-1}), x in S_{n-2} by the same argument, and, by induction, in every subinterval that we picked while encoding the message. Any such x, therefore, uniquely encodes the message: to decode, simply follow the hierarchy.

*Arith> encodeToStream “encodeToStream returns a pair of lists of bytes, represe
nting the numerator and denominator, respectively.”
([174,77,70,217,88,196,42,26,75,253,160,72,114,92,77,135,32,165,50,80,55,77,233,
103,172,90,177,4],[211,29,119,249,50,167,209,90,128,245,114,158,13,236,212,196,1
1,81,64,169,125,254,83,235,75,2,30,13])

*Arith> encode “testing testing testing”
(23,[(‘ ‘,(0%1,2%23)),(‘e’,(2%23,5%23)),(‘g’,(5%23,8%23)),(‘i’,(8%23,11%23)),(‘n
‘,(11%23,14%23)),(‘s’,(14%23,17%23)),(‘t’,(17%23,1%1))],3430733247%4363211066)

*Arith> (decode . encode . decode . encode) “testing testing testing”
“testing testing testing”

*Arith> encode (concat $ replicate 500 “abcd”)
(2000,[(‘a’,(0%1,1%4)),(‘b’,(1%4,1%2)),(‘c’,(1%2,3%4)),(‘d’,(3%4,1%1))],9%85)

The last test shows the output of ‘encode’ : the length of the message is 2000 characters, this is followed by character distributions (in a practical setting, frequencies would be returned instead of explicit intervals), and finally the encoded message. The entire 2000 byte string is encoded in the fraction 9/85.

Toy code follows. As mentioned earlier, ‘encodeToStream’ is a helper function that breaks the fraction into a pair of lists of bytes; the actual encoder and decoder consist of just ‘encode’, ‘decode’ and ‘freqRanges’, weighing in at 23 lines of code including type annotations and line breaks. Gotta love Haskell.

{-# OPTIONS -fglasgow-exts #-}
module Arith where

import Ratio
import Data.List
import Data.Maybe
import Data.Char
import qualified Data.Map as M

type RangeMap k a = [(k, (Ratio a, Ratio a))]

encode :: (Ord k, Integral a) => [k] -> (Int, RangeMap k a, Rational)
encode msg = (length msg, M.assocs freqMap, best $ foldl pair (0,1) rmap)
             where
                  freqMap = freqRanges msg
                  rmap = map (\x -> fromJust $ M.lookup x freqMap) msg
                  
                  best (a,b)                  = approxRational ((b+a)/2) ((b-a)/2)
                  pair (a,b) (x,y)            = ((b-a)*x+a, (b-a)*y+a)

decode :: (Ord a, Integral a) => (Int, RangeMap k a, Ratio a) -> [k]
decode (n, freqs, code) = take n $ decode' code
                          where
                              findChar x = find (\(c, (a,b)) -> (x >= a) && (x < b)) freqs
                              decode' code = let
                                                (Just (c, (x, y))) = findChar code
                                             in
                                                 c : decode' ((code-x) / (y-x))

freqRanges :: (Ord k, Integral a) => [k] -> M.Map k (Ratio a, Ratio a)
freqRanges str = snd $ M.mapAccum (\acc x -> (acc + x, (acc, acc + x))) 0 freqs
                 where
                      freqs = M.map (\p -> p % total) occurences
                      occurences = foldl (\m c -> M.insertWith (+) c 1 m) M.empty str
                      total = sum (M.elems occurences)

encodeToStream msg = let
                        (len, freqs, code) = encode msg
                        (num, denom) = (numerator code, denominator code)
                        bytes n = unfoldr (\k -> if k == 0 then Nothing else Just (rem k 256, quot k 256)) n
                     in
                        (bytes num, bytes denom)

4 Comments

Michael

Posted February 21, 2007 at 9:30 pm | Permalink

Thanks for the post! I’ve always found arithmetic coding fascinating in a beautifully simple sort of way. My desire to read your code just might finally get me around to figuring out Haskell, which has been on my to-do list for a while.

Johan Jeuring

Posted March 5, 2007 at 2:36 am | Permalink

See also:

http://web.comlab.ox.ac.uk/oucl/work/jeremy.gibbons/publications/index.html#arith

for another implementation in Haskell.

Vlad Andreev

Posted March 5, 2007 at 4:56 am | Permalink

Michael:
Definitely check out Haskell. Prettiest language there is, really.

Johan:
“Arithmetic Coding with Folds and Unfolds” is excellent, although its apparent aim was to produce something that is actually robust and useful, develop a fair bit of theory along the way, and, in short, provide a great example of an expository paper. My post was more along the lines of “eh, I’ve never written a arithmetic coder, before, let’s tr… ooh, I can use Ratio!”

lawlimmangelo

Posted May 16, 2010 at 8:22 pm | Permalink

The Wizard will lead you over step by step shopping cart transition procedure which is awfully clear. You will get uncomplicated orders, no incompetent information. If you follow them wisely the procedure will be fast and easy and transition will be realized successfully. You do not have to move every bite of the content of your warehouse at single time. You may perhaps pick some break categories: products, orders, clients, etc. This is dreadfully convenient as saves your time and wealth.

Migrating all your store’s content at once is not a must. You may decide on several categories the transfer of which is of big importance for you. For example: just products, orders, buyers etc. This is very convenient for you as saves your time and money.

Easy-to-use interface makes your dealing with this service even more convenient. The Wizard will lead you gradually through migration process. Therefore, you won’t be overloaded with lots of worthless information and can simply focus on the migration process. All you must do is follow the directions !

Go to Migration service website and grasp the instructions on how to setup your movement. The Wizard will take you gradually through the process. You should type in the details about your source shopping cart and target one. Next select the type of data you wish totransfer (consumers, buys, products, etc.) Demo Migration is offered in case you wish to see Migration service in action. A number of of items you have selected for movement will be transferred to the target platform of your selection.

The power of the shopping cart software is to make your business more visible. The opportunities of extending the list of advantages of using shopping cart software well to promote your online business are everlasting.
Some of them to be stated:
– Online shopping is a lot more convenient for your customers with Shopping cart software
– Shopping cart software adds a professional touch to your website
– You can simply integrate your shopping cart with multiple payment systems, making it a lot easier for your clients to select the payment method
http://sonicfellatio.com/make-the-move-to-better-e-commerce-platform-smooth-and-simple-with-cart2cart/

codeland