Having recently come across a method for automatic differentiation on sigfpe’s cornucopia of amazingly cool stuff masquerading as a blog, I decided to start playing around with it a little to see what might come out.

So, let’s say we take the standard definition of the derivative,

,

look at it for a bit, and decide that we don’t like the limit symbol, and that, in fact, we’re going to drop it entirely. After rearranging, we would then obtain the weird-looking f(x) + d f’(x) = f(x+d), and, presumably, set out to find what d could possibly look like.

To this end, we might expand f(x+d) about x, which yields

and, after subtracting f(x) + d f’(x) from both sides, degenerates into

It appears that d^n should be zero for all n>1. To see this, we can plug in f(x) = e^x. The coefficients of d^n become constants, and, dividing both sides by e^x, we obtain d^2/2! + d^3/3! + … = 0, which is the MacLaurin series for e^d with the first two terms missing, so e^d=d+1. Ignoring, momentarily, the troublesome fact that this gives d=0 in the reals, we certainly at least have d^2=0 (pardon the handwaving).

But in order for f(x) + d f’(x) = f(x+d) to be a remotely interesting statement, we must also have d != 0, and we want d to be unique. Given some structure whose objects we’d care to differentiate, we’re going to cheat a little, and extend it with the element d such that d != 0, but d^2 = 0. Having allowed such a number, all the weirdness disappears, the equality f(x+d) = f(x) + d f’(x) holds, and we’re left with a sort of infinitesimal constant which we’re free to plug into random things.

To test whether this makes any sense, we might start by taking the quadratic Q(x) = c_2 x^2 + c_1 x + c_0, and computing Q(x+d):

The derivative of Q(x) “fell out” into the coefficient of d. How about e^{x+d}?

Anything else? Let’s see. By the binomial theorem,

.

The d^{n-k} factor vanishes whenever n >= k+2, so

and we’ve just obtained the power law.

Trigonometric functions:

Using the power law experiment from earlier, we get

Plotting coefficients of d along the vertical axis, and the reals along the horizontal, we get the unit circle, while exponential maps are lines through the origin, which is kind of cool in and out of itself.

Ratios:

One useful thing here is that d is small enough to be in *any* radius of convergence, so we get logarithms “for free”:

So, what’s the point? The point is that we automatically obtain the derivative as a side effect of any computation, since f(x+d) = f(x) + d f’(x). In other words, by switching to dual numbers (numbers of the form x+y d) for things that need to be differentiated (and making it transparent through operator overloading), we can ask for the derivative of any differentiable function we’ve ever defined, and it’ll be evaluated symbolically.

To that end, here’s the first approximation in Haskell.

module Diff where

data Diffable a = !a :+ a

funcPart :: Diffable a -> a

funcPart (x :+ x’) = x

diffPart :: Diffable a -> a

diffPart (x :+ x’) = x’

instance (RealFloat a) => Eq (Diffable a) where

(x :+ x’) == (y :+ y’) = (x == y) && (x’ == y’)

instance (RealFloat a) => Show (Diffable a) where

show (x :+ x’) = show (x, x’)

instance (RealFloat a) => Num (Diffable a) where

(x :+ x’) + (y :+ y’) = (x + y) :+ (x’ + y’)

(x :+ x’) * (y :+ y’) = (x * y) :+ (x’ * y + y’ * x)

abs (x :+ x’) = if x < 0 then ((-x) :+ (-x’)) else (x :+ x’)

signum (x :+ x’) = (signum x) :+ 0

fromInteger x = (fromInteger x) :+ 1

instance (RealFloat a) => Fractional (Diffable a) where

fromRational x = (fromRational x) :+ 1

recip (x :+ x’) = (recip x) :+ (negate x’ / (x^2))

instance (RealFloat a) => Floating (Diffable a) where

pi = pi :+ 0

exp (x :+ x’) = (exp x) :+ (x’ * exp x)

log (x :+ x’) = (log x) :+ (x’ * recip x)

sin (x :+ x’) = (sin x) :+ (x’ * cos x)

cos (x :+ x’) = (cos x) :+ (x’ * (negate $ sin x))

sinh (x :+ x’) = (sinh x) :+ (x’ * cosh x)

cosh (x :+ x’) = (cosh x) :+ (x’ * sinh x)

asin (x :+ x’) = (asin x) :+ (x’ / (sqrt $ 1-x^2))

acos (x :+ x’) = (acos x) :+ (x’ / (negate (sqrt $ 1-x^2)))

atan (x :+ x’) = (atan x) :+ (x’ / (x^2+1))

asinh (x :+ x’) = (asinh x) :+ (x’ / (sqrt $ x^2+1))

acosh (x :+ x’) = (acosh x) :+ (x’ / (negate (sqrt $ x^2-1)))

atanh (x :+ x’) = (atanh x) :+ (x’ / (1 – x^2))

And now let’s try a sample GHCI session:

Derivatives of x^2 for x in {0, 0.5, …, 5}:

*Diff> map (diffPart . (^2) . fromRational) [0,0.5..5]

[0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0]

Derivatives of cos^2 x + sin^2 x (should be identically zero):

*Diff> map (diffPart . (\x -> (cos x)^2 + (sin x)^2) . fromRational) [0..10]

[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]

So there we go, AD in half a page of code.

## 14 Comments

That was a pretty cool first post. I am curious about how you did the nice mathematical typesetting. It looked quite nice on Firefox running on OS X.

Drc -

Thanks. I originally wrote the post in LaTeX, so I just used mathbin to render out the salient parts, and inserted them as images.

Vlad,

Thanks for the info. I was unaware of mathbin. I never cease to be amazed at the interesting things available on the web.

drc

Nice exposition.

Last week I (with Jeff Siskind) had a paper at POPL (“Lazy Multivariate Higher-Order Forward-Mode AD”) which generalizes ordinary dual numbers to the higher order (ie keeping an entire power series, computed lazily) multivariate case. The bookkeeping is fun, but one interesting issue is that you can’t define it in pure Haskell because you can’t keep the various distinct perturbations straight!

http://www.bcl.hamilton.ie/~qobi/tower/

Barak -

Very cool. I briefly experimented with the multivariate case, with the idea of applying it to some computational geometry and computational mechanics algorithms that traditionally require computing derivatives numerically. Didn’t get very far due to lack of time, but this definitely makes me want to revisit the whole idea.

Your code has trouble with constants:

*Diff> map (diffPart . (2*) . fromRational) [0..5]

[2.0,3.0,4.0,5.0,6.0,7.0]

One way to fix this is to change fromInteger and fromRational to use :+ 0 instead of :+ 1, and add a function for variable values:

var :: Num a => a -> Diffable a

var x = x :+ 1

Then you can do:

*Diff> map (diffPart . (2*) . var ) [0..5] [2.0,2.0,2.0,2.0,2.0,2.0]

This blog is missing a comment preview,

and my comment above is missing a line break.

Of course the list of 2.0s is the result.

You are missing negate in the Num class:

negate (x :+ x’) = (negate x) :+ (negate x’)

If you make the class constrains equal to the class you are defining you can get multiple derivitives.

instance (Eq a) => Eq (Diffable a) where

…

instance (Num a) => Num (Diffable a) where

…

(etc)

Now using ChristianS’s definition of var you get:

Diff> diffPart (diffPart (cos (var (var 0))))

-1.0

To do this I redefined abs as:

abs (x :+ x’) = if x==0 then (0 :+ (error “abs not differentiable at 0″))

else (abs x) :+ ((signum x)*x’)

You can do similar thing with the PowerSeries data type of NumericPrelude:

http://darcs.haskell.org/numericprelude/src/MathObj/PowerSeries.hs

There d^2 is not simply zero, but all derivatives are maintained.

Sweet blog. I never know what I am going to come across next. I think you should do more posting as you have some pretty intelligent stuff to say.

I’ll be watching you . :)

I loved as much as you will receive carried out right here.

The sketch is tasteful, your authored material stylish. nonetheless, you command get

bought an nervousness over that you wish be delivering the following.

unwell unquestionably come further formerly

again since exactly the same nearly very often inside case you shield this hike.

Today, I went to the beachfront with my kids. I found a sea

shell and gave it to my 4 year old daughter and said “You can hear the ocean if you put this to your ear.” She put the shell to her ear and screamed.

There was a hermit crab inside and it pinched her ear. She never wants

to go back! LoL I know this is entirely off topic but I had to tell someone!

Wow, fantastic blog layout! How lengthy have

you ever been blogging for? you made running a blog look easy.

The whole look of your web site is excellent, let alone the content material!

## 2 Trackbacks

[...] real) has to be a unique element of Also, we get powers of infinitesimals in this way; unlike automatic differentiation, [...]

[...] For some perspectives on the mathematical structuure under AD, see sigfpe’s AD post, and Non-standard analysis, automatic differentiation, Haskell, and other stories. [...]