Non-standard analysis, automatic differentiation, Haskell, and other stories.

Having recently come across a method for automatic differentiation on sigfpe’s cornucopia of amazingly cool stuff masquerading as a blog, I decided to start playing around with it a little to see what might come out.

So, let’s say we take the standard definition of the derivative,
look at it for a bit, and decide that we don’t like the limit symbol, and that, in fact, we’re going to drop it entirely. After rearranging, we would then obtain the weird-looking f(x) + d f'(x) = f(x+d), and, presumably, set out to find what d could possibly look like.

To this end, we might expand f(x+d) about x, which yields


and, after subtracting f(x) + d f'(x) from both sides, degenerates into


It appears that d^n should be zero for all n>1. To see this, we can plug in f(x) = e^x. The coefficients of d^n become constants, and, dividing both sides by e^x, we obtain d^2/2! + d^3/3! + … = 0, which is the MacLaurin series for e^d with the first two terms missing, so e^d=d+1. Ignoring, momentarily, the troublesome fact that this gives d=0 in the reals, we certainly at least have d^2=0 (pardon the handwaving).

But in order for f(x) + d f'(x) = f(x+d) to be a remotely interesting statement, we must also have d != 0, and we want d to be unique. Given some structure whose objects we’d care to differentiate, we’re going to cheat a little, and extend it with the element d such that d != 0, but d^2 = 0. Having allowed such a number, all the weirdness disappears, the equality f(x+d) = f(x) + d f'(x) holds, and we’re left with a sort of infinitesimal constant which we’re free to plug into random things.

To test whether this makes any sense, we might start by taking the quadratic Q(x) = c_2 x^2 + c_1 x + c_0, and computing Q(x+d):


The derivative of Q(x) “fell out” into the coefficient of d. How about e^{x+d}?


Anything else? Let’s see. By the binomial theorem,

The d^{n-k} factor vanishes whenever n >= k+2, so

and we’ve just obtained the power law.

Trigonometric functions:


Using the power law experiment from earlier, we get

Plotting coefficients of d along the vertical axis, and the reals along the horizontal, we get the unit circle, while exponential maps are lines through the origin, which is kind of cool in and out of itself.



One useful thing here is that d is small enough to be in any radius of convergence, so we get logarithms “for free”:


So, what’s the point? The point is that we automatically obtain the derivative as a side effect of any computation, since f(x+d) = f(x) + d f'(x). In other words, by switching to dual numbers (numbers of the form x+y d) for things that need to be differentiated (and making it transparent through operator overloading), we can ask for the derivative of any differentiable function we’ve ever defined, and it’ll be evaluated symbolically.

To that end, here’s the first approximation in Haskell.

module Diff where

data Diffable a = !a :+ a

funcPart :: Diffable a -> a
funcPart (x :+ x’) = x

diffPart :: Diffable a -> a
diffPart (x :+ x’) = x’

instance (RealFloat a) => Eq (Diffable a) where
  (x :+ x’) == (y :+ y’) = (x == y) && (x’ == y’)

instance (RealFloat a) => Show (Diffable a) where
  show (x :+ x’) = show (x, x’)

instance (RealFloat a) => Num (Diffable a) where
  (x :+ x’) + (y :+ y’) = (x + y) :+ (x’ + y’)
  (x :+ x’) * (y :+ y’) = (x * y) :+ (x’ * y + y’ * x)
  abs (x :+ x’) = if x < 0 then ((-x) :+ (-x’)) else (x :+ x’)
  signum (x :+ x’) = (signum x) :+ 0
  fromInteger x = (fromInteger x) :+ 1

instance (RealFloat a) => Fractional (Diffable a) where
  fromRational x = (fromRational x) :+ 1
  recip (x :+ x’) = (recip x) :+ (negate x’ / (x^2))

instance (RealFloat a) => Floating (Diffable a) where
  pi = pi :+ 0
  exp (x :+ x’) = (exp x) :+ (x’ * exp x)
  log (x :+ x’) = (log x) :+ (x’ * recip x)
  sin (x :+ x’) = (sin x) :+ (x’ * cos x)
  cos (x :+ x’) = (cos x) :+ (x’ * (negate $ sin x))
  sinh (x :+ x’) = (sinh x) :+ (x’ * cosh x)
  cosh (x :+ x’) = (cosh x) :+ (x’ * sinh x)
  asin (x :+ x’) = (asin x) :+ (x’ / (sqrt $ 1-x^2))
  acos (x :+ x’) = (acos x) :+ (x’ / (negate (sqrt $ 1-x^2)))
  atan (x :+ x’) = (atan x) :+ (x’ / (x^2+1))
  asinh (x :+ x’) = (asinh x) :+ (x’ / (sqrt $ x^2+1))
  acosh (x :+ x’) = (acosh x) :+ (x’ / (negate (sqrt $ x^2-1)))
  atanh (x :+ x’) = (atanh x) :+ (x’ / (1 – x^2))

And now let’s try a sample GHCI session:

Derivatives of x^2 for x in {0, 0.5, …, 5}:

*Diff> map (diffPart . (^2) . fromRational) [0,0.5..5]

Derivatives of cos^2 x + sin^2 x (should be identically zero):

*Diff> map (diffPart . (\x -> (cos x)^2 + (sin x)^2) . fromRational) [0..10]

So there we go, AD in half a page of code.



  1. drc
    Posted December 12, 2006 at 9:54 pm | Permalink

    That was a pretty cool first post. I am curious about how you did the nice mathematical typesetting. It looked quite nice on Firefox running on OS X.

  2. Posted December 13, 2006 at 4:20 am | Permalink

    Drc –

    Thanks. I originally wrote the post in LaTeX, so I just used mathbin to render out the salient parts, and inserted them as images.

  3. drc
    Posted December 13, 2006 at 4:29 pm | Permalink


    Thanks for the info. I was unaware of mathbin. I never cease to be amazed at the interesting things available on the web.


  4. Posted January 24, 2007 at 11:52 am | Permalink

    Nice exposition.

    Last week I (with Jeff Siskind) had a paper at POPL (“Lazy Multivariate Higher-Order Forward-Mode AD”) which generalizes ordinary dual numbers to the higher order (ie keeping an entire power series, computed lazily) multivariate case. The bookkeeping is fun, but one interesting issue is that you can’t define it in pure Haskell because you can’t keep the various distinct perturbations straight!

  5. Posted January 31, 2007 at 5:31 pm | Permalink

    Barak –

    Very cool. I briefly experimented with the multivariate case, with the idea of applying it to some computational geometry and computational mechanics algorithms that traditionally require computing derivatives numerically. Didn’t get very far due to lack of time, but this definitely makes me want to revisit the whole idea.

  6. ChristianS
    Posted February 17, 2007 at 11:20 pm | Permalink

    Your code has trouble with constants:

    *Diff> map (diffPart . (2*) . fromRational) [0..5]

    One way to fix this is to change fromInteger and fromRational to use :+ 0 instead of :+ 1, and add a function for variable values:

    var :: Num a => a -> Diffable a
    var x = x :+ 1

    Then you can do:

    *Diff> map (diffPart . (2*) . var ) [0..5] [2.0,2.0,2.0,2.0,2.0,2.0]

  7. ChristianS
    Posted February 17, 2007 at 11:25 pm | Permalink

    This blog is missing a comment preview,
    and my comment above is missing a line break.
    Of course the list of 2.0s is the result.

  8. Russell O'Conor
    Posted April 9, 2007 at 9:33 am | Permalink

    You are missing negate in the Num class:

    negate (x :+ x’) = (negate x) :+ (negate x’)

  9. Russell O'Conor
    Posted April 9, 2007 at 9:37 am | Permalink

    If you make the class constrains equal to the class you are defining you can get multiple derivitives.

    instance (Eq a) => Eq (Diffable a) where

    instance (Num a) => Num (Diffable a) where


    Now using ChristianS’s definition of var you get:

    Diff> diffPart (diffPart (cos (var (var 0))))

    To do this I redefined abs as:
    abs (x :+ x’) = if x==0 then (0 :+ (error “abs not differentiable at 0”))
    else (abs x) :+ ((signum x)*x’)

  10. Henning Thielemann
    Posted June 20, 2007 at 6:43 am | Permalink

    You can do similar thing with the PowerSeries data type of NumericPrelude:

    There d^2 is not simply zero, but all derivatives are maintained.

  11. Posted June 3, 2009 at 11:36 pm | Permalink

    Sweet blog. I never know what I am going to come across next. I think you should do more posting as you have some pretty intelligent stuff to say.

    I’ll be watching you . :)

  12. Posted July 16, 2013 at 1:46 am | Permalink

    I loved as much as you will receive carried out right here.
    The sketch is tasteful, your authored material stylish. nonetheless, you command get
    bought an nervousness over that you wish be delivering the following.
    unwell unquestionably come further formerly
    again since exactly the same nearly very often inside case you shield this hike.

  13. Posted September 23, 2013 at 2:33 pm | Permalink

    Today, I went to the beachfront with my kids. I found a sea
    shell and gave it to my 4 year old daughter and said “You can hear the ocean if you put this to your ear.” She put the shell to her ear and screamed.
    There was a hermit crab inside and it pinched her ear. She never wants
    to go back! LoL I know this is entirely off topic but I had to tell someone!

  14. Posted March 23, 2014 at 10:44 am | Permalink

    Wow, fantastic blog layout! How lengthy have
    you ever been blogging for? you made running a blog look easy.

    The whole look of your web site is excellent, let alone the content material!

2 Trackbacks

  1. By Playing with infinitesimals « Mort aux Triangles! on October 19, 2007 at 7:33 am

    […] real) has to be a unique element of Also, we get powers of infinitesimals in this way; unlike automatic differentiation, […]

  2. […] For some perspectives on the mathematical structuure under AD, see sigfpe’s AD post, and Non-standard analysis, automatic differentiation, Haskell, and other stories. […]

Post a Comment

Required fields are marked *


%d bloggers like this: