Machine Learning Data Scientist at Team Lead level

Julia vs R vs Python: simple optimization

Published Feb 13, 2018Last updated Apr 05, 2019

2019-04-05 Update: the previous version of this post had some serious concerns about the compilation latency issue with the Julia Optim.jl package. The issue has been resolved and the Julia package is actually quite performant.

In this post, I will try to compare and contrast Julia, R, and Python via a simple maximum likelihood optimization problem which is motivated by a problem from the credit risk domain and is discussed in more detail in this post.

Side note - the author's experience level at the time of writing

Language	Level of experience
R	9 years
Julia	6 months
Python	beginner

TL;DR

For such a simple optimization problem, R, Julia, and Python/SciPy will all do a competent job, so there is no clear winner. However, Julia has an edge as it's got the best output and has the best ways to dealing with truncated distributions

Language	Pros	Cons
Julia	Easy to use; Wonderful support for truncated distributions; Easy to understand output	Sub 1s wait for the first run due to compilation latency
R	Easy to use	Model output not as readable as Julia's
Python	Easy to use	Model output not as readable as Julia's

The optimization problem

Given observations $Q_1,\, Q_2,\, ...,\, Q_n$ , we aim to find paramters $\mu$ and $\sigma$ that optimize this likelihood function

$L = \prod\left(\frac{\phi(Q_i,\mu,\sigma)}{\Phi(\max Q_t,\mu,\sigma)}\right)$

often we try to optimize the log-likelihood instead

$\log L = l = \left(\sum_i \log\phi(Q_i,\mu,\sigma)\right) - n\log\Phi(\max Q_t,\mu,\sigma)$

In statistical-speak, this is a maximum likelihood estimation (MLE) for truncated normal distributions.

side note - a more efficient numerical solution exists

This Ergashev et al., 2016 paper (behind a paywall) derived a necessary condition for when a solution exists and derived an equation in one parameter which when solved (using simple uniroot techniques) can recover both

\mu

and

\sigma

. From my testing, applying Ergashev's formula yields about 50x speed up to the R solution. However, in this blogpost, I aim to compare and contrast the optimization function in Julia vs. R vs. Python and hence I have chosen not to implement Ergashev's methods.

Julia solution

Below is my Julia implementation using Optim.jl In Julia, one can use symbols in variable names, so I have used $\mu\sigma$ as a variable name. Julia also has a popular package called JuMP.jl for optimization problems. However, the Optim.jl package is more than adequate for such a simple problem, and I will only look at JuMP.jl in the future when dealing with more advanced optimization problems.

I have noted the time it takes to perform the first run of the optimization problem is sub 1 second. This is significant improvement on 7.5 seconds in an earlier version of this post!

side note - removing compilation latency
If you want your `using Optim` to have no latency at all and for the `optim` call to be efficient, you can try `PackageCompiler.jl` to precompile the Optim.jl package using `using PackageCompiler; compile_incremental(:Optim, force = false)`. Once complete, there should be instructions in the REPL on how to use the compiled image for super speedy `using Optim` and `optim` calls

In the below, I have hard-coded the values of Q_t that I would like to use in the MLE estimation.

using Distributions, Optim

# hard-coded data\observations
odr=[0.10,0.20,0.15,0.22,0.15,0.10,0.08,0.09,0.12]
Q_t = quantile.(Normal(0,1), odr)

# return a function that accepts `[mu, sigma]` as parameter
function neglik_tn(Q_t)
    maxx = maximum(Q_t)
    f(μσ) = -sum(logpdf.(Truncated(Normal(μσ[1],μσ[2]), -Inf, maxx), Q_t))
    f
end

neglikfn = neglik_tn(Q_t)

# optimize!
# start searching 
@time res = optimize(neglikfn, [mean(Q_t), std(Q_t)]) # 0.8 seconds
@time res = optimize(neglikfn, [mean(Q_t), std(Q_t)]) # 0.000137 seconds

# the \mu and \sigma estimates
Optim.minimizer(res) # [-1.0733250637041452,0.2537450497038758] # or

# use `fieldnames(res)` to see the list of field names that can be referenced via . (dot)
res.minimizer # [-1.0733250637041452,0.2537450497038758]

which gave me the following output which is by far the most descriptive out of the Julia/R/Python-verse.

Results of Optimization Algorithm
 * Algorithm: Nelder-Mead
 * Starting Point: [-1.1300664159893685,0.22269345618402703]
 * Minimizer: [-1.0733250637041452,0.2537450497038758]
 * Minimum: -1.893080e+00
 * Iterations: 28
 * Convergence: true
   *  √(Σ(yᵢ-ȳ)²)/n < 1.0e-08: true
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 59

What was nice about Julia?

Rating	Description
	`Truncated(DN, lower, upper)` is a wonderfully simple way to define truncated distribution
	A `logpdf` function that works for any distribution
	Very clear and informative output

What was not so nice about Julia?

Rating	Description
Acceptable	Sub 1 second compilation latency

R solution

R has a truncnorm package for dealing with truncated normals.

odr=c(0.10,0.20,0.15,0.22,0.15,0.10,0.08,0.09,0.12)
x = qnorm(odr)

library(truncnorm)
neglik_tn = function(x) {
  maxx = max(x)
  resfn = function(musigma) {
    -sum(log(dtruncnorm(x, a = -Inf, b= maxx, musigma[1], musigma[2])))
  }
  
  resfn
}

neglikfn = neglik_tn(x)

system.time(res <- optim(c(mean(x), sd(x)), neglikfn))
res

which gave the output

$par
[1] -1.0733354  0.2537339

$value
[1] -1.89308

$counts
function gradient 
      55       NA 

$convergence
[1] 0

$message
NULL

What was nice about R?

Rating	Description
	Has a package for truncated normal

What was not so nice about R?

Rating	Description
	A couple of minor things: no log density for truncated normal; and no easy way to define truncated distribution for arbitrary distributions; sparse output (print)

Python solution

Even though I have no experience with Python, simple Google searches allowed me to come up with this solution. I have used the Anaconda distribution which saved me a lot of hassle in terms installing packages, as all the packages I need are pre-installed.

import numpy as np
from scipy.optimize import minimize
from scipy.stats import norm

# generate the data
odr=[0.10,0.20,0.15,0.22,0.15,0.10,0.08,0.09,0.12]
Q_t = norm.ppf(odr)
maxQ_t = max(Q_t)

# define a function that will return a return to optimize based on the input data
def neglik_trunc_tn(Q_t):
    maxQ_t = max(Q_t)
    def neglik_trunc_fn(musigma):
        return -sum(norm.logpdf(Q_t, musigma[0], musigma[1])) + len(Q_t)*norm.logcdf(maxQ_t, musigma[0], musigma[1])
    return neglik_trunc_fn

# the likelihood function to optimize
neglik = neglik_trunc_tn(Q_t)

# optimize!
res = minimize(neglik, [np.mean(Q_t), np.std(Q_t)])
res

and these are the results

      fun: -1.8930804441641282
 hess_inv: array([[ 0.01759589,  0.00818596],
       [ 0.00818596,  0.00937868]])
      jac: array([ -3.87430191e-07,   3.33786011e-06])
  message: 'Optimization terminated successfully.'
     nfev: 40
      nit: 6
     njev: 10
   status: 0
  success: True
        x: array([-1.07334252,  0.25373624])

What was nice about Python?

Rating	Description
	Easy to google even for beginners

What was not so nice about Python?

Rating	Description
	The output could be nicer, but a minor point

Conclusion

For such a simple optimization problem, you can't go wrong choosing any of the three languages/ecosystems. All three languages/ecosystems allow you to define closures which I find to be a good way to generate functions that embed the data that it needs, leaving you with just a function with the distribution parameters as the only arguments. For me, Julia is clearly the better choice although not by a wide margin.

R Julia Python Scipy Data Science

Report

Enjoy this post? Give ZJ a like if it's helpful.

Machine Learning Data Scientist at Team Lead level

I am a Machine Learning developer and Data scientist Team Lead. I am use R, Python, Julia, and Scala daily and have extensive experience with machine learning both at a user-level and at a developer level. I develop a number of p...

Discover and read more posts from ZJ

get started

3Replies

Evgeniy Pogrebnyak

7 years ago

I’m so glad to see Julia precompilation issue surfacing, something not really much advertised about Julia. I also linked your post as a reference in my arcticle about max likelihood

6 years ago

Well, I am happy to report that the compilation latency is not a problem any more!

Pietro Grandinetti, PhD

7 years ago

Thanks for sharing!
The example is simple enough to let you solve it manually, but for complex problems you’r probably prefer to use some parser that lets you code mathematical programming (e.g., picos [python], roi [R], yalmip [matlab]). And when you do that, you must rely on robust numeric solvers (e.g., SCIP [free], Cplex, MOSEK, …), that usually are C/C++ pieces of software, called by the parser to solve the actual optimization. Therefore, there isn’t much of a difference neither in term of computation time (as the language you’re using it’s just an interface to the actual solver), nor in term of output (I mean, parsers’ output are pretty much all the same).
Cheers