Machine Learning Data Scientist at Team Lead level

Julia tips for R useRs

Published Sep 21, 2017Last updated Jul 16, 2020

Tip 1 - You can use R code in Julia

Use the RCall package to run R code from within Julia like this easily

Pkg.add("RCall") # this is the equivalent of install.packages
using RCall
R"var1 <- 1+2" # note the capitalised R, r"" is used for regex

@rget var1 # now var1 contains the value 3

For multiline R code simply wrap code in R""" """ and you can push Julia variables into R using the @rput macro, see example below

using RCall

julia_x = 1:1_000_000

# this pushes the julia_x range and converts it into a vector in R
@rput julia_x 
R"""
y <- rnorm(10^6) 
var1 <- julia_x + y
"""

@rget var1

Tip 2 - Delete the last few elements of a vector/array using end

In R, to exclude the last few elements of a vector one would use code like this

l <- length(x)
x1 <- x[-((l-3):l)]

the above removes the last 3 elements from x and assigns it to x1. In Julia one can use the end keyword to achieve the same result

x1 = x[1:end-3]

Note there is also no need for parentheses on either side of : in Julia.

Tip 3 - Be patient with installation and first use of packages

In R using install.packages is usually fairly fast and once you have run library(some_package) then you can use the functions in the package right away.

Installing a package in Julia is not that fast yet, and it's been acknowledged as an issue (video link). Also, the first time you use a package via using Package1 Julia goes through a compilation step which can seem to take awhile if you are used to the convenience of R's library-and-go. But Julia's compilation step is designed to make subsequent usage faster.

Tip 4 - Get "vectorized" code with `.` function suffix

In Julia, you can vectorize a function call to an element into a function call to an array of elements by the . (dot) function suffix. This is called broadcasting in Julia. For example

# call f on x
f(x)

# call f on each element of the array
f.([x1, x2, x3])

It's important to note that performance is not the main reason for vectorization of function calls because sometimes a for-loop is faster in Julia. It is available as a language feature that makes it easy to run a function over a vector/array of elements.

In R we are taught to use vectorized versions of our code for performance. For example instead of

x <- round(abs(rnorm(1000))*100000)

# pre-allocate a vector to store the results
res <- vector(mode = "numeric", length = length(x)) 
# using a for loop is not recommended as it is not vectorized
pt <- proc.time()
for(i in 1:length(x)) {
  res[i] <- mean(runif(x[i]))
}
# about ~3 seconds
data.table::timetaken(pt)

one should use the vectorized version using sapply

# recommended vectorized approach
pt <- proc.time()
res1 <- sapply(x, function(xx) {
  mean(rnorm(x))
})
# about ~1 second
data.table::timetaken(pt)

on my laptop the sapply approach is about 2~3 times faster.

The equivalent in Julia is

x = convert(Array{Integer},round(abs(randn(1000))*100000))

function meanunif(xx)
  mean(rand(xx))
end

# for-loop Julia code
function meanunif_v(x) 
  res = convert(Array{Real},1:length(x))
  for (i,xx) in enumerate(x)
    res[i] = meanunif(xx)
  end
end
@time resA = meanunif_v(x)

the "vectorized" version

# "vectorized" Julia code
@time resB = meanunif.(x)

I will let you figure out the speed of Julia yourself.

Tip 5 - `%>%` has Julia equivalents

In R the excellent magrittr package provides the pipe operator %>% so one can chain functions together for readability; one may also use the %>>% operator from the pipeR package which has some advantages.
Julia has a similar functionality via the |> operator which is less powerful as it can only deal with single argument functions by default.

x = [0,1,2,5,3,1.5, 0, 0, -1]
x |> diff .|> sign |> diff |> abs |> x -> isequal.(x,2) |> sum |> x -> isequal(x,1)

The .|> means to vectorize |> i.e. to apply the succeeding function to every element of the input that preceeds it.

the equivalent in R is a bit more concise

library(magrittr)
# a function to detect if there is one and only one turning point in the values
x <- c(0,1,2,5,3,1.5, 0, 0, -1)
x %>% diff %>% sign %>% diff %>% abs %>% equals(2) %>% sum %>% equals(1)

Also, there is a multitude of work already on implementing a better piping facilities in Julia. One notable example is Lazy.jl which can recode the above as below

Pkg.add("Lazy")
using Lazy
x = [0,1,2,5,3,1.5, 0, 0, -1]
@> x diff sign diff abs xx->isequal.(xx,2) sum isequal(1)

There appears to be a bug(?) that prevents the usage of isequal. which hopefully will be solved soon so the code can become

@> x diff sign diff abs isequal.(2) sum isequal(1)

As a side note, it is impossible for Julia to make |> as powerful as %>% because %>% works like a Julia macro in that it takes code and modifies it. In Julia code-as-is and macros have a clearer distinction in that macros need to be called with a @ prefix (e.g. @macro_name) hence the closest equivalent to %>% in Julia will necessarily be macro-based.

Tip 6 - The `@time` macro is the equivalent of `system.time`

You can use @time to gauge how fast a piece of code can run similar to how system.time works in R

@time begin
    randn(1000000) + randn(1000000)
end

function addxy(x,y)
    x .+ y
end

@time addxy(randn(1000000), randn(1000000))

For more advanced benchmarking, you are advised to use BenchmarkTools (next tip) instead of using the @time macro.

Tip 7 - Use `@benchmark` from BenchmarkTools.jl for benchmarking

function multxy(x,y)
    x .* y
end

# Pkg.add("BenchmarkTools") #only needs to be run once
using BenchmarkTools
@benchmark multxy(randn(1000000), randn(1000000))

# BenchmarkTools.Trial: 
#   memory estimate:  762.94 MiB
#   allocs estimate:  26
#   --------------
#   minimum time:     472.021 ms (0.77% GC)
#   median time:      501.675 ms (0.75% GC)
#   mean time:        519.823 ms (4.92% GC)
#   maximum time:     629.391 ms (16.82% GC)
#   --------------
#   samples:          10
#   evals/sample:     1

Tip 8 - Write large integers using `_` as visual aid

In R, writing large integers requires some mental gymnastics and good eyesight, as you need to count the number zeros carefully e.g.

one_million <- 10000000L

in Julia (like in Ruby) you can use _ as a visual aid so you can write

one_million = 1_000_000

Did you notice I accidentally put one too many 0 in the R code?

Tip 9 - `data.frame` is `DataFrame`

You can load the DataFrames.jl package via using DataFrames. It's Julia's equivalent of data.frame

Tip 10 - How does `a:b` work?

Firstly a:b does not generate a vector automatically; it generates a range. E.g. typeof(8:88) is UnitRange. To convert it to a vector you can use collect(a:b).

You also do not need parentheses on either side of : e.g. 1+1:2+2 will give 2:4 whereas in R it will give c(4,5).

Also 5:2 works in R but in Julia you need to explicitly say 5:-1:2.

Tip 11 - There is a built-in hash-table structure

In R the best way to build a hash table structure is to rely on environments, as other native structures are not hash-based and do not offer O(1) indexing. Julia like Python provides a Dict type that is essentially a hash-table. You can construct one based on a key-value pair declared as :key => value. E.g.

Dict(:item1 => 1, :item2 => 2)

will declare a Dict with :item1 and :item2 as keys and 1 and 2 as values.

Tip 12 - List comprehension depends on the iterable interface

Julia has borrowed from Python's list comprehension syntax and in essence, has R's itertools built-in. In R a for loop can "comprehend" vectors and lists, e.g. in the below a will take the values item1 and item2 inside the loop body

for(a in list(item1,item2)) {
  doSomething(a)
}

In Julia

for a in any_instance_of_iterable_type
    doSomething(a)
end

where any_instance_of_iterable_type can be any type that defined the required iterate method. Specifically, vectors, range, and Dict are all iterables.

One thing to note is that Julia, unlike other languages like Go or Java, does not have a formal way to declare and "enforce" an interface. So there is no guarantee that any_instance_of_iterable_type has all the iterable methods implemented, also the moment you implement iterate for your type, it becomes available to for loop comprehension. It's a bit lax but flexible.

Tip 13: `NULL` is `nothing` and `NA` is `missing`

R's NULL and NA have Julia equivalents in nothing and missing respectively.

R Julia

Report

Enjoy this post? Give ZJ a like if it's helpful.

Machine Learning Data Scientist at Team Lead level

I am a Machine Learning developer and Data scientist Team Lead. I am use R, Python, Julia, and Scala daily and have extensive experience with machine learning both at a user-level and at a developer level. I develop a number of p...

Discover and read more posts from ZJ

get started

Be the first to share your opinion

GitHub flavored markdown supported

submit

Wouter den Hollander

6 years ago

Why not x1 <- x[1:(l-3)] for Tip 2. :D

Show more replies

Julia tips for R useRs

Tip 1 - You can use R code in Julia

Tip 2 - Delete the last few elements of a vector/array using end

Tip 3 - Be patient with installation and first use of packages

Tip 4 - Get "vectorized" code with . function suffix

Tip 5 - %>% has Julia equivalents

Tip 6 - The @time macro is the equivalent of system.time

Tip 7 - Use @benchmark from BenchmarkTools.jl for benchmarking

Tip 8 - Write large integers using _ as visual aid

Tip 9 - data.frame is DataFrame

Tip 10 - How does a:b work?