Julia tips for R useRs
Tip 1 - You can use R code in Julia
Use the RCall package to run R code from within Julia like this easily
Pkg.add("RCall") # this is the equivalent of install.packages
using RCall
R"var1 <- 1+2" # note the capitalised R, r"" is used for regex
@rget var1 # now var1 contains the value 3
For multiline R code simply wrap code in R""" """
and you can push Julia variables into R using the @rput
macro, see example below
using RCall
julia_x = 1:1_000_000
# this pushes the julia_x range and converts it into a vector in R
@rput julia_x
R"""
y <- rnorm(10^6)
var1 <- julia_x + y
"""
@rget var1
Tip 2 - Delete the last few elements of a vector/array using end
In R, to exclude the last few elements of a vector one would use code like this
l <- length(x)
x1 <- x[-((l-3):l)]
the above removes the last 3 elements from x
and assigns it to x1
. In Julia one can use the end
keyword to achieve the same result
x1 = x[1:end-3]
Note there is also no need for parentheses on either side of :
in Julia.
Tip 3 - Be patient with installation and first use of packages
In R using install.packages
is usually fairly fast and once you have run library(some_package)
then you can use the functions in the package right away.
Installing a package in Julia is not that fast yet, and it's been acknowledged as an issue (video link). Also, the first time you use a package via using Package1
Julia goes through a compilation step which can seem to take awhile if you are used to the convenience of R's library-and-go. But Julia's compilation step is designed to make subsequent usage faster.
.
function suffix
Tip 4 - Get "vectorized" code with In Julia, you can vectorize a function call to an element into a function call to an array of elements by the .
(dot) function suffix. This is called broadcasting in Julia. For example
# call f on x
f(x)
# call f on each element of the array
f.([x1, x2, x3])
It's important to note that performance is not the main reason for vectorization of function calls because sometimes a for-loop is faster in Julia. It is available as a language feature that makes it easy to run a function over a vector/array of elements.
In R we are taught to use vectorized versions of our code for performance. For example instead of
x <- round(abs(rnorm(1000))*100000)
# pre-allocate a vector to store the results
res <- vector(mode = "numeric", length = length(x))
# using a for loop is not recommended as it is not vectorized
pt <- proc.time()
for(i in 1:length(x)) {
res[i] <- mean(runif(x[i]))
}
# about ~3 seconds
data.table::timetaken(pt)
one should use the vectorized version using sapply
# recommended vectorized approach
pt <- proc.time()
res1 <- sapply(x, function(xx) {
mean(rnorm(x))
})
# about ~1 second
data.table::timetaken(pt)
on my laptop the sapply
approach is about 2~3 times faster.
The equivalent in Julia is
x = convert(Array{Integer},round(abs(randn(1000))*100000))
function meanunif(xx)
mean(rand(xx))
end
# for-loop Julia code
function meanunif_v(x)
res = convert(Array{Real},1:length(x))
for (i,xx) in enumerate(x)
res[i] = meanunif(xx)
end
end
@time resA = meanunif_v(x)
the "vectorized" version
# "vectorized" Julia code
@time resB = meanunif.(x)
I will let you figure out the speed of Julia yourself.
%>%
has Julia equivalents
Tip 5 - In R the excellent magrittr package provides the pipe operator %>%
so one can chain functions together for readability; one may also use the %>>%
operator from the pipeR package which has some advantages.
Julia has a similar functionality via the |>
operator which is less powerful as it can only deal with single argument functions by default.
x = [0,1,2,5,3,1.5, 0, 0, -1]
x |> diff .|> sign |> diff |> abs |> x -> isequal.(x,2) |> sum |> x -> isequal(x,1)
The .|>
means to vectorize |>
i.e. to apply the succeeding function to every element of the input that preceeds it.
the equivalent in R is a bit more concise
library(magrittr)
# a function to detect if there is one and only one turning point in the values
x <- c(0,1,2,5,3,1.5, 0, 0, -1)
x %>% diff %>% sign %>% diff %>% abs %>% equals(2) %>% sum %>% equals(1)
Also, there is a multitude of work already on implementing a better piping facilities in Julia. One notable example is Lazy.jl which can recode the above as below
Pkg.add("Lazy")
using Lazy
x = [0,1,2,5,3,1.5, 0, 0, -1]
@> x diff sign diff abs xx->isequal.(xx,2) sum isequal(1)
There appears to be a bug(?) that prevents the usage of isequal.
which hopefully will be solved soon so the code can become
@> x diff sign diff abs isequal.(2) sum isequal(1)
As a side note, it is impossible for Julia to make |>
as powerful as %>%
because %>%
works like a Julia macro in that it takes code and modifies it. In Julia code-as-is and macros have a clearer distinction in that macros need to be called with a @
prefix (e.g. @macro_name
) hence the closest equivalent to %>%
in Julia will necessarily be macro-based.
@time
macro is the equivalent of system.time
Tip 6 - The You can use @time
to gauge how fast a piece of code can run similar to how system.time
works in R
@time begin
randn(1000000) + randn(1000000)
end
function addxy(x,y)
x .+ y
end
@time addxy(randn(1000000), randn(1000000))
For more advanced benchmarking, you are advised to use BenchmarkTools (next tip) instead of using the @time
macro.
@benchmark
from BenchmarkTools.jl for benchmarking
Tip 7 - Use function multxy(x,y)
x .* y
end
# Pkg.add("BenchmarkTools") #only needs to be run once
using BenchmarkTools
@benchmark multxy(randn(1000000), randn(1000000))
# BenchmarkTools.Trial:
# memory estimate: 762.94 MiB
# allocs estimate: 26
# --------------
# minimum time: 472.021 ms (0.77% GC)
# median time: 501.675 ms (0.75% GC)
# mean time: 519.823 ms (4.92% GC)
# maximum time: 629.391 ms (16.82% GC)
# --------------
# samples: 10
# evals/sample: 1
_
as visual aid
Tip 8 - Write large integers using In R, writing large integers requires some mental gymnastics and good eyesight, as you need to count the number zeros carefully e.g.
one_million <- 10000000L
in Julia (like in Ruby) you can use _
as a visual aid so you can write
one_million = 1_000_000
Did you notice I accidentally put one too many 0
in the R code?
data.frame
is DataFrame
Tip 9 - You can load the DataFrames.jl package via using DataFrames
. It's Julia's equivalent of data.frame
a:b
work?
Tip 10 - How does Firstly a:b
does not generate a vector automatically; it generates a range. E.g. typeof(8:88)
is UnitRange
. To convert it to a vector you can use collect(a:b)
.
You also do not need parentheses on either side of :
e.g. 1+1:2+2
will give 2:4
whereas in R it will give c(4,5)
.
Also 5:2
works in R but in Julia you need to explicitly say 5:-1:2
.
Tip 11 - There is a built-in hash-table structure
In R the best way to build a hash table structure is to rely on environments
, as other native structures are not hash-based and do not offer O(1)
indexing. Julia like Python provides a Dict
type that is essentially a hash-table. You can construct one based on a key-value pair declared as :key => value
. E.g.
Dict(:item1 => 1, :item2 => 2)
will declare a Dict
with :item1
and :item2
as keys and 1
and 2
as values.
Tip 12 - List comprehension depends on the iterable interface
Julia has borrowed from Python's list comprehension syntax and in essence, has R's itertools
built-in. In R a for loop can "comprehend" vectors and lists, e.g. in the below a
will take the values item1
and item2
inside the loop body
for(a in list(item1,item2)) {
doSomething(a)
}
In Julia
for a in any_instance_of_iterable_type
doSomething(a)
end
where any_instance_of_iterable_type
can be any type that defined the required iterate
method. Specifically, vectors, range, and Dict
are all iterables.
One thing to note is that Julia, unlike other languages like Go or Java, does not have a formal way to declare and "enforce" an interface. So there is no guarantee that any_instance_of_iterable_type
has all the iterable methods implemented, also the moment you implement iterate
for your type, it becomes available to for
loop comprehension. It's a bit lax but flexible.
NULL
is nothing
and NA
is missing
Tip 13: R's NULL
and NA
have Julia equivalents in nothing
and missing
respectively.
Why not x1 <- x[1:(l-3)] for Tip 2. :D