Interview with Ray Phan: MATLAB & Image Processing Expert
Codementor Raymond Phan is an expert on image processing and received Scholarship for his work from the Canadian Government.
Ray developed a technology that sped up 2-D to 3-D footage conversion and thus made Hollywood quite happy indeed, as his work cut down the conversion time for a 20 minute footage from a month to a week.
Tell us a bit about yourself.
I’m an academic at heart. I recently graduated with my PhD degree last October (2013) in Electrical and Computer Engineering from Ryerson University in Toronto, Ontario, Canada. My interests lie in the area of multimedia signal processing (image & video), computer vision, machine learning and numerical methods. I graduated with a Bachelors in Computer Engineering in 2006 and a Masters in Electrical and Computer Engineering in 2008 also from Ryerson. I did 11 years of school at the same university! My academic career has been pretty exciting. I received many prestigious honours, including the Ryerson Gold Medal, which is the most prestigious award a student can receive. It highlights the contributions made to the university through research, extra-curricular activities and other things that benefitted Ryerson. I was also a Vanier Graduate Scholar, which is the most prestigious Ph.D. scholarship to be received in Canada. It’s almost like the Fulbright scholarship based in the US, but this is specifically for Canadian Ph.D. students to study in Canada.
I wanted to go the academic route because I originally wanted to pursue a career in academia. I love doing research and also found a passion for teaching. I served as a part-time instructor in my department for about 5 years, teaching in a variety of areas from communication and control systems, to first year calculus and linear algebra, to applied mathematics courses (image processing, numerical analysis, etc.). However, at the end when I graduated, I was in research for such a long time that I figured it was time to try something else. Instead of going the academic route, I decided to go into industry to get better exposure of the real world and if I ever decide to go back into academia, I’ll have a better perspective of what’s going on out there (or here actually) to further improve my teaching.
I currently work for Bublcam, a company that is creating the first ever commercially available 360 degree spherical capturing camera that is roughly the size of a baseball. Think of the Google Maps capture truck, but fitting into the palm of your hand. I’m currently the Lead Software Developer in Research and Algorithms, where I’m right in the middle between the hardware capturing content on one side, and on the more software side where the user ultimately sees the content through various user interfaces. My specific task is the research and design of algorithms to help create and improve the conversion of taking content from each camera viewpoint, and dynamically stitching them in a seamless way that makes the content appear to be an immersive 360 degree experience. I also do other areas of algorithm development that isn’t part of the core stitching methods, but that isn’t important right now.
At the end of the day I’m an avid guitar player, love to watch movies and a huge gamer. I’m also a Coursera.org addict, and I love to learn new things. I have taken probably over 50 courses through that platform. One cool fact is that I learned Python through multiple Coursera courses, and I’m actually using Python right now where I am employed as part of one of our conversion and processing pipelines.
Of all the projects you’re working on, which one are you most excited about?
I would have to say that my work with Bublcam is the most exciting project I’ve had the pleasure in working on. It has been quite a rush because Bublcam is a startup company and so I’m involved with a lot of the core decisions with regards to the design of the product, as well as how the development of the product will be steered. On top of this, this really puts my training to the test, as it uses everything that I’ve learned through research and through my academic training. It also has kept my programming skills in check, and I’ve learned a lot of different technologies, applying them to various coding projects. These include Android development, some statistical analysis in R and image and video processing with Python.
What’s your proudest achievement so far?
I would have to say achieving the Vanier Graduate Scholarship. The Canadian government felt that my PhD research was important enough to benefit the Canadian society and citizens that they awarded me with such a prestigious honour to continue my research and ensuring that I have all of the resources I need to finish my work. My PhD research focused on the efficient conversion of single-view 2D video and artificially creating stereoscopic 3D content as a result. Current methods are either too inaccurate because they’re automatic systems and have no provision for error correction or user input, or they’re accurate but slow because they’re more manual methods and very time consuming.
My work bridged the gap between automated and manual methods by placing the user / human in the loop, but in a non-obstructive and more intuitive way.
With the user input, I used computer vision and signal processing methods as well as developing my own algorithms to make the output as accurate as possible. This project has received a lot of attention, including a spot on the Toronto Life magazine, coverage in the Toronto Star newspaper, as well as various articles in the Ryerson newsletters.
What tools do you use to get things done?
That totally depends on what programming languages and platforms I’m using.
- On Windows, I mostly develop in C++ and MATLAB. In that case, I’ll use Visual Studio with the Visual C++ compiler, and for MATLAB, I’ll just use the IDE and debugging tools that are accompanied with MATLAB.
- On Mac OS, I usually develop Android applications, with some smatterings of Python, R and MATLAB. For Android applications, I just use Eclipse with the Android Debugging Tools embedded in. For Python, I use IPython to do some interactive computing, as well as NumPy and SciPy for heavily intensive numerical methods and mathematically related tasks. For image processing and computer vision tasks, I’ll use Python OpenCV, as well as MATLAB when I feel that it’s more suitable to use that environment. As for IDEs, on Mac OS I’ll use Sublime Text as it has a lot of great features to help creating code quickly. I use Sublime for all development, except for Android where I’ll just use Eclipse.
- On Linux, I predominantly code in Python and C, and so I’ll use IPython with NumPy/SciPy and the GCC Compiler for C. For Linux, I go to the classic Vi, or if I’m coding late at night and want some colour, I’ll use Vim. If I want to demonstrate something, I’ll use emacs, though some Linux purists may be cringing at this point in time
What’s your favorite hack or what are some of the tricks you use to make your life easier?
One of the best things I’ve learned in my programming career is bitshifting. One of the best ways to multiply or divide a number by a power of two is to bitshift to the left or right. This requires almost no work in comparison to multiplying or dividing. Bitshifting to the left will multiply your number, and going to the right will divide your number. With this, my favourite hack would be converting a colour pixel to grayscale with bitshifting. Let’s say you had an 8-bit colour pixel, and so it’s in integer format with the values range from
[0,255]. A colour pixel is usually represented using red, green and blue. Each of these is 8 bits, and a combination of red, green and blue would give you a colour pixel visualized on the screen. Using the SMPTE standard, converting from colour to its grayscale (black and white) equivalent is:
out = 0.299*red + 0.587*green + 0.114*blue;
You would have to convert your colours to ensure that they’re of a floating point type, and then you’d have to convert the output grayscale value so that it’s back to unsigned 8-bit integer. To get around this, you can definitely stick with integer arithmetic and bitshifting to get an approximate but very accurate answer. You can do the above statement by:
out = (77*red + 150*green + 29*blue) >> 8
>> 8 bit shifts a number to the right by 8 bits, which is equivalent to 256. Also, 77/256 is about 0.299, 150/256 is about 0.587 and 29/256 is about 0.114. You may get an inaccuracy of a couple of grayscales off what the true value is, but perceptually you won’t be able to notice the difference. The above operation is much cheaper in comparison to doing floating point multiplication, and on top of this, you don’t need to convert back and forth between integer and double!
What do you wish someone had told you when you first started coding?
Coding is definitely not for everyone, and it’s one of those things that take patience to learn and dedication. Start learning how to code on your own first, before you start learning from someone else – whether it be an instructor, a professor or even a friend. The reason why is because when you learn on your own, you get to learn things on your own time and in a way that you are able to understand. By learning off of someone else, you are bound by their best practices which may or may not be the best thing for you. When you learn on your own, you’re able to develop your own sense of style and though this is still a subject of debate, coding is great art form that shouldn’t be taken lightly.
Also, it is never too late to start learning how to code. It’s one of the most rewarding skills you could ever learn as the breadth and depth of programming languages as well as the various accepted techniques that are put into practice constantly change, so it keeps you learning and keeps you up to date.
What’s your approach to helping someone with a coding problem?
I ask them what they generally want to accomplish first or the task required to code up and solve. Once I figure out what they want, I ask them a bunch of questions, ranging from what they have tried, or if they have tried these libraries or these technologies, or if they’re familiar with this algorithm and so on. Once we figure this out, it then depends on what the person has already accomplished. If it’s a debugging problem, I’d walk through the code with them line-by-line until we found the source of the error. I’d tell them why the error is being caused, then write a quick solution.
What I will also do is let the person complete the blanks. I’ll do some of the work, but then I want the other person to learn too, so I’ll ask them questions and see if they know the answer. That way, this will enforce what they’re learning in addition to helping them get their problem solved!
When people ask questions on StackOverflow, the patterns I see for questions fall into either one of two categories:
- Debugging questions: Those who have written code and can’t figure out why they are not seeing the right results, or why the keep seeing a particular error.
- Algorithmic questions: Those who have a particular goal in mind and have some initial code set up, but don’t quite know how to proceed in completing a task, or finishing up coding an algorithm.
When it comes to debugging questions, I always locate the line that the error is encountered in (if they haven’t noticed already). Once I do that, I point them to the right docs on the functions that I will be using to help solve their problem so they can look it up in further detail, and I then write code that fixes their problem. After this, I give some general advice on what to do next, or if they want to extend the code into something more complicated.
When it comes to algorithmic questions, I give a brief overview of what I’m doing to help solve their problem, and then break up each part of the algorithm into steps. For each step, I show a snippet of code and any intermediate results that come up. I also give them reasons why I decided to make certain design choices, and why this approach would be better than other approaches. For any functions that are rather complicated, I give an explanation of what’s going on so they can understand what’s happening if they decide to use the code. Once this is all done, I then put the entire algorithm in a single code block so the question poser can copy and paste the code into their IDE / environment and run it so that they can reproduce the results on their own end.
The best way to learn how to code is to learn those concepts that are new to you, then code up problems that are related to those concepts that interest you.
Figure out the right algorithms or research on how to do accomplish the task, then write some code. Also, to become good at coding is to keep coding. The more you code, the better you’ll be at it! I also find it very informative to read up on how people code up solutions. You get to see their sense of styles and you get to learn all of their neat tricks and tips that you could ultimately integrate into your code. Bitshifting was one of the things I learned on my own, and when reading other people’s code too!
What brought you to Codementor?
I was answering so many StackOverflow questions that some of the community encouraged me to help solve coding problems in a more serious way…. and it doesn’t hurt to get some extra cash on the side too! Several of them pointed me to Codementor. I took a look at the website, loved the concept and the fact that you could get real-time 1-on-1 help for coding problems and be able to interact with the person helping you. I wish this platform was around when I was starting to learn how to code! Fully endorsing what Codementor stands for, I decided to sign up and be a mentor. I’ve had some sessions so far, and it makes me really happy to help people solve their problems. This is as close as I can get to teaching, but I’m able to do it on my schedule and without any other academic obligations. I will definitely stick around and be a Codementor to help others!
Codementor is your instant 1:1 expert mentor helping you in real time.