R parallelism and multithreading
|January 25th, 2016, 03:47 AM||#11 (permalink)|
Futures Experience: Beginner
Broker/Data: TD Ameritrade
Favorite Futures: Stocks
Posts: 838 since Jul 2012
Thanks: 579 given, 1,649 received
The reason R is slow is because the garbage collector in R is backed by circular doubly linked lists that are slow to traverse and likely wipe out your CPU cache each GC cycle. In turn, this GC is triggered very frequently because of excessive use of redundant, temporary objects under the hood that immediately get swept for GC. Some of my colleagues use R extensively so we have a custom R kernel that mitigates this issue, and we found that the performance gain of rewriting the generational GC in R exceeds that of naive parallelism with mapreduce constructs in most cases.
At a time when I used to know very little about R, I felt that it had lacked language constructs suitable for parallelism. Having spent more time reading others' (more experienced than myself) R code, I've realized that it's actually inherently parallel if you are comfortable with a functional paradigm.
I would encourage people to use whatever language they're most comfortable learning, and worry about performance optimization later.