Please note: In order to keep Hive up to date and provide users with the best features, we are no longer able to fully support Internet Explorer. The site is still available to you, however some sections of the site may appear broken. We would encourage you to move to a more modern browser like Firefox, Edge or Chrome in order to experience the site fully.

Parallel R : Data Analysis in the Distributed World, EPUB eBook

Parallel R : Data Analysis in the Distributed World EPUB

EPUB

Please note: eBooks can only be purchased with a UK issued credit card and all our eBooks (ePub and PDF) are DRM protected.

Description

Its tough to argue with R as a high-quality, cross-platform, open source statistical software productunless youre in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets, including three chapters on using R and Hadoop together. Youll learn the basics of Snow, Multicore, Parallel, Segue, RHIPE, and Hadoop Streaming, including how to find them, how to use them, when they work well, and when they dont.

With these packages, you can overcome Rs single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address Rs memory barrier.

  • Snow: works well in a traditional cluster environment
  • Multicore: popular for multiprocessor and multicore computers
  • Parallel: part of the upcoming R 2.14.0 release
  • R+Hadoop: provides low-level access to a popular form of cluster computing
  • RHIPE: uses Hadoops power with Rs language and interactive shell
  • Segue: lets you use Elastic MapReduce as a backend for lapply-style operations

Information

Information