Infographic: Apache Spark Incredibly Faster than Mapreduce and Mahout . Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than HadoopMapReduce for certain applications.
Python for Apache Spark. The pros and cons of using Scala vs Python for programming against Apache Spark to solve big data problems.
Reviews of 'Apache Spark 2.0 with Scala' for learning Apache Spark | Hackr.io
Reviews of 'Taming Big Data with Apache Spark and Python' for learning Apache Spark | Hackr.io
Reviews of 'Spark and Python for Big Data with PySpark' for learning Apache Spark | Hackr.io
Reviews of 'Spark Screencasts' for learning Apache Spark | Hackr.io
Reviews of 'Apache Spark in Python: Beginner's Guide' for learning Apache Spark | Hackr.io
Building a natural language processing library for Apache Spark The OReilly Data Show Podcast: David Talby on a new NLP library for Spark and why model development starts after a model gets deployed to production. When I first discovered and started using Apache Spark a majority of the use cases I used it for involved unstructured text. The absence of libraries meant rolling my own NLP utilities and in many cases implementing a machine learning library (this was pre deep learning and MLlib…