Running Spark 1.3 on Hadoop/YARN 2.4.0
Go to
cd usr/ local/spark
then run following command it will start Spark
./bin/spark-shell
Spark’s primary abstraction is a distributed collection of items called a Resilient Distributed Dataset (RDD). RDDs can be created from Hadoop InputFormats (such as HDFS files) or by transforming other RDDs. Let’s make a new RDD from the text of the README file in the Spark source directory:
scala> val textFile = sc.textFile("README.md")
textFile: spark.RDD[String] = spark.MappedRDD@2ee9b6e3
scala> textFile.count() // Number of items in this RDD
res0: Long = 98
scala> textFile.first() // First item in this RDD
res1: String = # Apache Spark
filter
transformation to return a new RDD with a subset of the items in the file.scala> val linesWithSpark = textFile.filter(line => line.contains("Spark"))
linesWithSpark: spark.RDD[String] = spark.FilteredRDD@7dd4af09
scala> textFile.filter(line => line.contains("Spark")).count()
// How many lines contain "Spark"?
res3: Long = 15
reference : http://spark.apache.org/docs/latest/quick-start.html
0 comments:
Post a Comment