Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

Convert MapReduce code to Spark code

Mayank_PorwalMayank_Porwal Pune,IndiaMember Posts: 1

Hi All,
I need to write a Mapreduce job that can run on Spark. Can someone please provide me with a job that can calculate SUM based on a specific column for the below file:

b,t,b,ger,abl,djj,135,02,qbdas
a,l,p,vlo,mkn,oar,019,15,aaaaf
a,o,a,ndf,vvv,aeg,225,98,aynjn
w,i,s,zyb,amf,sqq,057,35,wsmhr
b,t,b,ger,abl,djj,135,02,qbdas
a,l,p,vlo,mkn,oar,019,15,aaaaf
a,o,a,ndf,vvv,aeg,225,98,aynjn

This is a 10GB file put on HDFS cluster of 10 Nodes.

I need to find out sum of column7 for each value in column1.

Output:

b,270
a,588
w,57
I've done the same in Mapreduce using Java. Can someone help me out with a similar code in Java that can be executed on Spark?

Sign In or Register to comment.