Convert MapReduce code to Spark code

Mayank_PorwalMayank_Porwal Pune,India

Hi All,
I need to write a Mapreduce job that can run on Spark. Can someone please provide me with a job that can calculate SUM based on a specific column for the below file:

b,t,b,ger,abl,djj,135,02,qbdas
a,l,p,vlo,mkn,oar,019,15,aaaaf
a,o,a,ndf,vvv,aeg,225,98,aynjn
w,i,s,zyb,amf,sqq,057,35,wsmhr
b,t,b,ger,abl,djj,135,02,qbdas
a,l,p,vlo,mkn,oar,019,15,aaaaf
a,o,a,ndf,vvv,aeg,225,98,aynjn

This is a 10GB file put on HDFS cluster of 10 Nodes.

I need to find out sum of column7 for each value in column1.

Output:

b,270
a,588
w,57
I've done the same in Mapreduce using Java. Can someone help me out with a similar code in Java that can be executed on Spark?

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories