I need to write a Mapreduce job that can run on Spark. Can someone please provide me with a job that can calculate SUM based on a specific column for the below file:
b,t,b,ger,abl,djj,135,02,qbdas a,l,p,vlo,mkn,oar,019,15,aaaaf a,o,a,ndf,vvv,aeg,225,98,aynjn w,i,s,zyb,amf,sqq,057,35,wsmhr b,t,b,ger,abl,djj,135,02,qbdas a,l,p,vlo,mkn,oar,019,15,aaaaf a,o,a,ndf,vvv,aeg,225,98,aynjn
This is a 10GB file put on HDFS cluster of 10 Nodes.
I need to find out sum of column7 for each value in column1.
I've done the same in Mapreduce using Java. Can someone help me out with a similar code in Java that can be executed on Spark?
It looks like you're new here. If you want to get involved, click one of these buttons!