Codementor Community2022-12-10T00:35:23+00:00https://www.codementor.io/community/topic/apache-sparkHow to make joins in Spark Dataset API more type-safe.https://www.codementor.io/vsimko/how-to-make-joins-in-spark-dataset-api-more-type-safe-1zwki1jjiqSimplify type-safe outer joins in Apache Spark. Discover practical tips and advice for optimizing your code, overcoming common challenges, and take your Spark development skills to the next level.
2022-12-10T00:35:23+00:002022-12-10T00:35:23+00:00Dr. Viliam SimkoCreate your Own %Magic Commands in Databrickshttps://www.codementor.io/fusionet24/create-your-own-magic-commands-in-databricks-1w8y1pdpqeEver wanted to build your own %Magic commands in Databricks? Well here is a handy guide on how to achieve that with Databricks Runtime 11x+.
2022-08-26T13:46:14+00:002022-08-26T13:46:14+00:00Scott BellSTL and Holt from R to SparkR. To scale our machine learning… | by Shubham Raizada | Walmart Global Tech Blog | Mediumhttps://www.codementor.io/raizsh/stl-and-holt-from-r-to-sparkr-to-scale-our-machine-learning-by-shubham-raizada-walmart-global-tech-blog-medium-1p4vuajbp2Scaling machine learning algorithms in R with SparkR
2021-12-31T14:50:19+00:002021-12-31T14:50:19+00:00Shubham RaizadaSetting up Isolated Virtual Environments in SparkRhttps://www.codementor.io/raizsh/setting-up-isolated-virtual-environments-in-sparkr-1p4vnur110Motivation
With the increasing adoption of Spark for scaling ML pipelines, being able to install and deploy our own R libraries becomes especially important if we want to use UDFs.
In my previous...
2021-12-31T14:46:14+00:002021-12-31T14:46:14+00:00Shubham RaizadaWriting tests for your spark code using FunSuite
https://www.codementor.io/phanikumaryadavilli/writing-tests-for-your-spark-code-using-funsuite-1kb9bty681One of the frequently asked questions in StackOverflow or any other forum by the Data Engineer’s who create their data pipelines using Apache Spark is how to write the test cases.
In this write-up,...
2021-07-30T20:00:25+00:002021-07-30T20:00:25+00:00Phani Kumar YadavilliApache Spark Java Tutorial: Simplest Guide to Get Startedhttps://www.codementor.io/martarey/apache-spark-java-tutorial-simplest-guide-to-get-started-1c1r38j7e6Apache Spark Java Tutorial. Learn how to write a simple Spark application. No previous knowledge of Apache Spark is required.
2020-11-09T07:53:02+00:002020-11-09T07:53:02+00:00Marta ReyEnabling Spark UI and Ganglia for EMR Clusterhttps://www.codementor.io/kushwahamit2016/enabling-spark-ui-and-ganglia-for-emr-cluster-xeqx6xkcfConfigure Spark UI and Ganglia for EMR cluster on your browser
2019-08-01T11:51:26+00:002019-08-01T11:51:26+00:00Amit KushwahaMulti-Class Image Classification Using Transfer Learning With PySparkhttps://www.codementor.io/innat_2k14/transfer-learning-with-pyspark-x5j8tpsn4A demonstrates on a Computer Vision problem with the power to combined two state-of-the-art technologies: Deep Learning with Apache Spark.
2019-07-23T09:23:42+00:002019-07-23T09:23:42+00:00Mohammed InnatApache Spark vs Hadoop: Choosing the Right Frameworkhttps://www.codementor.io/shubham853/apache-spark-vs-hadoop-choosing-the-right-framework-wof0yctt0This blog post speaks about apache spark vs hadoop. It will give you an idea about which is the right Big Data framework to choose in different scenarios.
2019-07-08T05:11:43+00:002019-07-08T05:11:43+00:00Shubham SinhaClick Through Rate Analysis using Sparkhttps://www.codementor.io/ayushpandey/click-through-rate-analysis-using-spark-vdel78grxIn recent years, programmatic advertising is been taking over the online advertisement industry. To enable automatic selling and purchasing ad impressions between advertisers and publishers through...
2019-05-27T13:41:39+00:002019-05-27T13:41:39+00:00Ayush PandeyBuilding Machine Learning Data Pipeline using Apache Sparkhttps://www.codementor.io/ayushpandey/building-machine-learning-data-pipeline-using-apache-spark-umfe20qtnApache Spark (.) is increasingly becoming popular in the field of Data Sciences because of its ability to deal with the huge datasets and the capability to run computations in memory which is...
2019-05-23T12:33:31+00:002019-05-23T12:33:31+00:00Ayush PandeyBigData, Cloud, ETL, SQL & NoSQL DB, Spark, Hive, Python, Scala, Kafkahttps://www.codementor.io/bharathkumarreddydv/bigdata-cloud-etl-sql-nosql-db-spark-hive-python-scala-kafka-u2qs8lji3Hands on at BigData, AWS, ETL, DWH, SQL, & NOSql DB's, Kafka, Tuning Clusters & Applications to increase resource utilization reducing time & expense, ....
2019-04-16T05:58:42+00:002019-04-16T05:58:42+00:00Bharath Kumar Reddy D VApache Spark vs Hadoop: Choosing the Right Frameworkhttps://www.codementor.io/shubham853/apache-spark-vs-hadoop-choosing-the-right-framework-t1huf01jsThis blog post speaks about apache spark vs hadoop. It will give you an idea about which is the right Big Data framework to choose in different scenarios.
2019-03-14T06:07:19+00:002019-03-14T06:07:19+00:00Shubham SinhaPySpark RDD - Backbone of PySparkhttps://www.codementor.io/swateechand/pyspark-rdd-backbone-of-pyspark-srenzaa51This PySpark RDD article talks about RDDs, the building blocks of PySpark. It also explains various RDD operations, commands along with a use case.
2019-03-05T07:53:39+00:002019-03-05T07:53:39+00:00SwateeApache Spark Architecture | Distributed System Architecture Explainedhttps://www.codementor.io/nehavaidya/apache-spark-architecture-distributed-system-architecture-explained-sq8otgscpThis article on "Spark Architecture" will help you to understand the Spark Eco-system Components and give you a brief insight of Apache Spark Architecture.
2019-03-04T10:06:58+00:002019-03-04T10:06:58+00:00NehaHow to set up PySpark for your Jupyter notebookhttps://www.codementor.io/tirthajyotisarkar/how-to-set-up-pyspark-for-your-jupyter-notebook-p8dcfaxhzIn this brief tutorial, we go over step-by-step how to set up PySpark and all its dependencies on your system, and then how to integrate it with Jupyter notebook.
2018-11-12T19:01:29+00:002018-11-12T19:01:29+00:00Tirthajyoti Sarkargradientgmm: Fast Gaussian Mixtures in Scalahttps://www.codementor.io/nestorysanchez/gradientgmm-fast-gaussian-mixtures-in-scala-on9mdyzewgradientgmm is a package for fast gaussian mixtures modelling in scala through stochastic gradient descent. In this post I show how it works.
2018-10-26T19:36:06+00:002018-10-26T19:36:06+00:00Nestor Y. SanchezThe Noob's Prelude to Hadoop: Part 1 - What is "Hadoop"???https://www.codementor.io/josiah14/the-noob-s-prelude-to-hadoop-part-1-what-is-hadoop-gko650s5bThe Road Ahead: What This Series Will Cover Any blog on Hadoop needs at least a brief introduction to the… by josiah-b
2018-02-09T17:03:57+00:002018-02-09T17:03:57+00:00Josiah BerkebileDocker 101https://www.codementor.io/shubhamchaudhary/docker-101-delnbdryeSetup spark cluster in docker with 4 commands using docker.
2017-10-31T16:09:53+00:002017-10-31T16:09:53+00:00Shubham ChaudharyBig Data Analysis Using PySparkhttps://www.codementor.io/kunaldhawan93/big-data-analysis-using-pyspark-8y5yobb44Learning Objectives
Introduction to PySpark
Understanding RDD, MapReduce
Sample Project - Movie Review Analysis
Why Spark
Lighting Fast Processing
Real Time Strem Processing
Ea...
2017-06-12T16:58:00+00:002017-06-12T16:58:00+00:00Kunal Dhawan