BGDT1001 – Big Data and Spark
Length – 7 weeks - 21 hours
Lectures
-
Introduction to Big Data and Spark
-
Getting started with Google Colab and Python
-
PySpark dataframes and Spark SQL
-
Batch analytics and Structured Streaming
-
Spark with Pandas API
-
Spark with relational database
-
Docker and Containerization
-
Machine learning & AI with Spark
-
Getting started with Databricks
-
Integrate Spark with Hadoop
-
Integrate Spark with Kafka
-
NLP in Spark
-
Spark with GraphX
-
Extra industry cases study
BGDT1002 – Data Mining, Modeling and Machine Learning
Length – 7 weeks - 21 hours
Lectures
1. Introduction to Data Mining and Machine Learning
2. Data collection, cleansing and pre-processing
3. Sentiment Analysis
4. Supervised Learning vs Unsupervised Learning
5. Linear regression
6. Logistic regression
7. Generalized linear model (GLM)
8. Decision tree
9. Random Forest
10. Naïve Bayes
11. Clustering and Segmentation
12. Association
13. Data visualization and story telling
14. Auto-Model and Hyper-parameter tuning
BGDT1003 – Big Data and Hadoop Ecosystem
Length – 7 weeks - 21 hours
Content
1. Introduction to Big Data and Hadoop
2. Introduction to Linux
3. Install & configure Hadoop with Google Colab
4. Install & configure Hadoop with Mac/Linux/Windows
5. Introduction to Cloud Hadoop with GCP
6. Cloudera VM QuickStart
7. Working with Hive
8. Working with Impala
9. Introduction to Map Reduce
10. Working with HBase
11. Understand big data format
12. Integrate Spark with Hadoop
13. Extra industry cases study