top of page

BGDT1001 – Big Data and Spark

Length – 7 weeks - 21 hours

Lectures

  1. Introduction to Big Data and Spark

  2. Getting started with Google Colab and Python

  3. PySpark dataframes and Spark SQL

  4. Batch analytics and Structured Streaming

  5. Spark with Pandas API

  6. Spark with relational database

  7. Docker and Containerization

  8. Machine learning & AI with Spark

  9. Getting started with Databricks

  10. Integrate Spark with Hadoop

  11. Integrate Spark with Kafka

  12. NLP in Spark

  13. Spark with GraphX

  14. Extra industry cases study

BGDT1002 – Data Mining, Modeling and Machine Learning

Length – 7 weeks - 21 hours
Lectures
1.    Introduction to Data Mining and Machine Learning
2.    Data collection, cleansing and pre-processing
3.    Sentiment Analysis
4.    Supervised Learning vs Unsupervised Learning
5.    Linear regression
6.    Logistic regression
7.    Generalized linear model (GLM)
8.    Decision tree
9.    Random Forest
10.    Naïve Bayes
11.    Clustering and Segmentation
12.    Association
13.    Data visualization and story telling
14.    Auto-Model and Hyper-parameter tuning

BGDT1003 – Big Data and Hadoop Ecosystem

Length – 7 weeks - 21 hours
Content
1.    Introduction to Big Data and Hadoop
2.    Introduction to Linux
3.    Install & configure Hadoop with Google Colab
4.    Install & configure Hadoop with Mac/Linux/Windows
5.    Introduction to Cloud Hadoop with GCP
6.    Cloudera VM QuickStart
7.    Working with Hive
8.    Working with Impala
9.    Introduction to Map Reduce
10.    Working with HBase
11.    Understand big data format
12.    Integrate Spark with Hadoop
13.    Extra industry cases study

bottom of page