Data Engineering Course
1. Introduction:
Nowadays, Data solves a lot of problems to improve efficiency in business. Businesses always try to solve problems related to cost reduction in stages such as marketing, customer care in products that operate for a long time, quickly, conveniently and accurately. Therefore, data plays a very important role in coordination. Therefore, trainees need to know the following solid knowledge to meet the needs of the times as follows:
- What is a Data Engineer?
- Why is a data engineer needed?
- What is the main job of a data engineer?
- Why is it necessary to store big data?
- What do we need to prepare to become a data engineer?
To get started in the data engineer world, let's start the first module (Building the Basic Data Platform) with details below. Other modules will be more specialized in storage (Hadoop, S3), Data Modeling, Data Quality. More details will follow.
2. Training targets
The aim of the course is to help students understand:
- Understand the data sources that the big data system integrates frequently.
- The most effective method of data collection.
- Commonly used big data processing tools.
- How to Design a Basic Data Platform
3. Knowledge and skills gained after the course
- Understand the job of a data engineer
- Can build a Data Platform with full of basic modules
- Understand the requirements of a big data system, can self-study other related parts
- Can be applied in a big data system or directly involved in data analysis
4. Career Opportunities
- Create a foundation for data engineer development
- Building big data infrastructure
- Have knowledge to be able to follow the direction of Data Engineer or Data Analyst
5. Training period
- Training time is 13 sessions
- The Schedule: At the beginning of each month
Time\Date
|
Monday
|
Wednesday
|
19h00 – 21h00
|
x
|
x
|
- Fee: 3.300.000 VNĐ
- For Students: -10%
6. Target Trainees
You can believe yourself to work any environment from outsourcing software company or product software company.
7. Input requirements
- Familiar with the working environment on the Unix system (MAC or Linux)
- Having basic knowledge of programming languages as Python, Java/Scala
- Familiar with development tools: IntelliJ (can setup Java environment for unix server or PC)
- Can read and understand English
8. Content
#
|
Items
|
Content
|
Timeline
|
1
|
Data Source
|
a. Apache Kafka
b. MySQL CDC
c. sFTP
d. Lab: setup and produce & consume messages in Kafka

|
3 Sessions
|
2
|
Data Collection
|
a. What and Why for data collection
b. Apache NiFi (Overview and Architecture)
c. Schema Registry and data format
d. Lab: setup NiFi to consume data from Kafka and save to files
|
4 Sessions
|
3
|
Data Process
|
a. Introduce SparkSQL & Jupyter Notebooks
b. Using DataFrame& Data Set
c. Activity - Min, Max, Count,..
d. Based Collaborative Filtering in Spark, cache(), and persist()
e. Activity - Using spark-submit to run Spark driver scripts
f. Using bash Script and crontab in deployment
|
4 Sessions
|
4
|
Data Platform
|
a. Architecture
b. Platform Monitoring
|
1 Session
|
5
|
Final test
|
Make Quiz test + Demo
|
1 Session
|
9. Materials
· Compiled by B4USOLUTION
10. Study method
- 100% study directly on computer, each student studies on a computer.
- 2 sessions of theory & 1 practice session in alternating order (Lab exercises / Activities).
11. Evaluation
- A Module Test for all contents.
- Make Quiz test in B4U BSOC app
- Satisfactory score is greater than or equal to 45%.
12. After completing Data Engineer course
- Can build a big data system by yourself, including the following components:
• Integration with data sources
• Collect data as required by business
• Building a big data processing system
• Deploy and system monitoring
- Can analyze underlying data to draw conclusions according to data output
13. Introduce Trainer
