In today's fast-paced tech world, automating application deployment and infrastructure management is essential. This project shows how to set up a complete CI/CD pipeline, use AWS EKS for Kubernetes deployment, and integrate Grafana and Prometheus for monitoring, all using Terraform for infrastructure management. By automating everything, you reduce the need for manual work and enhance the speed and reliability of your deployments..
Prerequisites
Before diving into this project, here are some skills and tools you should be familiar with:
[x] Clone repository for terraform code
Note: Replace resource names and variables as per your requirement in terraform code[x] AWS Account: You’ll need an AWS account to create resources like EC2 instances, EKS clusters, and more.
[x] Terraform Knowledge: Familiarity with Terraform to provision, manage, and clean up infrastructure.
[x] Basic Kubernetes (EKS): A basic understanding of Kubernetes, especially Amazon EKS, to deploy and manage containers.
[x] Docker Knowledge: Basic knowledge of Docker for containerizing applications.
[x] Grafana & Prometheus: Understanding of these tools to monitor applications and track performance.
[x] Jenkins: Knowledge of Jenkins for building and automating the CI/CD pipeline.
[x] GitHub: Experience with GitHub for version control and managing repositories.
[x] Command-Line Tools: Basic comfort with using the command line for managing infrastructure and services.
Setting Up the Infrastructure
I have created a Terraform code to set up the entire infrastructure, including the installation of required applications, tools, and the EKS cluster automatically created.
Note ⇒ EKS cluster
creation will take approx. 10 to 15 minutes.
⇒ EC2 machines will be created named as
"Jenkins-svr"
⇒ Jenkins Install
⇒ Docker Install
⇒ Trivy Install
⇒ helm Install
⇒ Grafan Install using Helm
⇒ Prometheus Install using Helm
⇒ AWS Cli Install
⇒ Terraform Install
⇒ EKS Cluster Setup
EC2 Instances creation
First, we'll create the necessary virtual machines using terraform
code.
Below is a terraform Code:
Once you clone repo then go to folder "19.Real-Time-DevOps-Project/Terraform_Code/Code_IAC_Terraform_box" and run the terraform command.
cd Terraform_Code/Code_IAC_Terraform_box
$ ls -l
dar--l 13/12/24 11:23 AM All_Pipelines
dar--l 12/12/24 4:38 PM k8s_setup_file
dar--l 11/12/24 2:48 PM scripts
-a---l 11/12/24 2:47 PM 507 .gitignore
-a---l 13/12/24 9:00 AM 7238 main.tf
-a---l 11/12/24 2:47 PM 8828 main.txt
-a---l 11/12/24 2:47 PM 1674 MYLABKEY.pem
-a---l 11/12/24 2:47 PM 438 variables.tf
Note ⇒ Make sure to run main.tf
from inside the folders.
19.Real-Time-DevOps-Project/Terraform_Code/Code_IAC_Terraform_box/
dar--l 13/12/24 11:23 AM All_Pipelines
dar--l 12/12/24 4:38 PM k8s_setup_file
dar--l 11/12/24 2:48 PM scripts
-a---l 11/12/24 2:47 PM 507 .gitignore
-a---l 13/12/24 9:00 AM 7238 main.tf
-a---l 11/12/24 2:47 PM 8828 main.txt
-a---l 11/12/24 2:47 PM 1674 MYLABKEY.pem
-a---l 11/12/24 2:47 PM 438 variables.tf
You need to run main.tf
file using the following terraform command.
Now, run the following command.
terraform init
terraform fmt
terraform validate
terraform plan
terraform apply
# Optional <terraform apply --auto-approve>
OOnce you run the Terraform command, we will check the following things to ensure everything is set up correctly using Terraform.
Inspect the Cloud-Init
logs:
Once connected to EC2 instance then you can check the status of the user_data
script by inspecting the log files.
# Primary log file for cloud-init
sudo tail -f /var/log/cloud-init-output.log
or
sudo cat /var/log/cloud-init-output.log | more
If the user_data script runs successfully, you will see output logs and any errors encountered during execution.
If there’s an error, this log will provide clues about what failed.
Outcome of "cloud-init-output.log
"
From Terraform:
Verify the Installation
- [x] Docker version
ubuntu@ip-172-31-95-197:~$ docker --version
Docker version 24.0.7, build 24.0.7-0ubuntu4.1
docker ps -a
ubuntu@ip-172-31-94-25:~$ docker ps
- [x] trivy version
ubuntu@ip-172-31-89-97:~$ trivy version
Version: 0.55.2
- [x] Helm version
ubuntu@ip-172-31-89-97:~$ helm version
version.BuildInfo{Version:"v3.16.1", GitCommit:"5a5449dc42be07001fd5771d56429132984ab3ab", GitTreeState:"clean", GoVersion:"go1.22.7"}
- [x] Terraform version
ubuntu@ip-172-31-89-97:~$ terraform version
Terraform v1.9.6
on linux_amd64
- [x] eksctl version
ubuntu@ip-172-31-89-97:~$ eksctl version
0.191.0
- [x] kubectl version
ubuntu@ip-172-31-89-97:~$ kubectl version
Client Version: v1.31.1
Kustomize Version: v5.4.2
- [x] aws cli version
ubuntu@ip-172-31-89-97:~$ aws version
usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]
To see help text, you can run:
aws help
aws <command> help
aws <command> <subcommand> help
- [x] Verify the EKS cluster
On the jenkins
virtual machine, Go to the directory k8s_setup_file
and open the file cat apply.log
to verify whether the cluster is created or not.
ubuntu@ip-172-31-90-126:~/k8s_setup_file$ pwd
/home/ubuntu/k8s_setup_file
ubuntu@ip-172-31-90-126:~/k8s_setup_file$ cd ..
After Terraform deploys on the instance, now it's time to set up the cluster. You can SSH into the instance and run:
aws eks update-kubeconfig --name <cluster-name> --region
<region>
Once the EKS cluster is set up then need to run the following command to make it interact with EKS.
aws eks update-kubeconfig --name balraj-cluster --region us-east-1
The aws eks update-kubeconfig
a command is used to configure your local kubectl tool to interact with an Amazon EKS (Elastic Kubernetes Service) cluster. It updates or creates a kubeconfig file that contains the necessary authentication information to allow kubectl to communicate with your specified EKS cluster.
What happens when you run this command:
The AWS CLI retrieves the required connection information for the EKS cluster (such as the API server endpoint and certificate) and updates the kubeconfig file located at ~/.kube/config (by default)
. It configures the authentication details needed to connect kubectl to your EKS cluster using IAM roles. After running this command, you will be able to interact with your EKS cluster using kubectl commands, such as kubectl get nodes
or kubectl get pods
.
kubectl get nodes
kubectl cluster-info
kubectl config get-contexts
Setup the Jenkins
Go to Jenkins EC2 and run the following command Access Jenkins via http://<your-server-ip>:8080
.
- Retrieve the initial admin password using:
sudo cat /var/lib/jenkins/secrets/initialAdminPassword
Install a plugin in Jenkins
Manage Jenkins > Plugins view> Under the Available tab, plugins available for download from the configured Update Center can be searched and considered:
Following plugin needs to be installed.
SonarQube Scanner
NodeJS
Pipeline: Stage View
Blue Ocean
Eclipse Temurin installer
Docker
Docker Commons
Docker Pipeline
Docker API
docker-build-step
Prometheus metrics
Note⇒ Restart the Jenkins to make it an effective plugin.
Create Webhook
in SonarQube
- publicIPaddressofJenkins:9000
Click on
Administration>Configuration>webooks
Name: sonarqube-webhook
Create a token in SonarQube
- Administration>Security>Users>Create a new token
Configure Sonarqube credential in Jenkins.
Dashboard> Manage Jenkins> Credentials> System> Global credentials (unrestricted)
Configure AWS credentials (Access & Secret Keys) in Jenkins
Dashboard> Manage Jenkins> Credentials> System> Global credentials (unrestricted)
Configure/Integrate SonarQube in Jenkins
Dashboard > Manage Jenkins > System
Configure JDK, Sonar scanner, and Node JS
- To configure
JDK
Dashboard> Manage Jenkins> Tools
- To configure
SonarQube Scanner
Dashboard > Manage Jenkins > Tools
- To configure
Node JS
Dashboard > Manage Jenkins > Tools
Note ⇒ We have to select NodeJS 16.20.0
as per project required. it won't work on NodeJs23.x
- To configure
Docker
Dashboard > Manage Jenkins > Tools
Build a pipeline.
Here is the Pipeline Script
Build deployment pipeline.
Run the pipeline; the first time it would fail, and rerun it with parameters.
- I ran the pipeline but it failed with the below error message.
Solution:
sudo su - ansadmin
sudo usermod -aG docker $USER && newgrp docker
sudo usermod -aG docker jenkins && newgrp docker
I ran the pipeline but failed with the same error message. I found the solution below.
Solution:
Jenkins service needs to be restarted.
sudo systemctl restart jenkins
I reran the pipeline, and it went well, and the build was completed successfully.
Build status:
Application status in SonarQube
Quality Gate Status is failed because of NodeJS mismatch version, as I was using the latest version of Nodes (23.x).
I removed the nodes js 23.x and installed
nodejs16
.Note: ⇒ You won't be facing this issue because I have updated the Terraform code.
sudo apt-get remove -y nodejs
curl -fsSL https://deb.nodesource.com/setup_16.x | sudo -E bash -
sudo apt-get install -y nodejs
Rerun the pipeline and change the quality gate status to passed from failed.
Cleanup Old Images from ECR checks if there are more than 3 images in the repository and deletes the old ones if necessary.
Setup ArgoCD
- Run the following commands to verify the
Pods
andservices type
kubectl get pods -n argocd
kubectl get svc -n argocd
kubectl get pods -n prometheus
kubectl get service -n prometheus
ubuntu@bootstrap-svr:~$ kubectl get pods -n argocd
NAME READY STATUS RESTARTS AGE
argocd-application-controller-0 1/1 Running 0 40m
argocd-applicationset-controller-64f6bd6456-79k4l 1/1 Running 0 40m
argocd-dex-server-5fdcd9df8b-85dl7 1/1 Running 0 40m
argocd-notifications-controller-778495d96f-lsmww 1/1 Running 0 40m
argocd-redis-69fd8bd669-qd4qs 1/1 Running 0 40m
argocd-repo-server-75567c944-cwrdv 1/1 Running 0 40m
argocd-server-5c768cdd96-wh4t5 1/1 Running 0 40m
ubuntu@bootstrap-svr:~$ kubectl get svc -n argocd
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
argocd-applicationset-controller ClusterIP 172.20.37.85 <none> 7000/TCP,8080/TCP 41m
argocd-dex-server ClusterIP 172.20.185.246 <none> 5556/TCP,5557/TCP,5558/TCP 41m
argocd-metrics ClusterIP 172.20.6.170 <none> 8082/TCP 41m
argocd-notifications-controller-metrics ClusterIP 172.20.36.121 <none> 9001/TCP 41m
argocd-redis ClusterIP 172.20.104.129 <none> 6379/TCP 41m
argocd-repo-server ClusterIP 172.20.184.189 <none> 8081/TCP,8084/TCP 41m
argocd-server ClusterIP 172.20.150.224 <none> 80/TCP,443/TCP 41m
argocd-server-metrics ClusterIP 172.20.208.97 <none> 8083/TCP 41m
ubuntu@bootstrap-svr:~$
ubuntu@bootstrap-svr:~$ kubectl get pods -n prometheus
NAME READY STATUS RESTARTS AGE
alertmanager-stable-kube-prometheus-sta-alertmanager-0 2/2 Running 0 42m
prometheus-stable-kube-prometheus-sta-prometheus-0 2/2 Running 0 42m
stable-grafana-6c67f4cb8d-k4bpb 3/3 Running 0 42m
stable-kube-prometheus-sta-operator-74dcfb4f9c-2vwqr 1/1 Running 0 42m
stable-kube-state-metrics-6d6d5fcb75-w8k4l 1/1 Running 0 42m
stable-prometheus-node-exporter-8tqgh 1/1 Running 0 42m
stable-prometheus-node-exporter-jkkkf 1/1 Running 0 42m
ubuntu@bootstrap-svr:~$ kubectl get service -n prometheus
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 42m
prometheus-operated ClusterIP None <none> 9090/TCP 42m
stable-grafana ClusterIP 172.20.21.160 <none> 80/TCP 42m
stable-kube-prometheus-sta-alertmanager ClusterIP 172.20.20.12 <none> 9093/TCP,8080/TCP 42m
stable-kube-prometheus-sta-operator ClusterIP 172.20.69.94 <none> 443/TCP 42m
stable-kube-prometheus-sta-prometheus ClusterIP 172.20.199.20 <none> 9090/TCP,8080/TCP 42m
stable-kube-state-metrics ClusterIP 172.20.52.146 <none> 8080/TCP 42m
stable-prometheus-node-exporter ClusterIP 172.20.40.154 <none> 9100/TCP 42m
- Run these commands to change the service type from
ClusterIP
toLoadBalancer
.
kubectl patch svc stable-kube-prometheus-sta-prometheus -n prometheus -p '{"spec": {"type": "LoadBalancer"}}'
kubectl patch svc stable-grafana -n prometheus -p '{"spec": {"type": "LoadBalancer"}}'
kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "LoadBalancer"}}'
Verify status now.
- Now, time to run the script to get ArgoCD and Grafana access details.
Once you access the ArgoCD URL and create an application
Application Name: amazon-prime-app
Project Name: default
Sync Policy: Automatic (Select Prune Resources and SelfHeal)
Repository URL: https://github.com/mrbalraj007/Amazon-Prime-Clone-Project.git
Revison: main
Path: k8s_files (where Kubernetes files reside)
cluster URL: Select default cluster
Namespace: default
Update the latest image name in
deployment.yml
Verify the app Status
Verify Pods & service status
Click on the hostnames (URL details) from the service and access it in the browser.
http://af70e2590416f4788be765b667bb8175-2006799998.us-east-1.elb.amazonaws.com:3000/
Congratulations :-) the application is working and accessible.
- Access Prometheus/Grafana and create a custom dashboard in Prometheus/Grafana.
Dashboard in Grafana
Clean up the images and deployment using the pipeline.
- Here is the Updated pipeline
Pipeline would be partially failed because KMS will take some days to get it deleted automatically.
- Verify the pods and services in EKS. If it is not deleted, then change the service back to ClusterIP and rerun the pipeline.
kubectl patch svc stable-kube-prometheus-sta-prometheus -n prometheus -p '{"spec": {"type": "ClusterIP"}}'
kubectl patch svc stable-grafana -n prometheus -p '{"spec": {"type": "ClusterIP"}}'
kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "ClusterIP"}}'
kubectl patch svc singh-app -p '{"spec": {"type": "ClusterIP"}}'
Environment Cleanup:
As we are using Terraform, we will use the following command to delete
Delete all deployment/Service first
kubectl delete deployment.apps/singh-app kubectl delete service singh-app kubectl delete service/singh-service
EKS cluster
secondthen delete the
virtual machine
.
To delete AWS EKS cluster
- Log in to the bootstrap EC2 instance, change the directory to
/k8s_setup_file
, and run the following command to delete the cluster.
sudo su - ubuntu
cd /k8s_setup_file
sudo terraform destroy --auto-approve
Now, time to delete the Virtual machine
.
Go to folder "19.Real-Time-DevOps-Project/Terraform_Code/Code_IAC_Terraform_box" and run the terraform command.
cd Terraform_Code/
$ ls -l
Mode LastWriteTime Length Name
---- ------------- ------ ----
da---l 26/09/24 9:48 AM Code_IAC_Terraform_box
Terraform destroy --auto-approve
Conclusion
By combining Terraform, Jenkins, EKS, Docker, and monitoring tools like Grafana and Prometheus, this project automates the entire process of infrastructure management, application deployment, and monitoring. The use of a cleanup pipeline ensures that resources are removed when no longer needed, helping to reduce costs. This approach offers a scalable, efficient, and cost-effective solution for modern application deployment and management.
Ref Link: