Description:
- Requirements
- • Bachelor’s degree in Computer Science, related Engineering field, or equivalent experience
- • 4+ years of experience in public cloud infrastructure, especially Azure and AWS.
- • Good understanding of cloud infrastructure, and different deployment models
- • Should be familiar with cloud networking and security solutions like load balancer, firewall, WAF, CSPM, security group, etc.
- • Good understanding of identity and access management solutions like Active directory, Azure AD, conditional access, IAM and other vendor specific solutions
- • Good understanding of Linux and windows based systems
- • Understanding of SQL & NoSQL Databases including IAAS and PAAS models.
- • Experience in policy management, governance, monitoring and alerts
- • Knowledge in microservices, DevOps and IaC (Terraform and Ansible).
- • Azure AZ-104 or AWS administrator certification would be an advantage
- • Excellent communication and interpersonal skills
- Job responsibilities
- • Assist application team to deploy various solutions in the cloud environment.
- • Maintain infrastructure security and governance as per the client requirement and standards.
- • Support other team members (database, network, security, etc.) to configure and maintain respective solution.
- • Actively Involve in discussions related to new solution implementation, design creation and all other discussions related to cloud infrastructure.
- • POC deployment, documentation, and technical presentation.
Requirements:
- Linux Hosting and Administration
- • Install, configure, and maintain Linux servers, ensuring optimal performance and security.
- • Handle Linux-based hosting solutions, including web servers, databases, and other services.
- • Apply patches and updates to Linux servers as required, and automate routine tasks.
- • Monitor system performance, troubleshoot issues, and conduct root cause analysis for any server downtime.
- Kubernetes Operations
- • Deploy, manage, and maintain containerized applications using Kubernetes.
- • Create and manage Kubernetes manifests, helm charts, and operators for complex application architectures.
- • Scale applications based on resource utilization and requirements.
- • Monitor the health and performance of Kubernetes clusters and take corrective actions as needed.
- DevOps Integration
- • Implement and maintain CI/CD pipelines for automated testing and deployments.
- • Assist in incorporating containerization and orchestration into the DevOps process.
- Rancher/OpenShift Expertise (Nice to Have)
- • Experience in deploying and managing Kubernetes clusters using Rancher or OpenShift.
- • Implement monitoring, logging, and auto-scaling solutions in Rancher or OpenShift environments.
- Application Support
- • Gain a thorough understanding of the applications running within containers to provide first-level application support.
- • Collaborate with development teams to debug application issues in staging and production environments.
- Azure Infrastructure
- o Deploy and manage resources on Azure, including but not limited to VMs, databases, and Kubernetes clusters.
- o Implement Infrastructure as Code practices using Azure Resource Manager (ARM) templates or terraform
- Monitoring and Alerting Using Open-Source Tools (Any one of the following)
- ELK Stack
- o Implement and manage the ELK (Elasticsearch, Logstash, Kibana) stack for real-time log aggregation, monitoring, and analysis.
- o Customize Kibana dashboards for different system metrics and logs to aid in quick issue resolution.
- • Grafana
- o Develop and maintain Grafana dashboards to visualize key performance indicators and system metrics.
- o Integrate Grafana with other data sources and monitoring tools for comprehensive analytics.
- • Loki
- o Set up and manage Loki for aggregating and storing logs.
- o Integrate Loki with Grafana for unified querying and visualization of metrics and logs.
- • Prometheus
- o Deploy and configure Prometheus for monitoring system and application metrics.
- o Create custom Prometheus queries and alerts to catch anomalies and system performance issues.
- • Mimir/Cortex (prefereable)
- o Implement Mimir or Cortex for enhanced long-term storage and scalability of Prometheus metrics.