Job Description:
We are seeking a highly skilled Python Developer with hands-on experience in developing and deploying cloud-native applications on AWS and Azure, and integrating robust observability and monitoring solutions using Grafana, ELK Stack, and Prometheus. The role is suited for a backend or infrastructure-focused engineer with a passion for building scalable systems and ensuring high availability and performance visibility across distributed environments.
Key Responsibilities:
- Design, develop, and maintain backend services and automation scripts using Python.
- Deploy and manage applications in AWS and Azure, leveraging native services (EC2, Lambda, S3, Azure Functions, Blob, etc.).
- Integrate and configure observability tools including Grafana, Prometheus, ELK Stack (Elasticsearch, Logstash, Kibana) for real-time metrics and logs.
- Develop monitoring and alerting solutions for microservices, APIs, and system infrastructure.
- Automate infrastructure and deployment tasks using IaC tools (Terraform, CloudFormation, ARM templates).
- Collaborate with DevOps and SRE teams to ensure system reliability, scalability, and performance optimization.
- Implement logging, tracing, and monitoring standards to improve troubleshooting and root cause analysis.
Required Skills & Experience:
- 6-8 years of professional experience in Python development.
- Strong experience with AWS (e.g., EC2, S3, Lambda, CloudWatch) and Azure (e.g., Functions, App Services, Monitor).
- Proficient with Grafana dashboards and data sources (Prometheus, CloudWatch, etc.).
- Solid understanding of the ELK Stack (Elasticsearch, Logstash, Kibana) for centralized logging.
- Deep knowledge of Prometheus for metrics collection and integration with alerting pipelines.
- Experience building RESTful APIs, backend services, and CLI tools in Python.
- Familiarity with Docker, Kubernetes, and CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, etc.).
- Comfortable working in Linux-based environments and using shell scripting.
Nice to Have:
- Experience with OpenTelemetry, distributed tracing (Jaeger, Zipkin), or APM tools (Elastic APM, Datadog).
- Familiarity with security monitoring, SIEM tools, or compliance reporting.
- Exposure to serverless computing, event-driven architecture, and streaming data (e.g., Kafka, Kinesis).
- Knowledge of Python testing frameworks (pytest, unittest) and static analysis tools (mypy, flake8).