1 month ago
GCS GLOBAL CAPABILITY LIMITED

ML Ops Engineer (AWS / Terraform)

GCS GLOBAL CAPABILITY LIMITED

Engineering & Technology

IT & Telecoms Confidential
Easy Apply

Job Summary

We are seeking an experienced ML Ops Engineer to help scale the deployment and management of multiple AI models across AWS.

  • Experience Level : Mid level
  • Experience Length : 3 years

Job Description/Requirements

Location: Remote
Work Hours: 11:30 AM – 8:30 PM EAT (08:30 AM – 5:30 PM GMT)
Employment Type: Permanent or Contract
Compensation: Salary dependent on experience, skill set, and project scope

About the Role
We are seeking an experienced ML Ops Engineer to help scale the deployment and management of multiple AI models across AWS. You will join a growing team that has developed a suite of image-based machine learning models — including classification, recognition, and prediction systems — and now needs to operationalise these models efficiently and securely in production environments.
This role sits at the core of the platform and infrastructure strategy. You will be responsible for designing scalable
deployment pipelines, building and managing infrastructure using Terraform, and ensuring the entire ML lifecycle — from experimentation to production — runs efficiently, securely, and cost-effectively.
The ideal candidate is proactive, detail-oriented, and confident working in a fast-paced, cloud-first, international environment.

Key Responsibilities
• Design and implement scalable, automated infrastructure for deploying ML models in AWS using Terraform.
• Manage and optimise existing AWS environments (SageMaker, ECS/EKS, Lambda, Batch, and GPU-backed instances).
• Build and maintain CI/CD pipelines for ML model delivery and monitoring.
• Ensure infrastructure supports both real-time inference and batch processing workloads.
• Collaborate closely with Data Scientists and Engineers to productionise models efficiently.
• Monitor system performance and costs, identifying opportunities for optimisation and automation.
• Maintain infrastructure reliability, security, and compliance with best practices.

Skills & Experience
• Minimum 3–5 years of proven experience in ML Ops, DevOps, or Cloud Infrastructure Engineering, preferably in large-scale production environments.
• Extensive hands-on experience with Terraform, including provisioning and managing complex AWS environments.
• Strong knowledge of AWS services relevant to ML Ops:
• SageMaker for model training and deployment
• ECS/EKS or Elastic Beanstalk for containerised workloads
• Lambda and Batch for inference pipelines
• S3, CloudWatch, IAM, Glue, and related orchestration tools
• Proven experience deploying GPU-accelerated ML models in production.
• Solid understanding of ML model lifecycle management, including versioning, packaging, and scaling.
• Proficiency in Python, with experience using FastAPI or Flask for serving models.
• Strong understanding of CI/CD, Infrastructure as Code, and DevOps principles.
• AWS or Terraform certifications are highly regarded.
• Familiarity with Kubernetes, Docker, and MLflow is an advantage.
• Experience in AWS cost optimisation and performance tuning preferred.

Important Safety Tips

  • Do not make any payment without confirming with the BrighterMonday Customer Support Team.
  • If you think this advert is not genuine, please report it via the Report Job link below.
Report Job
View More

Lorem ipsum

Lorem ipsum dolor (Location) Lorem ipsum Confidential
3 years ago

Stay Updated

Join our newsletter and get the latest job listings and career insights delivered straight to your inbox.

v2.homepage.newsletter_signup.choose_type

We care about the protection of your data. Read our

We care about the protection of your data. Read our  privacy policy .

This action will pause all job alerts. Are you sure?

Cancel Proceed
Follow us On:
Get it on Google Play
2025 BrighterMonday

Or your alerts