SRE / FinOps (Shared)
At Fluor, we are proud to design and build projects and careers. We are committed to fostering a welcoming and collaborative work environment that encourages big-picture thinking, brings out the best in our employees, and helps us develop innovative solutions that contribute to building a better world together. If this sounds like a culture you would like to work in, you’re invited to apply for this role.
Job Description
Role Overview
The FinOps Specialist for Data & AI Cloud Deployments ensures efficient, cost optimized, and compliant operation of Fluor’s Azure based AI and data platforms. This role leads cloud financial governance for AI workloads, including SLO monitoring, runbook development, GPU and high- performance storage capacity planning, and operational visibility across model training, inference, and data engineering environments. The specialist owns cloud savings targets and is responsible for reporting realized savings—not just tracking spend—while partnering with Business Units to enforce cost accountability. Working closely with platform engineering, AI engineering, data teams, and IT operations, the role aligns cost versus performance tradeoffs with application teams to ensure predictable financial management and reliable AI system performance at enterprise scale.
Key Responsibilities
- Monitor, analyze, and optimize cloud spend across Data & AI workloads using Azure Cost Management & Budgets.
- Hands-on execution of cost-saving recommendations
- Reserved Instances / Savings Plans optimization
- Eliminating unused resources
- Spot GPU/VM usage
- Azure Advisor–based cost remediation
- Azure Policy enforcement for cost governance
- Running RI coverage and utilization analyses
- Implement cost showback/chargeback models to provide transparent cost allocation to teams and projects.
- Develop and maintain runbooks for cloud operations, incident response, and AI platform reliability.
- Define, measure, and monitor SLOs/SLIs for AI services, including model endpoints, vector stores, GPUs, and storage.
- Build and maintain operational dashboards using Azure Monitor, Workbooks, Power BI, and Logs.
- Perform capacity planning for GPUs, memory, networking, and storage, ensuring availability for AI training and inference.
- Work with engineering teams to optimize resource provisioning, autoscaling, and workload right‑sizing.
- Identify cost anomalies, opportunities for optimization, and enforce governance through policies and alerts.
- Support forecasting of monthly/quarterly cloud consumption for AI workloads.
- Collaborate with Security, Architecture, and Platform Engineering to ensure compliance with standards and guardrails.
- Maintain operational documentation and provide knowledge sharing to teams consuming AI platform services.
- Participate in incident reviews, RCA, and improvement planning for platform reliability and cost governance.
Basic Job Requirements
- Typically 5-7 years in cloud operations, FinOps, platform engineering, or similar roles.
- Hands-on experience with Azure cost management tools and monitoring systems.
- Bachelor’s degree in Computer Science, Information Technology, Engineering, or related field.
- Experience with cloud FinOps or cloud operations in a large enterprise environment.
- Strong hands-on knowledge of Azure Monitor, Azure Cost Management + Budgets, Azure Workbooks.
- Experience using Grafana for operational dashboards.
- Demonstrated ability to work with cloud billing data, cost analytics, budgets, and forecasting.
- Practical understanding of cloud resource provisioning (VMs, AKS, GPUs, storage tiers, networking).
- Familiarity with AI/LLM deployments, inference endpoints, or data platform operations.
- Strong analytical and problem solving skills with attention to detail.
- Clear communication skills for working with engineering, finance, and leadership teams.
- FinOps Certified Practitioner (preferred but not required).
- Azure certifications (e.g., AZ900) are beneficial.
Other Job Requirements
Preferred Qualifications
- Experience with FinOps Foundation frameworks or Cloud Financial Management certifications.
- Understanding of MLOps, AI platform operations, or GPU workload management.
- Exposure to Azure AI Studio/AI Foundry, vector databases, and AI pipelines.
- Experience with automation using Python, PowerShell, or Terraform.
- Ability to interpret usage telemetry and performance metrics for optimization decisions.
To be Considered Candidates:
Must be authorized to work in the country where the position is located.
We are an equal opportunity employer. All qualified individuals will receive consideration for employment without regard to race, color, age, sex, sexual orientation, gender identity, religion, national origin, disability, veteran status, genetic information, or any other criteria protected by governing law.