Google

Staff Software Engineer, Machine Learning Infrastructure, Google Cloud

Job Description

  • Design, implement and advance the telemetry capabilities needed for monitoring and evaluating the fleet-wide efficiency of ML resources (TPUs and GPUs). This includes identifying the right underlying signals, devising the right high-level metrics of interest, and creating common dashboards for highlighting fleet-wide performance and efficiency.
  • Identify opportunities to improve the efficiency of the ML fleet and build solutions and capabilities to improve ML fleet efficiency.
  • Build reporting and analytic solutions with key partners,  and provide in-depth analysis of the metrics to improve the operation and utilization of ML resources. 
  • Drive collaboration with various teams (across different PAs) as needed to accomplish the efficiency improvement goals.
  • Lead junior SWEs towards delivering project goals.

Minimum qualifications:

  • Bachelor's degree in Computer Science or a related technical field or equivalent practical experience.
  • 8 years of experience with software development in one or more programming languages, and with data structures/algorithms.
  • 5 years of experience testing, and launching software products, and 3 years of experience with software design and architecture.
  • 5 years of experience with machine learning algorithms and tools (e.g., TensorFlow), artificial intelligence, deep learning or natural language processing.

Preferred qualifications:

  • Experience with Kubernetes, Google Kubernetes Engine, GPU Programming, TensorFlow, and Cloud.
  • Experience analyzing ML models performance or working on LLM prompting, training or developing LLMs.
  • Experience and knowledge of CPU/GPU architecture or HW accelerators
  • Ability to quickly adapt to new tools, frameworks, and languages.
  • Bachelor's degree in Computer Science or a related technical field or equivalent practical experience.
  • 8 years of experience with software development in one or more programming languages, and with data structures/algorithms.
  • 5 years of experience testing, and launching software products, and 3 years of experience with software design and architecture.
  • 5 years of experience with machine learning algorithms and tools (e.g., TensorFlow), artificial intelligence, deep learning or natural language processing.

Google Cloud's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google Cloud's needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. You will anticipate our customer needs and be empowered to act like an owner, take action and innovate. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

In this role, you will join a team that's part of Google's Core ML organization, focused on optimizing Google's Machine Learning resources. You will help develop monitoring tools and dashboards to track the performance and efficiency of TPUs and GPUs, which are used across all Google products. This data helps to improve resource allocation, identify areas for improvement, and drive efficiency gains across Google's products.

The ML, Systems, & Cloud AI (MSCA) organization at Google designs, implements, and manages the hardware, software, machine learning, and systems infrastructure for all Google services (Search, YouTube, etc.) and Google Cloud. Our end users are Googlers, Cloud customers and the billions of people who use Google services around the world.

We prioritize security, efficiency, and reliability across everything we do - from developing our latest TPUs to running a global network, while driving towards shaping the future of hyperscale computing. Our global impact spans software and hardware, including Google Cloud’s Vertex AI, the leading AI platform for bringing Gemini models to enterprise customers.


Jobs at Bengaluru

Oracle

Software Developer 4

Professional

Bengaluru, Karnataka

View Details

Last Date: Aug. 26, 2025

Oracle

HCM Workforce Management Functional…

Professional

Bengaluru, Karnataka

View Details

Last Date: July 16, 2025

Oracle

Full Stack Quality Assurance Engine…

Professional

Bengaluru, Karnataka

View Details

Last Date: May 13, 2025

Amazon

Applied Scientist II, Alexa AI

Freshers/Experienced

Bengaluru, Karnataka

View Details

Last Date: May 13, 2025

Oracle

CSS-Oracle SAAS - Fusion HCM Functi…

Professional

Bengaluru, Karnataka

View Details

Last Date: June 26, 2025

Oracle

Senior Principal TAM, Oracle SaaS

Professional

Bengaluru, Karnataka

View Details

Last Date: Oct. 6, 2025

Oracle

Consulting Technical Manager- Java …

Professional

Bengaluru, Karnataka

View Details

Last Date: Oct. 11, 2025

Oracle

Senior Consultant

Professional

Bengaluru, Karnataka

View Details

Last Date: Aug. 3, 2025

Amazon

Software Dev Engineer II, UTRT

Freshers/Experienced

Bengaluru, Karnataka

View Details

Last Date: May 13, 2025

Amazon

Senior SoC Validation Engineer, Ama…

Freshers/Experienced

Bengaluru, Karnataka

View Details

Last Date: May 13, 2025

Amazon

Software Development Engineer , Ama…

Freshers/Experienced

Bengaluru, Karnataka

View Details

Last Date: May 20, 2025

Oracle

Controller Manager-Fin

Professional

Bengaluru, Karnataka

View Details

Last Date: Sept. 16, 2025




More Jobs at Google

Google

Software Engineer, PhD, Early Caree…

Freshers/Experienced

Bengaluru, Karnataka

View Details

Last Date: May 8, 2025

Google

Product Manager, Google Global Netw…

Freshers/Experienced

Bengaluru, Karnataka

View Details

Last Date: May 15, 2025

Google

Technical Program Manager, AI Data …

Freshers/Experienced

Hyderabad, Telangana

View Details

Last Date: May 15, 2025

Google

Engineering Analyst, Video and Imag…

Freshers/Experienced

Bengaluru, Karnataka

View Details

Last Date: May 15, 2025

Google

Software Engineer III, AI/ML, gUP E…

Freshers/Experienced

Hyderabad, Telangana

View Details

Last Date: May 8, 2025

Google

Staff Software Engineer, Google Clo…

Freshers/Experienced

Bengaluru, Karnataka

View Details

Last Date: May 15, 2025

Google

Mainframe Modernization Consultant,…

Freshers/Experienced

Pune, Maharashtra

View Details

Last Date: May 15, 2025

Google

Product Manager, Creator Support La…

Freshers/Experienced

Bengaluru, Karnataka

View Details

Last Date: May 8, 2025

Google

SoC Interface Architect, Silicon

Freshers/Experienced

Bengaluru, Karnataka

View Details

Last Date: May 15, 2025

Google

UX Engineer, Android, Google TV

Freshers/Experienced

Bengaluru, Karnataka

View Details

Last Date: May 15, 2025

Google

Director, Engineering, GCS Storage …

Freshers/Experienced

Hyderabad, Telangana

View Details

Last Date: May 15, 2025

Google

Performance Lead, YouTube Support O…

Freshers/Experienced

Bengaluru, Karnataka

View Details

Last Date: May 15, 2025




Actively Recruiting Companies at Bengaluru, Karnataka