contact@ijirct.org      

 

Publication Number

2411046

 

Page Numbers

1-7

 

Paper Details

GPU Acceleration Techniques for Optimizing AI-ML Inference in the Cloud

Authors

Charan Shankar Kummarapurugu

Abstract

The demand for real-time Artificial Intelligence (AI) and Machine Learning (ML) inference in cloud environments has grown substantially in recent years. However, delivering high- performance inference at scale remains a challenge due to the computational intensity of AI/ML workloads. General-purpose CPUs often struggle to meet the latency and throughput require- ments of modern AI/ML applications. This paper explores the application of Graphics Processing Units (GPUs) to accelerate in- ference tasks, particularly in cloud environments, where dynamic and scalable resources are essential. We review current GPU- based optimization techniques, focusing on reducing inference latency and enhancing cost-effectiveness. The proposed approach integrates distributed GPU resource management with AI-driven prediction models to balance workloads efficiently across multiple cloud platforms. Experiments conducted on AWS, Azure, and Google Cloud demonstrate that GPU acceleration can reduce inference latency by up to 40% while improving cost efficiency by 30%, compared to CPU-only implementations. These findings highlight the potential of GPU acceleration to transform AI-ML inference in the cloud, making it more scalable and accessible for a wide range of applications.

Keywords

GPU acceleration, AI/ML inference, cloud com- puting, performance optimization, cost-efficiency

 

. . .

Citation

GPU Acceleration Techniques for Optimizing AI-ML Inference in the Cloud. Charan Shankar Kummarapurugu. 2022. IJIRCT, Volume 8, Issue 6. Pages 1-7. https://www.ijirct.org/viewPaper.php?paperId=2411046

Download/View Paper

 

Download/View Count

3

 

Share This Article