Posted On: Mar 13, 2024
The Amazon S3 Connector for PyTorch now supports saving PyTorch Lightning model checkpoints directly to Amazon S3, improving the cost and performance of your machine learning training jobs. PyTorch Lightning is an open source framework that provides a high-level interface for training with PyTorch. The Amazon S3 Connector for PyTorch automatically optimizes S3 requests to improve data loading and checkpoint performance for your training workloads. Saving PyTorch Lightning model checkpoints is up to 40% faster with the Amazon S3 Connector for PyTorch than writing to Amazon EC2 instance storage.
The Amazon S3 Connector for PyTorch delivers a new implementation of PyTorch Lightning's checkpoint primitive that you can use to save machine learning model checkpoints directly to Amazon S3. Model checkpointing typically requires pausing training jobs, so the time needed to save a checkpoint impacts overall training times. With this integration, you can save, load, and delete checkpoints directly from PyTorch Lightning training jobs to Amazon S3.
Amazon S3 Connector for PyTorch is an open source project. To get started, visit the GitHub page.