Posted On: Dec 4, 2023
Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler. You can now import tabular, timeseries, image, and text data from 50+ data sources, generate Data Quality and Insights reports, and transform data using 300+ built-in operators to build and use machine learning (ML) models all without writing any code. Through this integration, you can accelerate data preparation for ML from weeks to minutes using SageMaker Canvas.
Aggregating, analyzing, and transforming large amounts of data is critical yet often the most time-consuming part of the ML workflow. Customer can now quickly analyze and visualize data to identify data issues that could impact model quality using the Data Quality and Insights report and clean data and create features for ML using 300+ transformations backed by Spark. Now customers can create a visual data preparation flow in SageMaker Canvas, and import data from Amazon S3, Amazon Athena, Amazon Redshift, Salesforce Data Cloud, Snowflake, and over 50 data sources. Once the data is prepared, customers can scale the data preparation steps to run on distributed Spark processing jobs, export the dataset to train models, or predict outcomes with ready-to-use machine learning and foundation models. Alternatively, they can export their data workflow as a step in a SageMaker pipeline to engineer features, train models, or transform data in near real time for inference in SageMaker Studio.
The new data preparation capabilities are available all AWS regions where SageMaker Canvas is supported. For more information, see the blog and the AWS technical documentation.