AWS Glue upgrades Spark engines, backs Ray framework

Jean J. White

AWS Glue, a serverless data integration service provided by Amazon Web Services, showcases Python and Apache Spark capabilities in a version 4.0 release introduced this week.

The upgrade adds engines for Python 3.10 and Apache Spark 3.3.0. Both engines include performance enhancements and bug fixes, with Spark offering capabilities such as row-level runtime filtering and improved error messages.

New engine plugins in Glue 4.0 support the Ray compute framework, the Cloud Shuffle Service for Spark, and Adaptive Query Execution. Support for the Pandas data analysis and manipulation tool, built on top of Python, also is featured. New data format support covers Apache Hudi, Apache Iceberg, and Delta Lake. Glue 4.0 also includes the Parquet vectorized reader, with support for additional encodings and data types.

AWS Glue provides data discovery, data preparation, data transformation, and data integration capabilities, with autoscaling based on workload size. AWS said Glue also now offers visual transforms for customers to use and share business-specific ETL logic among teams.

AWS announced a preview of AWS Glue for Ray as a new engine option. Data engineers can use AWS Glue for Ray to process large data sets with Python and popular Python libraries. Distributed processing of Python code is done over multi-node clusters.

Glue 4.0 is available now in parts of the US including Ohio, Northern Virginia, and Northern California.

Copyright © 2022 IDG Communications, Inc.

Next Post

‘Deus Ex Go’ To Be Completely Disappeared With Studio Shutdown

from the deus-ex-no dept It is a lesson that evidently keeps needing to be re-figured out above and over once more: for considerably way too many kinds of digital purchases, you simply really do not personal the issue you bought. The arena for this perma-lesson are different: flicks, guides, songs. […]
‘Deus Ex Go’ To Be Completely Disappeared With Studio Shutdown

Subscribe US Now