IBM is introducing a new solution that simplifies the integration, scaling, and acceleration of complex multi-step analytics and machine learning pipelines on the hybrid multi-cloud.
Dubbed CodeFlare, the platform is an an open-source framework for simplifying the integration and efficient scaling of big data and AI workflows onto the hybrid cloud built on top of Ray, an emerging open-source distributed computing framework for machine learning applications. CodeFlare extends the capabilities of Ray by adding specific elements to make scaling workflows easier.
To create a machine learning model today, researchers and developers have to train and optimize the model first. This might involve data cleaning, feature extraction, and model optimization. CodeFlare simplifies this process using a Python-based interface for what’s called a pipeline—by making it simpler to integrate, parallelize and share data.
The goal of the new framework is to unify pipeline workflows across multiple platforms without requiring data scientists to learn a new workflow language.
CodeFlare pipelines run with ease on IBM’s new serverless platform IBM Cloud Code Engine, and Red Hat OpenShift.
It allows users to deploy it just about anywhere, extending the benefits of serverless to data scientists and AI researchers.
It also makes it easier to integrate and bridge with other cloud-native ecosystems by providing adapters to event-triggers (such as the arrival of a new file), and load and partition data from a wide range of sources, such as cloud object storages, data lakes, and distributed filesystems.
CodeFlare should also mean developers aren’t having to duplicate their efforts or struggle to figure out what colleagues have done in the past to get a certain pipeline to run.
With CodeFlare, IBM aims to give data scientists richer tools and APIs that they can use with more consistency, allowing them to focus more on their actual research than the configuration and deployment complexity, according to the company.
IBM will continue to evolve CodeFlare to support increasingly more complex pipelines. The company is planning on providing enhanced fault-tolerance and consistency, as well as improving integration and data management for external sources, and adding support for pipeline visualization.
Source: dbta.com