OneFlow v0.8.0 Came Out!

4 min readAug 2, 2022

We are thrilled to announce the release of OneFlow v0.8.0. This update contains 523 commits. For the full changlog, please check out: https://github.com/Oneflow-Inc/oneflow/releases/tag/v0.8.0. Welcome to install OneFlow v0.8.0 for a new user experience. Your feedbacks will be much appreciated!

Highlights and optimizations in this release:

1. PyTorch API compatibility

OneFlow v0.8.0 provides more and better PyTorch compatible APIs. In v0.8.0, a series of new features and interfaces that are compatible with PyTorch 1.10.0 are in place, including 68 new APIs that are aligned with PyTorch; 84 bugs are fixed to ensure better compatibility between operators and interfaces, allowing users to transfer more PyTorch models to OneFlow with just one click.

2. Wider support of global operators

All operators support Global Tensor more widely and efficiently. Fixed 28 bugs related to Global Tensor and added 180 Global operator unit tests, making the development of distributed models with Global Tensor faster and easier.

3. Better performance

The advanced features of Graph have been improved for better performance:

In addition to the original ZeRO-DP, ZeRO can be used in parallel with MP, 2-D, and 3-D to further reduce memory overhead.
Added a new pipeline parallelism API for Graph to simplify the configuration for pipeline parallelism and accelerate training when using pipeline parallelism and 3-D parallelism.
Added debugging features in multiple dimensions, including logical graphs, light plan physical graphs, memory analysis, and Python stack information, to further improve efficiency of Graph.debug.

As is shown above, the combination of OneFlow v0.8.0 and LiBai v0.2.0 enables higher computation speeds of GPT and BERT under 3-D parallelism on multiple dimensions, surpassing those of Megatron-LM with the same configurations. (For more details, see: https://libai.readthedocs.io/en/latest/tutorials/get_started/Benchmark.html).

4. OneEmbedding component

OneEmbedding is an extended component specifically designed for large-scale recommender systems. It boasts excellent performance, extensibility, and flexibility. Its features include:

Support for tiered storage and dynamic expansion of Embedding so users can expand the Embedding capacity at a lower cost.
Support for mixed parallelism strategies so users can easily extend models to multi-node multi-GPU scenarios.
Quantitative compression of communication. It can reduce communication volume and accelerate training in parallel scenarios by quantitative compression of the communication data.
Efficient data pipeline. It can move up the non-data dependent parts of the models for execution to save time.
Support for Automatic Mixed Precision training. It can transfer part of the computations in model training into FP16 computations, so as to reduce memory usage and accelerate training without loss of convergence accuracy.
Inclusion of a series of high-performance CUDA operators for common operations in recommender system models.
Support for flexible model building.

API Documentation: https://docs.oneflow.org/en/master/cookies/one_embedding.html

5. Multi-Device adaptation

OneFlow v0.8.0 provides a neat, efficient, and easily extensible hardware abstraction layer EP (Execution Provider) to adapt to different hardware. With the introduction of the hardware abstraction layer, no modifications are needed for any module of the framework to adapt to new hardware devices, regardless of the implementation details of any underlying hardware or framework.

To make the new hardware devices work, users only need to implement a series of interfaces based on the protocols of the hardware abstraction interfaces and the status quo of the hardware devices.

EP also defines a set of basic computing interface primitives, allowing the reimplementation of kernels. Primitives provide interfaces that are more flexible than the runtime interfaces provided by EP. Different interfaces are independent of each other, and each interface represents a kind of computing capability that can be provided by a certain hardware device.

6. Debugging tool stack

New debug tools: OneFlow-Profiler and AutoProf.

OneFlow-Profiler is a tool used to collect performance information during framework execution. It can keep records of the execution time of operators and system components, the allocation of memory, and the corresponding input and parameters of operators. All this information helps developers find out the main source of overhead in framework execution and thus implement targeted optimization. (Oneflow-Inc/oneflow#8047)

AutoProf is a framework for testing the performance of OneFlow and PyTorch operators. It provides an elegant and efficient method to detect the alignment between OneFlow APIs and PyTorch APIs, allowing users to conveniently compare the performance of OneFlow APIs and PyTorch APIs. (Oneflow-Inc/oneflow#8207)

7. Error message

Improved error message with more details. Refactored exception handling.

8. API documentation

Made over 20 revisions to the OneFlow API documentation, restructured the documentation based on features, and added further elaboration of modules and environment variables including OneFlow oneflow.nn.graph, oneflow.embedding, and oneflow.autograd, in addition to the general operator APIs.

To view the full changelog, please check out OneFlow v0.8.0 Release Note.

Welcome to visit OneFlow on GitHub and follow us on Twitter and LinkedIn.

Also, welcome to join our Discord group to discuss and ask OneFlow related questions, and connect with OneFlow contributors and users all around the world.