OneFlow v0.6.0 came out!

OneFlow is a performance-centered and open-source deep learning framework.

Welcome to use OneFlow v0.6.0, we would love to hear feedback!

This version mainly updates three parts: framework, model, and OneFlow-ONNX. Hightlights include:

  • Performance optimization in static graphs, dynamic graphs, operators, memory occupation, etc
  • A larger number of common operators
  • Improvements in static graphs and ConsistentTensor
  • Serving functionality as Nvidia Triton’s backend
  • Richer visual pre-training models similar to torchvision and timm
  • Better OneFlow-ONNX conversion functionality

The following are the detailed release notes.


1. Performance Optimization of nn.Graph

Compared to v0.5.0, nn.Graph in v0.6.0 delivers a 10% speedup in training on models such as ResNet AMP and WDL, etc

  • Optimized nn.Graph’s performance in high frequency iterative training scenarios
  • Redesigned the scheduling instructions of nn.Graph and refactored the interaction logic between Actor Graph and Eager VM so that the runtime execution of the Graph is asynchronous and parallel to Python input/output Tensor as much as possible

2. Performance Optimization of Eager

Compared to v0.5.0, v0.6.0 OneFlow Eager’s training speed increases dramatically in small batch scenarios

  • Optimized the scheduling logic for virtual machines
  • Optimized get/set item
  • Optimized tensor.numel()
  • Optimized oneflow.Size()

3. Performance Optimization of Operators

Optimized some operators that affect the performance of new model to significantly improve the training speed of these models

4. Performance Optimization of Eager’s Memory Occupation

Optimized some operators’ memory occupation during net training, making the same computing device run bigger models or data

  • Optimized the backward memory occupation of broadcast binary operators
  • Optimized the backward memory occupation of Slice operator
  • Optimized the memory occupation of LayerNorm operator

5. More Useful Features to Static Computation Graph (nn.Graph)

The newly added features are related to the effeciency, debugging, completeness, and usability of static graphs

To help the debugging of static graphs, we added the following features:

  • debug mode supports graph.debug(1) printing more information about the composition
  • Provided the environment variable ONEFLOW_DEBUG_PASS to show the changes in the computed graph before and after compile-time optimization
  • Added user-readable thread naming information to Nsight Profile for locating and retrieving target key thread locations
  • Added many static graph test cases and added automatic nn.Graph tests that accompany Eager tests

Provided and load() interfaces to support the deployment of models (Serving) using nn.Graph

To do AMP acceleration on GPUs which use TensorCore, the environment variable ONEFLOW_ENABLE_NHWC is provided to indicate the CNN-related operators for channels last calculation

Enabled nn.Graph to support more usage scenarios:

  • Supported for Sparse Update Optimizer for sparse update of parameters in WDL scenarios
  • Supported for using the following nn.Module Containers with nn.Graph: Sequential, ModuleList, ModuleDict, ParameterList, and ParameterDict
  • Supported for creating Optimizer in the init function of nn.Graph
  • Supported multiple parameters sharing the same Tensor with nn.Graph
  • Supported for scenarios where the actual number of processes is greater than the number of GPU devices
  • Supported more Inplace execution for Consistent SBP inference under nn.Graph

6. A Larger Number of Operators

7. User-Defined autograd.Function

Users can customize autograd.Function just like using Torch.

8. Added Basic Serving Functionality

Serving functionality of models is provided by OneFlow as Nvidia Triton’s backend.

9. Added Some Functionalities of Tensor (ConsistentTensor)

  • Supported Tensor using 2-D SBP to represent arbitrary hybrid parallelism (such as a Linear operation that runs data parallelism in the row direction of the device matrix and model parallelism in the column)
  • Supported Tensor’s conversion from arbitrary 1-D SBP to 2-D SBP (the network consists of a mixture of 1-D parallel and 2-D parallel)
  • Supported constructing ConsistentTensor from numpy
  • oneflow.from_numpy()
  • oneflow.numel()
  • tensor.expand_as()


Released flowvision 0.0.54.

1. Richer Visual Pre-training Models

Image Classification

  • CNN series: ResNet, DenseNet, VGG, ResNext, EfficientNet, etc
  • Vision Transformer series: ViT, PVT, Swin-Transformer, etc
  • Vision MLP series: Mlp-Mixer, Res-MLP, g-MLP, etc

Object Detection

  • SSD, SSDLite
  • Faster R-CNN
  • RetinaNet

Image Segmentation

  • FCN
  • DeepLabV3

Style Migration

  • StyleNet: Suport Styles sketch, candy, mosaic, rain_princess, and undie

2. Implemented Data Augmentation Operations Similar to torchvision

For data augmentation operations like CenterCrop and ColorJitter similar to torvhvision, developers can run import flowvision as torchvisionto execute in most scenarios.

3. Implemented Advanced Data Augmentation Opertations Similar to timm

Advanced data augmentation opertations implemented in

  • Mixup
  • CutMix
  • Random-Erasing
  • AutoAugment
  • RandAugment
  • AugMix

4. Separated the Layers Module and Provided a Plug-and-play Block when Building a Model


  • Implemented plug-and-play attention models like Non-Local, SELayer, CBAM, BAM, ECA, etc


  • Provided modules that might be used for model building like PatchEmb, Pooler, ConvBnAct, etc


  • Provided regularization modules such as drop-path, drop-block, and stochastic depth to improve model generalization ability
  • Provided separate files such as activation and weight_init to improve components like activation function and initialize method

OneFlow-ONNX Conversion

Updated OneFlow to ONNX toolkit:

  • Supported OneFlow model converting to ONNX model in CPU or GPU mode
  • Added test cases for operators and models to align all classification models in OneFlowVision library
  • Fixed onnx-runtime bugs during PReLU conversion
  • Compatible with v1.9.0 onnx-runtime library or later versions
  • Released v0.5.4 oneflow-onnx package, and developers can run pip install oneflow-onnx to experience

Full changelog link:

Welcome to visit OneFlow on GitHub and follow us on Twitter and LinkedIn.

Also, welcome to join our Discord group to discuss and ask OneFlow related questions, and connect with OneFlow contributors and users all around the world.




OneFlow is a performance-centered and open-source deep learning framework.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Getting started with Amazon RDS + Python + Postgres

Behind the curbs: Building the curb map

Data Serialization — Protocol Buffers vs Thrift vs Avro

A better way to manage your projects’ configuration

Day #4 of Python

Our First Open-Source Project

SLAE #5–2: Shell-code Analysis for linux/x86/chmod

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


OneFlow is a performance-centered and open-source deep learning framework.

More from Medium

What an Optimal Point-to-Point Communication Library Should Be? (Part 2)

Introduction to Knowledge Distillation

InfiniteRep: An open-source synthetic dataset for remote fitness and PT applications

A ConvNet for the 2020s