Explore MLIR Development Process: Take OneFlow as an Example

OneFlow
6 min readJan 21, 2022

Written by Xiaoyu Zhang; Translated by Xiaozhen Liu, Chenyang Xu, Yakun Zhou

This article describes how OneFlow works with MLIR, how to add a graph-level Pass to OneFlow IR, how OneFlow Operations automatically become MLIR Operations, and why OneFlow IR can use MLIR to accelerate computations.

For more details, please refer to GitHub projects: https://github.com/Oneflow-Inc/oneflow, https://github.com/BBuf/tvm_mlir_learn.

Op and Operation mentioned in this article are the same thing.

How OneFlow Works with MLIR

Introducing MLIR as OneFlow’s IR has many advantages. First, it can replace the hand-written Operation definition in OneFlow by C++, which reduces the development difficulty. Second, it can reduce some container-related overhead in the Operation definition. Third, we can also use the infrastructure maintained by MLIR (i.e., multiple Dialect) to accelerate the graph computation. The computation graph can be either an Eager computation graph or a Lazy computation graph. Since the work on accelerating the computation of Eager graphs using MLIR (i.e., oneflow.jit.xxx) is not yet officially open, this article takes the Lazy computational graph (Job) as an example to introduce the process of combining OneFlow and MLIR.

First, we need to compile OneFlow with MLIR enabled. The compile command is as follows:

Then write a case to test:

After running this case, a log folder will be generated in the current run directory. It contains an ir_pass folder, which records the computation graphs (.prototxt) before and after OneFlow MLIR's optimization and MLIR expressions (*.mlir). In addition, there is also a *.mlir.dot file that can be opened with graphviz to visualize the computation graphs of MLIR expressions. Note that if OneFlow is performing a training task, this log folder contains not only the forward computation graph and MLIR expressions, but also generates the backward computation graph and MLIR expressions. So MLIR can work throughout the entire neural network running process. This is an important point that distinguishes it from the forward inference framework: training can also be accelerated.

oneflow/api/python/ir.cppcontains the following codes:

Here, BeforeAD means before the reverse, and kAfterAD can be understood as after the reverse. Here the link between OneFlow computation graph and MLIR is established by registering the mutual transfer process of OneFlow Job and MLIR as a Pass of OneFlow Job. When running OneFlow scripts, if you want to enable MLIR to act on OneFlow computation graphs, enable the ONEFLOW_MLIR_ENABLE_ROUND_TRIP=1 environment variable.

Next, establishing a link between OneFlow computation graph and MLIR means a one-to-one conversion between the operations in OneFlow computation graph and the operations in MLIR. Since the operations in MLIR are defined under Dialect, following the general access principle of MLIR, we implement a OneFlow Dialect and a one-to-one mapping of operations from OneFlow Operation to OneFlow Dialect. For the method to define OneFlow Dialect and Operation, you can refer to the official MLIR documentation Operation Definition Specification (ODS). In addition to Dialect and Operation, some other things need to be defined. For example, the mapping of OneFlow data types to MLIR data types is defined in oneflow/ir/include/OneFlow/OneFlowEnums.td, and some generic front-end interfaces of OneFlow Dialect Operation are defined in oneflow/ir/include/OneFlow/OneFlowEnums.td. Here we take the Reshape Operation as an example to briefly illustrate its components:

For OneFlow_ReshapeOp, OneFlow is the name of Dialect followed by the name of Operation ReshapeOp, which is under Dialect. This operation inherits the OneFlow_BaseOp base class, and declares constraints and front-end interfaces. And then it defines its input, output, and properties. It can be found that the definition of OneFlow Dialect Operation is exactly the same as that of OneFlow User Op, which ensures the legality of the mutual conversion between OneFlow and MLIR. OneFlow Reshape Operation is defined as follows:

The mutual conversion between OneFlow Job and MLIR is implemented in oneflow/ir/oneflow-translate. The main thing to do is to traverse the OpGraph of the Job and process the nodes and edges respectively, and finally convert them into an MLIR expression. After the calculation is completed, Job can be rewritten based on MLIR expressions. The overall logic is complicated because it is necessary to deal with the conversion of various types of operations and edges in OneFlow Job OpGraph. The article will not continue to explain it in depth here, because it is not the point wants to discuss in this article. If you are interested, you can read the code directly.

How OneFlow IR Performs

The above Operation definition took an example of Reshape. If you browse oneflow/ir/include/OneFlow/OneFlowOps.td, you can find that OneFlow_MlirJitOp is also defined. This custom operation is used to execute MLIR expressions. It implements the CPU and GPU Kernel (original source code is in oneflow/ir/oneflow-extension/extension.cpp) to load the JIT execution engine provided by MLIR to run the final LLVM IR. So how did LLVM IR come about? It is obtained by lowering the OneFlow MLIR expression step by step. The specific lowering process is as follows:

It can be found that OneFlow Dialect is first lowered to Tosa Dialect, then to Linalg Dialect. After that, it is lowered to Loop Dialect, and finally to LLVM IR. In the step-down process, the nested loop transformations of Dialect (such as Linalg Dialect) can improve the performance of the final IR. The Lowering process here is triggered when OneFlow calls the Kernel of MlirJitOp (oneflow/ir/oneflow-extension/extension.cpp), and the call is also added to the optimization process as an MLIR Pass. The implementation of the JIT call process Pass can be simplified as:

However, there are still two problems to be solved in this process:

  • The first problem is how to do Op combination. The JIT execution process only considers constant Lowering, if there is some Operation in the OneFlow Dialect that can be fused, what should we do? That is very simple. We use MLIR’s DRR rules and write a series of Fuse Pattern in oneflow/ir/include/OneFlow/OneFlowPatterns.td with TableGen syntax. For example, the two Ops bias_add+gelu can be fused into the fused_bias_add_gelu Op in OneFlow, then we can write the following rule:

Here the expression matching and rewriting are done based on MLIR’s DRR rules. If the current device is a GPU and the two Op’s before and after are gelu and bias_add, fuse them into a fused_bias_add_gelu_op to reduce reads and writes on CUDA to improve execution efficiency.

  • The second problem is how to let some Operations of OneFlow enjoy more optimizations in the MLIR infrastructure. When the multi-level Dialect is lowered layer by layer, each sub-function of OneFlow’s MLIR expressions will be lowered. It will be lowered to Tosa Dialect for the first time. Then if an Operation in this subfunction does not define a method to convert to Tosa Dialect, then it cannot be lowered to Tosa Dialect. It can’t be further lowered to Linalg Dialect, and it can’t enjoy the optimization brought by loop changes (It can be analogous to TVM’s scheduler optimization). To solve this situation we need to define an additional Pass to extract the Op or model converted to Tosa into a function in which the Oneflow Ops are able to be lowered to tosa, and then generate an oneflow mlir jit op to call this function:

We manually implemented the conversion of MulCast (a Pattern) from OneFlow Dialect to Tosa Dialect. Finally, adding this Pass to the optimization process can complete the Pattern in the MLIR expression. It will pass through the two levels of Dialect, Tosa and Linalg, to obtain some optimization opportunities.

Summary

In this article, We took OneFlow as an example to illustrate the actual running process of MLIR, i.e., how to execute and accelerate the deep learning framework’s computation graph with MLIR. Comments and suggestions are most welcome!

I hope this article will help you in your deep learning projects😊. If you want to experience the functions of OneFlow, you can follow the method described in this article. If you have any questions or comments💡 about use, please feel free to leave a comment in the comments section below. Please do the same if you have any comments, remarks or suggestions for improvement. In future articles, we’ll introduce more details of OneFlow.

Related articles:

  1. How to Choose the Grid Size and Block Size for a CUDA Kernel?
  2. OneFlow v0.6.0 came out!

Welcome to visit OneFlow on GitHub and follow us on Twitter and LinkedIn.

Also, welcome to join our Discord group to discuss and ask OneFlow related questions, and connect with OneFlow contributors and users all around the world.

--

--

OneFlow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient. https://github.com/Oneflow-Inc/oneflow