Error Handling in OneFlow: Maybe
Written by Xinqi Li, twice, Chi Yao; Translated by Kaiyan Wang, Wenwen Dong
Error Handling Dilemmas in C++
The mechanisms for error handling in C++ programming are roughly divided into two types:
- Functions returning error codes
Consider the implementation of
get_cute_cat in the following code: let
img be processed by
add_rainbow respectively, returning a "cute_cat" image.
The above code has no error handling at all, so the result of the program is uncertain when there is no cat in the
crop_to_cat, or in other unexpected circumstances. In this case, exception handling can be added：
However, exception handling in C++ has its own drawbacks, such as a loss of efficiency. In normal flow (i.e. without triggering an exception), using
try. .catch will not result in any loss of efficiency; but in the case of triggered exceptions, unwind affects efficiency, especially when the exceptions occur frequently (for example, if 50% of the images in the above example had no cat, the program would be much less efficient. Of course, some readers may suggest that it makes more sense to consider extracting the exception logic into the normal program flow, which is an afterthought).
In addition to efficiency issues, it is hard to make C++ programs exception-safe. If a library claims to be exception-safe, the time the author of that library spent to maintain the title of “exception-safe” should have been several times the time it takes to implement the function.
In this respect Google C++ Style Guide lists the detailed advantages and disadvantages of exceptions and gives its own view: most of Google’s projects do not use exceptions for practical reasons (although it might be different if starting from scratch), but use error code checking or assertion.
Large C++ projects such as LLVM have also designed a special Error Handling to avoid the use of exceptions.
Function Returns Error Code
The first problem with using functions that return error codes for error handling is that it is easy to confuse the expected result with an error code that indicates an unexpected situation.
Taking integer division as an example, the divisor cannot be zero:
No matter what the above error code is, it will “crowd out” a correct result. For example, if
-1 is the error code, how to represent the result of calling
div(5,-5) then? To solve such problems, passing parameters or other methods have to be used.
In addition, the checking of error codes,
if, interferes with the normal logic of the program and breaks the logical flow of the program. Take the
get_cute_cat in the above text as an example, if the error check is added:
The logic of the program’s “correct path” is broken up by
if(error), which is not smooth enough.
TensorFlow uses error handling based on error codes. Through the design of general class Status as well as a set of
TF_CHECK_OK related macros to handle errors.
The core implementation is as follows:
The TensorFlow developers use the
TF_CHECK_OK macro or check the return status of the calling function. If it is incorrect, a
QFATAL level event will be triggered, terminating the program.
This somewhat mitigates the problem of
if(error) breaking the correct flow logic. However, the limitations of
TF_CHECK_OK are also obvious:
- Methods that can be checked by
TF_CHECK_OKcan only return the
- Method calls wrapped in
TF_CHECK_OKcan only be used as statements, e.g.
TF_CHECK_OK(Foo(...));, but cannot continue to participate in operations as expressions (e.g.
const auto data = TF_CHECK_OK(Foo(...))is wrong). This is actually a derivative problem of the previous restriction.
Elegant Handling in Haskell: Just Return
The functional language Haskell has already provided an elegant and consistent way of handling errors: Maybe. Let’s take a look at how to implement a division with error handling in Haskell (the code here is written as close to imperative style as possible, considering the larger number of readers who are familiar with imperative programming).
The return type of such a function is of
The characteristic of the
Maybe lies in that the value inside
Maybe is the real value wrapped in
Just if it is the expected result (the "Just" in haskell has the meaning of "just return it").
In the case of an error, the value in the
Maybe type is
The real value wrapped in
Just cannot be calculated directly, but can be retrieved by calling
fromJust (in fact,
>>= is more important, but we won't cover it in this article):
If the value is
Nothing, an exception will be thrown when using
The significant benefit of
Maybe is that the result of the function call can continue to be used as an expression while the error is being handled.
Taking this to a higher level,
Maybe is actually one of the representatives of Monad in Haskell. Monad can be thought of as different data types and control structures, all providing a consistent interface.
Maybe is an ADT that can be loaded with both normal data and error messages, i.e.
Due to the success of Haskell
Maybe, similar data types are available in many other languages:
std::optional in C++ was introduced in C++17, and the proposal to make
monadic operations was not introduced until C++23. As a result, OneFlow implements
Maybe in its own way.
Maybe in OneFlow
Having understood the use of Maybe in Haskell, it is easy to understand how the
Maybe class in OneFlow can be used in conjunction with the
JUST macro. We will show you how to use them in OneFlow and then explain the implementation principles behind them.
Using OneFlow Maybe
Here, using the example of integer trigger again, the
safediv is implemented with OneFlow
Maybe as follows：
CHECK_NE_OR_RETURN and other macros (such as the check of equal
CHECK_EQ_OR_RETURN, the check of not null
CHECK_NOTNULL_OR_RETURN, etc.) are all defined in maybe.h for expression checking.
Any function or method that returns the type of
Maybe needs to be wrapped in the
JUST macro. The result of
JUST is an expression rather than a statement, and can therefore be directly involved in the subsequent evaluation.
Principle of Implementation
Maybe<T> class in OneFlow is defined in maybe.h, and as mentioned above, it is actually
Union<T, Error>. In
maybe.h, there is actually a template specialization for the various data types, that is the various data types all have the same interface, which allows the
Maybe<T> container to load them (and errors). These types include:
- Maybe: This corresponds to the original
voidreturn type and is equivalent to TensorFlow's
- Maybe: User-defined data type (class/struct). When data is retrieved from
Maybe, the type is
- Maybe: Scalar data type in C++, and when the data is retrieved from
Maybe, the type is
- Maybe: Reference data type in C++, and when the data is retrieved from
Maybe, the type is the reference itself.
The main interfaces provided by
IsOk(): whether there is normal data in
error(): gets the error message
Data_YouAreNotAllowedToCallThisFuncOutsideThisFile(): gets the data for the normal process. The reason for the complex name is to alienate the user and prevent them from directly calling
JUST that fetches data from
Maybe is actually a macro defined in just.h
Leaving aside the above type-safe operations (
RemoveRValConst), the reserved intermediate layer (
__JustStackCheckWrapper__) and the overloading functions for compatibility with other data types (
JustIsOk), the core logic of the
JUST macro is actually as follows:
We can see that function calls included by
JUST return results (type
Maybe<T>), which are first stored as
value_to_check and then checked for errors `if(!value_to_check_IsOK(). If an error occurs, the error stack is logged and the error is returned directly.
If everything goes well, use the statement expression syntax to make
value_to_check the value of the entire statement block (from
})) and call
Data_YouAreNotAllowedToCallThisFuncOutsideThisFile to retrieve the normal data from it.
As an aside, the statement expression used here is a GNU extension syntax, supported by GCC and Clang, but not yet by MSVC. In compiled environments where statement expressions are not supported,
JUST may be degraded to use exceptions. Interested readers are welcome to raise an issue or PR to the OneFlow-Inc/oneflow repository.
The above approach ensures that wrapping
JUST can both be used as an expression and allow the current function to return early if an error occurs. Consider the following code for example:
If there is an error in
float x2 = JUST(sqrt(x1));, the execution of the
rsqrt function will terminate and the
sqrt will not be executed. OneFlow's
Maybe<T> in conjunction with
JUST has some other advantages in addition to the basic benefit of being used directly as an expression while handling errors.
Error Stack Constructed by the JUST Chain
Maybe<T> as a function return value in OneFlow, we make the following conventions:
- All functions must not take
Maybe<T>as an input parameter.
- For all functions with
Maybe<T>as their return value, the call must be wrapped in
JUST(or a macro provided by the OneFlow error checking mechanism such as
In this way, if an error occurs, the error return of the innermost
JUST function is propagated upwards layer by layer, building an error stack. For example, try using
flow.gather incorrectly on the Python side:
The resulting error message is recorded from the location of the C++ code that triggered the error, all the way to the Python layer, with friendly error messages.
Some readers may be thinking: in case a developer implements a function that returns
Maybe<T> but forgets to use the
JUST wrapper when calling it, doesn't that result in a broken error stack?
In fact, telling developers to use
JUST is a bit like telling programmers who use exceptions to remember RAII, which sounds reasonable but is prone to omissions.
However, OneFlow has developed a set of static analysis tools around the
JUST conventions based on LLVM (https://github.com/llvm/llvm-project/compare/main...Oneflow-Inc:maybe) to ensure that developers' code follows the
Details of the implementation will be covered in subsequent articles.
The open source distributed deep learning framework OneFlow draws on the ideas from the functional programming language Haskell to build the ADT of
Maybe<T> and construct a set of infrastructures around it. With this set of infrastructure, developers can write high quality and fault-tolerant code without spending extra effort on error handling.
OneFlow has also built static analysis tools based on LLVM to ensure that developers are using OneFlow
- Google C++ Style Guide Summary of Exceptions: https://google.github.io/styleguide/cppguide.html#Exceptions
- Error Handling in LLVM: https://llvm.org/docs/ProgrammersManual.html#error-handling
- Haskell Wikibook on Maybe: https://en.wikibooks.org/wiki/Haskell/Libraries/Maybe
- Haskell Wikibook on Monad: https://wiki.haskell.org/Monad
- The Question Mark Operator (
?) in Rust: https://doc.rust-lang.org/reference/expressions/operator-expr.html#the-question-mark-operator
- The std::optional in C++: https://en.cppreference.com/w/cpp/utility/optional
- C++ Optional Monatic Operations — Published Proposal: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p0798r6.html
I hope this article will help you in your deep learning projects😊. If you want to experience the functions of OneFlow, you can follow the method described in this article. If you have any questions or comments💡 about use, please feel free to leave a comment in the comments section below. Please do the same if you have any comments, remarks or suggestions for improvement. In future articles, we’ll introduce more details of OneFlow.