Error Handling in OneFlow: Maybe

OneFlow
8 min readNov 19, 2021

Written by Xinqi Li, twice, Chi Yao; Translated by Kaiyan Wang, Wenwen Dong

Error Handling Dilemmas in C++

The mechanisms for error handling in C++ programming are roughly divided into two types:

  • Exceptions
  • Functions returning error codes

Exceptions

Consider the implementation of get_cute_cat in the following code: let img be processed by crop_to_cat, add_bow_tie, make_eyes_sparkle, make_smaller, add_rainbow respectively, returning a "cute_cat" image.

The above code has no error handling at all, so the result of the program is uncertain when there is no cat in the img of crop_to_cat, or in other unexpected circumstances. In this case, exception handling can be added:

However, exception handling in C++ has its own drawbacks, such as a loss of efficiency. In normal flow (i.e. without triggering an exception), using try. .catch will not result in any loss of efficiency; but in the case of triggered exceptions, unwind affects efficiency, especially when the exceptions occur frequently (for example, if 50% of the images in the above example had no cat, the program would be much less efficient. Of course, some readers may suggest that it makes more sense to consider extracting the exception logic into the normal program flow, which is an afterthought).

In addition to efficiency issues, it is hard to make C++ programs exception-safe. If a library claims to be exception-safe, the time the author of that library spent to maintain the title of “exception-safe” should have been several times the time it takes to implement the function.

In this respect Google C++ Style Guide lists the detailed advantages and disadvantages of exceptions and gives its own view: most of Google’s projects do not use exceptions for practical reasons (although it might be different if starting from scratch), but use error code checking or assertion.

Large C++ projects such as LLVM have also designed a special Error Handling to avoid the use of exceptions.

Function Returns Error Code

The first problem with using functions that return error codes for error handling is that it is easy to confuse the expected result with an error code that indicates an unexpected situation.

Taking integer division as an example, the divisor cannot be zero:

No matter what the above error code is, it will “crowd out” a correct result. For example, if -1 is the error code, how to represent the result of calling div(5,-5) then? To solve such problems, passing parameters or other methods have to be used.

In addition, the checking of error codes, if, interferes with the normal logic of the program and breaks the logical flow of the program. Take the get_cute_cat in the above text as an example, if the error check is added:

The logic of the program’s “correct path” is broken up by if(error), which is not smooth enough.

TensorFlow uses error handling based on error codes. Through the design of general class Status as well as a set of TF_CHECK_OK related macros to handle errors.

The core implementation is as follows:

The TensorFlow developers use the TF_CHECK_OK macro or check the return status of the calling function. If it is incorrect, a FATAL or QFATAL level event will be triggered, terminating the program.

This somewhat mitigates the problem of if(error) breaking the correct flow logic. However, the limitations of TF_CHECK_OK are also obvious:

  • Methods that can be checked by TF_CHECK_OK can only return the Status class.
  • Method calls wrapped in TF_CHECK_OK can only be used as statements, e.g. TF_CHECK_OK(Foo(...)); , but cannot continue to participate in operations as expressions (e.g. const auto data = TF_CHECK_OK(Foo(...)) is wrong). This is actually a derivative problem of the previous restriction.

Elegant Handling in Haskell: Just Return

The functional language Haskell has already provided an elegant and consistent way of handling errors: Maybe. Let’s take a look at how to implement a division with error handling in Haskell (the code here is written as close to imperative style as possible, considering the larger number of readers who are familiar with imperative programming).

The return type of such a function is of Maybe:

The characteristic of the Maybe lies in that the value inside Maybe is the real value wrapped in Just if it is the expected result (the "Just" in haskell has the meaning of "just return it").

In the case of an error, the value in the Maybe type is Nothing:

The real value wrapped in Just cannot be calculated directly, but can be retrieved by calling fromJust (in fact, >>= is more important, but we won't cover it in this article):

If the value is Nothing, an exception will be thrown when using fromJust:

The significant benefit of Maybe is that the result of the function call can continue to be used as an expression while the error is being handled.

Taking this to a higher level, Maybe is actually one of the representatives of Monad in Haskell. Monad can be thought of as different data types and control structures, all providing a consistent interface.

More abstractly, Maybe is an ADT that can be loaded with both normal data and error messages, i.e. Union<T, Error>.

Due to the success of Haskell Maybe, similar data types are available in many other languages:

But std::optional in C++ was introduced in C++17, and the proposal to make std::optional support monadic operations was not introduced until C++23. As a result, OneFlow implements Maybe in its own way.

Maybe in OneFlow

Having understood the use of Maybe in Haskell, it is easy to understand how the Maybe class in OneFlow can be used in conjunction with the JUST macro. We will show you how to use them in OneFlow and then explain the implementation principles behind them.

Using OneFlow Maybe

Here, using the example of integer trigger again, the safediv is implemented with OneFlow Maybe as follows:

The CHECK_NE_OR_RETURN and other macros (such as the check of equal CHECK_EQ_OR_RETURN, the check of not null CHECK_NOTNULL_OR_RETURN, etc.) are all defined in maybe.h for expression checking.

Any function or method that returns the type of Maybe needs to be wrapped in the JUST macro. The result of JUST is an expression rather than a statement, and can therefore be directly involved in the subsequent evaluation.

Principle of Implementation

The Maybe<T> class in OneFlow is defined in maybe.h, and as mentioned above, it is actually Union<T, Error>. In maybe.h, there is actually a template specialization for the various data types, that is the various data types all have the same interface, which allows the Maybe<T> container to load them (and errors). These types include:

  • Maybe: This corresponds to the original void return type and is equivalent to TensorFlow's Status class.
  • Maybe: User-defined data type (class/struct). When data is retrieved from Maybe, the type is shared_ptr<ClassType>.
  • Maybe: Scalar data type in C++, and when the data is retrieved from Maybe, the type is ScalarType itself.
  • Maybe: Reference data type in C++, and when the data is retrieved from Maybe, the type is the reference itself.

The main interfaces provided by Maybe<T> are:

  • IsOk(): whether there is normal data in Maybe<T>
  • error(): gets the error message
  • Data_YouAreNotAllowedToCallThisFuncOutsideThisFile(): gets the data for the normal process. The reason for the complex name is to alienate the user and prevent them from directly calling

The JUST that fetches data from Maybe is actually a macro defined in just.h

Leaving aside the above type-safe operations (RemoveRValConst), the reserved intermediate layer (__JustStackCheckWrapper__) and the overloading functions for compatibility with other data types (JustIsOk), the core logic of the JUST macro is actually as follows:

We can see that function calls included by JUST return results (type Maybe<T>), which are first stored as value_to_check and then checked for errors `if(!value_to_check_IsOK(). If an error occurs, the error stack is logged and the error is returned directly.

If everything goes well, use the statement expression syntax to make value_to_check the value of the entire statement block (from ({ to })) and call Data_YouAreNotAllowedToCallThisFuncOutsideThisFile to retrieve the normal data from it.

As an aside, the statement expression used here is a GNU extension syntax, supported by GCC and Clang, but not yet by MSVC. In compiled environments where statement expressions are not supported, JUST may be degraded to use exceptions. Interested readers are welcome to raise an issue or PR to the OneFlow-Inc/oneflow repository.

The above approach ensures that wrapping JUST can both be used as an expression and allow the current function to return early if an error occurs. Consider the following code for example:

If there is an error in float x2 = JUST(sqrt(x1));, the execution of the rsqrt function will terminate and the div after sqrt will not be executed. OneFlow's Maybe<T> in conjunction with JUST has some other advantages in addition to the basic benefit of being used directly as an expression while handling errors.

Error Stack Constructed by the JUST Chain

When using Maybe<T> as a function return value in OneFlow, we make the following conventions:

  1. All functions must not take Maybe<T> as an input parameter.
  2. For all functions with Maybe<T> as their return value, the call must be wrapped in JUST (or a macro provided by the OneFlow error checking mechanism such as CHECK_JUST).

In this way, if an error occurs, the error return of the innermost JUST function is propagated upwards layer by layer, building an error stack. For example, try using flow.gather incorrectly on the Python side:

The resulting error message is recorded from the location of the C++ code that triggered the error, all the way to the Python layer, with friendly error messages.

Some readers may be thinking: in case a developer implements a function that returns Maybe<T> but forgets to use the JUST wrapper when calling it, doesn't that result in a broken error stack?

In fact, telling developers to use JUST is a bit like telling programmers who use exceptions to remember RAII, which sounds reasonable but is prone to omissions.

However, OneFlow has developed a set of static analysis tools around the Maybe<T> and JUST conventions based on LLVM (https://github.com/llvm/llvm-project/compare/main...Oneflow-Inc:maybe) to ensure that developers' code follows the JUST conventions.

Details of the implementation will be covered in subsequent articles.

Conclusion

The open source distributed deep learning framework OneFlow draws on the ideas from the functional programming language Haskell to build the ADT of Maybe<T> and construct a set of infrastructures around it. With this set of infrastructure, developers can write high quality and fault-tolerant code without spending extra effort on error handling.

OneFlow has also built static analysis tools based on LLVM to ensure that developers are using OneFlow Maybe<T> correctly.

References

I hope this article will help you in your deep learning projects😊. If you want to experience the functions of OneFlow, you can follow the method described in this article. If you have any questions or comments💡 about use, please feel free to leave a comment in the comments section below. Please do the same if you have any comments, remarks or suggestions for improvement. In future articles, we’ll introduce more details of OneFlow.

Related articles:

  1. How to Implement a Permute/Transpose Op 6 Times Faster Than PyTorch?
  2. Automatic Type Promotion in OneFlow

Welcome to visit OneFlow on GitHub and follow us on Twitter and LinkedIn.

Also, welcome to join our Discord group to discuss and ask OneFlow related questions, and connect with OneFlow contributors and users all around the world.

--

--

OneFlow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient. https://github.com/Oneflow-Inc/oneflow