Written by Xinqi Li, twice, Chi Yao; Translated by Kaiyan Wang, Wenwen Dong
Error Handling Dilemmas in C++
The mechanisms for error handling in C++ programming are roughly divided into two types:
- Exceptions
- Functions returning error codes
Exceptions
Consider the implementation of get_cute_cat
in the following code: let img
be processed by crop_to_cat
, add_bow_tie
, make_eyes_sparkle
, make_smaller
, add_rainbow
respectively, returning a "cute_cat" image.
The above code has no error handling at all, so the result of the program is uncertain when there is no cat in the img
of crop_to_cat
, or in other unexpected circumstances. In this case, exception handling can be added:
However, exception handling in C++ has its own drawbacks, such as a loss of efficiency. In normal flow (i.e. without triggering an exception), using try. .catch
will not result in any loss of efficiency; but in the case of triggered exceptions, unwind affects efficiency, especially when the exceptions occur frequently (for example, if 50% of the images in the above example had no cat, the program would be much less efficient. Of course, some readers may suggest that it makes more sense to consider extracting the exception logic into the normal program flow, which is an afterthought).
In addition to efficiency issues, it is hard to make C++ programs exception-safe. If a library claims to be exception-safe, the time the author of that library spent to maintain the title of “exception-safe” should have been several times the time it takes to implement the function.
In this respect Google C++ Style Guide lists the detailed advantages and disadvantages of exceptions and gives its own view: most of Google’s projects do not use exceptions for practical reasons (although it might be different if starting from scratch), but use error code checking or assertion.
Large C++ projects such as LLVM have also designed a special Error Handling to avoid the use of exceptions.
Function Returns Error Code
The first problem with using functions that return error codes for error handling is that it is easy to confuse the expected result with an error code that indicates an unexpected situation.
Taking integer division as an example, the divisor cannot be zero:
No matter what the above error code is, it will “crowd out” a correct result. For example, if -1
is the error code, how to represent the result of calling div(5,-5)
then? To solve such problems, passing parameters or other methods have to be used.
In addition, the checking of error codes, if
, interferes with the normal logic of the program and breaks the logical flow of the program. Take the get_cute_cat
in the above text as an example, if the error check is added:
The logic of the program’s “correct path” is broken up by if(error)
, which is not smooth enough.
TensorFlow uses error handling based on error codes. Through the design of general class Status as well as a set of TF_CHECK_OK
related macros to handle errors.
The core implementation is as follows:
The TensorFlow developers use the TF_CHECK_OK
macro or check the return status of the calling function. If it is incorrect, a FATAL
or QFATAL
level event will be triggered, terminating the program.
This somewhat mitigates the problem of if(error)
breaking the correct flow logic. However, the limitations of TF_CHECK_OK
are also obvious:
- Methods that can be checked by
TF_CHECK_OK
can only return theStatus
class. - Method calls wrapped in
TF_CHECK_OK
can only be used as statements, e.g.TF_CHECK_OK(Foo(...));
, but cannot continue to participate in operations as expressions (e.g.const auto data = TF_CHECK_OK(Foo(...))
is wrong). This is actually a derivative problem of the previous restriction.
Elegant Handling in Haskell: Just Return
The functional language Haskell has already provided an elegant and consistent way of handling errors: Maybe. Let’s take a look at how to implement a division with error handling in Haskell (the code here is written as close to imperative style as possible, considering the larger number of readers who are familiar with imperative programming).
The return type of such a function is of Maybe
:
The characteristic of the Maybe
lies in that the value inside Maybe
is the real value wrapped in Just
if it is the expected result (the "Just" in haskell has the meaning of "just return it").
In the case of an error, the value in the Maybe
type is Nothing
:
The real value wrapped in Just
cannot be calculated directly, but can be retrieved by calling fromJust
(in fact, >>=
is more important, but we won't cover it in this article):
If the value is Nothing
, an exception will be thrown when using fromJust
:
The significant benefit of Maybe
is that the result of the function call can continue to be used as an expression while the error is being handled.
Taking this to a higher level, Maybe
is actually one of the representatives of Monad in Haskell. Monad can be thought of as different data types and control structures, all providing a consistent interface.
More abstractly, Maybe
is an ADT that can be loaded with both normal data and error messages, i.e. Union<T, Error>
.
Due to the success of Haskell Maybe
, similar data types are available in many other languages:
But std::optional
in C++ was introduced in C++17, and the proposal to make std::optional
support monadic operations
was not introduced until C++23. As a result, OneFlow implements Maybe
in its own way.
Maybe in OneFlow
Having understood the use of Maybe in Haskell, it is easy to understand how the Maybe
class in OneFlow can be used in conjunction with the JUST
macro. We will show you how to use them in OneFlow and then explain the implementation principles behind them.
Using OneFlow Maybe
Here, using the example of integer trigger again, the safediv
is implemented with OneFlow Maybe
as follows:
The CHECK_NE_OR_RETURN
and other macros (such as the check of equal CHECK_EQ_OR_RETURN
, the check of not null CHECK_NOTNULL_OR_RETURN
, etc.) are all defined in maybe.h for expression checking.
Any function or method that returns the type of Maybe
needs to be wrapped in the JUST
macro. The result of JUST
is an expression rather than a statement, and can therefore be directly involved in the subsequent evaluation.
Principle of Implementation
The Maybe<T>
class in OneFlow is defined in maybe.h, and as mentioned above, it is actually Union<T, Error>
. In maybe.h
, there is actually a template specialization for the various data types, that is the various data types all have the same interface, which allows the Maybe<T>
container to load them (and errors). These types include:
- Maybe: This corresponds to the original
void
return type and is equivalent to TensorFlow'sStatus
class. - Maybe: User-defined data type (class/struct). When data is retrieved from
Maybe
, the type isshared_ptr<ClassType>
. - Maybe: Scalar data type in C++, and when the data is retrieved from
Maybe
, the type isScalarType
itself. - Maybe: Reference data type in C++, and when the data is retrieved from
Maybe
, the type is the reference itself.
The main interfaces provided by Maybe<T>
are:
IsOk()
: whether there is normal data inMaybe<T>
error()
: gets the error messageData_YouAreNotAllowedToCallThisFuncOutsideThisFile()
: gets the data for the normal process. The reason for the complex name is to alienate the user and prevent them from directly calling
The JUST
that fetches data from Maybe
is actually a macro defined in just.h
Leaving aside the above type-safe operations (RemoveRValConst
), the reserved intermediate layer (__JustStackCheckWrapper__
) and the overloading functions for compatibility with other data types (JustIsOk
), the core logic of the JUST
macro is actually as follows:
We can see that function calls included by JUST
return results (type Maybe<T>
), which are first stored as value_to_check
and then checked for errors `if(!value_to_check_IsOK(). If an error occurs, the error stack is logged and the error is returned directly.
If everything goes well, use the statement expression syntax to make value_to_check
the value of the entire statement block (from ({
to })
) and call Data_YouAreNotAllowedToCallThisFuncOutsideThisFile
to retrieve the normal data from it.
As an aside, the statement expression used here is a GNU extension syntax, supported by GCC and Clang, but not yet by MSVC. In compiled environments where statement expressions are not supported, JUST
may be degraded to use exceptions. Interested readers are welcome to raise an issue or PR to the OneFlow-Inc/oneflow repository.
The above approach ensures that wrapping JUST
can both be used as an expression and allow the current function to return early if an error occurs. Consider the following code for example:
If there is an error in float x2 = JUST(sqrt(x1));
, the execution of the rsqrt
function will terminate and the div
after sqrt
will not be executed. OneFlow's Maybe<T>
in conjunction with JUST
has some other advantages in addition to the basic benefit of being used directly as an expression while handling errors.
Error Stack Constructed by the JUST Chain
When using Maybe<T>
as a function return value in OneFlow, we make the following conventions:
- All functions must not take
Maybe<T>
as an input parameter. - For all functions with
Maybe<T>
as their return value, the call must be wrapped inJUST
(or a macro provided by the OneFlow error checking mechanism such asCHECK_JUST
).
In this way, if an error occurs, the error return of the innermost JUST
function is propagated upwards layer by layer, building an error stack. For example, try using flow.gather
incorrectly on the Python side:
The resulting error message is recorded from the location of the C++ code that triggered the error, all the way to the Python layer, with friendly error messages.
Some readers may be thinking: in case a developer implements a function that returns Maybe<T>
but forgets to use the JUST
wrapper when calling it, doesn't that result in a broken error stack?
In fact, telling developers to use JUST
is a bit like telling programmers who use exceptions to remember RAII, which sounds reasonable but is prone to omissions.
However, OneFlow has developed a set of static analysis tools around the Maybe<T>
and JUST
conventions based on LLVM (https://github.com/llvm/llvm-project/compare/main...Oneflow-Inc:maybe) to ensure that developers' code follows the JUST
conventions.
Details of the implementation will be covered in subsequent articles.
Conclusion
The open source distributed deep learning framework OneFlow draws on the ideas from the functional programming language Haskell to build the ADT of Maybe<T>
and construct a set of infrastructures around it. With this set of infrastructure, developers can write high quality and fault-tolerant code without spending extra effort on error handling.
OneFlow has also built static analysis tools based on LLVM to ensure that developers are using OneFlow Maybe<T>
correctly.
References
- Google C++ Style Guide Summary of Exceptions: https://google.github.io/styleguide/cppguide.html#Exceptions
- Error Handling in LLVM: https://llvm.org/docs/ProgrammersManual.html#error-handling
- Haskell Wikibook on Maybe: https://en.wikibooks.org/wiki/Haskell/Libraries/Maybe
- Haskell Wikibook on Monad: https://wiki.haskell.org/Monad
- The Question Mark Operator (
?
) in Rust: https://doc.rust-lang.org/reference/expressions/operator-expr.html#the-question-mark-operator - The std::optional in C++: https://en.cppreference.com/w/cpp/utility/optional
- C++ Optional Monatic Operations — Published Proposal: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p0798r6.html
I hope this article will help you in your deep learning projects😊. If you want to experience the functions of OneFlow, you can follow the method described in this article. If you have any questions or comments💡 about use, please feel free to leave a comment in the comments section below. Please do the same if you have any comments, remarks or suggestions for improvement. In future articles, we’ll introduce more details of OneFlow.
Related articles:
Welcome to visit OneFlow on GitHub and follow us on Twitter and LinkedIn.
Also, welcome to join our Discord group to discuss and ask OneFlow related questions, and connect with OneFlow contributors and users all around the world.