List 1: Framework Analysis
Keywords: OneFlow, GPT-3
OneFlow Made Training GPT-3 Easier (Part 1)
Correct Level of Abstraction for Distributed Deep Learning Frameworks (Part2)
Runtime of OneFlow Based On Boxing and Actor Model (Part 3)
The Limitations of Existing Deep Learning Frameworks: Resource Dependency
The Limitations of Existing Deep Learning Frameworks: Data Movement
The Limitations of Existing Deep Learning Frameworks: Dynamic Scheduling
List 2: Industry View
Keywords: AI Industry, Software System
Combating Software System Complexity: Entities Should Not Be Multiplied Unnecessarily
Combating Software System Complexity: Conceptual Integrity and Uniform Metaphor
Combating Software System Complexity: Appropriate Abstraction Layer
On AI Industrialization Dilemma and the Inspiration from Database Standardization
How to Go beyond Data Parallelism and Model Parallelism: Starting from GShard
What an Optimal Point-to-Point Communication Library Should Be? (Part 1)
What an Optimal Point-to-Point Communication Library Should Be? (Part 2)
List 3: Technique Sharing
Keywords: PyTorch, CUDA, Operator Optimization
Quantization Aware Training of Deep Learning Frameworks and the Implementation in OneFlow
Adding Expand and Repeat Ops into OneFlow
Adding Unfold and Fold Ops into OneFlow
Automatic Type Promotion in OneFlow
How to Implement a Permute/Transpose Op 6 Times Faster Than PyTorch?
Error Handling in OneFlow: Maybe
How to Implement an Efficient Softmax CUDA Kernel — OneFlow Performance Optimization
How to Implement an Efficient LayerNorm CUDA Kernel — OneFlow Performance Optimization
OneFlow’s Optimization of CUDA Elementwise Template Library: Practical, Efficient, and Extensible