6 min readDec 1, 2022

Text to Image in less than 1 Second, Probably the Fastest Open Source Stable Diffusion Ever

OneFlow has refreshed the SOTA inference performance of Stable Diffusion. On A100 GPU, whether it is PCIe 40GB or SXM 80GB, OneFlow Stable Diffusion leads the performance results compared to other deep learning frameworks/compilers.

The first automobile in the world ran at a speed of merely 16 km/h, easily beaten by a normal carriage. That’s why the initial cars were nothing more than “a cool toy” for quite a long time. AI text-to-image generators were born similarly.

AI art generators started with portrait stylization, an image processing function mostly designed for entertainment. People used it to smooth their skin in photos and generate fun avatars but soon lost interest, like many social media trends go.

But then came the real game changer: the diffusion models. The models allow painters and designers to save the trouble of deciding the colors and composition before drawing. They can just tell the AI model what they want, and then it will generate beautifully crafted images from scratch as required.

However, like the early cars, if the diffusion models could only run at unsatisfiable speeds, they would never go beyond a toy and become a real production tool for humankind.

At first, AI art generators took days to produce an image, then hours, then minutes. They were getting faster and faster, but the question is, how fast do they have to be before they can be put into the everyday toolkit for professional artists and even the general public?

No specific answer has revealed itself. But now, it’s safe to say that with the newly released OneFlow Stable Diffusion, the day is on the horizon!

One of the landmark events is that recently OneFlow accelerated the Stable Diffusion to the era of “generating in one second” for the first time, and then AI community started a race of speeding up the model. Just now, OneFlow refreshed the SOTA record again.

OneFlow Stable Diffusion URL：https://github.com/Oneflow-Inc/diffusers/wiki/How-to-Run-OneFlow-Stable-Diffusion
OneFlow URL：https://github.com/Oneflow-Inc/oneflow/

OneFlow Stable Diffusion: faster than fast

On November 7th, OneFlow announced that the Stable Diffusion model has been literally achieved “generating in 1 second”. In the comparison of various hardware and other frameworks, OneFlow has pushed the reasoning performance of Stable Diffusion to a brand-new SOTA.

The following charts show the inference performance of Stable Diffusion on A100(SXM 80GB / PCIe 40GB) using 4 deep learning frameworks/compilers (PyTorch, TensorRT, AITemplate, and OneFlow).

On A100 （SXM 80GB / PCIe 40GB）, the OneFlow Stable Diffusion inference speeds are at least 15% faster than the second best.

Notably, on A100 SXM 80GB, OneFlow Stable Diffusion reaches a groundbreaking inference speed of 50 it/s, which means that the required 50 rounds of sampling to generate an image can be done in exactly 1 second.

A week later, Meta AITemplate improved the performance of Stable Diffusion, and its speed surpassed OneFlow.

There is no end to performance optimization, and OneFlow is also constantly iterating its products. Two weeks later, OneFlow made a further performance upgrade to Stable Diffusion, and once again surpassed the results of AITemplate.

But now OneFlow is still the fastest one. It can be seen that on A100 GPU, whether it is PCIe 40GB or SXM 80GB, OneFlow continues to improve by more than 10% based on previous performance results.

Showcase

With OneFlow Stable Diffusion, you can turn your wildest imagination into stunning artworks. Here are a few examples to show you what it can do:

A shockingly realistic sunshine beach with a coconut tree

A hamster firefighter and a rabbit-eared dog

Come try OneFlow Stable Diffusion and make your own masterpiece! If you don’t have any prompt ideas for now, you may find inspirations from Lexica, a gallery of AI-generated paintings with the corresponding prompts.

Seamless integration into PyTorch ecosystem to enable easy model transfer

Users can convert PyTorch Stable Diffusion from Hugging Face into OneFlow Stable Diffusion by simple modifications to three lines of code: just replace import torch with import oneflow as torch, and StableDiffusionPipeline with OneFlowStableDiffusionPipeline as follows.

Such effortless model transfer is made possible by two facts about OneFlow Stable Diffusion:

OneFlowStableDiffusionPipeline.from_pretrained is compatible with PyTorch weights.
OneFlow APIs are intrinsically aligned with PyTorch so no changes is needed in the expressions of torch.autocast and torch.float16 for them to work after import oneflow as torch.

From above you can see how OneFlow is seamlessly integrated into the PyTorch ecosystem. This enables easy transfer of not only Stable Diffusion but also many other models to OneFlow. For example, you may transfer most of the Torchvision models to Flowvision via import oneflow as torch.

In addition, users can enable the mock torch feature by running eval $(oneflow-mock-torch) from the command line so all the import torch commands in the subsequent Python scripts will be automatically pointed to oneflow.

How to run OneFlow Stable Diffusion

To try and generate images with OneFlow Stable Diffusion using Docker, all you need is to execute the following snippet:

docker run --rm -it \
  --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
  -v ${HF_HOME}:${HF_HOME} \
  -v ${PWD}:${PWD} \
  -w ${PWD} \
  -e HF_HOME=${HF_HOME} \
  -e HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN} \
  oneflowinc/oneflow-sd:cu112 \
  python3 /demos/oneflow-t2i.py # --prompt "a photo of an astronaut riding a horse on mars"

For further details, please check: https://github.com/Oneflow-Inc/diffusers/wiki/How-to-Run-OneFlow-Stable-Diffusion

What next

In the coming months, the OneFlow team will work on merging the codes in the forked Diffusers and Transformers repositories from OneFlow to the corresponding upstream repositories in Hugging Face. This is the first time that OneFlow has developed models by contributing to the Transformers/Diffusers backends. Any developers are more than welcome to provide your inputs for us on GitHub.

It is noteworthy that OneFlow’s compiler has played a pivotal role in accelerating OneFlow Stable Diffusion. This compiler can allow any PyTorch frontend-built models to run faster on NVIDIA GPUs. More technological details will be unveiled in our future posts.

OneFlow Stable Diffusion: faster than fast

Showcase

Seamless integration into PyTorch ecosystem to enable easy model transfer

How to run OneFlow Stable Diffusion

What next

Written by OneFlow

Responses (1)