Rethinking training of 3D GANs
Summary
We are witnessing a surge of works on building and improving 3D-aware generators. To induce a 3D-aware bias, such models rely on volumetric rendering, which is expensive to employ at high resolutions. The dominant strategy to address the scaling issue is to train a separate 2D decoder to upsample a low-resolution volumetrically rendered representation. But this solution comes at a cost. Not only does it break multi-view consistency, e.g. shape and texture change when a camera moves, but it also learns the geometry in a low fidelity. In this work, we take a different route to 3D synthesis and develop a non-upsampler-based generator with state-of-the-art image quality, high-resolution geometry and which trains
Note: please, use the latest version of Chrome/Chromium or Safari to watch the videos (alternatively, you can download a video and watch it offline). Some of the videos can be displayed incorrectly in other web browsers (e.g., Firefox).
Random samples on FFHQ
Random samples on Cats
Random samples on Megascans Plants
Random samples on Megascans Food
Latent interpolations on Megascans Plants
Latent interpolations on Megascans Food
Background separation
In contrast to upsampler-based models, our generator is purely NeRF-based, so it can directly incorporate the advancements from the NeRF literature. In this example, we simply copy-pasted the code from NeRF++ for background separation via the inverse sphere parametrization. For this experiment, we didn't use pose conditioning in the discriminator (which we use for FFHQ and Cats to avoid flat surfaces — otherwise we have the same issues as EG3D and GRAM) and found that when the background separation is enabled, it learns to produce non-flat surfaces on its own, i.e. without direct guidance from the discriminator.