Here is how Nvidia’s DLSS 3 works

Share This Post

Nvidia’s RTX 40-series graphics cards will be available in a few weeks, but among all the hardware upgrades, upgrades include what could be Nvidia’s golden egg: the NVIDIA DLSS 3. It’s much more than an update to Nvidia’s popular DLSS (Deep Learning Super Sampling) feature, and it could define Nvidia’s next generation far more than the graphics cards themselves.

AMD has been working hard to get its FidelityFX Super Resolution (FSR) on par with DLSS, and it has been successful for several months. DLSS 3 appears to be changing that dynamic, and FSR may not be able to catch up anytime soon.

How does DLSS 3 work?

You might think that DLSS 3 is a completely new version of DLSS, but it isn’t. Or, at the very least, it is not entirely novel. The same super-resolution technology that is currently available in DLSS titles serves as the foundation of DLSS 3, and Nvidia will presumably continue to improve it with new versions.

According to Nvidia, the super-resolution portion of DLSS 3 will now be available as a separate option in the graphics settings.

Frame generation is a new feature. For every other frame, DLSS 3 will generate an entirely unique frame, essentially generating seven out of every eight pixels you see. The flow chart below shows an illustration of this. In the case of 4K, your GPU only renders 1080p pixels and uses that information not only for the current frame but also for the next frame.

According to Nvidia, frame generation will be a separate toggle from super-resolution. This is because frame generation is currently limited to RTX 40-series GPUs, whereas super-resolution will continue to work on all RTX graphics cards, including games that have been updated to DLSS 3. It should go without saying, but if half of your frames are completely generated, your performance will skyrocket.

But frame generation isn’t just some AI secret sauce. Motion vectors are an important input for upscaling in DLSS 2 and tools like FSR. Motion vectors describe where objects move from one frame to the next, but they only apply to geometry in a scene.

To avoid visual artifacts, elements with no 3D geometry, such as shadows, reflections, and particles, have traditionally been masked out of the upscaling process.

When an AI generates an entirely unique frame, masking isn’t an option, which is where the Optical Flow Accelerator in RTX 40-series GPUs comes in. It’s similar to a motion vector, except the graphics card tracks the movement of individual pixels from frame to frame. The AI-generated frame includes this optical flow field, as well as motion vectors, depth, and color.

Although this appears to be all positive, there is a major issue with AI-generated frames: they increase latency. Because the AI-generated frame never passes through your PC—it’s a “fake” frame—you won’t see it on traditional FPS readouts in games or tools like FRAPS.

So, despite having so many extra frames, latency does not decrease, and in fact, due to the computational overhead of optical flow, latency increases. As a result, DLSS 3 requires Nvidia Reflex to compensate for the increased latency.

Normally, your CPU keeps a render queue for your graphics card to ensure that your GPU is never idle (that would cause stutters and frame rate drops).

Reflex removes the render queue and syncs your GPU and CPU so that the GPU begins processing instructions as soon as your CPU can send them. Nvidia claims that when used on top of DLSS 3, Reflex can sometimes result in a latency reduction.

You can also read: NVIDIA Drivers: Things To Do After Installation

The Effect of AI-Powered Performance

AMD’s FSR 2.0 does not use AI, and, previously stated, it demonstrates that you can achieve the same quality as DLSS using algorithms rather than machine learning. With its unique frame generation capabilities and the addition of optical flow, DLSS 3 changes that.

Optical flow isn’t a new concept; it’s been around for decades and has applications ranging from video editing to self-driving cars. However, due to an increase in datasets to train AI models on, calculating optical flow with machine learning is still relatively new. The reason for using AI is simple: with enough training, it produces fewer visual errors and has less overhead at runtime.

DLSS is currently in use. It is possible to create an algorithm that does not rely on machine learning to estimate how each pixel moves from one frame to the next, but it is computationally expensive, which defeats the purpose of supersampling in the first place.

You can achieve high-quality optical flow that can be executed at runtime using an AI model that doesn’t require a lot of horsepower and has enough training data. Rest assured, Nvidia has plenty of training data to work with.

This leads to an increase in frame rate even in CPU-limited games. Supersampling only affects your resolution, which is almost entirely determined by your GPU. DLSS 3 can double frame rates in games even if the CPU is completely bottlenecked thanks to a new frame that bypasses CPU processing. That’s impressive, and it’s only possible with AI right now.

You can also read: What Is Nvidia GeForce Experience? Key Features and Benefits Explained

Why is FSR 2.0 falling behind now?

With FSR 2.0, AMD has truly accomplished the impossible. It looks great, and the fact that it is brand-agnostic makes it even better. Since I first saw it in Deathloop, I’ve been ready to abandon DLSS in favor of FSR 2.0. But, as much as I like FSR 2.0 and think it’s a great piece of AMD hardware, it won’t catch up to DLSS 3 anytime soon.

To begin with, developing an algorithm capable of tracking each pixel between frames without artifacts is difficult enough, especially in a 3D environment with dense fine detail (Cyberpunk 2077 is a prime example).

It is possible but difficult. The larger issue is how bloated that algorithm would have to be. It’s a lot to ask to track each pixel through 3D space, perform the optical flow calculation, generate a frame, and clean up any errors that occur along the way.

It’s even more difficult to get that to run while a game is running and still provide a frame rate improvement on the level of FSR 2.0 or DLSS. Even with dedicated processors and a trained model, Nvidia must rely on Reflex to compensate for the higher latency imposed by optical flow. FSR would most likely trade too much latency to generate frames without that hardware or software.

I have no doubt that AMD and other developers will get there—or find another way around the problem—but it may be a few years. It’s difficult to say right now.

What is obvious is that DLSS 3 appears to be very exciting. Of course, we will have to wait for it to arrive before we can verify Nvidia’s performance claims and see how image quality holds up.

So far, we only have a short video from Digital Foundry showcasing DLSS 3 footage, which I highly recommend watching until we see more third-party testing. DLSS 3 appears to be promising from our current vantage point.

This is an article from ReSpec, a biweekly column that includes discussions, advice, and in-depth reporting on the technology behind PC gaming.

Would you like to read more about NVIDIA DLSS 3-related articles? If so, we invite you to take a look at our other tech topics before you leave!