Trained using billions of noisy image samples from DxO’s unique dataset, DeepPRIME improves image quality by up to two stops.

1. Introduction

DxO has been committed to making the world’s highest-quality raw converter for more than 18 years. Today, DxO unveils the successor to its award-winning PRIME denoising technology. By leveraging deep learning, DeepPRIME improves image quality by up to two ISO stops compared to PRIME. “The most challenging task in deep learning is gathering training data,” explains Wolf Hauser, DxO’s Image Science Director. “Since the company’s founding, DxO has been amassing a dataset of raw images that contains countless shots from all cameras at all ISO levels. While these images were originally intended for calibration, it turned out they were exactly what we needed to create DeepPRIME. We developed a method that allowed us to extract billions of noisy samples from this dataset and use them to train our neural network.”

Demosaicing and denoising are two key steps involved in converting raw image sensor data into beautiful photographs. In DxO PhotoLab 4, these two challenges have been solved utilizing deep learning, a method adopted from the field of artificial intelligence. This approach is significantly different from the algorithms based on mathematical models that are commonly used in other raw converters. DxO harnessed its expertise in raw conversion, its unique dataset, and a substantial amount of computing power to train a deep neural network to jointly apply demosaicing and denoising. The insights the neural network has gained are significantly more effective than any model designed by humans. “We don’t know if DeepPRIME actually resembles the human brain,” says Wolf Hauser, “but its results look extremely convincing to the human eye.”

2. The challenges of raw conversion

2.1. Definition

Raw conversion refers to the image processing algorithms involved in converting raw image sensor data into a defect-free RGB image. This includes dark signal subtraction, defective pixels removal, demosaicing and denoising as well as the correction of optical flaws such as vignetting and distortion. It does not include subjective adjustments such as white balance, color rendering, or contrast enhancement, and it obviously does not include any local retouching.

When it comes to correcting optical flaws, the main challenge is calibrating the correction parameters. DxO is known to have the most precise lens and camera calibration in the industry, and we have invested millions to ensure we remain the leader in this domain. Once the defect is modeled and measured, the correction is relatively simple, because the inverse function can be applied as a correction.

For example, to correct lens shading, once you know the attenuation factor at each position in the image, simply dividing by this attenuation factor yields the corrected image. There is only one solution. Mathematicians call such a problem well-posed.

Demosaicing and denoising, on the other hand, are ill-posed problems. Even when you know the structure of the Bayer filter or the intensity of the noise, you cannot simply “invert” these defects. In demosaicing, two thirds of the scene have simply not been observed by the sensor. There are infinite variations of textures which, observed through a Bayer filter, all look exactly the same. Which one should the algorithm choose?

Figure 1. The observation through a Bayer filter can be explained by infinite many textures in the scene, three of which are shown. Mathematically, all are equally compatible with the Bayer pattern, but some look more plausible than others.

In denoising, the challenge is that you can only calibrate the noise intensity, called the standard deviation, but the actual noise is random and changes in each image. To complicate matters further, some textures have a very noise-like structure, producing variations that should not be treated. As such, a noisy observation can be explained by an infinite number of combinations of noise and noise-free images. Which of these noise-free images should the algorithm chose? Obviously, to our human eyes, some solutions look more plausible than others. The algorithm should pick the most plausible solution. To that end, it needs a model for not only the defect that needs correcting, but also the underlying scene.

 

Figure 2. The noisy observation can be explained by infinite many combinations of underlying scene content and noise, three of which are shown. Mathematically, all are equally compatible with the noise model, but some look more plausible than others.

2.2. A mathematical model for the world

Since every scene is different, defining a model that works for all scenes comes down to defining a model for what the world looks like—something that is far from obvious. A very basic model could be to suppose that “the world is flat”, i.e. that each pixel is similar to its neighboring pixels. While this is true for many pixels, it is obviously wrong along edges.

An algorithm based on this model would therefore work well in homogeneous areas but fail on edges. We could refine the model by supposing that the world is composed of flat areas and horizontal or vertical edges. But that would still fail on diagonals, textures, gradients, etc. Most of the progress in demosaicing and denoising over the last 20 years was obtained by researchers proposing more and more sophisticated models for the world.

The two denoising methods in DxO PhotoLab, High-Quality denoising and DxO PRIME, are examples of this approach. They share the same model for the defect itself (i.e. noise intensity, which is calibrated for each camera in function of its ISO and gray level). However, DxO PRIME uses a much more sophisticated model for the world, which helps in distinguishing between noise and details. We would like to further improve it using an even more sophisticated model, but our current model is already amazingly complex and difficult to comprehend.

2.3. The complexity challenge

In order to keep the complexity at a reasonable level, the process of raw conversion is typically divided into several distinct challenges that are first solved individually: removal of isolated defective pixels from noise-free images, demosaicing of images without noise or optical flaws, and denoising of otherwise perfect color images. Researchers in particular like to compete in solving one of these well-defined and distinct challenges.

To implement a full-fledged raw converter, however, all these separate solutions must be combined, and this combination in itself is far from trivial. For instance, when denoising is applied before demosaicing, you have to deal with the fact that you don’t know the color of each pixel yet. When it is applied after demosaicing, on the other hand, the noise is no longer independent between neighboring pixels, even though all denoising algorithms are based on this hypothesis. When deciding which process should be applied first, there is simply no good answer.

Whichever order you choose, the algorithms must be adapted. For DxO PRIME, we started by creating a denoising algorithm that could achieve state-of-the art results. We then adapted this algorithm so it could be used before demosaicing. After that, we turned it into a multi-scale process to speed it up. This is how the entire industry works. Actually, no algorithm is used exactly as intended by the researchers who first designed it, and all this tinkering and tweaking typically has more impact on the overall image quality than the algorithm itself.

In conclusion, dividing the process into separate challenges is not the best approach. We should consider raw conversion in its entirety and search for the overall best solution. However, the complexity of such a solution is beyond the capacity of the human brain.

 

3. The promises of machine learning

3.1. Definition

Traditionally, in order to perform a certain task, computer scientists would first create a series of operations, a so-called algorithm. They would then program a computer to apply this algorithm. This is how we have created DxO PhotoLab up until today.

Machine learning, on the other hand, aims at achieving a certain task without explicit programming, relying on the computer to find an appropriate solution by itself. It is considered a subset of artificial intelligence (AI). The solutions found by computers generally require more arithmetic instructions and are thus slower to compute than human-designed algorithms, but they allow computers to perform tasks that are hard or impossible to describe mathematically, such as image and speech recognition.

Machine learning algorithms build complex models by looking at countless examples, known as training data. When each example consists of an input and the desired output, this is called supervised learning.

 

3.2. Deep neural networks

Using machine learning, the computer is no longer explicitly programmed to solve a specific task, such as image recognition. Still, a program is necessary to the process. This particular program allows the machine to learn so it can build its own model to solve the task. There are many models for machine learning, and one specific family is called deep neural networks. Originally invented in the 1990s, deep neural networks only recently became useable in a practical sense due to their computational complexity and the huge amount of training data that is required to make them work.

A neural network tries to imitate the structure of the brain in order to make decisions. It is basically a huge network of simple mathematical operations involving input data and many, many parameters. Here, machine learning means nothing more and nothing less than letting the computer determine the best value for each of these parameters.

The first generation of “shallow” neural networks was typically used as a final step in image recognition algorithms. The first step involved the computation of all sorts of features, or bits of information that might be useful in recognizing an object, such as the local average, local gradients, local histogram, etc. Researchers spent a significant amount of time trying out which features worked best for which task. Next, a neural network was trained to decide what object was depicted in the image based on these features.

Deep neural networks, on the other hand, can accomplish the entire task from start to finish and are able to learn which features are appropriate for the task. Their structure vaguely resembles that of the human visual cortex; these networks start by computing simple, local features, and aggregate more and more abstract features over an increasingly large part of the image, culminating in the final decision step. They require much more computing power and parameters than the first generation of neural networks, and many more training examples are required to determine all these parameters.

By the early 2010s, technology had reached a point where deep neural networks became usable in practice. Graphics processors were powerful enough to train them. Cloud storage made it possible to gather and store huge amounts of training data. And it turned out that letting the computer itself determine which features work best for the given task significantly improved its performance.

 

3.3. The frontier between computer vision and signal processing is vanishing

Following the huge success deep neural networks had in image and speech recognition, researchers attempted to apply this technology to other challenges, including comparatively low-level signal and image processing tasks. The first scientific publication on using these networks for image denoising dates back to 2013, but the results were not compelling because the researchers had not yet figured out the right network structure.

The first promising results on the use of deep neural networks for both image demosaicing and denoising were published in 2016 and 2017. The researchers used a network architecture very similar to the one that had proven successful for image recognition, but rather than trying to make an overarching decision, they trained their networks to predict the demosaiced or denoised values for each pixel.

We do not know exactly how neural networks achieve such tasks, but it is unlikely they derive explicit mathematical models for either the defect or the world. It seems more likely that they develop heuristic knowledge about what the defects and the world look like by examining billions of examples. To describe the defects, this approach might be less efficient than the explicit mathematical model, but it turns out to be amazingly powerful for modeling the world.

The observation that the world can be better described by heuristic knowledge than by a mathematical model obviously aligns with our intuitive understanding. It is exactly the reason why deep neural networks excel at image recognition.

 

4. DxO’s “DeepPRIME” project

4.1. A holistic approach

We at DxO are proud to provide our raw converter with the best denoising algorithms on the market. As we want to remain a leader in this domain, we watched the arrival of this new technique with great interest.

Papers published in 2016 and 2017 proposed solving the challenges of demosaicing and denoising through machine learning, but they continued to view them as two separate problems.

In 2017, we considered replacing our existing denoising and demosaicing algorithms with new versions based on machine learning. We could have tried to take the best available denoising neural network and adapt it to raw image editing and our multi-scale approach. But we saw the opportunity to make a more fundamental change.

The idea of DeepPRIME was born. This extensive deep neural network is capable of transforming raw images directly into RGB images by applying defect pixel removal, demosaicing, and multi-scale denoising all at the same time. Rather than solve individual problems in an isolated way, as had been the norm for the past two decades, the technology approaches the problem holistically—a feat that was unimaginable before the advent of machine learning.

 

DeepPRIME holistic raw conversion

Figure 3. Top: Traditional step-by-step raw conversion pipeline, for example HQ denoising in DxO PhotoLab. Bottom: DeepPRIME holistic raw conversion in DxO PhotoLab 4.

This approach has several advantages. It is numerically more efficient than multiple dedicated neural networks, since the learned features can be computed once and be shared by all the tasks. It was easier to integrate into our processing pipeline because it only involved adding a single monolithic block. Most importantly, however, machine learning technology is better and much faster than humans when it comes to tweaking and tinkering with the algorithm so that it can be applied to raw images and as a multi-scale process, resulting in another significant leap in image quality.

4.2. Learning to recognize noise

Supervised machine learning requires a big dataset of training examples, each containing the input to the neural network and the desired output. In computer vision, for image classification or segmentation, researchers typically use real images as inputs and craft the desired outputs manually. While this is already a lot of work in their domain, it is completely impossible for image restoration. Manually defining the desired output value for each pixel in a single, noisy, multi-megapixel raw image would not only take months, it also would yield bad results, because even humans cannot perfectly restore such an image.

The answer researchers found to this problem is to start with a defect-free image, which is used as the expected result during the training, and simulate the defect that needs correcting, which is used as the input during the training. For raw conversion, that means introducing a Bayer mosaic and adding noise and pixel defects. However, this is more complicated than it sounds.

Classic image restoration methods are based on mathematically meaningful noise models. To a certain extent, this means they are resilient even when deviations occur between the model used to design the algorithm and the noise found in real images. But in machine learning, where the distinction between noise and image content is learned from countless observations, even the slightest difference between the noise observed during training and the noise encountered in real images makes the system fail.

In theory, noise in raw images should follow some basic physical laws, and it should be fairly simple to synthesize noise that resembles noise in real photographs. In practice, however, there are many second-order noise sources that are hard to characterize. On top of that, most camera manufacturers do apply some digital processing to their so-called “raw” files, which introduces further differences with respect to the theory. The noise characteristics are thus specific to each brand, model, ISO, and exposure time. Looking for a mathematical model for that is (almost) as hopeless as looking for a mathematical model for the world.

Figure 4. In theory, noise in raw images should follow a Gaussian distribution and be spatially white. But for most digital cameras, this does not hold true in practice. The example shows what we observe for the Canon EOS 1D Mark IV at ISO 102400.

This is where DxO’s more than 18 years of expertise in image sensor characterization were decisive for the project’s success. We have seen most of the world’s image sensors, from low-quality embedded image sensors to high-end professional photography gear.

We have thousands of raw images taken with each camera, showing the same test scenes over and over again, at different ISOs and exposure times—some with plenty of noise, some with almost none. Our researchers eventually figured out how this unique base of hundreds of thousands of images could be utilized as training data.

Thanks to this method, our deep neural network not only acquired heuristic knowledge about what the world looks like, it also became a true expert in recognizing noise associated with digital image sensors. As a result of this extensive knowledge, it has an unprecedented capacity to distinguish between noise and details. The result is noise-free images with an incredible amount of detail, and we believe this will allow our users to push the limits of their cameras even further.

Figure 5. In this image shot in high ISO (Nikon D4 – ISO 6400), a large amount of noise appears in low light areas. After processing with DxO DeepPRIME, the highly detailed areas retain all the original information, while completely removing noise.

4.3. Machine learning outside of the cloud

Many companies use machine learning and AI as a pretext to make their users upload their entire photo collection to the cloud. At DxO, we care deeply about your privacy and, most importantly, about your subjects’ privacy. The safest and best way to protect this privacy is to keep your photos on your own computer.

This is why we have worked hard to ensure DeepPRIME can run on your computer. We have worked with Microsoft, Apple, and all major hardware manufacturers to make it as fast as possible on your hardware. Nevertheless, we are not able to change the laws of physics. If you have an integrated GPU, and if it is already several years old, DeepPRIME will be significantly slower than PRIME because it requires more than twenty times more computing operations.

That is why we are continuing to offer PRIME as an option, so you can choose to use DeepPRIME on only your most valuable images. However, if you are lucky enough to have a good and recent GPU, DeepPRIME runs actually much faster than PRIME, and you should definitely make it your default setting.

If you want to discover DeepPRIME by DxO PhotoLab 4, you can try your 30 free days trial.

Get updates and exclusive discounts. Subscribe to our newsletter.

Share This