Page updated: April 14, 2020
Author: Curtis Mobley
View PDF

# Importance Sampling

As we saw on the previous Error Estimation page, the standard error of the mean in a Monte Carlo estimate depends on $1∕\sqrt{N}$, where $N$ is the number of samples. In the present discussion, $N$ is the number of photon packets reaching a detector. In addition, we have seen that photons that are traced but do not intersect the detector—i.e., do not generate a sample of the underlying pdf—do not contribute to the answer and are wasted calculations.

The general topic of importance sampling considers ways to generate and trace photons so that more photons reach the detector, thus increasing $N$ and thereby reducing the statistical error in the estimated quantity, while simultaneously reducing the number of wasted photons.

### Theory of Importance Sampling

Let $pdf\left(x\right)$ be the probability distribution function for variable $x$. The basic idea of importance sampling is to ”over sample” the parts of the pdf that are most ”important,” i.e., those parts of the pdf that send photons in the direction of the detector. Those parts of the pdf that send photons (or photon packets) in directions away from the detector are under sampled. Thus more photons are sent towards the detector, which increases the number detected and therefore reduces the variance in the estimated quantity, and fewer are sent in directions that never reach the detector. However, this process samples the pdf in a biased or incorrect manner compared to the physical process being studied. To account for the biased sampling of the pdf, the weight of each photon is adjusted to keep the ﬁnal answer correct.

Mathematically, importance sampling is described as follows. The mean of any function $f\left(x\right)$ of random variable $x$ is by deﬁnition

 $⟨f⟩=\int f\left(x\right)\phantom{\rule{0.3em}{0ex}}pdf\left(x\right)\phantom{\rule{0.3em}{0ex}}dx\phantom{\rule{0.3em}{0ex}},$

where the integration is over the full range of $x$ (e.g., over 0 to $\pi$ if $x$ is the polar scattering angle). This can be rewritten as

$\begin{array}{llll}\hfill ⟨f⟩=& \int f\left(x\right)\phantom{\rule{0.3em}{0ex}}\frac{pdf\left(x\right)}{pd{f}_{b}\left(x\right)}\phantom{\rule{0.3em}{0ex}}pd{f}_{b}\left(x\right)\phantom{\rule{0.3em}{0ex}}dx\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill \equiv & \int f\left(x\right)\phantom{\rule{0.3em}{0ex}}w\left(x\right)\phantom{\rule{0.3em}{0ex}}pd{f}_{b}\left(x\right)\phantom{\rule{0.3em}{0ex}}dx={⟨fw⟩}_{b}\phantom{\rule{0.3em}{0ex}}.\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

$pd{f}_{b}\left(x\right)$ is the biased pdf used to generate random values of $x$, and $w\left(x\right)$ is the weight given to each biased sample of $x$. The weight $w\left(x\right)$ is just the ratio of the unbiased to the biased pdfs. Note that the estimate of the mean of $f$ when sampled by the correct, unbiased pdf is the same as the estimate of the weighted function, $fw$, when sampled with the biased pdf . The estimate $⟨f⟩={⟨fw⟩}_{b}$ is thus unbiased, even though $pd{f}_{b}$ is biased. Because the sampling is done using a biased pdf, importance sampling is sometimes called biased sampling. However, the estimate of $⟨f⟩$ is still unbiased, even though the sampling uses a biased pdf. The term importance sampling is therefore preferred in recent literature.

If the biased pdf is chosen well, it will increase the number of detected photons and thereby reduce the error in the estimate. However, each detected photon will have a smaller weight, so that the product of more detected photons times a smaller weight for each remains the same as for a smaller number of higher weighted photons that would be generated by the original pdf. As we will see in the following numerical examples, this basic idea often works well. However, in practice, the biasing can sometimes be pushed to extremes and actually increase the error in the estimate.

### Embedded Point Source Example

The ﬁrst numerical example is based on Gordon (1987). This paper provides a good starting point for examining in detail the advantages and pitfalls of importance sampling.

Gordon was interested in computing the irradiance pattern of upwelling light at the sea surface generated by an isotropic point source at some depth in the water. Figure 1 shows the geometry of his problem. The left panel of the ﬁgure shows an isotropic point source at depth $z$ emitting photons equally in all directions. For an isotropic source, half of all photons are emitted in downward directions. Few of those photons will be scattered in directions that eventually reach the sea surface. The same holds true for many photons emitted in upward, but nearly horizontal directions. Such photons are illustrated by the red arrows. Only photons emitted in nearly straight upward directions are likely to reach the surface and contribute to the estimate of upwelling irradiance ${E}_{u}\left(r\right)$ as a function of radial distance $r$ away from the source location. The green arrow illustrates such a photon.

To increase the number of photons emitted almost straight upward, Gordon chose a biased pdf for the probability of emission of a photon at polar angle $𝜃$, measured from $𝜃=0$ in the nadir direction ($z$ is measured positive downward from the sea surface, and $𝜃$ is measured from the $+z$ direction). This isotropic emission is mathematically the same as isotropic scattering. Recall from the Introduction that the pdf for polar scattering angle $\psi$ is $2\pi \stackrel{̃}{\beta }sin\psi$. Taking $\psi$ for the polar angle of emission $𝜃$ and recalling that the phase function for isotropic scatter is $\stackrel{̃}{\beta }=1∕4\pi$ shows that the pdf for the polar emission angle is $pdf\left(𝜃\right)=\frac{1}{2}sin𝜃$. Gordon wished to have a biased pdf $pd{f}_{b}\left(𝜃\right)$ that would emit more photons into upward directions $90deg<𝜃\le 180deg$. There are many functions that could give such a behavior, but the one Gordon chose is

 $pd{f}_{b}\left(𝜃\right)=\frac{\sqrt{1-{𝜖}^{2}}}{\pi \left(1+𝜖cos𝜃\right)}\phantom{\rule{0.3em}{0ex}},$ (1)

where $0\le 𝜖<1$ is a parameter to be determined as described below. The weighting function is then

 $w\left(𝜃\right)=\frac{pdf\left(𝜃\right)}{pd{f}_{b}\left(𝜃\right)}=\frac{\pi }{2\sqrt{1-{𝜖}^{2}}}\left(1+𝜖cos𝜃\right)sin𝜃\phantom{\rule{0.3em}{0ex}}.$

(Gordon did not say why he chose this particular function, nor did he give the value of $𝜖$ used in his calculations.)

The right panel of Fig. 1 illustrates biased emission, with many more photons being emitted into upward directions and reaching the sea surface, and relatively few photons being emitted into downward directions.

The red curves in Fig. 2 show the unbiased emission function $pdf\left(𝜃\right)=\frac{1}{2}sin𝜃$, for which each emitted photon is given an initial weight of $w=1$. The other curves show the biased functions $pd{f}_{b}\left(𝜃\right)$ and the angle-dependent initial weights for several values of $𝜖$. Note that as $𝜖$ approaches 1, most photons are emitted at angles near $𝜃=180deg$, i.e. in near-zenith directions. However, photons emitted at angles near 180 deg are given very small weights, whereas the relatively few photons emitted in roughly horizontal and downward directions are given large weights (except for photons emitted near $𝜃=0$).

Monte Carlo simulations were performed for an unbiased isotropic source and for biased sources with various $𝜖$ values. Figure 3 shows the results for a point source at $\tau =5$ optical depths below the surface and for water IOPs deﬁned by a Fournier-Forand phase function with a backscatter fraction of 0.02 and an albedo of single scatter of ${\omega }_{o}=0.8$. Independent runs were done for isotropic emission and for values of $𝜖=0.8$ and 0.99 in Eq. (1). Each run had $1{0}^{6}$ photons emitted by the source. The photons reaching the sea surface were tallied for a ring (or ”bullseye”) detector centered above the source and having equally spaced diﬀerences in ring diameters of $\Delta r=1$ in optical distance.

The left panel of the ﬁgure shows the number of photons received by each detector ring. For the isotropic source, at most about 40,000 photons were received by any ring (near radius = 5), and fewer than 10,000 photons were received by the innermost ring. As $𝜖$ increases, the photons are emitted in a more and more vertical direction, and the numbers of photons received by the inner rings increase. The innermost ring received about 94,000 photons for $𝜖=0.8$ and 270,000 photons for $𝜖=0.99$. We thus naively expect that the variance of the estimated irradiance in the innermost ring is improved by a factor of $\sqrt{270,000∕10,000}\approx 5$ with $𝜖=0.99$, compared to unbiased isotropic emitting. Or, conversely, we can get the same number of detected photons for one-ﬁfth of the emitted photons, i.e., for one-ﬁfth of the computer time.

However, there is no free lunch. The outer rings, greater than about 5 or 6 optical distances in these simulations, received fewer photons with the biased emission than for isotropic emission. Thus the variance in the estimates of irradiance for the outer rings actually increases when using a biased source emission. This is simply because the biased near-zenith emission and subsequent near-forward scattering sends most of the photons towards the inner rings when $𝜖$ nears 1.

The middle panel of the ﬁgure shows the fraction of the emitted power that is detected for each ring. Although the numbers of photo packets received by the rings varies according to the value of $𝜖$, the power received does not (except for statistical noise, which is almost unnoticeable in this ﬁgure). The right panel shows the irradiance ${E}_{u}\left(r\right)$ received in each ring normalized to the total power emitted by the source, ${\Phi }_{emit}$. This is just the values of the middle panel divided by the area of each ring. This panel corresponds to Fig. 6 in Gordon’s paper, except that the IOPs are diﬀerent.

There is more to be learned from this simple example. Suppose we are interested only in the magnitude of the irradiance received at the ocean surface directly above the source. We might then consider only the power received by the innermost ring of the detector seen in the previous ﬁgure, i.e. detector radius $\le 1$ in those plots.

Consider the variance in the estimated fraction of the emitted power detected (i.e., of the surface irradiance, after dividing by the constant collection area) by the innermost ring of the previous ﬁgure, as a function of $𝜖$. Simulations were performed for isotropic emission and for values of $𝜖$ from 0.0 to 0.999 (note that $𝜖=0$ in Eq. 1 is not the same as isotropic emission). Each simulation comprised 100 runs with $1{0}^{4}$ photo packets emitted for each run. The red dots in Fig. 4 show the standard deviation in the estimated value of the fraction of emitted power received by the innermost ring as computed from the values in each of the 100 runs. The open red circles show the corresponding results from an independent set of 100 runs, which started with a diﬀerent seed for random number generation. Those two sets of points give a qualitative idea of the amount of statistical noise in these results. For isotropic emission an average of only 99 photons reached the detector out of the 10,000 emitted in each run (for the ﬁrst set of runs with the ﬁrst seed; for the second set the average was 101). The mean of the 100 runs was $3.257\cdot 1{0}^{-3}$ and the standard deviation was $3.493\cdot 1{0}^{-4}$. When biased emission is used, the number of detected photons increases with $𝜖$ as shown by the green dots and right-hand axis in the ﬁgure. The standard deviation decreases with increasing $𝜖$ up to values around $𝜖=0.8$ or 0.9. However, beyond $𝜖=0.9$ the standard deviation beings to increase, and even becomes greater than for isotropic emission when $𝜖=0.999$, even though almost 4,000, or 40% of the emitted photons reach the detector. This seems to contradict the idea that increasing the number of detected photons reduces the variance.

It was already mentioned on the previous page on error estimation that the dependence of the standard deviation of an estimate on $1∕\sqrt{N}$ assumes that ”all else is the same” in the simulation. In the present case, we are changing the physics of the simulation (namely the emission pattern of the source) by using a biased source, in which case all else is not the same. As was shown in the theory section, this change in physics does not aﬀect the estimated mean value, but it does aﬀect the variance of that estimate over many runs. The use of a biased emission pattern does increase the number of detected photons and thereby reduce the standard deviation of the estimated quantity, but only up to a point. When $𝜖$ is very close to 1, only a very few photons are emitted in downward directions. However, those photons can have very large weights: for $𝜖=0.999$ the maximum weight is 45.6. Photons emitted within 10 deg of the zenith ($𝜃>170deg$ in Fig. 2) have weights less than 0.07, and those emitted within 5 deg of the zenith have weights less than 0.008. Thus the downward-emitted photons can have weights hundreds or even thousands of times larger than the vast majority upward-emitted photons. The large numbers of low-weight detected photons do reduce the error in the estimate. However, when $𝜖$ is very close to 1, one of the downward-emitted but very-large-weight photons may occasionally be backscattered in just the right direction to hit the detector and make a large contribution to the fraction of the emitted power detected. Such photons are few in number, so their contribution ﬂuctuates from simulation to simulation, and the diﬀerence of even a few photons may signiﬁcantly changed the estimate. These rare photons then increase the standard deviation of the computed quantity, and the beneﬁts of the biased source emission are lost.

The blue dots in Fig. 4 show the reduction in the standard deviation as expected due only to the increased numbers of detected photons, i.e. a dependence on $1∕\sqrt{{N}_{𝜖}∕{N}_{iso}}$. The actual improvement is less than the value that would be obtained if the runs were just emitting and detecting more photons for an isotropically emitting source (i.e., for the ”all else is the same” situation). Nevertheless, the reduction in the standard deviation is roughly a factor of two, which is signiﬁcant for many applications.

Figure 5 shows the numbers of detected photons binned by the weights of the detected photons for three of the runs, and for $𝜖=0.999$ and 0.8. For $𝜖=0.999$, the left panel shows that most of the detected photons have weights in the range of $1{0}^{-5}$ to 0.1. However, a very few (2, 4 and 5 photon packets in these particular runs) have weights between 1 and 10. One photon with a weight of 1 contributes as much to the estimated irradiance as 1000 photons with a weight of $1{0}^{-3}$. The numbers of medium-weight photons are stable for the diﬀerent runs, but the numbers of the highest-weight photons ﬂuctuate greatly. The ﬂuctuation in the number of high-weight photons begins to increase the variance for $𝜖$ values very near 1. The numbers of very-low-weight photons also varies greatly from one run to the next, but those ﬂuctuations have a negligible eﬀect on the variance because the weights are so small. The right panel of this ﬁgure shows that for $𝜖=0.8$, most of the detected photons have weights in the range of 0.001 to 0.1, the numbers of photons in each bin is almost the same for each run, and there are no photons with weights greater than 1. Thus $𝜖=0.8$ gives a stable number of medium-weight photons and a reduction in the standard deviation of the estimated detected weight, as desired. Note that in each case the estimate of the fraction of power received is $\left(3.27±0.01\right)\cdot 1{0}^{-3}$, so the estimated mean is almost independent of $𝜖$.

Figure 6 gives yet another view of this situation. The left panel shows the histogram of the 100 estimates of the fraction of emitted power detected by the innermost ring for isotropic emission. The middle panel shows a tighter histogram for $𝜖=0.8$, i.e., a reduced error in the estimate. The right panel shows the histogram for $𝜖=0.999$; the spread is now wider and the error in the estimate has increased.

The behavior of the estimated error when doing importance sampling can be succinctly summed up as the tails of the distribution matter. That is to say, importance sampling can in many cases give more detected photons and a corresponding decrease in the error of the estimated quantity. However, the biasing can be pushed too far. In the present case, if $𝜖$ is too close to 1, the rare but high-weight photons—those in the high-weight tail of the distribution of photon weights in Fig. 5—can overpower the reduction of variance obtained by the numerous but low-weight photons in the main part of the distribution of photon weights.

There is no general rule for determining how much biasing can be used or what value of a parameter like $𝜖$ gives the minimum error in the estimated quantition. However, the broad near-minimum seen in Fig. 4 shows that any value of $𝜖\le 0.9$ will give a better estimate than naive isotropic emission, and fortunately the exact value used is not critical. A suitable value of a biasing parameter such as $𝜖$ must be determined by numerical experimentation for a particular class of problems. However, this eﬀort is worthwhile if many simulations must be made and computer time or accuracy are critical.

### Backscatter Example

Now consider an example of using importance sampling in the simulation of backscatter, as might be needed for the design and evaluation of an instrument to measure the backscatter coeﬃcient ${b}_{b}$. This example is inspired by the Monte Carlo simulations used in Gainusa Bogdan and Boss (2011), who did not use importance sampling.

The generic geometry for simulating a sensor is shown in Fig. 7. This ﬁgure illustrates a source emitting photon packets in a collimated beam. The distance a photon packet travels between interactions with the medium is randomly determined using the photon path length algorithm described in the Introduction Chapter. Those photon packets are then scattered according to the chosen phase function $\stackrel{̃}{\beta }\left(\psi ,\chi \right)$, where $\psi$ is the polar scattering angle and $\chi$ is the azimuthal scattering angle. Photon tracking is done by the ”Type 2” technique of the chapter on Photon Tracking. That is, at each scattering, the initial photon weight $w=1$ is multiplied by the albedo of single scattering ${\omega }_{o}=b∕c$, where $b$ is the scattering coeﬃcient and $c$ is the beam attenuation coeﬃcient. This accounts for energy loss due to absorption.

Oceanic phase functions are highly peaked in the forward scattering direction. Thus most photon packets undergo multiple forward scatterings and continue to travel away from the source and detectors. The green arrows in Fig. 7 illustrate such a photon trajectory. Only rarely will a photon be backscattered in just such a manner that it is eventually detected, as illustrated by the red arrows in the ﬁgure. For typical oceanic conditions, only 0.5 to 3% of photons are backscattered in any given interaction. For a spatially small detector, very few of the backscattered photons will actually intersect the detector. The end result is that almost every photon emitted from the source is wasted computation because the photon never reaches the detector.

These simulations can be improved by use of importance sampling, as follows. The photon packets are emitted by the source according to whatever distribution is chosen (e.g., an idealized collimated point source or a beam proﬁle and distribution of emitted directions that describes a particular light source such as an LED). Those photons travel an initial distance determined by the beam attenuation coeﬃcient and the random number drawn. Then, on the ﬁrst scattering only, the scattered direction is determined using a biased phase function ${\stackrel{̃}{\beta }}_{b}$ that gives an increased number of backscattered photons. This “reverses” many of the initial photons, all of which are traveling away from the detector. Subsequent scatterings then use the normal phase function for the water body. This gives an increased number of photon packets traveling in the general direction of the detector, and thus an increased number of detected photons. This is illustrated in Fig. 8. The green arrow is the initial photon emitted by the source. The blue arrow is the ﬁrst-scattered photon, whose direction is determined using the biased phase function. The red arrows are subsequent scatterings of the photon.

In oceanic simulations, the unbiased pdf is highly forward scattering. To reverse some of the initial photons, we use a biased pdf that enhances backscatter. The one-term Henyey-Greenstein (OTHG) phase function with asymmetric parameter $g=⟨cos\psi ⟩$,

 ${\stackrel{̃}{\beta }}_{OTHG}\left(\psi \right)=\frac{1-{g}^{2}}{4\pi {\left(1+{g}^{2}-2gcos\psi \right)}^{3∕2}}\phantom{\rule{0.3em}{0ex}},$

gives a convenient analytical phase function to use for ${\stackrel{̃}{\beta }}_{b}$. The parameter $g$ plays the same role as $𝜖$ in the previous example. A value of $g=0$ gives isotropic scattering (50% backscatter); a negative $g$ gives more backscatter than forward scatter. Numerical investigations show that the results are not sensitive to the exact form of the biased phase function, so long as the biasing is not pushed to extremes (such as using $g=-0.9$ in the OTHG, which gives 98% backscatter). Just as in the previous example, if the biasing is pushed too far, the small-angle forward-scattered photons have very large weights. The rare one of these photons that is subsequently backscattered (by normal scattering) into the detector gives a large ﬂuctuation in the computed mean, which oﬀsets the reduction in variance obtained by detecting many more low-weight photons.

The green curve of Fig. 9 shows a Fournier-Forand phase function with a backscatter fraction of ${\stackrel{̃}{b}}_{b}=0.0183$; this phase function is typical of ocean water and gives a good ﬁt to the Petzold average-particle phase function. The red curve shows a OTHG phase function with $g=-0.3$, which gives a backscatter fraction of ${\stackrel{̃}{b}}_{b}=0.714$. When used as the biased phase function ${\stackrel{̃}{\beta }}_{b}$, this OTHG gives 40 times more backscattered photons at the ﬁrst scattering event as does the Fournier-Forand phase function. The blue curve shows the corresponding weighting function $w\left(\psi \right)={\stackrel{̃}{\beta }}_{FF}\left(\psi \right)∕{\stackrel{̃}{\beta }}_{OTHG}\left(\psi \right)$. Note that the biased backscattered photon packets are given weights less than 0.1, whereas the very small angle forward scattered photons can have weights as large as 1000.

Figure 10 and Table 1 show example simulations with and without biased ﬁrst scatterings. The target was an annular (“bullseye”) detector like the one in Fig. 7, but with ﬁve rings each of 1 cm radial width. This is similar to the proposed sensor design in Gainusa Bogdan and Boss (2011). The water properties were deﬁned by a Fournier-Forand phase function with a backscatter fraction of 0.0183, which is shown in Fig. 9. The absorption coeﬃcient was $a=0.2\phantom{\rule{1em}{0ex}}{m}^{-1}$ and the scattering coeﬃcient was $b=0.8\phantom{\rule{1em}{0ex}}{m}^{-1}$. The albedo of single scattering is then ${\omega }_{o}=0.8$, which is typical of ocean waters at blue or green wavelengths. Three simulations were done: one without biased scattering and two with ﬁrst scatterings biased with a OTHG phase functions with either $g=+0.3$ or $g=-0.3$. The OTHG with $g=+0.3$ has a backscatter fraction of ${\stackrel{̃}{b}}_{b}=0.286$, and $g=-0.3$ has ${\stackrel{̃}{b}}_{b}=0.714$. The left panel of the ﬁgure shows the numbers of photons received by each detector ring; the right panel is the percent of emitted power received by each ring. Note that 12.9 times more photons reach the detector when biased ﬁrst scattering is used with $g=+0.3$ (green curve), and 52.6 times more for $g=-0.3$ (blue curve) than for unbiased scattering (red curve). However, the fraction of emitted power received by the detector (over all 5 rings) remains the same, 0.0645%, to within less than 0.5% percent of Monte Carlo noise.

 First Scattering ${N}_{det}$ ${N}_{det}$ Increase % Emitted Power (all rings) Factor Detected (all rings) unbiased 84,645 — 0.0648 biased, $g=+0.3$ 1,093,114 12.9 0.0645 biased, $g=-0.3$ 4,451,875 52.6 0.0642

Table 1: Comparison of detected power for biased and unbiased ﬁrst scatterings. Each run had $1{0}^{8}$ photon packets emitted by the source as a collimated beam. ${N}_{det}$ is the number of photon packets that reached the detector (total for all 5 detector rings).

Conversely, for a given number of photons detected, biased ﬁrst scattering allows that number to be detected with fewer photons being emitted and traced to completion, i.e., with less computer time. For unbiased scattering, running the Monte Carlo code with unbiased scattering until 100,000 photons reached the detector (total over all 5 rings) required the emission of $1.18\cdot 1{0}^{8}$ photon packets and a total run time of 6632 sec (on a 2.4 GHz PC). However, use of biased scattering with $g=-0.3$ required emission of only $2.23\cdot 1{0}^{6}$ photons, and a total run time of 105 seconds. The percent of emitted power that was detected was the same to within 3% in both cases, but the run time savings was a factor of 63.

[Additional pages on Monte Carlo topics such as forced collisions and backward ray tracing are under development.]