Listen:

**Update:** Re-ran with a 6th algorithm, results and discussion reflect the performance of this.

I was reading this article by Kudelski Security Research from 2020 about modulo bias, and I found it quite interesting. As the article suggests, modulo bias is a common security issue with security reviews, and I suspect I have also seen it appear in my experiments too.

The basic idea is that you have some random variable $x$ and perform a modulo operation to limit it to the range of $y$. We assume $x \geq 0$ and $y > 0$, and calculate the bounded random value with $x \bmod y$.

In these experiments we will be working on small random values where $x \leq 255$.

First things first, I want to replicate the results the author of the article generated. As in the experiments in the article, we will be be displaying the results for $y = 107$.

These programs are run using:

0001$python3<program_name><y_value>

The idea of this is just to show we produce random numbers correctly.

0002 import pandas 0003 import os 0004 import time 0005 import sys 0006 import matplotlib.pyplot as p 0007 0008 alg="1-bias" 0009 mod=int(sys.argv[1])0010 0011 start=time.time()0012 s=pandas.DataFrame({'value':list(os.urandom(1000000))})0013 end=time.time()0014 0015 s['value'].value_counts().sort_index().plot(figsize=(20,5),kind='bar')0016 p.savefig(alg+"-"+str(mod)+".png")0017 0018 print(alg,mod,(end-start)*10**3,s.std(axis=0,skipna=True).value)

As you can see I made some changes to the Python code to actually produce the results and give you an idea of how we will also represent the other algorithms. Note that we are also printing some other information - this will be important later.

Thankfully, the local system *does* produce random numbers correctly. This may sound sarcastic, but Intel’s RDRAND contained a bug that meant it had to be run multiple times to produce an actually random number. AMD’s RDRAND *also* contained a bug meaning it would return a default value instead of a random number.

This program is for showing the bias visually.

0019 import pandas 0020 import os 0021 import time 0022 import sys 0023 import matplotlib.pyplot as p 0024 0025 alg="2-bias" 0026 mod=int(sys.argv[1])0027 0028 start=time.time()0029 s=pandas.DataFrame({'value':[x%mod for x in os.urandom(1000000)]})0030 end=time.time()0031 0032 s['value'].value_counts().sort_index().plot(figsize=(20,5),kind='bar')0033 p.savefig(alg+"-"+str(mod)+".png")0034 0035 print(alg,mod,(end-start)*10**3,s.std(axis=0,skipna=True).value)

As you can see, we have now introduced a bias into the output despite the input randomness being pretty good.

So, let’s look at how we can deal with this!

The most obvious solution is to test the received random number and request a new one if it doesn’t meet our criteria.

0036 import pandas 0037 import os 0038 import time 0039 import sys 0040 import matplotlib.pyplot as p 0041 0042 alg="3-reject" 0043 mod=int(sys.argv[1])0044 0045 values=[]0046 start=time.time()0047 while(len(values)<=1000000):0048 x=int.from_bytes(os.urandom(1),byteorder='little')0049 if x<mod:0050 values.append(x)0051 s=pandas.DataFrame({'value':values})0052 end=time.time()0053 0054 s['value'].value_counts().sort_index().plot(figsize=(20,5),kind='bar')0055 p.savefig(alg+"-"+str(mod)+".png")0056 0057 print(alg,mod,(end-start)*10**3,s.std(axis=0,skipna=True).value)

As you can see, this works really quite well! The only problem is that for low values of $y$, it can take a long time to randomly produce a value that meets the requirement. For example, for $y = 1$, probability it $p(y = 1) = \frac{1}{256}$. We will plot the runtime shortly.

This method overcomes the issues of the previous low values of $y$, by increasing the search space to be a multiple of $y$. This in turn reduces the amount of rejection done without compromising the random output, increasing the running speed greatly.

0058 import pandas 0059 import os 0060 import time 0061 import sys 0062 import matplotlib.pyplot as p 0063 0064 alg="4-reject-reduce" 0065 mod=int(sys.argv[1])0066 mul=mod 0067 while mul+mod<255:0068 mul+=mod 0069 0070 values=[]0071 start=time.time()0072 while(len(values)<=1000000):0073 x=int.from_bytes(os.urandom(1),byteorder='little')0074 if x<mul:0075 values.append(x%mod)0076 s=pandas.DataFrame({'value':values})0077 end=time.time()0078 0079 s['value'].value_counts().sort_index().plot(figsize=(20,5),kind='bar')0080 p.savefig(alg+".png")0081 0082 print(alg,mod,(end-start)*10**3,s.std(axis=0,skipna=True).value)

And as we can see, the results are quite nice.

The problem with *rejection & reduction* is that you cannot be sure it *ever* actually completes. Whilst it would be incredibly unlikely, it could theoretically run indefinitely. What we need is a constant-time algorithm.

This is my initial attempt at creating such. We store the previous result and XOR it with our random variable $x$ if $x$ is not within bounds. This of course means there is still a bias, but the point is that there is *less* of a bias.

0083 import pandas 0084 import os 0085 import time 0086 import sys 0087 import matplotlib.pyplot as p 0088 0089 alg="5-constant" 0090 mod=int(sys.argv[1])0091 mul=mod 0092 while mul+mod<255:0093 mul+=mod 0094 0095 values=[]0096 y=00097 start=time.time()0098 while(len(values)<=1000000):0099 x=int.from_bytes(os.urandom(1),byteorder='little')0100 if x>=mul:0101 x^=y 0102 y=x%mod 0103 values.append(y)0104 s=pandas.DataFrame({'value':values})0105 end=time.time()0106 0107 s['value'].value_counts().sort_index().plot(figsize=(20,5),kind='bar')0108 p.savefig(alg+".png")0109 0110 print(alg,mod,(end-start)*10**3,s.std(axis=0,skipna=True).value)

As we can see, the distribution is not quite as nice as either of the rejection algorithms:

But what we should gain is constant running time.

This is similar to the previous algorithm, except instead of XOR, we minus the previous value instead. We know that the previous value was likely less than the modulo value, and in the case that we see the generated random value $x$ is greater than the modulo value, we can minus the previous value and have a result within range.

0111 import pandas 0112 import os 0113 import time 0114 import sys 0115 import matplotlib.pyplot as p 0116 0117 alg="6-constant" 0118 mod=int(sys.argv[1])0119 mul=mod 0120 while mul+mod<255:0121 mul+=mod 0122 0123 values=[]0124 y=00125 start=time.time()0126 while(len(values)<=1000000):0127 x=int.from_bytes(os.urandom(1),byteorder='little')0128 if x>=mul:0129 # NOTE: y was likely successful in the last spin, so y < mul likely holds 0130 # true. We know that x >= mul, so x - y should in most cases yield a 0131 # result < mul. 0132 x-=y 0133 y=x%mod 0134 values.append(y)0135 s=pandas.DataFrame({'value':values})0136 end=time.time()0137 0138 s['value'].value_counts().sort_index().plot(figsize=(20,5),kind='bar')0139 p.savefig(alg+".png")0140 0141 print(alg,mod,(end-start)*10**3,s.std(axis=0,skipna=True).value)

As you can see, we get a *very* nice distribution at this range that runs in constant time, that appears to be without significant bias:

Obviously more comprehensive testing required…

Each of these algorithms was run for 10 runs at each modulo value, where $0 < y <= 255$. This was run over several days, switching between algorithms (as there is currently a large day-night temperature differential which would affect CPU clock speed).

The run time for `1-bias`

and `2-bias`

are simply there for comparison. As they do not operate on the Pandas DataFrame data structure they are awarded some speed-up. Most important of note for these values is that they run near constant time.

For `3-reject`

we see that as expected, for lower modulo values it can take significant time to happen upon a value that occurs within bounds. For this reason we had to display the values on a logarithmic scale, as it took on average in excess of 1,000 seconds (greater than 16 minutes) per experiment. As the modulo value increases, the probability of rejection decreases, and so does the time required.

When looking at `4-reject-reduce`

we see a largely reduced run time, although we see a “stairs” pattern. These represent increased rejection values for given values of $y$. In some systems this may provide probabilistic information regarding random values chosen, or the state of a pseudo random number generator (PRNG).

With `5-constant`

the goal was to generate random values with near constant time, which appears to have been achieved ^{1}. It would be difficult for a malicious actor to perform a timing-based attack based on the delay generated here.

Lastly the `6-constant`

algorithm is a lot more reliable for constant time work, offering the flattest line we see yet.

For completeness we see `1-bias`

, with a perfect standard deviation as it doesn’t limit the values within bounds. For the other algorithms, we see the deviation increase as the modulo value $y$ increases, as expected. `3-reject`

is our ‘perfect’ line, as it excepts nothing but truly random values within the provided bounds, at the cost of time. Equally, `4-reject-reduce`

simply is an iteration that reduces the number of rejections performed.

As expected, `2-bias`

deviates greatly, showing the bias of the output depends on the modulo value of $y$. The algorithm `5-constant`

deviates somewhat, but not significantly, when compared to the deviation posed with the problematic `2-bias`

implementation.

The `6-constant`

algorithm of the other hand overcomes the issues of the `5-constant`

algorithm by selecting random values within the proper range, rather than trying to generate a new random value. As such the deviation is a lot tighter and appears to be cryptographically appropriate.

As you can see in the results graphs, the modulo bias problem is quite significant. I present an algorithm of near constant time running that mostly eliminates the modulo bias problem. This may be useful for systems that need to generate *many* bounded random numbers with the same modulo value $y$.

One way in which this could be used is by checking whether the previously generated random number had the same modulo, and picking a generation technique based on this fact:

0142 prev_mod=00143 def gen_rand(upper_bound=255):0144 x=00145 if upper_bound>0:0146 if prev_mod==upper_bound:0147 x=..# Run constant algorithm 0148 else:0149 x=..# Run reject and reduction 0150 prev_mod=upper_bound 0151 return x

I believe there may still be better strategies available for generating bounded random numbers that do not rely on repeated modulo values. In the future I may investigate other strategies, perhaps even looking to implement them in standard libraries for a proper comparison.

To prove this we would need to look at the distribution of the time taken to ensure it is reliable, and doesn’t just produce constant time on average.↩︎