Listen:

A problem that was posed to be years ago (paraphrasing), is the following ^{1}:

A King’s subjects collect 50 women, each a potential bride. He meets with them one at a time, and must decide whether to propose or not to the woman before him. If he rejects her, she will not accept his proposal in the future. He cannot see them all at the same time, only as they are presented to him in some unknown order. If he accepts none, he ends up with no bride at all.

What is the best algorithm for him to apply to get the ‘best’ bride?

Note that ‘best’ here could be beauty, or it could be the result of some personal connection made with each woman. The measure he uses is not important, the important part is that the King has a method to rate each woman based on his internal measure.

The source code to this is given below, as well as the raw results.

In our experiment, we make the King select from 50 brides (this is a number I remember) and perform 10,000 runs to get some statistics. Each set of brides for each run is randomly given a rating from a uniform distribution.

Each bride has a rating of `0.0`

to `1.0`

. The average minimum rating is close to `0.0`

, the average maximum rating is close to `1.0`

, the average rating is close to `0.5`

.

The easy solution is for the King to simply select the first woman that comes to him. In Python this would look as follows:

0001 def alg_first(x,n,bride):0002 if x==0:0003 return bride 0004 return None

In this case, when the proposed algorithm returns `None`

, it means that no selection was made, and the woman is rejected. If it returns `bride`

then the King proposes to the woman.

**Result:** In my experiments, this algorithm performs very averagely, with an average rating of `0.5027437360986446`

.

The ‘optimal’ solution I was told years ago was to select the best bride that is better than the previous best. In reality, you would reject your first bride in all scenarios, and then select the best that is better than her. In Python this looks as follows:

0005 nextbest_prev=None 0006 def alg_nextbest(x,n,bride):0007 global nextbest_prev 0008 if x==0:0009 nextbest_prev=bride 0010 else:0011 if bride>nextbest_prev:0012 return bride 0013 return None

Something to note here is that it’s entirely possible that the best bride is the first one you meet, therefore by running this algorithm you run a risk of having no bride at all.

**Result:** In my experiments, this algorithm performs quite well, with an average rating of `0.7478398367763861`

. Worth noting though that there is an approximately 2.24% chance of not selecting a bride at all.

The algorithm I propose relies on you having some insider knowledge of the distribution of the women to be judged. Given there are 50 brides, and that they have a uniform random rating, if you set a threshold of `0.75`

as a minimum, you know that on average 12.5 of these potential brides should rate at least this. The probability of none of them being higher than this rating is acceptably low (I didn’t see it in my experiments).

The Python code for this would look like so:

0014 def alg_firstthresh(x,n,bride):0015 if bride>=0.75:0016 return bride 0017 return None

**Result:** In my experiments, this algorithm selects a bride with an average rating of `0.8744221248261387`

, with approximately 0% occurrences of not finding a bride.

When answering “*which is best?*”, this is not quite so simple. If you want to always ensure you select a bride, simply picking the *first* is guaranteed to do this. If your goal is to select the best bride you can, without knowing anything about the distribution of ratings and a small probability of not selecting a bride is acceptable, the *next best* is a good algorithm. On the other hand, if you do have information about your distribution, you can quite clearly reliably do better by selecting the *first* that passes a *threshold*.

Some other algorithms likely exist for even better performance:

- If you use the first N women to sample your distribution, and they are given to you randomly, you could select a better threshold value without knowing the distribution.
- Rather than picking the first that is better than the threshold, potentially you could pick the next best bride greater than the threshold, although with the warning that you will end up with a significant probability of selecting no bride.

*Why should I care?* In this scenario, a King is being presented with brides, but this is easily applicable to other areas. Perhaps you are looking for a job, you make multiple applications and have multiple successful interviews ending with multiple offers. Maybe you are looking to purchase a second-hand vehicle and you are looking to get the best bang-for-buck. There are many scenarios this could apply.

*Off-topic:* A good friend of mine is looking to get married soon, and I thought I could make a joke about his selection policy. I imagine it would go over the heads of 95% of people in the group and those who understand it may not find it funny. Still, I’m sure there is a great joke in there somewhere I could brew for another day.

*What did we learn?* Generally, the more information you have available to you, the better the decision you can make. Knowing nothing, we select the first bride. Knowing just the score of the first bride, we select the next best. Knowing the approximate distribution of brides, we select the first past some threshold. If we had prior information about every bride, we could even just select the best one straight away.

When making decisions, we should try to be as informed as possible.

0018 import random 0019 import sys 0020 0021 def gen_brides(num):0022 brides=[]0023 for n in range(num):0024 brides+=[random.random()]0025 return brides 0026 0027 def min_brides(brides):0028 val=brides[0]0029 for bride in brides:0030 if bride<val:0031 val=bride 0032 return val 0033 0034 def max_brides(brides):0035 val=brides[0]0036 for bride in brides:0037 if bride>val:0038 val=bride 0039 return val 0040 0041 def avg_brides(brides):0042 summ=0.00043 for bride in brides:0044 summ+=bride 0045 return summ/len(brides)0046 0047 def alg_first(x,n,bride):0048 if x==0:0049 return bride 0050 return None 0051 0052 nextbest_prev=None 0053 def alg_nextbest(x,n,bride):0054 global nextbest_prev 0055 if x==0:0056 nextbest_prev=bride 0057 else:0058 if bride>nextbest_prev:0059 return bride 0060 return None 0061 0062 def alg_firstthresh(x,n,bride):0063 if bride>=0.75:0064 return bride 0065 return None 0066 0067 def test_alg(alg,brides):0068 for x in range(len(brides)):0069 result=alg(x,len(brides),brides[x])0070 if result!=None:0071 return result 0072 return None 0073 0074 def main(args):0075 # Setup variables initially 0076 num_runs=100000077 num_brides=500078 results=[]0079 # Generate experiments 0080 for run in range(num_runs):0081 brides=gen_brides(num_brides)0082 brides_min=min_brides(brides)0083 brides_max=max_brides(brides)0084 brides_avg=avg_brides(brides)0085 brides_first=test_alg(alg_first,brides)0086 brides_nextbest=test_alg(alg_nextbest,brides)0087 brides_firstthresh=test_alg(alg_firstthresh,brides)0088 results+=[[brides_min,brides_max,brides_avg,brides_first,brides_nextbest,brides_firstthresh]]0089 # Display results 0090 print("num_runs",num_runs)0091 print("num_brides",num_brides)0092 print("---")0093 print([0094 "avg_min",0095 "avg_max",0096 "avg_avg",0097 "avg_first",0098 "avg_nextbest",0099 "avg_firstthresh",0100 "avg_first_fail",0101 "avg_nextbest_fail",0102 "avg_firstthresh_fail" 0103])0104 avg_min=00105 avg_max=00106 avg_avg=00107 avg_first=00108 avg_nextbest=00109 avg_firstthresh=00110 avg_first_fail=00111 avg_nextbest_fail=00112 avg_firstthresh_fail=00113 for result in results:0114 avg_min+=result[0]0115 avg_max+=result[1]0116 avg_avg+=result[2]0117 if result[3]!=None:0118 avg_first+=result[3]0119 else:0120 avg_first_fail+=10121 if result[4]!=None:0122 avg_nextbest+=result[4]0123 else:0124 avg_nextbest_fail+=10125 if result[5]!=None:0126 avg_firstthresh+=result[5]0127 else:0128 avg_firstthresh_fail+=10129 avg_min/=num_runs 0130 avg_max/=num_runs 0131 avg_avg/=num_runs 0132 avg_first/=(num_runs-avg_first_fail)0133 avg_nextbest/=(num_runs-avg_nextbest_fail)0134 avg_firstthresh/=(num_runs-avg_firstthresh_fail)0135 avg_first_fail/=num_runs 0136 avg_nextbest_fail/=num_runs 0137 avg_firstthresh_fail/=num_runs 0138 print([0139 avg_min,0140 avg_max,0141 avg_avg,0142 avg_first,0143 avg_nextbest,0144 avg_firstthresh,0145 avg_first_fail,0146 avg_nextbest_fail,0147 avg_firstthresh_fail 0148])0149 print("---")0150 print(["min","max","avg","first","nextbest","firstthresh"])0151 for result in results:0152 print(result)0153 return 0154 0155 if__name__=="__main__":0156 main(sys.argv)

Please download the provided file for the raw results of this experiment.

If anybody knows the name of this problem, I would like to know.↩︎