Coffee Space – Coffee Space

King’s Brides

Problem

A problem that was posed to be years ago (paraphrasing), is the following ¹:

A King’s subjects collect 50 women, each a potential bride. He meets with them one at a time, and must decide whether to propose or not to the woman before him. If he rejects her, she will not accept his proposal in the future. He cannot see them all at the same time, only as they are presented to him in some unknown order. If he accepts none, he ends up with no bride at all.

What is the best algorithm for him to apply to get the ‘best’ bride?

Note that ‘best’ here could be beauty, or it could be the result of some personal connection made with each woman. The measure he uses is not important, the important part is that the King has a method to rate each woman based on his internal measure.

Experiment

The source code to this is given below, as well as the raw results.

In our experiment, we make the King select from 50 brides (this is a number I remember) and perform 10,000 runs to get some statistics. Each set of brides for each run is randomly given a rating from a uniform distribution.

Each bride has a rating of 0.0 to 1.0. The average minimum rating is close to 0.0, the average maximum rating is close to 1.0, the average rating is close to 0.5.

First

The easy solution is for the King to simply select the first woman that comes to him. In Python this would look as follows:

0001 def alg_first(x, n, bride) :
0002   if x == 0 :
0003     return bride
0004   return None

In this case, when the proposed algorithm returns None, it means that no selection was made, and the woman is rejected. If it returns bride then the King proposes to the woman.

Result: In my experiments, this algorithm performs very averagely, with an average rating of 0.5027437360986446.

Next Best

The ‘optimal’ solution I was told years ago was to select the best bride that is better than the previous best. In reality, you would reject your first bride in all scenarios, and then select the best that is better than her. In Python this looks as follows:

0005 nextbest_prev = None
0006 def alg_nextbest(x, n, bride) :
0007   global nextbest_prev
0008   if x == 0 :
0009     nextbest_prev = bride
0010   else :
0011     if bride > nextbest_prev :
0012       return bride
0013   return None

Something to note here is that it’s entirely possible that the best bride is the first one you meet, therefore by running this algorithm you run a risk of having no bride at all.

Result: In my experiments, this algorithm performs quite well, with an average rating of 0.7478398367763861. Worth noting though that there is an approximately 2.24% chance of not selecting a bride at all.

First Threshold

The algorithm I propose relies on you having some insider knowledge of the distribution of the women to be judged. Given there are 50 brides, and that they have a uniform random rating, if you set a threshold of 0.75 as a minimum, you know that on average 12.5 of these potential brides should rate at least this. The probability of none of them being higher than this rating is acceptably low (I didn’t see it in my experiments).

The Python code for this would look like so:

0014 def alg_firstthresh(x, n, bride) :
0015   if bride >= 0.75 :
0016     return bride
0017   return None

Result: In my experiments, this algorithm selects a bride with an average rating of 0.8744221248261387, with approximately 0% occurrences of not finding a bride.

Discussion

When answering “which is best?”, this is not quite so simple. If you want to always ensure you select a bride, simply picking the first is guaranteed to do this. If your goal is to select the best bride you can, without knowing anything about the distribution of ratings and a small probability of not selecting a bride is acceptable, the next best is a good algorithm. On the other hand, if you do have information about your distribution, you can quite clearly reliably do better by selecting the first that passes a threshold.

Some other algorithms likely exist for even better performance:

If you use the first N women to sample your distribution, and they are given to you randomly, you could select a better threshold value without knowing the distribution.
Rather than picking the first that is better than the threshold, potentially you could pick the next best bride greater than the threshold, although with the warning that you will end up with a significant probability of selecting no bride.

Why should I care? In this scenario, a King is being presented with brides, but this is easily applicable to other areas. Perhaps you are looking for a job, you make multiple applications and have multiple successful interviews ending with multiple offers. Maybe you are looking to purchase a second-hand vehicle and you are looking to get the best bang-for-buck. There are many scenarios this could apply.

Off-topic: A good friend of mine is looking to get married soon, and I thought I could make a joke about his selection policy. I imagine it would go over the heads of 95% of people in the group and those who understand it may not find it funny. Still, I’m sure there is a great joke in there somewhere I could brew for another day.

What did we learn? Generally, the more information you have available to you, the better the decision you can make. Knowing nothing, we select the first bride. Knowing just the score of the first bride, we select the next best. Knowing the approximate distribution of brides, we select the first past some threshold. If we had prior information about every bride, we could even just select the best one straight away.

When making decisions, we should try to be as informed as possible.

Source Code

0018 import random
0019 import sys
0020 
0021 def gen_brides(num) :
0022   brides = []
0023   for n in range(num) :
0024     brides += [ random.random() ]
0025   return brides
0026 
0027 def min_brides(brides) :
0028   val = brides[0]
0029   for bride in brides :
0030     if bride < val :
0031       val = bride
0032   return val
0033 
0034 def max_brides(brides) :
0035   val = brides[0]
0036   for bride in brides :
0037     if bride > val :
0038       val = bride
0039   return val
0040 
0041 def avg_brides(brides) :
0042   summ = 0.0
0043   for bride in brides :
0044     summ += bride
0045   return summ / len(brides)
0046 
0047 def alg_first(x, n, bride) :
0048   if x == 0 :
0049     return bride
0050   return None
0051 
0052 nextbest_prev = None
0053 def alg_nextbest(x, n, bride) :
0054   global nextbest_prev
0055   if x == 0 :
0056     nextbest_prev = bride
0057   else :
0058     if bride > nextbest_prev :
0059       return bride
0060   return None
0061 
0062 def alg_firstthresh(x, n, bride) :
0063   if bride >= 0.75 :
0064     return bride
0065   return None
0066 
0067 def test_alg(alg, brides) :
0068   for x in range(len(brides)) :
0069     result = alg(x, len(brides), brides[x])
0070     if result != None :
0071       return result
0072   return None
0073 
0074 def main(args) :
0075   # Setup variables initially
0076   num_runs = 10000
0077   num_brides = 50
0078   results = []
0079   # Generate experiments
0080   for run in range(num_runs) :
0081     brides = gen_brides(num_brides)
0082     brides_min = min_brides(brides)
0083     brides_max = max_brides(brides)
0084     brides_avg = avg_brides(brides)
0085     brides_first = test_alg(alg_first, brides)
0086     brides_nextbest = test_alg(alg_nextbest, brides)
0087     brides_firstthresh = test_alg(alg_firstthresh, brides)
0088     results += [ [brides_min, brides_max, brides_avg, brides_first, brides_nextbest, brides_firstthresh] ]
0089   # Display results
0090   print("num_runs", num_runs)
0091   print("num_brides", num_brides)
0092   print("---")
0093   print([
0094     "avg_min",
0095     "avg_max",
0096     "avg_avg",
0097     "avg_first",
0098     "avg_nextbest",
0099     "avg_firstthresh",
0100     "avg_first_fail",
0101     "avg_nextbest_fail",
0102     "avg_firstthresh_fail"
0103   ])
0104   avg_min = 0
0105   avg_max = 0
0106   avg_avg = 0
0107   avg_first = 0
0108   avg_nextbest = 0
0109   avg_firstthresh = 0
0110   avg_first_fail = 0
0111   avg_nextbest_fail = 0
0112   avg_firstthresh_fail = 0
0113   for result in results :
0114     avg_min += result[0]
0115     avg_max += result[1]
0116     avg_avg += result[2]
0117     if result[3] != None :
0118       avg_first += result[3]
0119     else :
0120       avg_first_fail += 1
0121     if result[4] != None :
0122       avg_nextbest += result[4]
0123     else :
0124       avg_nextbest_fail += 1
0125     if result[5] != None :
0126       avg_firstthresh += result[5]
0127     else :
0128       avg_firstthresh_fail += 1
0129   avg_min /= num_runs
0130   avg_max /= num_runs
0131   avg_avg /= num_runs
0132   avg_first /= (num_runs - avg_first_fail)
0133   avg_nextbest /= (num_runs - avg_nextbest_fail)
0134   avg_firstthresh /= (num_runs - avg_firstthresh_fail)
0135   avg_first_fail /= num_runs
0136   avg_nextbest_fail /= num_runs
0137   avg_firstthresh_fail /= num_runs
0138   print([
0139     avg_min,
0140     avg_max,
0141     avg_avg,
0142     avg_first,
0143     avg_nextbest,
0144     avg_firstthresh,
0145     avg_first_fail,
0146     avg_nextbest_fail,
0147     avg_firstthresh_fail
0148   ])
0149   print("---")
0150   print(["min", "max", "avg", "first", "nextbest", "firstthresh"])
0151   for result in results :
0152     print(result)
0153   return
0154 
0155 if __name__ == "__main__" :
0156   main(sys.argv)

Raw Results

Please download the provided file for the raw results of this experiment.

If anybody knows the name of this problem, I would like to know.↩︎