Coffee Space


Listen:

Prisoner Game

Preview Image

Background

After watching a recent Veritasium video:

The 100 Prisoners Riddle feels completely impossible even once you know the answer.

The challenge is the following (2:46):

  • 100 prisoners numbered 1-100
  • Slips with their numbers are randomly placed in 100 boxes in a room
  • Each prisoner may enter the room one at a time and check the 50 boxes
  • They must leave the room exactly as they found it and can’t communicate with the others after
  • If all 100 prisoners find their number during their turn in the room, they will all be freed. But if even one fails, they will all be executed.

What is their best strategy?

Well, it sounds like something that can be coded up and computed to learn more…

I should add that I immediately come to this solution at the moment of pausing the video. I believe I have now seen enough similar problems to reach this solution.

Implementation

The following is the Python code I came up with. Of course it is quite simple, but also quite extensible.

0001 import random
0002 import sys
0003 
0004 def gen_boxes(num) :
0005   boxes = list(range(0, num))
0006   random.shuffle(boxes)
0007   return boxes
0008 
0009 def search_self(boxes, check, p) :
0010   x = p
0011   while check > 0 :
0012     check -= 1
0013     if boxes[x] == p :
0014       return 1.0
0015     else :
0016       x = boxes[x]
0017   # Nothing found
0018   return 0.0
0019 
0020 def main(args) :
0021   # Setup variables
0022   check = 50
0023   num = 100
0024   mode = "self"
0025   runs = 1
0026   # Loop arguments
0027   x = 1
0028   while x < len(args) :
0029     if args[x] == "-c" or args[x] == "--check" :
0030       x += 1
0031       check = int(args[x])
0032     elif args[x] == "-h" or args[x] == "--help" :
0033       print("help")
0034       print("")
0035       print("  -c  --check  Set the number of elements to check")
0036       print("                 <INT> The max checks to perform (default: " + str(check) + ")")
0037       print("  -h  --help   Display this help")
0038       print("  -n  --num    Set the number")
0039       print("                 <INT> The number of prisoners in the population (default: " + str(num) + ")")
0040       print("  -m  --mode   Set the mode of exploration")
0041       print("                 <STR> The mode to be used by prisoners (default: " + mode + ")")
0042       print("                         'self' Find own number in room")
0043       print("  -r  --runs   Set the repetitions of the experiment (default: " + str(runs) + ")")
0044       print("                 <INT> The number of experiment repeats")
0045     elif args[x] == "-n" or args[x] == "--num" :
0046       x += 1
0047       num = int(args[x])
0048     elif args[x] == "-m" or args[x] == "--mode" :
0049       x += 1
0050       mode = args[x]
0051     elif args[x] == "-r" or args[x] == "--runs" :
0052       x += 1
0053       runs = int(args[x])
0054     else :
0055       Utils.err("Error: Unknown argument '" + args[x] + "'")
0056     x += 1
0057   # Perform the number of experiments
0058   r = runs
0059   passed = 0.0
0060   while r > 0 :
0061     boxes = gen_boxes(num)
0062     succeed = 0.0
0063     # Send in each prisoner
0064     for p in range(0, num) :
0065       if mode == "self" :
0066         succeed += search_self(boxes, check, p)
0067       else :
0068         print("Unknown mode")
0069         sys.exit(1)
0070     if succeed == num :
0071       passed += 1.0
0072     r -= 1
0073   print(
0074     mode + "," +
0075     str(check) + "," +
0076     str(num) + "," +
0077     str(runs) + "," +
0078     str(passed) + "," +
0079     str(runs) + "," +
0080     str((passed / runs) * 100.0) + "%"
0081   )
0082   return
0083 
0084 if __name__ == "__main__" :
0085   main(sys.argv)

Tests

With this, we perform the following tests (they take a while):

0086 RESULTS="$(date '+results-%F-%H-%M-%S.csv')"
0087 
0088 CHECK=50
0089 NUM=100
0090 MODE="self"
0091 RUNS=1000000
0092 
0093 # Vary number of prisoners, maintain 50% ratio
0094 python3 main.py -c    1    -n     2    -m $MODE -r 10000000000 >> $RESULTS
0095 python3 main.py -c    5    -n    10    -m $MODE -r   100000000 >> $RESULTS
0096 python3 main.py -c   50    -n   100    -m $MODE -r     1000000 >> $RESULTS
0097 python3 main.py -c  500    -n  1000    -m $MODE -r       10000 >> $RESULTS
0098 python3 main.py -c 5000    -n 10000    -m $MODE -r         100 >> $RESULTS
0099 
0100 # Vary number of checks available to prisoners
0101 python3 main.py -c   0 -n 100 -m $MODE -r $RUNS >> $RESULTS
0102 python3 main.py -c  10 -n 100 -m $MODE -r $RUNS >> $RESULTS
0103 python3 main.py -c  20 -n 100 -m $MODE -r $RUNS >> $RESULTS
0104 python3 main.py -c  30 -n 100 -m $MODE -r $RUNS >> $RESULTS
0105 python3 main.py -c  40 -n 100 -m $MODE -r $RUNS >> $RESULTS
0106 python3 main.py -c  50 -n 100 -m $MODE -r $RUNS >> $RESULTS
0107 python3 main.py -c  60 -n 100 -m $MODE -r $RUNS >> $RESULTS
0108 python3 main.py -c  70 -n 100 -m $MODE -r $RUNS >> $RESULTS
0109 python3 main.py -c  80 -n 100 -m $MODE -r $RUNS >> $RESULTS
0110 python3 main.py -c  90 -n 100 -m $MODE -r $RUNS >> $RESULTS
0111 python3 main.py -c 100 -n 100 -m $MODE -r $RUNS >> $RESULTS

Note that some of the experiments have reduced runs, due to them taking a long time to run.

Results

And now for the results…

Attempts vs Wins

Obviously it appears to be more advantageous to have more checks. 0 checks means each prisoner walks into the room of boxes and instantly fails. 100% checks means that each prisoner gets to check every box. This also tells us something else - that each ‘loop’ always contains every box.

Another interesting observation here is that the reward for checking more boxes is not linear. Until you check approximately 20% of the boxes, there appears to be no reward at all.

Prisoners vs Wins

The next graph shows us how increasing the number of prisoners affects this result. As the video points out, we seem to approach a set value. (Note how little runs were possible on those end values, hence a lot more noise.)

And the following are the raw results encase you want to generate your own graphs:

0112 self,1,2,10000000000,5000046071.0,10000000000,50.000460710000006%
0113 self,5,10,100000000,35438004.0,100000000,35.438004%
0114 self,50,100,1000000,312591.0,1000000,31.2591%
0115 self,500,1000,10000,3073.0,10000,30.73%
0116 self,5000,10000,100,33.0,100,33.0%
0117 self,0,100,1000000,0.0,1000000,0.0%
0118 self,10,100,1000000,0.0,1000000,0.0%
0119 self,20,100,1000000,466.0,1000000,0.0466%
0120 self,30,100,1000000,25901.0,1000000,2.5901%
0121 self,40,100,1000000,134628.0,1000000,13.4628%
0122 self,50,100,1000000,311481.0,1000000,31.1481%
0123 self,60,100,1000000,492823.0,1000000,49.2823%
0124 self,70,100,1000000,645143.0,1000000,64.5143%
0125 self,80,100,1000000,777606.0,1000000,77.7606%
0126 self,90,100,1000000,895677.0,1000000,89.56769999999999%
0127 self,100,100,1000000,1000000.0,1000000,100.0%

Further Questions

So the next questions are:

  1. How does this compare to a random search?
  2. Can we leverage the fact that we know that the experiment is continuing in order to search faster?