After watching a recent Veritasium video:
The 100 Prisoners Riddle feels completely impossible even once you know the answer.
The challenge is the following (2:46):
- 100 prisoners numbered 1-100
- Slips with their numbers are randomly placed in 100 boxes in a room
- Each prisoner may enter the room one at a time and check the 50 boxes
- They must leave the room exactly as they found it and can’t communicate with the others after
- If all 100 prisoners find their number during their turn in the room, they will all be freed. But if even one fails, they will all be executed.
What is their best strategy?
Well, it sounds like something that can be coded up and computed to learn more…
I should add that I immediately come to this solution at the moment of pausing the video. I believe I have now seen enough similar problems to reach this solution.
The following is the Python code I came up with. Of course it is quite simple, but also quite extensible.
0001 import random 0002 import sys 0003 0004 def gen_boxes(num) : 0005 boxes = list(range(0, num)) 0006 random.shuffle(boxes) 0007 return boxes 0008 0009 def search_self(boxes, check, p) : 0010 x = p 0011 while check > 0 : 0012 check -= 1 0013 if boxes[x] == p : 0014 return 1.0 0015 else : 0016 x = boxes[x] 0017 # Nothing found 0018 return 0.0 0019 0020 def main(args) : 0021 # Setup variables 0022 check = 50 0023 num = 100 0024 mode = "self" 0025 runs = 1 0026 # Loop arguments 0027 x = 1 0028 while x < len(args) : 0029 if args[x] == "-c" or args[x] == "--check" : 0030 x += 1 0031 check = int(args[x]) 0032 elif args[x] == "-h" or args[x] == "--help" : 0033 print("help") 0034 print("") 0035 print(" -c --check Set the number of elements to check") 0036 print(" <INT> The max checks to perform (default: " + str(check) + ")") 0037 print(" -h --help Display this help") 0038 print(" -n --num Set the number") 0039 print(" <INT> The number of prisoners in the population (default: " + str(num) + ")") 0040 print(" -m --mode Set the mode of exploration") 0041 print(" <STR> The mode to be used by prisoners (default: " + mode + ")") 0042 print(" 'self' Find own number in room") 0043 print(" -r --runs Set the repetitions of the experiment (default: " + str(runs) + ")") 0044 print(" <INT> The number of experiment repeats") 0045 elif args[x] == "-n" or args[x] == "--num" : 0046 x += 1 0047 num = int(args[x]) 0048 elif args[x] == "-m" or args[x] == "--mode" : 0049 x += 1 0050 mode = args[x] 0051 elif args[x] == "-r" or args[x] == "--runs" : 0052 x += 1 0053 runs = int(args[x]) 0054 else : 0055 Utils.err("Error: Unknown argument '" + args[x] + "'") 0056 x += 1 0057 # Perform the number of experiments 0058 r = runs 0059 passed = 0.0 0060 while r > 0 : 0061 boxes = gen_boxes(num) 0062 succeed = 0.0 0063 # Send in each prisoner 0064 for p in range(0, num) : 0065 if mode == "self" : 0066 succeed += search_self(boxes, check, p) 0067 else : 0068 print("Unknown mode") 0069 sys.exit(1) 0070 if succeed == num : 0071 passed += 1.0 0072 r -= 1 0073 print( 0074 mode + "," + 0075 str(check) + "," + 0076 str(num) + "," + 0077 str(runs) + "," + 0078 str(passed) + "," + 0079 str(runs) + "," + 0080 str((passed / runs) * 100.0) + "%" 0081 ) 0082 return 0083 0084 if __name__ == "__main__" : 0085 main(sys.argv)
With this, we perform the following tests (they take a while):
0086 RESULTS="$(date '+results-%F-%H-%M-%S.csv')" 0087 0088 CHECK=50 0089 NUM=100 0090 MODE="self" 0091 RUNS=1000000 0092 0093 # Vary number of prisoners, maintain 50% ratio 0094 python3 main.py -c 1 -n 2 -m $MODE -r 10000000000 >> $RESULTS 0095 python3 main.py -c 5 -n 10 -m $MODE -r 100000000 >> $RESULTS 0096 python3 main.py -c 50 -n 100 -m $MODE -r 1000000 >> $RESULTS 0097 python3 main.py -c 500 -n 1000 -m $MODE -r 10000 >> $RESULTS 0098 python3 main.py -c 5000 -n 10000 -m $MODE -r 100 >> $RESULTS 0099 0100 # Vary number of checks available to prisoners 0101 python3 main.py -c 0 -n 100 -m $MODE -r $RUNS >> $RESULTS 0102 python3 main.py -c 10 -n 100 -m $MODE -r $RUNS >> $RESULTS 0103 python3 main.py -c 20 -n 100 -m $MODE -r $RUNS >> $RESULTS 0104 python3 main.py -c 30 -n 100 -m $MODE -r $RUNS >> $RESULTS 0105 python3 main.py -c 40 -n 100 -m $MODE -r $RUNS >> $RESULTS 0106 python3 main.py -c 50 -n 100 -m $MODE -r $RUNS >> $RESULTS 0107 python3 main.py -c 60 -n 100 -m $MODE -r $RUNS >> $RESULTS 0108 python3 main.py -c 70 -n 100 -m $MODE -r $RUNS >> $RESULTS 0109 python3 main.py -c 80 -n 100 -m $MODE -r $RUNS >> $RESULTS 0110 python3 main.py -c 90 -n 100 -m $MODE -r $RUNS >> $RESULTS 0111 python3 main.py -c 100 -n 100 -m $MODE -r $RUNS >> $RESULTS
Note that some of the experiments have reduced runs, due to them taking a long time to run.
And now for the results…
Obviously it appears to be more advantageous to have more checks. 0 checks means each prisoner walks into the room of boxes and instantly fails. 100% checks means that each prisoner gets to check every box. This also tells us something else - that each ‘loop’ always contains every box.
Another interesting observation here is that the reward for checking more boxes is not linear. Until you check approximately 20% of the boxes, there appears to be no reward at all.
The next graph shows us how increasing the number of prisoners affects this result. As the video points out, we seem to approach a set value. (Note how little runs were possible on those end values, hence a lot more noise.)
And the following are the raw results encase you want to generate your own graphs:
0112 self,1,2,10000000000,5000046071.0,10000000000,50.000460710000006% 0113 self,5,10,100000000,35438004.0,100000000,35.438004% 0114 self,50,100,1000000,312591.0,1000000,31.2591% 0115 self,500,1000,10000,3073.0,10000,30.73% 0116 self,5000,10000,100,33.0,100,33.0% 0117 self,0,100,1000000,0.0,1000000,0.0% 0118 self,10,100,1000000,0.0,1000000,0.0% 0119 self,20,100,1000000,466.0,1000000,0.0466% 0120 self,30,100,1000000,25901.0,1000000,2.5901% 0121 self,40,100,1000000,134628.0,1000000,13.4628% 0122 self,50,100,1000000,311481.0,1000000,31.1481% 0123 self,60,100,1000000,492823.0,1000000,49.2823% 0124 self,70,100,1000000,645143.0,1000000,64.5143% 0125 self,80,100,1000000,777606.0,1000000,77.7606% 0126 self,90,100,1000000,895677.0,1000000,89.56769999999999% 0127 self,100,100,1000000,1000000.0,1000000,100.0%
So the next questions are: