I recently saw this article titled: “How To Download All of Wikipedia onto a USB Flash Drive in 2022”. As others have commented, it is really surface deep and could not be considered a explanation at all.
But it did get me thinking, I should document my solution for hosting Wikipedia.
Firstly, I used to like Kiwix, but since the older version, the new version uses a software stack I really don’t like. They now use Javascript! Eurgh. The old version used to be C++! Besides this, the install process and compile process is something I really don’t like.
I found this whilst searching around, but found the master
branch to be mostly broken. The version2
branch on the other hand is pretty cool and works well.
I wrote a wrapper script to do everything I wanted, and then a repository around the idea with the following structure:
0001 ./ 0002 ZIMply/ 0003 archive/ 0004 run.py
With ZIM
files stored in the archive
and ZIMply repository in the relevant folder (git clone
’d).run.py
looks like the following:
0005 import sys 0006 sys.path.append("ZIMply/zimply") 0007 0008 from zimply import ZIMServer 0009 0010 args = sys.argv[1:] 0011 0012 if len(args) < 1 : 0013 print("Required ZIM file") 0014 sys.exit() 0015 0016 zim = args[0] 0017 zim_idx = zim + ".idx" 0018 zim_temp = "ZIMply/zimply/template.html" 0019 zim_serv_ip = "" 0020 zim_serv_port = 4444 0021 zim_serv_enc = "utf-8" 0022 0023 # TODO: In the future it would be good to serve multiple ZIM files from the 0024 # same IP address. 0025 0026 ZIMServer(zim, index_file = zim_idx, template = zim_temp, ip_address = zim_serv_ip, port = zim_serv_port, encoding = zim_serv_enc)
I this download all of the ZIM archives I want, in my case (from the Kiwix site):
0027 $ ls -lah *.zim 0028 301M developer.mozilla.org_en_all_2022-05.zim 0029 20M iot.stackexchange.com_en_all_2022-05.zim 0030 685M mathoverflow.net_en_all_2022-07.zim 0031 1.3G programmers.stackexchange.com_en_all_2017-10.zim 0032 62M robotics.stackexchange.com_en_all_2022-05.zim 0033 1.6G serverfault.com_en_all_2022-05.zim 0034 482M softwareengineering.stackexchange.com_en_all_2022-05.zim 0035 1.2G unix.stackexchange.com_en_all_2022-05.zim 0036 2.4G wikipedia_uk_all_mini_2022-10.zim
Then to start an archive I simply run:
0037 python3 run.py archive/softwareengineering.stackexchange.com_en_all_2022-05.zim
Which produces:
Happy days.