Coffee Space


Listen:

Offline Wikipedia

Preview Image

I recently saw this article titled: “How To Download All of Wikipedia onto a USB Flash Drive in 2022”. As others have commented, it is really surface deep and could not be considered a explanation at all.

But it did get me thinking, I should document my solution for hosting Wikipedia.

Firstly, I used to like Kiwix, but since the older version, the new version uses a software stack I really don’t like. They now use Javascript! Eurgh. The old version used to be C++! Besides this, the install process and compile process is something I really don’t like.

Enter ZIMply

I found this whilst searching around, but found the master branch to be mostly broken. The version2 branch on the other hand is pretty cool and works well.

I wrote a wrapper script to do everything I wanted, and then a repository around the idea with the following structure:

0001 ./
0002 ZIMply/
0003 archive/
0004 run.py

With ZIM files stored in the archive and ZIMply repository in the relevant folder (git clone’d).run.py looks like the following:

0005 import sys
0006 sys.path.append("ZIMply/zimply")
0007 
0008 from zimply import ZIMServer
0009 
0010 args = sys.argv[1:]
0011 
0012 if len(args) < 1 :
0013   print("Required ZIM file")
0014   sys.exit()
0015 
0016 zim = args[0]
0017 zim_idx = zim + ".idx"
0018 zim_temp = "ZIMply/zimply/template.html"
0019 zim_serv_ip = ""
0020 zim_serv_port = 4444
0021 zim_serv_enc = "utf-8"
0022 
0023 # TODO: In the future it would be good to serve multiple ZIM files from the
0024 #       same IP address.
0025 
0026 ZIMServer(zim, index_file = zim_idx, template = zim_temp, ip_address = zim_serv_ip, port = zim_serv_port, encoding = zim_serv_enc)

I this download all of the ZIM archives I want, in my case (from the Kiwix site):

0027 $ ls  -lah *.zim
0028 301M developer.mozilla.org_en_all_2022-05.zim
0029  20M iot.stackexchange.com_en_all_2022-05.zim
0030 685M mathoverflow.net_en_all_2022-07.zim
0031 1.3G programmers.stackexchange.com_en_all_2017-10.zim
0032  62M robotics.stackexchange.com_en_all_2022-05.zim
0033 1.6G serverfault.com_en_all_2022-05.zim
0034 482M softwareengineering.stackexchange.com_en_all_2022-05.zim
0035 1.2G unix.stackexchange.com_en_all_2022-05.zim
0036 2.4G wikipedia_uk_all_mini_2022-10.zim

Then to start an archive I simply run:

0037 python3 run.py archive/softwareengineering.stackexchange.com_en_all_2022-05.zim

Which produces:

ZIMply homepage

Happy days.