Coffee Space 
Recently I was looking for a Python function that could parse HTML
and remove the <script> tags. For example, you may
have the following:
0001 html = "Hello world! <script>alert('Hi!');</script> Test."
We want to strip between the tags <script> and
<script>. I wrote a function called
strip_between() that does this:
0002 # strip_between() 0003 # 0004 # Strips out a string inclusively between two strings until no matches are 0005 # found. 0006 # 0007 # NOTE: There is an assumption made that the strings are found and only found 0008 # in matching pairs. If b appears before a, it doesn't make sense to remove 0009 # between b and a, as order matters. 0010 # 0011 # @param s The string to be searched. 0012 # @param a The first string to search for. 0013 # @param b The second string to search for. 0014 # @return The removal of the strings. 0015 def strip_between(s, a, b) : 0016 z = 0 0017 l = len(s) + 1 0018 while l > len(s) : 0019 l = len(s) 0020 i = s.find(a, z) 0021 j = s.find(b, z) 0022 if i < 0 or j < 0 or i >= j : 0023 continue 0024 s = s[:i] + s[(j + len(b)):] 0025 z = i 0026 return s
It takes the input string s, the first string to check
for a and the second string to check for b. It
keeps checking until all matches have been found.
We can now run this on our test input 1:
0027 result = strip_between(html, "<script", "</script>") 0028 print(result)
Hello world! Test.
Hopefully you find this useful!