The problem you're experiencing comes from the fact that array.append, as implemented in NumPy, actually creates a new copy of the array every time called, which must allocated in the memory completely new, especially if called often and on large arrays this should definitely be avoided. There are two solutions to solve this issue:
1. If you don't know how big the arrays are going to be, you can use python lists during the processing and then convert them to NumPy arrays afterwards. Python lists alter in memory and are therefore much more performant for using append. The downside of this is that you have larger memory usage, as you can't use NumPys more efficient data types.
2. If you know how big the arrays are going to be you can preallocate the whole array, by using the zeros method of NumPy and create sort of an empty array which you then add the values to as you progress. This may requires changing of your processing algorithm and therefore could be slower, but it can be way more memory efficient.
There is a simple library lxml in python. With this a sample script for looking up information in a website could look like this:
from urllib import urlopen, urlencode
from httplib import HTTPConnection
from lxml import etree
sock= urlopen(url)
html= sock.read()
sock.close()
f= open(fn,"w")
f.write(html)
f.close()
htmlParser= etree.HTMLParser()
tree= etree.parse(fn, htmlParser)
info= tree.xpath(" path ")