The problem you're experiencing comes from the fact that array.append, as implemented in NumPy, actually creates a new copy of the array every time called, which must allocated in the memory completely new, especially if called often and on large arrays this should definitely be avoided. There are two solutions to solve this issue:

1. If you don't know how big the arrays are going to be, you can use python lists during the processing and then convert them to NumPy arrays afterwards. Python lists alter in memory and are therefore much more performant for using append. The downside of this is that you have larger memory usage, as you can't use NumPys more efficient data types.

2. If you know how big the arrays are going to be you can preallocate the whole array, by using the zeros method of NumPy and create sort of an empty array which you then add the values to as you progress. This may requires changing of your processing algorithm and therefore could be slower, but it can be way more memory efficient.


NumPy array append slow

For a university project I need to implement an inverted index in python. This consists of a dictionary containing NumPy arrays, to which I append new values as I process more text. We've been told to use NumPy instead of regular Python lists, to decrease memory usage as it is possible to use smaller defined datatypes. However using list append and then swapping the value in the dictionary proves to be terribly slow. Is there a better way to implement this?
Subscribe to Numpy