The problem you're experiencing comes from the fact that array.append, as implemented in NumPy, actually creates a new copy of the array every time called, which must allocated in the memory completely new, especially if called often and on large arrays this should definitely be avoided. There are two solutions to solve this issue:
1. If you don't know how big the arrays are going to be, you can use python lists during the processing and then convert them to NumPy arrays afterwards. Python lists alter in memory and are therefore much more performant for using append. The downside of this is that you have larger memory usage, as you can't use NumPys more efficient data types.
2. If you know how big the arrays are going to be you can preallocate the whole array, by using the zeros method of NumPy and create sort of an empty array which you then add the values to as you progress. This may requires changing of your processing algorithm and therefore could be slower, but it can be way more memory efficient.
Thank you for this explanation! I had the same problem in a course this autumn & your explanation help alot
Thanks for bringing this up, as you explained the list filling process stays within the list itself, and no new lists are generated, that is why lists can be faster. However, as you said depending on the data size sometimes NumPy arrays can act faster too.
Comments