matrix - Very large matrices using Python and NumPy -
is a very useful library, and by using it I have found that it is capable of controlling the matrix which is quite large (10000 x 10000) is easily but struggling with huge (trying to make a matrix of 50000 x 50000), obviously, it is largely due to memory requirements.
Is there a way to create basically heavy metrics in NumPy (without many terabytes of RAM) in some way (1 million to 1 million)
PyTables and NumPy are the way to go.
PyTables will store data in HDF format on disk, with optional compression. My datasets often get 10x compression, which is easy to deal with tens or millions of rows. It is very fast; My 5-year laptops can decrease through a SQL-like group aggregation data in 1000,000 rows / seconds through a python-based solution is not bad!
Again it is easy to access the data as a NumPy recara:
data = table [row_from: row_to]
HDF Library takes care of reading and related to NumPy changes in related parts of the data.
Comments
Post a Comment