Big data refers to tasks that deal with so much data that processing it requires special programming techniques. For instance, some have traditionally used this term to refer to situations in which the data exceeds the amount of RAM or memory accessible on a single machine.
Informally this term is used in many contexts, from data too big to fit in a spreadsheet, to data too big to fit on a standard operating environment (SOE) laptop RAM capacity. If a project has enough data such that you are forced to make technical changes to accommodate for it then you still may not have a Big Data problem, there are multiple approaches that one can try first. It is only when a task is completely infeasible to run on one server that you have to investigate heavy duty Big Data techniques.
Due to the rapid decrease in the cost of memory (and processing power) over the years, many situations that were once considered Big Data problems can now fit on a single server instance. We have been involved in projects where we provisioned servers with a terabyte of RAM in order to do in-memory processing of data. Sometimes a bigger server is cheaper than the costs of adopting a legitimate Big Data pipeline, and sometimes it is not. We can help you with these sorts of decisions, feel free to contact us to discuss your needs in this space.
You want to deal with bulk data from your Python program. You realise that looping over every cell of a huge array from your Python code would be silly. You also would like the convenience of many kinds of canned routine to transform your data easily and efficiently. Enter NumPy!Published on September 16th, 2018 by Nick Downing.