Can GPU Based Data Handling Take Over From CPUs In Big Databases?
Hadoop and Spark are two commonly used applications for processing and storage of big data clusters. Each has its own advantage but sometimes you need something a bit more specialized and optimized for even bigger workload. That is where Kinetica, previously known as GPUdb, comes in with its unique and supposedly revolutionizing technology of using GPU based data handling.
The Kinetica database claims to use the power of massively distributed GPUs thereby giving up to 1,000 times better real time analytics performance. While on paper it may sound amazing, whether it works properly or not is another question especially when considering a wider range of big data applications and uses.
Kinetica isn’t even the new kid on the block. It has definitely proven itself in the past by creating the terrorist tracking database used by US government as well as the recent partnership with US postal service to improve operations and reduce mail fraud.
The service pulls in data from more than 213,000 scanning devices across the US to deliver more than 100 billion pieces of mail throughout the year. This is 200 times more data handling than the traditional relational database US Post already had in place.
While seemingly diverse, such workloads strike the sweet spot of GPUs, as Todd Mostak, founder and CEO of MapD, wrote: “GPUs excel at tasks requiring large amounts of arithmetically intense calculations, such as visual simulations, hyper-fast database transactions, computer vision and machine learning tasks.”
GPUs definitely have a huge advantage in certain situations for certain workloads, including deep learning, and that’s where the real trick comes in, figuring out where to apply GPU oriented databases in the system. According to Jared Rosoff, director of engineering at VMware, a single GPU contains thousands of cores optimized for matrix math operations and deep learning is a part of that.
However, when it comes to other processes outside of deep learning and visualization, after factoring in the costs of the database, the tried method of CPU oriented database is usually a better choice. Using CPU power for such tasks tends to be cheaper, thanks to companies like Intel which are efficient at packaging great CPU power while keeping costs low.
There is also the factor of industry support. CPU oriented databases have been in use for ages and no one is ready to switch to GPU based just yet since most softwares are incapable of taking full advantage of the parallelism that GPUs offer.