InsiderPro November 15, 2020 – Looking at Both Sides of the Data Lake Argument

Some see it as the new data warehouse for the Big Data era, while others see a mess that can easily turn into a swamp.

For many decades, the data warehouse was the go-to technology for storing large amounts of data for querying and data mining. This should not be confused with the venerable database, which has a different mode of operation and use.

The data lake arrived at the same time as the advent of Big Data. The concept was coined in 2010 by James Dixon, founder of Pentaho (now a part of Hitachi Vantara), in a blog post announcing his company’s first Hadoop-based release. He argued that data marts, aka data warehouses, had several problems, such as size restrictions to narrow research parameters.

“If you think of a Data Mart as a store of bottled water, cleansed and packaged and structured for easy consumption, the Data Lake is a large body of water in a more natural state. The contents of the Data Lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples,” he wrote.

Data lakes are often compared to data warehouses but the two are nothing like except for one common element: both are for storing and later analyzing massive amounts of data, and that is all they have in common.

“Is the data lake the new data warehouse? Yes and no,” says Steve Tcherchian, CISO for XYPRO Technology, a cyber security vendor for mission critical apps. “They can be used as data warehouse but if they are not used correctly they are data graveyards.”

To read the full interview visit idginsiderpro.com