What are Common use cases for Hadoop

What are the common use cases for Hadoop?

Hadoop has many operational as well as strategic use cases across multiple industries.
The 3 main categories are:

  1. Transformation: Hadoop helps you transform large amounts of data more quickly, reliably, and affordably (e.g., for loading into the data warehouse).
  2. Active archiving: HDFS gives you access to data that would otherwise be taken offline (e.g., to tape) due to the high cost of actively managing it.
  3. Exploration/Analytics: Hadoop lets you analyze and get value from data that otherwise could not be easily modelled in rigid relational systems

Dodd-Frank Compliance at a bank

A leading retail bank is using Cloudera and Datameer to validate data accuracy and quality to comply with regulations like Dodd-Frank

Problem: The previous solution using Teradata and IBM Netezza was time consuming and complex, and the data mart approach didn’t provide the data completeness required for determining overall data quality.

Solution: A Cloudera + Datameer platform allows analyzing trillions of records which currently result in approximately one terabyte per month of reports. The results are reported through a data quality dashboard.

Trucking data @ US Xpress

US Xpress - one of the largest trucking companies in US - is using Hadoop to store sensor data from their trucks. The intelligence they mine out of this, saves them $6 million / year in fuel cost alone.

Problem: Collecting and and storing 100s of data points from thousands of trucks, plus lots of geo data.

Solution: Hadoop allows storing enormous amount of sensor data. Also Hadoop allows querying / joining this data with other data sets.

NetApp

NetApp collects diagnostic data from its storage systems deployed at customer sites. This data is used to analyze the health of NetApp systems.

Problem: NetApp collects over 600,000 data transactions weekly, consisting of unstructured logs and system diagnostic information. Traditional data storage systems proved inadequate to capture and process this data.

Solution: A Cloudera Hadoop system captures the data and allows parallel processing of data

Storing and processing Medical Records

Problem: A health IT company instituted a policy of saving seven years of historical claims and remit data, but its in-house database systems had trouble meeting the data retention requirement while processing millions of claims every day

Solution: A Hadoop system allows archiving seven years’ claims and remit data, which requires complex processing to get into a normalized format, logging terabytes of data generated from transactional systems daily, and storing them in CDH for analytical purposes