Waterline Data is a Hortonworks Technology Partner and recently earned HDP Certification and YARN Ready with their solution that automates the inventory of data assets in the data lake, enables data governance, and provides self-service to data engineers and data scientists to find and understand their data. Learn more by joining the upcoming webinar on May 6, download the Sandbox tutorial or joint whitepaper. Our guest blogger is Oliver Claude, CMO at Waterline Data.
Apache Hadoop promises to unlock new business value for enterprises. Hadoop provides a powerful platform for data science and analytics, where data engineers and data scientists can leverage myriad data from external and internal data sources to uncover new insight. Data stored in Hadoop is available via a centralized architecture allowing access from any application and for any user. This type of deployment is often called a data lake.
Such power is also presenting a few new challenges, in particular as data lakes grow. On the one hand, the business wants more and more self-service, and on the other hand IT is trying to keep up with the demand for data, while maintaining architecture and data governance standards. In other words, there is a need to combine self-service with automation and governance.
Automation and Governance
The metaphor that comes to mind that illustrates such a solution is Amazon.com.
Amazon.com is supported by a complete and automated inventory and catalog of all the products. Amazon.com also makes it very easy for users to find, understand, and get the products they want. Lastly, there is end-to-end governance to ensure accurate product information and secure transactions.
At Waterline Data, Amazon.com inspired us, and we built a product that is like Amazon.com for data in Hadoop and the Hortonworks Data Platform (HDP).
Waterline Data provides a unique combination of automation and machine learning in order to
- Automatically inventory every file and field in the entire data lake
- Let data engineers and data scientists find and understand the best suited and most trusted data without having to explore each file manually
- Provision the data securely
- Enable data governance throughout including the discovery of data lineage, compliance metadata, and business metadata.
Waterline Data is HDP Certified
Waterline Data also invested in optimizing the product with the Hortonworks Data Platform, and is an HDP Certified Technology Partner. As a result, Waterline Data running on HDP helps turn the data lake into a business-ready data lake, and prevents a data swamp from forming.
Hortonworks Sandbox
You can get hands-on with Waterline Data Inventory over a Hortonworks cluster, by downloading the Waterline on Hortonworks Sandbox and tutorial to find, understand, and govern data in Hadoop.
Download the Sandbox Tutorial with Waterline Data: Manage Your Data Lake More Efficiently.
Join the Webinar
Waterline Data and Hortonworks host an upcoming webinar on May 6 at 10 am PT “Implementing a Data Lake with Enterprise Grade Data Governance.” Register Here.
Learn More
- Download the joint whitepaper: Waterline and Hortonworks on the Modern Data Architecture
- Visit our websites: Hortonworks or Waterline partner website or Waterline website
The post Find, Understand, and Govern Data in Hadoop appeared first on Hortonworks.