Recently, I had the opportunity to speak in detail with Julie Smith and Audrey Hammonds of Innovative Architects about Azure Data Lake. Julie and Audrey are both Microsoft Data Platform MVPs and contributors to the deeply informative Datachix Blog.
Azure’s Data Lake is leading the Big Data in the Cloud innovation. Imagine a future where decision makers are able to ask high-level questions and computers are able to answer in complex and intuitive ways. The access and processing capabilities of Big Data will completely change the way we do business.
We now have the tools needed to answer questions we haven’t even asked.
How it all began
Database architects need to be very selective in how they invest their time learning new technologies. They have to decide which new solutions offer customers a different value from the established platforms, which technologies will see good adoption rates, and will be viable for the long-term. Azure Data Lake was designed with these priorities in mind.
Azure Data Lake is a very young service for Microsoft, having only entered public preview in October of 2015. “It looks a lot more sophisticated than what you would expect to see with a brand new product,” says Julie. “To me, it’s pretty clear, this actually has had a lot of love.” Upon investigation, this unexpected level of product maturity makes sense. Azure Data Lake grew from Microsoft’s experience building and using their internal Big Data solution called Cosmos, which boasted clusters as large as 50,000 servers processing hundreds of petabytes of data daily.
Here’s what Julie and Audrey said really makes Azure Data Lake stand out:
- It’s more robust and sophisticated than competing products, particularly for something so new
- It features nicer tooling, including user interfaces to load data, run queries, and perform export analysis
- The new query language, U-SQL, has a familiar feel to anyone who knows C# or SQL
- It’s packaged to feel accessible to traditional application and database developers.
“I think what Azure Data Lake solves is the barrier to entry for Big Data.”
– Julie Smith, Consultant at Innovation Architect
Proponents of Big Data have been trying to get their companies to implement large scale analytics in any kind of iteration, but many companies just don’t know how to get started. Julie explained it this way, “At the core, Azure Data Lake is really just HDInsight, which is Hadoop, but what it has now is a familiar-feeling place to start. Just like SSMS for relational data, with Azure Data Lake you don’t have to think about infrastructure and architecture like you would if you were implementing Hadoop for yourself.“
She went on to say “Hadoop can be intimidating because new users are faced with a blank console window where you have to know how to submit query jobs in a Java-like language. Azure Data Lake has an amazing way where you can create queries with Intellisense and a you can just start using it without having to deal with all that overwhelming brand new technology.”
As for the infrastructure benefits that Azure Data Lake has compared to any on-premises solution, Audrey says “We’re thinking not only about how to write the code or move the data or store the data, but we’re thinking about how to fit into a business process. We’re thinking about our end users and about their budgets. When you’re in a situation where you have a whole new unfamiliar technology landscape you fall into a very tactical way of thinking: servers and infrastructure and memory and processors and disk space.”
“I think what Azure Data Lake will bring to the Big Data community is the ability to bypass the tactical concerns and focus on solving business problems immediately.”
– Audrey Hammonds, Consultant at Innovation Architect
I asked Julie and Audrey if Azure Data Lake will completely replace the need for creating any new data warehouses. Audrey replied, “A data warehouse is for answering the questions you know you have. A data lake is for answering the questions you don’t know you have, yet. A data lake is not for everyone because an organization may not be able to make the ongoing investment in human resources to be able to effectively make use of it.”
“We see business users out there who are intimidated by self-service BI and a data lake is like self-service BI on steroids.”
– Audrey Hammonds, Consultant at Innovation Architect
The most technical portions of owning a data warehouse come at the beginning, so a consulting firm can create a data warehouse project with a defined beginning, middle, and end and leave their customer with a product of well-defined ongoing value. In the short term, Azure Data Lake may be a slightly harder value proposition for Microsoft Partners as long as “answering the questions you don’t know you have yet” remains in the specialized domain of data scientists that customers may not have on staff. It’s true that without a data scientist, a data lake becomes a data vault.
Big Data: the final frontier
We all agreed, this is where the next pieces of the Azure data story will come in. Imagine putting Azure Data Lake together with evolving technologies in:
- Power BI, for intuitive self-service reporting
- Cortana Analytics Suite, with machine learning and natural-language query
- Project GigJam, for presenting AI-created LOB applications from apparently unrelated datasets
It’s not hard to picture a future that looks a lot like what was promised on Star Trek, where decision makers ask high-level questions of computers who can give answers which seem to require leaps of intuition, but really they come from the ability to process and connect massive amounts of information. When that happens, everyone will have the tools needed to answer the questions they don’t know they have yet.
Take the next step
Have you thought about how you can help your customers tackle their Big Data problems with Azure Data Lake? We’d love to hear your thoughts, please comment below.