Lack of talent should never hold back advanced analytics in Hadoop or the cloud
In every sector, one thing is certain. Enterprises want to be data-driven but are constrained by the lack of human talent.
There are few areas where this shortage is more frustrating than in the use of Hadoop, the distributed framework that allows for processing of huge data sets across clusters of commodity computers using simple programming models.
This shortage of skills has become all the more acute as the use of Amazon Web Services (AWS) has grown and with it the development of the multi-tenant environment, says Mike Whelan, director Technology – International Big Data Centre of Excellence, Teradata.
The dearth of expertise becomes a barrier when enterprises want to integrate advanced analytics into their existing infrastructure. Massive volumes of data are being generated by sensors and mobile devices, resulting in a customised architecture that spans Hadoop and the cloud.
Organisations want to use this data as quickly and efficiently as possible to solve business-critical problems such as customer-churn, provision of next-best offers, the mapping of multi-channel behaviour and the development of “customer360” use-cases. Yet advanced and algorithmic analytics to find new insights and predict future outcomes requires hard-to-find data science skills.
The skills shortage is relative of course, but in the last couple of years, a lot has been said and written on this point. Gartner, in their survey last year, for example, found that the skills gap continued to be a major barrier to adoption of Hadoop for 57% of respondents.
With these looming difficulties in mind, much attention has been given to the tools required to extract insights from the mass of data stored in Hadoop.
From the early days of MapReduce and Hive, to newer SQL-on-Hadoop tools like Presto, to the rise of Apache Spark, good, iterative steps have been taken to make it easier.
Now companies like Teradata are taking this much further, giving businesses the flexibility to accelerate revenue-generating insights from their data, wherever it resides, including AWS.
In Hadoop, this is achieved by executing analytics natively inside the cluster, integrated fully with YARN and managed by the YARN resource manager. This is an important step in raising the use and value of Hadoop for enterprises that have invested so much in the infrastructure.
In AWS the new approach makes it possible to provision an analytic environment and experiment with advanced analytics without large capital expenditure, speeding up the time to value.
The result is that enterprises have even more choice, gaining a return from data that they might not be fully realising today.
Enabling work at scale
At its essence, it is all about connecting analysts with big data at scale. Putting it baldly, most open-source advanced analytics packages are not designed with business analysts in mind, making it very hard for them to access the data beyond simple business intelligence and reporting use-cases. Although tools have been adapted for use with Hadoop, they are not designed to run on Hadoop and as a result, often require data to be pulled out on to a dedicated platform, making working at scale very difficult.
The new simplified approach to advanced analytics, running natively in Hadoop, gives analysts the expanded SQL repertoire to undertake some seriously valuable tasks.
For example they can conduct path analysis that allows them to understand the customer journey to conversion, or graph analytics that reveal influencer networks. Machine learning can be let loose on sensor data to predict part failures and indeed, many other advanced analytic use-cases can be conducted directly against Hadoop data.
In AWS, the quick provisioning of analytic sandboxes gives businesses the opportunity to used pre-built SQL-based analytics to speed up development and shift models that deliver results, into production in the cloud. Analysts gain hugely in agility since they can employ multi-genre analytics on large data volumes for as long as they require. The business on the other hand, can experiment without the costs associated with new hardware, set-up and implementation.
New approach, new opportunities
These significant new options are opening up all the major revenue-boosting opportunities as enterprises move to a hybrid approach, storing data across Hadoop and the cloud.
It means any enterprise that needs a super-fast data analytics solution on Hadoop or in AWS, can easily and cost-effectively deploy multi-genre capabilities to solve real-world business challenges in ways that were not possible even a short while ago.
The authof of this blog is Mike Whelan, director Technology – International Big Data Centre of Excellence, Teradata.
Comment on this article below or via Twitter: @ VanillaPlus OR @jcvplus