Teradata Takes Bigger Approach to Big Data

vr_Big_Data_Analytics_02_defining_big_data_analyticsTeradata continues to expand its information management and analytics technology for big data to meet growing demand. My analysis last year discussed Teradata’s approach to big data in the context of its distributed computing and data architecture. I recently got an update on the company’s strategy and products at the annual Teradata analyst summit. Our big data analytics research finds that a broad approach to big data is wise: Three-quarters of organizations want analytics to access data from all sources and not just one specific to big data. This inclusive approach is what Teradata as designed its architectural and technological approach in managing the access, storage and use of data and analytics.

Teradata has advanced its data warehouse appliance and database technologies to unify in-memory and distributed computing with Hadoop, other databases and NoSQL in one architecture; this enables it to move to center stage of the big data market. Teradata Intelligent Memory provides optimal accessibility to data based on usage characteristics for DBAs, analysts and business users consuming data from Teradata’s Unified Data Architecture (UDA). Teradata also introduced QueryGrid technology, which virtualizes distributed access to and processing of data across many sources, including the Teradata range of appliances, Teradata Aster technology, Hadoop through its SQL-H, other databases including Oracle’s and data sources including the SAS, Perl, Python and even R languages. Teradata can provide push-down processing of getting data and analytics processed through parallel execution in its UDA including data from Hadoop. Teradata QueryGrid data virtualization layer can dynamically access data and compute analytics as needed making it versatile to meet a broadening scope of big data needs.

Teradata has embraced Hadoop through a strategic relationship with Hortonworks. Its commercial distribution, Teradata Open Distribution for Hadoop (TDH) 2.1, and originates from Hortonworks. It recently announced Teradata Portfolio for Hadoop 2, which has many components. There is also a new Teradata Appliance for Hadoop; this is its fourth-generation machine and includes previously integrated and configured software with the hardware and services. Teradata has embraced and integrated Hadoop into its UDA to ensure it is a unified part of its product portfolio that is essential as Hadoop is still maturing and is not ready to operate in a fully managed and scalable environment.

Teradata has enhanced its existing portfolio of workload-specific appliances. It includes the Integrated Big Data Platform 1700, which handles up to 234 petabytes, the Integrated Data Warehouses 2750 for up to 21 petabytes for scalable data warehousing and the 6750 for balanced active data warehousing. Each appliance is configured for enterprise-class needs, works in a multisystem environment and supports balancing and shifting of workloads with high availability and disaster recovery. They are available in a variety of ratios including disks, arrays and nodes, which makes them uniquely focused for enterprise use. The appliances run version 15 of the Teradata database with Teradata Intelligent Memory and interoperate through integrated workload management. In a virtual data warehouse the appliances can provide maximum compute power, capacity and concurrent user potential for heavy work such as connecting to Hadoop and Teradata Aster. UDA enables distributed management and operations of workload-specific platforms to use data assets efficiently. Teradata Unity now is more robust in moving and loading data, and Ecosystem Manager now supports monitoring of Aster and Hadoop systems across the entire range of data managed by Teradata.

Teradata is entering the market for legacy SAP applications with Teradata Analytics for SAP, which provides integration and data models across lines of business to use logical data from SAP applications more efficiently. Teradata acquired this product from a small company in last year; it uses an approach common among data integration technologies today and can make data readily available through new access points to SAP HANA. The product can help organizations that have not committed to SAP and its technology roadmap, which proposes using SAP HANA to streamline processing of data and analytics from business applications such as CRM and ERP. For others that are moving to SAP, Teradata Analytics for SAP can provide interim support for existing SAP applications.

Teradata continues to advance JavaScript Object Notation (JSON) integration for support of document-oriented databases that are schemaless and semistructured. JSON has become a critical tool as more applications need to store and access data efficiently. NoSQL databases have become more popular recently: 25 percent of organizations in our big data analytics research are using them today, 20 percent  plan to use them within two years, and another 23 percent are evaluating NoSQL. With this focus Teradata provides for its customers application and operational support beyond just supporting data for analytic purposes.

Teradata continues expansion of its Aster Discovery Platform to process analytics for discovery and exploration and also advances visualization and interactivity with analytics, which could encroach on partners that provide advanced analytics capabilities like discovery and exploration. Organizations looking for analytic discovery tools should consider this technology overlap. Teradata provides a broad and integrated big data platform and architecture with advanced resource management to process data and analytics efficiently. In addition it provides archiving, auditing and compliance support for enterprises. It can support a range of data refining tasks including fast data landing and staging, lower workload concurrency, and multistructured and file-based data.

Teradata efforts are also supported in what I call a big data or data warehouse as a service and is called Teradata Cloud. Its approach is can operate across and be accessed from a multitenant environment where it makes its portfolio of Teradata, Aster and Hadoop available in what they call cloud compute units. This can be used in a variety of cloud computing approaches including public, private, hybrid and for backup and discovery needs. It has gained brand name customers like BevMo and Netflix who have been public references on their support of Teradata Cloud. Utilizing this cloud computing approach eliminates the need for placing Teradata appliances in the data center while providing maximum value from the technology. Teradata advancements in cloud computing comes at a perfect time where our information optimization research finds that a quarter of organizations now prefer a cloud computing approach with eight percent prefer it to be hosted by a supplier in a specific private cloud approach.

vr_Info_Optimization_10_reasons_to_change_information_availabilityWhat makes Teradata’s direction unique is moving beyond its own appliances to embrace the enterprise architecture and existing data sources; this makes it more inclusive in access than other big data approaches like those from Hadoop providers and in-memory approaches that focus more on themselves than their customers’ actual needs. Data architectures have become more complex with Hadoop, in-memory, NoSQL and appliances all in the mix. Teradata has gathered this broad range of database technology into a unified approach while integrating its products directly with those of other vendors. This inclusive approach is timely as organizations are changing how they make information available, and our information optimization benchmark research finds improving operational efficiency (for 67%) and gaining a competitive advantage (63%) to be the top two reasons for doing that. Teradata’s approach to big data helps broaden data architectures, which will help organizations in the long run. If you have not considered Teradata and its UDA and new QueryGrid technologies for your enterprise architecture, I recommend looking at them.

Regards,

Mark Smith

CEO & Chief Research Officer

Cloudera Makes Hadoop a Big Player in Big Data

I had the pleasure of attending Cloudera’s recent analyst summit. Presenters reviewed the work the company has done since its founding six years ago and outlined its plans to use Hadoop to further empower big data technology to support what I call information optimization. Cloudera’s executive team has the co-founders of Hadoop who worked at Facebook, Oracle and Yahoo when they developed and used Hadoop. Last year they brought in CEO Tom Reilly, who led successful organizations at ArcSight, HP and IBM. Cloudera now has more than 500 employees, 800 partners and 40,000 users trained in its commercial version of Hadoop. The Hadoop technology has brought to the market an integration of computing, memory and disk storage; Cloudera has expanded the capabilities of this open source software for its customers through unique extension and commercialization of open source for enterprise use. The importance of big data is undisputed now: For example, our latest research in big data analytics finds it to be very important in 47 percent of organizations. However, we also find that only 14 percent are very satisfied with their use of big data, so there is plenty of room for improvement. How well Cloudera moves forward this year and next will determine its ability to compete in big data over the next five years.

Cloudera’s technology supports what it calls an enterprise data hub (EDH), vr_Big_Data_Analytics_04_types_of_big_data_for_analyticswhich ties together a series of integrated components for big data that include batch processing, analytic SQL, a search engine, machine learning, event stream processing and workload management; this is much like the way relational databases and tools evolved in the past. These features also can deal with the types of big data most often used, according to our research: 40 percent or more use five types, from transactional data (60%) to machine data (42%). Hadoop combines layers of the data and analytics stack from collection, staging and storage to data integration and integration with other technologies. For its part Cloudera has a sophisticated focus on both engineering and customer support. Its goal is to enable enterprise big data management that can connect and integrate with other data and applications from its range of partners. Cloudera also seeks to facilitate converged analytics. One of these partners, Zoomdata, demonstrated the potential of big data analytics in analytic discovery and exploration through its visualization on the Cloudera platform; its integrated and interactive tool can be used by business people as well as professionals in analytics, data management and IT.

Cloudera latest major release with Cloudera Enterprise 5 brought a range of enterprise advancements from in-memory processing, vr_Big_Data_Analytics_11_implementing_analytics_through_hadoopresource management, data management, data protection to name a few. Cloudera offers a range of product options that they announced to make it easier to embrace their Hadoop technology. Cloudera Express is its free version of Hadoop, and it provides three editions licensed through subscription: basic, flex and data hub. The Flex Edition of Cloudera Enterprise has support for analytic SQL, search, machine learning, event stream processing and online NoSQL through the Hadoop components HBase, Impala, Spark and Navigator; a customer organization can have one of these per Hadoop cluster. The Enterprise Data Hub (EDH) Edition enables use of any of the components in any configuration. Cloudera Navigator is a product for managing metadata, discovery and lineage, and in 2014 it will add search, annotation and registration on metadata. Cloudera uses Apache Hive to support SQL through HiveQL, and Cloudera Impala provides a unique interface to the Hadoop file system HDFS using SQL. This is in line with what our research shows organizations prefer: More than half (52%) use standard SQL to access Hadoop. This range of choices in getting to data within Hadoop helps Cloudera’s customers realize a broad range of uses that include predictive customer care, market risk management, customer experience and other areas where very large volumes of information can be applied for applications that were not cost-effective before. With EDH Edition Cloudera can compete directly with large players IBM, Oracle, SAS and Teradata, all of which have ambitions to provide the hub of big data operations for enterprises.

Having open source roots, community is especially important to Hadoop. vr_Big_Data_Analytics_07_dissatisfaction_with_big_data_analyticsPart of building a community is providing training to certify and validate skills. Cloudera has enrolled more than 50,000 professionals in its Cloudera University and works with online learning provider Udacity to increase the number of certified Hadoop users. It also has developed academic relationships to promote Hadoop skills being taught to computer science students. Our research finds that this sort of activity is necessary: The most common challenge in big data analytics processes for two out of three (67%) organizations is not having enough skilled resources; we have found similar issues in the implementation and management of big data. The other aspect of a community is to enlist partners that offer specific capabilities. I am impressed with Cloudera’s range of partners, from OEMs and system integrators to channel resellers such as Cisco, Dell, HP, NetApp and Oracle to support in the cloud from Amazon, IBM, Verizon and others.

To help it keep up Cloudera announced it has raised another $160 million from the likes of T. Rowe Price, Michael Dell Ventures and Google Ventures to add to financing from venture capital firms. With this funding Cloudera outlined its investment focus for 2014 which will concentrate on advancing database and storage, security, in-memory computing and cloud deployment. I believe that it will need to go further to meet the growing needs for integration and analytics and prove that it can provide a high-value integrated offering directly as well as through partners. Investing in its Navigator product also is important, as our research finds that quality and consistency of data is the most challenging aspect of the big data analytics process in 56 percent of organizations. At the same time, Cloudera should focus on optimizing its infrastructure for the four types of data discovery that are required according to our analysis.

Cloudera’s advantage is being the focal point in the Hadoop ecosystem while others are still trying to match its numbers in developers and partners to serve big data needs. Our research finds substantial growth opportunity here: Hadoop will be used in 30 vr_Info_Optimization_12_big_data_is_widely_usedpercent of organizations through 2015 and another 12 percent are planning to evaluate it. Our research also finds a significant lead for Cloudera in Hadoop distributions, but other options like Hortonworks and MapR are growing. The research finds that the most of these organizations are seeking the ability to respond faster to opportunities and threats; to do that they will need to have a next generation of skills to apply to big data projects. Our research in information optimization finds that over half (56%) of organizations are planning to use big data and Hadoop will be a key focus for those efforts. Cloudera has a strong position in the expanding big data market because it focuses on the fundamentals of information management and analytics through Hadoop. But it faces stiff competition from the established providers of RDBMSs and data appliances that are blending Hadoop with their technology as well as from a growing number of providers of commercial versions of Hadoop. Cloudera is well managed and has finances to meet these challenges; now it needs to be able to show many high-value production deployments in 2014 as the center of business’s big data strategies. If you are building a big data strategy with Hadoop, Cloudera must be in the evaluation priority for an organization.

Regards,

Mark Smith

CEO & Chief Research Officer