Big Data is Broken Without Integration

Big data involves interplay between different data management approaches and business intelligence and operational systems, which makes it imperative that all sources of business data be integrated efficiently and that organizations be able to easily adapt to new data types and sources. Our recent big data benchmark research confirmed that big data storageBig Data Technologies Plannedtechnologies continue to follow many approaches, including appliances, Hadoop, and in-memory and specialized DBMSes. With the variety, velocity and volume of big data being part of today’s information architecture, and the potential for big data to be a source to feed other systems, integration should be a top priority.

Many organizations that have already deployed big data technology now struggle to access, transform, transport and load information using conventional technology. Even replication or migration of data from existing sources can be troublesome, requiring custom programming and manual processing, which are always a tax on resources Barriers to Information Managementand time. Barriers such as having data spread across too many applications and systems, which our benchmark research found in 67 percent of organizations, do not go away just because an organization is using big data technology; in fact, they get more complicated. However, big data also creates opportunities to use information to innovate and to improve business processes. To avoid the risks and take advantage of the opportunities, organizations need efficient processes and effective technology that makes information drawn from big data available to all people who need it.

Organizations need integration technology flexible enough to handle big data regardless of whether it originates in the enterprise or across the Internet. For this reason, tools for big data integration must be able to work with a range of underlying architectures and data technologies, including appliances, flat files, Hadoop, in-memory computing and conventional databases, and move data seamlessly between relational and non-relational structures. They must be able to adapt to events or streams of data, and they must harvest data from transactional systems and business applications in enterprise data warehouses. Supporting data quality and master data management needs is also part of supporting big data with data integration.

Selecting the right approach to big data integration is difficult when organizations lack knowledge of the functional requirements and best practices relevant to their industries, lines of business and IT. Deficiencies in existing software and data environments can further complicate the ability to choose wisely and so should be factored into the deployment decision-making process. Organizations must identify the types of integration being used or under consideration to handle data other than that formatted for relational databases, and evaluate processing capabilities and techniques to handle the proliferation of big data. IT professionals therefore must understand how to work with analysts and business management to deliver timely, benefit-based big data deployments.Barriers to Innovative Technology

IT should evaluate whether it can use existing skills to shorten the time it takes to get big data to users. Sinceour research has found lack of resources to be the top barrier to using innovative technology, according to 51 percent of organizations, businesses should make sure their IT staff does everything possible to maximize skills and resources internally and not waste them on custom, manual siloes of effort. Having the right data integration processes and data management methods can help IT work more efficiently and partner better with the business units.

Not having a dialogue about what information management competencies a business needs is a mistake. I have seen most IT industry analyst firms’ content deal with just a portion of the big data picture, discussing for example just the technologies for storing and accessing data, with a fixation on variety, velocity and volume. However, decision-makers must consider the efficient flow of data across its entire path of travel, from its origins to user systems, to ensure the effective functioning of any big data project. Failure to do that means failing to optimize information across its life-cycle for business value. Without the ability see the entire big data value chain, a business may find its initiatives exceed available limits of cost and time and damage a business case built on time-to-value metrics. According to our research, the most important benefits of big data technologies include retaining and analyzing more data (74%) Benefits of Big Data Technologiesand increasing the speed of analysis (70%). Organizations need to make sure they do not increase the number of manual processes they run and the time spent on them, thus impairing the value of big data.

We have begun research to assess the latest big data integration technologies and best practices to help advance these efforts, as we outlined in our research agenda on big data and information optimization for 2013. We will document emerging best practices in big data integration to meet business needs, from basic access and replication to transformational migration. Until we can share our results, be sure to consider big data integration as part of your business case and project, because it is essential to gaining the most value from your big data investments.


Mark Smith

CEO & Chief Research Officer

Big Data Search is Getting Better with LucidWorks

LucidWorks addresses the growing volume of information now being stored in the enterprise and in big data with two products aimed at the enterprise with search technology. Though you may not be familiar with LucidWorks (previously known as Lucid Imagination), the company has for many years contributed to Apache Lucene, an open source search project, and commercialized and supported for it for business.

Search is a necessary enterprise application, but it is often not deployed successfully, for many reasons, some to do with the lack of focus and priority by IT and in some cases because a given application is not designed to work well across an organization. Our technology innovation benchmark research finds that search is a critical capability for 38 percent of organizations, and was a top-five capability for business intelligence, ranked as very important in 29 percent of organizations in our next-generation business intelligence research. But few of today’s solutions in BI and business analytics have well-integrated search capabilities that can work against the analytics, let alone with a broader set of information Technology Considerationssources. Inadequate search capabilities hurt applications’ usability, and our latest research finds that usability is the top technology consideration in 64 percent of organizations.

LucidWorks has two product offerings in the search market. LucidWorks Search provides the ability to rapidly set up search and index content using Apache Solr. The company not only provides full commercial-grade support and services and a security framework, but has also improved on Solr’s usability for developers and business users. Solr, built on top of Lucene, is an enterprise platform that provides full-text search, dynamic clustering, geospatial search and other enterprise-class capabilities.

Second, LucidWorks Big Data, which was released late last year, uses the compute and storage capabilities of Hadoop to support larger-scale deployments. LucidWorks also announced more formalized integration with MapR to support it from within its software. LucidWorks has built a big data framework to take advantage of key Apache technologies such as HBase for storage and access, Kafka for distributed publish and subscribe, Mahout for scalable machine learning, Pig for Map-Reduce scripting and ZooKeeper for distributed coordination. Using Hadoop technologies to support big data search could be the tipping point that brings enterprise and Internet-class search to every organization.

LucidWorks provides its products via either on-premises or cloud computing deployment. LucidWorks has embraced cloud computing through Amazon Elastic Compute Cloud (EC2) and Microsoft Windows Azure, making it simple to integrate LucidWorks with existing approaches. As big data deployments evolve, our technology innovation research finds that 43 percent of organizations prefer big data on-premises, while 26 percent prefer it via cloud computing and 19 percent have no preference. Providing choices for organizations is essential Big Data Capabilities Not Availableto meeting the broadest range of needs. Our big data research found searching for data is one of the top big data capabilities not available today in 20 percent of organizations and is needed. In addition, our research found the need for search on mobile technology to be important in 29 percent of organizations, so demonstrations of how LucidWorks can operate across smartphones and tablets will be essential.

According to our research, the market potential for providing search to business is quite high. Business understands the value of time and needs to get information. Unfortunately, the market for enterprise search software has been hampered by lack of interest from IT, and on the part of analyst firms that follow the IT industry but do not measure the needs of business. Historically, search has been relegated to IT systems and not seen as software to help business be more productive.

As organizations begin to build a new generation of applications to leverage big data they should not forget that search is a required component to get to information not only from within an application but also from any related information beyond it. LucidWorks has priced its offering to be affordable for any midsize to large organization and made it easy to set up through its cloud computing offering. Keeping it simple to set up is important, as lack of resources is the highest barrier for use of innovative technology in 51 percent of organizations.

Last year LucidWorks added a new CEO, Paul Doscher, a veteran in the enterprise software and search technology industry, to join its chief technology officer and founder, Grant Ingersoll. Going forward I expect to see LucidWorks search results further integrated into business applications and even business analytics. I hope that the company expands its big data offering beyond just Hadoop. LucidWorks should also get more of its customers to promote the software by talking about their deployments. I will be further researching the need for search as part of our big data and information optimization research in 2013. LucidWorks is definitely a vendor to examine if you are looking to bring enterprise-class search to your organization and big data deployments.


Mark Smith

CEO & Chief Research Officer