loading

Logout succeed

Logout succeed. See you again!

ebook img

Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph PDF

pages130 Pages
file size3.77 MB
languageenglish

Preview Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph

Big Data Analytics Big Data Analytics From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph David Loshin AMSTERDAM(cid:129)BOSTON(cid:129)HEIDELBERG(cid:129)LONDON NEWYORK(cid:129)OXFORD(cid:129)PARIS(cid:129)SANDIEGO SANFRANCISCO(cid:129)SINGAPORE(cid:129)SYDNEY(cid:129)TOKYO MorganKaufmannisanimprintofElsevier MorganKaufmannisanimprintofElsevier 225WymanStreet,Waltham,MA02451,USA Copyrightr2013ElsevierInc.Allrightsreserved Firstpublished2013 Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans, electronicormechanical,includingphotocopying,recording,oranyinformationstorageand retrievalsystem,withoutpermissioninwritingfromthepublisher.Detailsonhowtoseek permission,furtherinformationaboutthePublisher’spermissionspoliciesandourarrangements withorganizationssuchastheCopyrightClearanceCenterandtheCopyrightLicensingAgency, canbefoundatourwebsite:www.elsevier.com/permissions. Thisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythe Publisher(otherthanasmaybenotedherein). Notices Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchandexperience broadenourunderstanding,changesinresearchmethodsorprofessionalpractices,maybecome necessary. Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgein evaluatingandusinganyinformationormethodsdescribedherein.Inusingsuchinformationor methodstheyshouldbemindfuloftheirownsafetyandthesafetyofothers,includingpartiesfor whomtheyhaveaprofessionalresponsibility. Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors, assumeanyliabilityforanyinjuryand/ordamagetopersonsorpropertyasamatterofproducts liability,negligenceorotherwise,orfromanyuseoroperationofanymethods,products, instructions,orideascontainedinthematerialherein. LibraryofCongressCataloging-in-PublicationData AcatalogrecordforthisbookisavailablefromtheLibraryofCongress BritishLibraryCataloguing-in-PublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary ISBN:978-0-12-417319-4 ForinformationonallMKpublications visitourwebsiteatwww.mkp.com PrintedintheUnitedStatesofAmerica 131415161710987654321 FOREWORD In the summer of 1995, I attended my first conference for information technology professionals. The event, called Interex, was an annual con- vention for users of the HP 3000, Hewlett-Packard’s midrange business computer system known at the time for its reliability—and a devoted user base. More than 10,000 of these users gathered in Toronto that August to swap tips and pester HP executives for information about the future of their beloved system. After spending several days talking with these IT managers, I came away with two observations: 1. The managers were grappling with rising technology demands from business executives and office workers who wanted more out of their IT investments. 2. While everyone in the Toronto Convention Center was talking about the HP 3000, there was a 100-foot-tall “WINDOWS ‘95” banner hanging from the nearby CN Tower, a prominent landmark in the city’s skyline. That banner was not part of the Interex event. But it would not be long before Microsoft’s new desktop operating system would influence the work of just about everyone who used a PC. It has always been this way. Innovations in computer hardware, communications technology, and software development regularly enter to challenge IT professionals to adapt to new opportunities and associ- ated challenges. The latest edition of this recurrent dynamic is big data analytics, which takes advantage of advances in software programming, open source code, and commodity hardware to promise major gains in our ability to collect and analyze vast amounts of data—and new kinds of data—for fresh insights. The kinds of techniques that allow Google to index the Web, Facebook to build social graphs, and Netflix to rec- ommend movies can be applied to functions like marketing (what’s the next best offer for Marjorie?), risk management (the storm is tracking near our warehouse, better move the goods today), and equipment maintenance (the sensor says it’s time to replace that engine part). x Foreword Those possibilities and many others have generated much interest. Venture capital is flowing to startups as database architects are cool again. Leaders in health care, finance, insurance, and other industries are racing to hire talented “data scientists” to develop algorithms to discover competitive advantages. Universities are launching master’s programs in analytics in response to corporate demands and a pro- jected skills gap. Statisticians are joining celebrity ranks, with one sparking cable news debates about presidential election predictions and another starring in TED Talk videos on data visualization design. There is so much going on, in fact, that a busy IT professional look- ing for relevant help could use a personal guide to explain the issues in a style that acknowledges some important conditions about their world: They likely have a full list of ongoing projects. Their organiza- tion has well-defined IT management practices. They stipulate that adopting new technologies is not easy. This is the kind of book you are reading now. David Loshin, an experienced IT consultant and author, is adept at explaining how tech- nologies work and why they matter, without technical or marketing jargon. He has years of practice both posing and answering questions about data management, data warehousing, business intelligence, and analytics. I know this because I have asked him. I first met David in 2012 at an online event he moderated to explain issues involved in making big data analytics work in business. I sought him out to discuss the issues in more detail as the editor of Data Informed (http://data-informed.com/) an online publication that chronicles these trends and shares best prac- tices for IT and business professionals. Those early conversations led to David writing a series of articles for Data Informed that forms the basis for this book, on issues ranging from the market and business drivers for big data analytics, to use cases for these emerging technologies, to strategies for assessing their relevance to your organization. Along the way, David and I have found ourselves agreeing about a key lesson from his years of working in IT (or, in my case, reporting on it): New big data analytics technologies are exciting, and represent a great opportunity. But making any new technology work effectively requires understanding the tools you need, having the right people Foreword xi working together on common goals, and establishing the right business processes to create value from the work. Theteachingsinthisbookgobeyondthisstraightforwardthree-legged stool of technologies(cid:1)skills(cid:1)processes. At the end of each chapter, there are “thought exercises” that challenge you to consider the technology, business, and management concepts in the context of your organization. This is where David provides you the opportunity to answer the kinds of questions that will help you evaluate next steps for making the technologiescoveredherevaluabletoyou. These are like signposts to direct your work in adapting to the big data analytics field. It’s much better than a 10-story banner blaring to a city that your world is about to change. Here, the signs come with full explanations and advice about how to make that change work for you. Michael Goldberg PREFACE INTRODUCTION In technology, it seems, what comes around goes around. At least in my experience, it certainly seems that way. Over recent times, the con- cepts of “big data” and “big data analytics” have become ubiqui- tous—it is heard to visit a web site, open a newspaper, or read a magazine that does not refer to one or both of those phrases. Yet the technologies that are incorporated into big data—massive parallelism, huge data volumes, data distribution, high-speed networks, high- performance computing, task and thread management, and data min- ing and analytics—are not new. During the first phase of my career in the late 1980s and early 1990s I was a software developer for a company building program- ming language compilers for supercomputers. Most of these high-end systems were multiprocessor systems, employed massive parallelism, and were driven by (by the standards of the times, albeit) large data sets. My specific role was looking at code optimization, particularly focusing on increasing data bandwidth to the processors and taking advantage of the memory hierarchies upon which these systems were designed and implemented. And interestingly, much of the architec- tures and techniques used for designing hardware and developing soft- ware were not new either—much credit goes to early supercomputers such as the Illiac IV, the first massively parallel computing system that was developed in the early 1970s. That is why the big data phenomenon is so fascinating to me: not the appearance of new technology, but rather how known technology finally comes into the mainstream. When the details of technology that was bleeding edge 20 years ago appear regularly in The New York Times, The Wall Street Journal, and The Economist, you know it has finally arrived. xiv Preface THE CHALLENGE OF ADOPTING NEW TECHNOLOGY Many people have a natural affinity to new technology—there is often the perception that the latest and shiniest silver bullet will not only eliminate all the existing problems in the organization will but also lead to the minting of a solid stream of gold coins enriching the entire organization. And in those organizations that are not leading the revo- lution to adoption, there is the lingering fear of abandonment—if they don’t adopt the technology they will be left far behind, even if there is no clear value proposition for it in the first place. Clearly, it would be unwise to commit to a new technology without assessing the components of its value—expected value driver “lift,” as compared to the total cost of operations. Essentially, testing and pilot- ing new technology is necessary to maintain competitiveness and ensure technical feasibility. But in many organizations, the processes to expeditiously mainstream new techniques and tools often bypass exist- ing program governance and corporate best practices. The result is that pilot projects are prematurely moved into “production” are really just point solutions relying on islands of data that don’t scale from the performance perspective nor fit into the enterprise from an architec- tural perspective. WHAT THIS BOOK IS The goal of this book is to provide a firm grounding in laying out a strategy for adopting big data techniques. It is meant to provide an overview of what big data is and why it can add value, what types of problems are suited to a big data approach, and how to properly plan to determine the need, align the right people in the organization, and develop a strategic plan for integration. On the other hand, this book is not meant as a “how-to” for big data application development, MapReduce programming, or imple- menting Hadoop. Rather, my intent is to provide an overview within each chapter that addresses some pertinent aspect of the ecosystem or the process of adopting big data: (cid:129) Chapter 1: We consider the market conditions that have enabled broad acceptance of big data analytics, including commoditization Preface xv of hardware and software, increased data volumes, growing varia- tion in types of data assets for analysis, different methods for data delivery, and increased expectations for real-time integration of ana- lytical results into operational processes. (cid:129) Chapter 2: In this chapter, we look at the characteristics of business problems that traditionally have required resources that exceeded the enterprises’ scopes, yet are suited to solutions that can take advantage of the big data platforms (either dedicated hardware or virtualized/cloud based). (cid:129) Chapter 3: Who in the organization needs to be involved in the pro- cess of acquiring, proving, and deploying big data solutions? And what are their roles and responsibilities? This chapter looks at the adoption of new technology and how the organization must align to integrate into the system development life cycle. (cid:129) Chapter 4: This chapter expands on the previous one by looking at some key issues that often plague new technology adoption and show that the key issues are not new ones and that there is likely to be organizational knowledge that can help in fleshing out a reason- able strategic plan. (cid:129) Chapter 5: In this chapter, we look at the need for oversight and governance for the data, especially when those developing big data applications often bypass traditional IT and data management channels. (cid:129) Chapter 6: In this chapter, we look at specialty-hardware designed for analytics and how they are engineered to accommodate large data sets. (cid:129) Chapter 7: This chapter discusses and provides a high-level overview of tool suites such as Hadoop. (cid:129) Chapter 8: This chapter examines the MapReduce programming model. (cid:129) Chapter 9: In this chapter, we look at a variety of alternative meth- ods of data management methods that are being adopted for big data application development. (cid:129) Chapter 10: This chapter looks at business problems suited for graph analytics, what differentiates the problems from traditional approaches and considerations for discovery versus search analyses. (cid:129) Chapter 11: This short final chapter reviews best practices for incre- mentally adopting big data into the enterprise. xvi Preface WHY YOU SHOULD BE READING THIS BOOK You have probably picked up this book for one or more of these very good reasons: (cid:129) You are a senior manager seeking to take advantage of your organi- zation’s information to create or add to corporate value by increas- ing revenue, decreasing costs, improving productivity, mitigating risks, or improving the customer experience. (cid:129) You are the Chief Information Officer or Chief Data Officer of an organization who desires to make the best use of the enterprise information asset. (cid:129) Youareamanagerwhohasbeenaskedtodevelopabigdataprogram. (cid:129) You are a manager who has been asked to take over a floundering big data application. (cid:129) You are a manager who has been asked to take over a successful big data program. (cid:129) You are a senior business executive who wants to explore the value that big data can add to your organization. (cid:129) You are a business staff member who desires more insight into the way that your organization does business. (cid:129) You are a database or software engineer who has been appointed a technical manager for a big data program. (cid:129) You are a software engineer who aspires to be the manager of a big data program. (cid:129) You are an analyst of engineer working on a big data framework who aspires to replace your current manager. (cid:129) You are a business analyst who has been asked to join a big data application team. (cid:129) You are a senior manager and your directly reporting managers have started talking about big data using terminology you think they expect you to understand. (cid:129) You are a middle-level manager or engineer and your manager has started talking about big data using terminology you think they expect you to understand. (cid:129) You are just interested in nig. How do I know so much about you? Because at many times in my life, I was you—either working on or managing a project for which I had some knowledge gaps, for an organization full of people not sure of why they were doing, what they were doing, with very few clear

See more

The list of books you might like