{"id":114,"date":"2014-04-29T07:05:32","date_gmt":"2014-04-29T07:05:32","guid":{"rendered":"http:\/\/www.excelglobalsolution.com\/?p=114"},"modified":"2015-11-19T08:56:43","modified_gmt":"2015-11-19T08:56:43","slug":"hadoop-framework","status":"publish","type":"post","link":"https:\/\/excelglobalsolution.com\/blogs\/?p=114","title":{"rendered":"Hadoop Framework"},"content":{"rendered":"<p><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\"><b>Origin of Hadoop:<\/b><\/span><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\">Now-a-days,\u00a0users of web applications \/ social networking sites are increasing and along with that the usage of data is also getting increased vastly (Terra Bytes and Peta Bytes). To handle that large amounts of data, Doug Cutting, Cloudera\u2019s Chief Architect, helped create Apache Hadoop out of necessity as data from the web exploded, and grew far beyond the ability of traditional systems to handle it.<\/span><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\">Hadoop is the solution found for processing and storing the huge amount of\u00a0data (structured, unstructured, log files, pictures, audio files, communications records, email). It is inexpensive, industry standard servers which are capable of handling processing and storing of data by using the concept of distributed parallel processing of data.<\/span><\/span><\/p>\n<p><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\"><b>Introduction<\/b><\/span><\/span><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\"><b>of Hadoop<\/b><\/span><\/span><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\">:<\/span><\/span><\/p>\n<p><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\"><b>Apache Hadoop<\/b><\/span><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\">Hadoop is a framework which is used for storage and processing of large-scale data-sets on clusters of commodity hardware.\u00a0\u00a0It is a framework written in Java which is originally developed by Doug Cutting.<\/span><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\"><b>Commodity Hardware <\/b><\/span><\/span><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\">means using large numbers of already available computing components (i.e., hard disks or\u00a0memory storage\u00a0components)\u00a0for parallel computing to get the greatest amount of useful computation at low cost.<\/span><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\"><b>The Apache Hadoop framework is composed of the following modules:<\/b><\/span><\/span><\/p>\n<ol style=\"font-family: 'Times New Roman', serif; font-size: medium;\">\n<li style=\"text-align: justify;\"><strong>Hadoop Common<\/strong> \u2013 contains libraries and utilities needed by other Hadoop modules<\/li>\n<li style=\"text-align: justify;\"><strong>Hadoop Distributed File System (HDFS)<\/strong> \u2013 a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster.<\/li>\n<li style=\"text-align: justify;\"><strong>Hadoop YARN<\/strong> \u2013 a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users applications.<\/li>\n<li style=\"text-align: justify;\"><strong>Hadoop MapReduce<\/strong> \u2013 a programming model for large scale data processing.<\/li>\n<\/ol>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\">All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common i.e., individual machines or racks of machines, and thus should be automatically handled in software by the framework.<\/span><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\">Apart from the proposed modules like HDFS, YARN and MapReduce, the entire Apache Hadoop platform is now commonly considered to consist of a number of related projects as well:<\/span><\/span><\/p>\n<ul style=\"font-family: 'Times New Roman', serif; font-size: medium;\">\n<li>Apache Pig<\/li>\n<li>Apache Hive<\/li>\n<li>Apache HBase<\/li>\n<li>Apache Spark, and others.<\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\"><b>Technical Knowledge Requirements<\/b><\/span><\/span><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\">: For Hadoop we need to have the knowledge of Java as Hadoop framework itself is mostly written in the Java programming language, with some native code in C and command line utilities written as shell-scripts. Though MapReduce Java code is common, any programming language can be used with \u201cHadoop Streaming\u201d to implement the \u201cmap\u201d and \u201creduce\u201d parts of the user\u2019s program.<\/span><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\"><b>Install \/ Deploy Hadoop<\/b><\/span><\/span><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\">: Hadoop can be installed in 3 modes:<\/span><\/span><\/p>\n<ol>\n<li style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\"><b>Standalone mode<\/b><\/span><\/span><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\">: To deploy Hadoop in standalone mode, we just need to set path of JAVA_HOME. In this mode there is no need to start the daemons and no need of name node format as data save in local disk.<\/span><\/span><\/li>\n<li style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\"><b>Pseudo Distributed mode<\/b><\/span><\/span><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\">: In this mode all the daemons (nameNode, dataNode, secondaryNameNode, jobTracker, taskTracker) run on a single machine.<\/span><\/span><\/li>\n<li style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\"><b>Distributed mode<\/b><\/span><\/span><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\">: In this mode, daemons (nameNode, jobTracker, secondaryNameNode(Optionally)) run on master (NameNode) and daemons (dataNode and taskTracker) run on slave (DataNode).<\/span><\/span><\/li>\n<\/ol>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\"><b>Note<\/b><\/span><\/span><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\"> : Daemon \u2192 In multitasking computer operating systems, a daemon is a computer program that runs as a background process, rather than being under the direct control of an interactive user.<\/span><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\"><b>Architecture of Hadoop:<\/b><\/span><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\">Top Level Interfaces : In this section the ETL Tools, BI Reporting and RDBMS will fall.<\/span><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\">Top Level Abstractions : In this section the PIG, HIve and Squoop will fall.<\/span><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\">Data Processing : In this section the Map-Reduce and HBase will fall.<\/span><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'Times New Roman', serif;\"><span style=\"font-size: medium;\">Storage : In this section the HDFS cluster will fall.<\/span><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Origin of Hadoop: Now-a-days,\u00a0users of web applications \/ social networking sites are increasing and along with that the usage of data is also getting increased vastly (Terra Bytes and Peta &#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[49,3],"tags":[],"class_list":["post-114","post","type-post","status-publish","format-standard","hentry","category-hadoop","category-technology"],"_links":{"self":[{"href":"https:\/\/excelglobalsolution.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/114","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/excelglobalsolution.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/excelglobalsolution.com\/blogs\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/excelglobalsolution.com\/blogs\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/excelglobalsolution.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=114"}],"version-history":[{"count":17,"href":"https:\/\/excelglobalsolution.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/114\/revisions"}],"predecessor-version":[{"id":210,"href":"https:\/\/excelglobalsolution.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/114\/revisions\/210"}],"wp:attachment":[{"href":"https:\/\/excelglobalsolution.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=114"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/excelglobalsolution.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=114"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/excelglobalsolution.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=114"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}