INFRASTRUCTURE FOR BIG DATA
DATA PLATFORMS AND INFRASTRUCTURE
Institute of technology Carlow, Ireland
STUDENT NAME: Sumit Kumar Singh
COURSE NAME:Masters in Data Science
DEPARTMENT:Department of Computing and Networking
COURSE CODE:(CW_SRDAT_M) Y5
DATE OF SUBMISSION: 03 05 2018
CONTENTS TOC o “1-3” u CONTENTS PAGEREF _Toc513290005 h 2
introduction PAGEREF _Toc513290006 h 3
Cloud Computing PAGEREF _Toc513290007 h 3
Cloud Service Provider: (AWS vs. Azure) PAGEREF _Toc513290008 h 5
Data Platforms: PAGEREF _Toc513290009 h 6
Implementation: PAGEREF _Toc513290010 h 7
Conclusion PAGEREF _Toc513290011 h 9
references PAGEREF _Toc513290012 h 10
introductionCloud ComputingAccording to IBM, cloud computing is “the delivery of on-demand computing resources – everything from applications to data centers – over the internet on a pay-for-use basis”.
By using cloud computing solutions, users can have a scalable solution that is metered and allows an IT infrastructure to grow and shrink as required with ease and cost efficiency.
The key elements provided by cloud computing services are:
Software as a service (SaaS) – Users can gain access to software solutions that are based in cloud. Software is available from any computer connected to the cloud. A common example of this is the Microsoft Office 365 suite of software.
Platform as a Service (PaaS) – This provides complete environment to support web-based application development and deployment solutions without the cost of purchase of the underlying hardware and software.
Infrastructure as a Service (IaaS) – This provides a complete IT Platform including hardware, storage and software on a pay per use basis. These infrastructures are scalable on demand.
There are three main flavours of cloud computing:
Public Cloud – Public cloud services are delivered over the public internet and provide all key cloud services such as hardware, software, or supporting infrastructure. Public cloud services are provided by companies such as Microsoft (Azure) and Amazon (AWS).
Private Cloud – A private cloud is a solution provided for one organisation alone. These solutions are can be provided internally or externally to the organisation and provide the same range of services that a public cloud does.
Hybrid Cloud – A hybrid cloud is a combination of data centers, private clouds and public clouds. It delivers the private cloud’s high-security features coupled with the fast connection and easy-to-access features of the public cloud.
Below is the list of some major CSPs and their approaches:
Cloud Service Provider IaaS PaaS SaaS
Amazon Yes Yes No
Century Link Yes Yes No
Google Yes Yes Yes
IBM Yes Yes Yes
Microsoft Yes Yes Yes
Rackspace Yes Yes No
Salesforce.com No Yes Yes
SAP Yes Yes Yes
Verizon Terremark Yes Yes No
Advantages and Disadvantages of cloud computing:
1) Cost Savings – Cloud computing keeps capital and operational expenses to a minimum. No in-house server storage or applications are required. Lack of on premises infrastructure also removes their associated operational costs of power or administrative expenses.
2) Reliability – Service Level Agreement guarantees 24/7/365 and 99.99% availability that make cloud computing more reliable than in house IT infrastructure.
3) Manageability – Cloud computing provides IT management and maintenance through central administration of resources, vendor managed infrastructure and SLA backed agreements.
4) Strategic Edge – It nullifies the time requirement for IT procurement and helps focus on key business activities and objectives. 1) Downtime – Technical outages are frequent which could lead to temporary suspension of business processes.
2) Security – Being a public service opens up cloud service providers to security challenges on a routine basis. Although cloud service providers implement the best security standards and industry certifications, storing data and important files on external service providers always opens up risks.
3) Vendor Lock-In – Cloud service providers promise that the cloud will be flexible to use and integrate, switching services hasn’t completely evolved. Hosting and integrating current cloud applications on another platform may throw up interoperability and support issues. For instance, applications developed on Microsoft Development Framework (.Net) might not work properly on the Linux platform.
The purpose of the document is to look at the tools and technologies available for Big Data Infrastructure system and have a broad knowledge of the tools that can be used.
My report is based on the comparison of each service via case study and is followed by my opinions.
Cloud Service Provider: (AWS vs. Azure)Selecting a cloud service provider always depends on the wants and needs of each individual customer and the workloads they are running. Perhaps the best way to figure out is to start a free trial with the providers to experience what each platform is like.
According to Cloud Tech and IT professionals at Spiceworks, when it comes to IaaS services Azure and AWS are both solid platforms that gives organisations access to vast computing resources. There are certain factors that distinguish two of them.
Cost: Both providers offer a variety of differently-sized instances at relatively comparable price points to fit the needs of organisations of all sizes. Costs on both platforms vary depending on the performance, capacity, amount of data you need to transfer, and whether you need advanced features such as load balancing and auto-scaling. How often you use your cloud instances also figures into cost, with one big pricing difference between Azure and AWS being that Microsoft charges for usage by rounding up to the nearest minute, while Amazon rounds up to the nearest hour.
Support Plans: Both provide different levels of tech support depending on how quickly you need the issues resolved. However, Azure support plans are billed using a flat monthly fee, but AWS support fees vary on a sliding scale tied to monthly usage, so support costs can grow quickly if you are a very heavy user.
Reliability and uptime: Both Azure and AWS strive for greater than 99.95% service availability. While both services have been reliable, both have experienced periodic outages that affected popular services like Netflix, Office 365, and more.
Setup and user-friendliness: Azure is known for being convenient for Windows admins because they don’t have to learn a new platform. On the other hand, AWS is known for providing a more highly configurable, feature-rich offering that has a bit of an initial learning curve. However, AWS offers a lot of power, flexibility, and room for customisation with support for a huge number of third party integrations.
Consider the aspects before selecting a cloud service provider.
Business Health and Processes
Technical Capabilities and processes
Data Platforms:According to InformationWeek, data analysis is a do-or-die requirement for today’s businesses. Huge quantities and varieties of data are grappled on one hand and are processed even faster on the other. Vendors too are responding well by providing highly distributed architectures and new levels of memory and processing power.
Apache Hadoop leads the big data revolution. It’s nine years old and it was first used by internet giants such as Yahoo and Facebook. Cloudera introduced big data support for enterprises in 2008 and MapR and Hortonworks piled on in 2009 and 2011. IBM and EMC introduced their own Hadoop distribution in data management. Microsoft and Teradata offer complement software and support for Horton’s platform. Oracle supports Cloudera, while HP, SAP and others work with multiple Hadoop software providers.
Fast Processors are needed for in-memory analysis; SAP with its Hana Platform is leading the race. However, to compete with SAP, Microsoft and Oracle are ready to introduce in-memory options for their flagship databases.
On a broader platform, vendors such as IBM, Microsoft, Oracle and SAP offer data management and data integration software’s, business intelligence and analytics software’s,
in-memory and Hadoop options. Teradata is a step ahead and focuses more on data management and is closely tied to analytics market leader SAS.
To conclude, plenty of vendors offer cloud platforms. But 1010Data and Amazon Web Services (AWS) have their entire business on cloud model. Out of two, Amazon has the broadest selection of products and is an obvious choice for big workloads. 1010data has a highly scalable database service and supporting information-management, BI, and analytics capabilities that are served up private-cloud style.
AB&R, has given a checklist to choose the right services before implementing any business in cloud. Be sure to include these steps before you sign for any services.
Define your project – Some applications and infrastructures should never be put on a cloud. Decide what you want to move to the cloud and whether or not it’s feasible.
Select the platform – Choose a platform that is fast, easy and safe to deploy. Ensure you have a flexible platform that scales to support your evolving business model and future growth.
Understand security policies – Many service providers believe that data security is your responsibility, not theirs. Make sure you have a clear understanding of who is responsible and ensure that the right resources are in place.
Select your cloud computing service provider – Partner with a service provider that has success with businesses similar to yours and knows your technology.
Determine service level agreements – In addition to uptime, be very clear with your service provider when it comes to SLAs and exactly what they do and do not cover, such as data availability or data protection.
Understand who owns recovery – Outages will happen, so know in advance if you or your service provider is responsible for recovery.
Migrate in phases – Roll out a phased migration that allows you to gradually increase the load and gives you time to fine tune and minimize risks while maintaining business continuity.
Think ahead – Your business requirements can change at any time, so choose a cloud solution that allows you to move between on-premise and cloud as needed, and one that allows you to move to a different cloud service provider if necessary.
Below is the comparison of four Private Cloud providers given by tom’sIT PRO:Provider Compatibility Complexity Security
Microsoft Private cloud
Single pane of glass for Hyper-V, VMware ESXi and Citrix Xen servers, although detail of management support is not clear. Single product makes it easy (but expensive) to purchase. Because System Centre contains so many related products, installation and upgrades can be complex. Relies on the existing Active Directory / IIS frameworks allowing for ease of operations.
VMware vCloud Suite
Ability to manage top three hypervisors, but limited information on to what detail.
Suite variants with add-ons can make purchasing complex. Products are separate features making installing and upgrades more complex.
Mature security in core products, integration with Active Directory requires single sign on.
OpenStack Private Cloud
Primary focus is on KVM but VMware’s hypervisor is gaining support. Limited attention on Citrix Xen and Microsoft Hyper-V.
Fairly easy to get started with community created documentation.
Security is based on security domains and trusts. Third-party LDAP server can do authentication; support for multi-factor authentication is available as well.
Apache CloudStackPrivate Cloud Supports VMware ESXi, Citrix Xen, Microsoft Hyper-V and KVM. No single pane of glass for detailed management. Multiple products under a single open source umbrella may present challenges for installation and configuration, coupled with primary support being forum based is cause for concerns. While support for Active Directory exists, along with security zones, the inclusion of Java-based management agents may present additional security concerns.
ConclusionFor those who have concerns about the use of cloud solutions, private cloud is a suitable option to consider. They provide the functionality, scalability and affordability of large data management infrastructures without the need for investment in hardware and IT personnel. This flexibility is delivered with the reassurance of the privacy of the cloud and the restrictions of access to systems.
It is my recommendation that for small to medium sized organisations to get the best service from a private cloud, the option of an externally hosted private solution is the best fit. Business needs to understand their requirements and choose the best provider. Amazon Web Service (AWS) stands out of all the service providers in this league as of now and is my personal recommendation to those who wants to opt for the cloud services.
references BIBLIOGRAPHY l 16393 AnytimeAnywhere, L. I. (n.d.). Advantages and Disadvantages of Cloud Computing. Retrieved from https://www.levelcloud.net/why-levelcloud/cloud-education-center/advantages-and-disadvantages-of-cloud-computing/
Azure, M. (n.d.). How do I choose a cloud service provider? Retrieved from https://azure.microsoft.com/en-us/overview/choosing-a-cloud-service-provider/
central, s. (n.d.). What are Cloud Service Providers? Retrieved from https://www.sdxcentral.com/cloud/definitions/what-are-cloud-service-providers/
Henschen, Doug. (n.d.). 16 Top Big Data Analytics Platforms. Retrieved from https://www.informationweek.com/big-data/big-data-analytics/16-top-big-data-analytics-platforms/d/d-id/1113609
Kirsch, Brian. (n.d.). 5 Private Cloud Providers Compared. Retrieved from http://www.tomsitpro.com/articles/private-cloud-providers-comparison,2-899.html
Tsai, Peter. (n.d.). AWS vs. Azure: IT pros weigh the pros and cons. Retrieved from https://www.cloudcomputing-news.net/news/2016/sep/06/aws-vs-azure-it-pros-weigh-pros-and-cons/
What is Cloud Computing. (n.d.). Retrieved from https://www.ibm.com/cloud/learn/what-is-cloud-computing
What is Hybrid Cloud? A Scalable and Customizable Computing Solution. (n.d.). Retrieved from https://www.sdxcentral.com/cloud/definitions/what-is-hybrid-cloud/