Apr 4, 2015

EMC Atmos - Archirecture

EMC Atmos is a cloud storage platform that enables enterprises and service providers to store, manage, and protect globally distributed, unstructured content at scale. It is the first exabyte-scale, global information management solution specifically designed to automate and manage data placement, protection, and access for rich, unstructured content as a single system across distributed storage environments.
Atmos operates as a single entity, regardless of how it is physically distributed which distributes content in an active/active paradigm rather than in a hierarchical approach common with file system-based structures. Unlike other systems, Atmos uses customizable, value-driven metadata to drive storage placement, protection and lifecycle policies. This ensures information get’s to the right location, at the right time - automatically. Atmos can operate as the foundation of a Cloud infrastructure, natively serving and metering isolated tenants (Multi-tenancy) from a single system to maximize utilization across multiple customers and applications.
These qualities of Cloud-optimized storage architecture increase operational efficiency, reduce management complexity, and reduce lifecycle cost. Specific Atmos features that drive these benefits include
  • Massively scalable infrastructure into multiple petabytes with support for billions of objects across a globally distributed infrastructure.
  • Unified namespace eliminates capacity, file number, location and other file system limitations.
  • Policy-based management: Metadata and policy-based information management capabilities combine to intelligently drive information placement, protection and other information services, optimizing availability and cost based on the customer’s SLO.
  • Data Protection and Recovery: Atmos offers two flexible policy-based options to choose from. GeoMirror provides traditional synchronous or asynchronous copies that are distributed across locations. GeoParity lets you split up objects into multiple encoded fragments that are distributed across components for increased content durability.
  • Integrated Data Services: Atmos policies also allow you to set and automate data services including compression, de-duplication, spin down, striping. Reduce administration time and permit Atmos to be efficiently managed globally.
  • Multi-tenancy: Enables multiple applications to be securely served from the same infrastructure. Each application is securely partitioned and data is neither co-mingled nor accessible by other tenants. This feature is ideal for businesses providing cloud services for multiple customers or departments within large enterprises.
  • Flexible Access Methods: REST and SOAP web service APIs, as well as file-based access provides convenient integration to virtually any application, and easy access over the LAN or WAN. Sync & Share with mobile devices, windows, and Linux.
  • Storage-as-a-Service: The Atmos Cloud Delivery Platform is add-on software product that enables enterprises and service providers to deliver and manage storage-as-a-service to an Atmos cloud. Enables self-service access and management by tenant.

1. SERVICES
At its core, Atmos is delivered as a set of distributed, redundant software services that interact with one another to provide global information management. This collection of services:
  • Provides web services and file presentation interfaces
  • Tracks availability and location of all other services
  • Maintains an index of objects
  • Stores the policy for objects
  • Stores the user and system metadata
  • Responds to all I/O requests
  • Writes to physical disk
  • Manages background replication tasks
2. DATA AND METADATA
Atmos stores content as objects, and divides objects into two parts: metadata and user data.
  •  Metadata, which is further divided into:
  • System metadata – This includes filename, file size, modification date, creation date, access-control lists, and object ID (“OID”)
  • User metadata – This comprises arbitrary, custom, name-value pairs. Examples of user metadata are artist name (for music data) and customer type
  • User data – This is application data, such as image files, text files, videos and audio files.
Every object in Atmos has information associated with it that includes an object ID, system and user metadata, Atmos object layout information, and parent/child information (for objects saved through file system interfaces).
Atmos uses metadata to provide greater context for the user data. User metadata can be used to logically group objects. Data management policies can then be applied to these logical groupings. System metadata can also be used to trigger policies based on MIME type and similar system attributes, but user metadata allows the end user and end user application greater control in grouping objects by more abstract concepts like user type (e.g. objects associated with a new user to a web application).

3. ATMOS BUILDING BLOCKS
The Atmos packaging consists of two elements: Atmos front-end nodes and Disk Array Enclosures (DAE). As shown in the picture below, Atmos front-end nodes run the Atmos software, while the Atmos DAEs provide very dense, economical storage. Every node runs the Common Appliance Platform “CAP”, which is an internal EMC Linux distribution based on a Red Hat kernel. The Atmos software is layered on top of CAP, and is considered a closed appliance model.
The picture below shows the 4 bundled physical solutions, being a G3-FLEX-180, G3-FLEX-240/360, and G3-DENSE-480 respectively. Currently, each physical node is an x86 commodity server, with 2 quad core CPUs, and 2 onboard NIC ports. The first NIC port, eth0, is connected to a private network with a bundled internal-only switch, to allow for Management, PXE and IMPI traffic between the nodes. There are 2 10G ports which can be connected to a single or dual (HA configuration) 10G switches for external network access (I/O). Each node is connected via a serial-attached SCSI (SAS) cable to a disk enclosure (DAEs). Depending on the model, one or two servers may connect to a single DAE and each DAE may be populated with 30 or 60 SATA drives. The drive capacities available are 1TB, 2TB, 4TB, and 6TB at 7200 RPM spindle speeds
4. Docs






2 comments: