Jul 23, 2015

Cisco - UCS - Invicta - Invicta OS: A Strongly Differentiated Approach

Data Management Layers
Cisco UCS Invicta appliances and nodes both carefully manage data to optimize write operations for the flash devices. The OS running on both the standalone Cisco UCS Invicta C3124SA Appliance and the nodes in the Cisco UCS Invicta Scaling System optimizes data before handing it down to the drive level. This process uses dedicated computing and memory resources that are separate from those on the flash drives.
Cisco’s approach efficiently writes full erase block-aligned stripes to the flash drives. The OS works with the flash drive’s existing flash translation layer (FTL) and gives the drives only optimized data. The Cisco UCS Invicta OS always strives for determinism, requiring the drive’s FTL to perform less work. Figure 3 and the sections that follow describe how the Cisco UCS Invicta OS works at different data management layers.
  • Data is protected: As data is written to the system, 1 GB of the most recent inbound data is constantly written to the data protection buffer. The buffer is powered by a supercapacitor to protect against a power outage.
  • Data is organized: At the node or appliance level, the block translation layer (BTL) organizes data into long linear write blocks. Currently these write blocks are 22 MB in size and are stored in the appliance or node memory while being organized.





An individual hash tag is also created for every 4-KB element. Metadata is stored in memory and appended to the write block in both the header and footer. The write blocks are then flushed to the RAID layer.

  • • Data is spread across the drives: Also at the node or appliance level, the RAID layer provides for RAID 6 distribution of data. The RAID layer splits the 22-MB write block into 22 pieces, each 1 MB in size. The 1-MB size corresponds to the optimal size for the current drives in use by the Cisco UCS Invicta Series. The RAID layer then sends write segments to 22 of the 24 drives, and parity segments are written to the other two drives.
  • • Data is written to all drives: The flash device is addressed as a block device, and it is addressed in terms of logical block addresses (LBAs). The FTL is at the drive level, and it is used to convert LBAs to actual NAND locations within each drive. The FTL also tracks used and free pages and erase blocks.

By performing long linear write operations, the Cisco UCS Invicta OS helps ensure that the flash drive FTL doesn’t have to work hard to sort data. Because write operations are performed in full write segments only, writing actually takes less time, improving write performance overall. In addition, with full write segments, the drive can see that it will overwrite all the LBAs in a specific region. In turn, the drive knows that it can mark the entire region as invalid and pre-erase it, placing the erase blocks on its free block list.

Software-Defined Approach to Data and Flash-Memory Management Cisco’s approach is software defined, an innovation that generates considerable power and flexibility and lets Cisco UCS Invicta Series systems rely less on the capabilities of solid-state devices. This software-defined approach allows the Cisco UCS Invicta OS to intelligently manage write operations. As depicted in Figure 4,
the process shifts critical tasks from the flash devices to the Cisco UCS Invicta appliances and nodes and reduces the number of actual write operations that take place on the flash media itself.
 

  • • Log-structured translation layer: As with database technology, Cisco UCS Invicta OS structures write to the media in a linear chain or log structure, allowing the FTL




to simplify its own NAND management functions. This approach allows the drive to maintain media performance even at large fill levels.

  • • Linear updates: The Cisco UCS Invicta OS also performs overwrite operations in a log-structured manner. Full segments are written in line with the new data, and never less than an entire write segment is written. This approach allows the FTL to invalidate outdated information all at once, reducing reliance on large-scale device-level overprovisioning.
  • • Deletions in metadata: Like updates, deletions are performed in metadata, with the original data left in place on the flash device. The metadata structure is then updated to show that the data should no longer be referenced, and that the location on the flash device is eligible for garbage collection. This approach avoids unnecessary erase cycles, leading to better performance.


Virtual Garbage Collection
Log-structured write operations, linear updates, and deletions in metadata all serve to optimize write performance and reduce the actual write operations to the flash devices. As far as the device is concerned, the write process simply continues, and the FTL can place data linearly on the media. Eventually, however, garbage collection is required as the BTL runs out of free blocks on the media and overwriting begins to occur.


Cisco UCS Invicta appliances and nodes run their own virtual garbage collection, which is engaged when the appliance or node is at approximately 70 percent capacity. Some free blocks on the flash device are reserved for virtual garbage collection (vGC) purposes. However, the process uses dedicated CPU and memory, not flash drive resources.
As shown in Figure 5, the process begins by determining the amount of a block that is still valid and then ranking the write blocks based on this information. Those blocks with the most invalid data are targeted for virtual garbage collection. This approach requires the least amount of valid information to be moved. The system then determines how much valid information needs to be moved to create a block of free space. That minimal information is then written in line to reserved garbage collection




blocks, freeing the old blocks for rewrite. Invalidated blocks are noted in metadata and are ready to be reclaimed with a single overwrite. With Cisco UCS Invicta OS virtual garbage collection, there is no need to run the special ATA reserve command called TRIM, which indicates to the drive that the
space is free. Instead, the drive can infer that the space is free based on the fact that the LBAs have already been overwritten by a full write segment as commanded by the BTL. The controller can also pre-erase blocks to enhance performance based on this behavior. Moreover, as a result of Cisco UCS Invicta OS virtual garbage collection, drive-level garbage collection has less work to perform.


1 comment: