ThoughtWorks
  • Contact
  • Español
  • Português
  • Deutsch
  • 中文
Go to overview
  • Engineering Culture, Delivery Mindset

    Embrace a modern approach to software development and deliver value faster

    Intelligence-Driven Decision Making

    Leverage your data assets to unlock new sources of value

  • Frictionless Operating Model

    Improve your organization's ability to respond to change

    Platform Strategy

    Create adaptable technology platforms that move with your business strategy

  • Experience Design and Product Capability

    Rapidly design, deliver and evolve exceptional products and experiences

    Partnerships

    Leveraging our network of trusted partners to amplify the outcomes we deliver for our clients

Go to overview
  • Automotive
  • Cleantech, Energy and Utilities
  • Financial Services and Insurance
  • Healthcare
  • Media and Publishing
  • Not-for-profit
  • Public Sector
  • Retail and E-commerce
  • Travel and Transport
Go to overview

Featured

  • Technology

    An in-depth exploration of enterprise technology and engineering excellence

  • Business

    Keep up to date with the latest business and industry insights for digital leaders

  • Culture

    The place for career-building content and tips, and our view on social justice and inclusivity

Digital Publications and Tools

  • Technology Radar

    An opinionated guide to technology frontiers

  • Perspectives

    A publication for digital leaders

  • Digital Fluency Model

    A model for prioritizing the digital capabilities needed to navigate uncertainty

  • Decoder

    The business execs' A-Z guide to technology

All Insights

  • Articles

    Expert insights to help your business grow

  • Blogs

    Personal perspectives from ThoughtWorkers around the globe

  • Books

    Explore our extensive library

  • Podcasts

    Captivating conversations on the latest in business and tech

Go to overview
  • Application process

    What to expect as you interview with us

  • Grads and career changers

    Start your tech career on the right foot

  • Search jobs

    Find open positions in your region

  • Stay connected

    Sign up for our monthly newsletter

Go to overview
  • Conferences and Events
  • Diversity and Inclusion
  • News
  • Open Source
  • Our Leaders
  • Social Change
  • Español
  • Português
  • Deutsch
  • 中文
ThoughtWorksMenu
  • Close   ✕
  • What we do
  • Who we work with
  • Insights
  • Careers
  • About
  • Contact
  • Back
  • Close   ✕
  • Go to overview
  • Engineering Culture, Delivery Mindset

    Embrace a modern approach to software development and deliver value faster

  • Experience Design and Product Capability

    Rapidly design, deliver and evolve exceptional products and experiences

  • Frictionless Operating Model

    Improve your organization's ability to respond to change

  • Intelligence-Driven Decision Making

    Leverage your data assets to unlock new sources of value

  • Partnerships

    Leveraging our network of trusted partners to amplify the outcomes we deliver for our clients

  • Platform Strategy

    Create adaptable technology platforms that move with your business strategy

  • Back
  • Close   ✕
  • Go to overview
  • Automotive
  • Cleantech, Energy and Utilities
  • Financial Services and Insurance
  • Healthcare
  • Media and Publishing
  • Not-for-profit
  • Public Sector
  • Retail and E-commerce
  • Travel and Transport
  • Back
  • Close   ✕
  • Go to overview
  • Featured

  • Technology

    An in-depth exploration of enterprise technology and engineering excellence

  • Business

    Keep up to date with the latest business and industry insights for digital leaders

  • Culture

    The place for career-building content and tips, and our view on social justice and inclusivity

  • Digital Publications and Tools

  • Technology Radar

    An opinionated guide to technology frontiers

  • Perspectives

    A publication for digital leaders

  • Digital Fluency Model

    A model for prioritizing the digital capabilities needed to navigate uncertainty

  • Decoder

    The business execs' A-Z guide to technology

  • All Insights

  • Articles

    Expert insights to help your business grow

  • Blogs

    Personal perspectives from ThoughtWorkers around the globe

  • Books

    Explore our extensive library

  • Podcasts

    Captivating conversations on the latest in business and tech

  • Back
  • Close   ✕
  • Go to overview
  • Application process

    What to expect as you interview with us

  • Grads and career changers

    Start your tech career on the right foot

  • Search jobs

    Find open positions in your region

  • Stay connected

    Sign up for our monthly newsletter

  • Back
  • Close   ✕
  • Go to overview
  • Conferences and Events
  • Diversity and Inclusion
  • News
  • Open Source
  • Our Leaders
  • Social Change
Blogs
Select a topic
View all topicsClose
Technology 
Agile Project Management Cloud Continuous Delivery  Data Science & Engineering Defending the Free Internet Evolutionary Architecture Experience Design IoT Languages, Tools & Frameworks Legacy Modernization Machine Learning & Artificial Intelligence Microservices Platforms Security Software Testing Technology Strategy 
Business 
Financial Services Global Health Innovation Retail  Transformation 
Careers 
Career Hacks Diversity & Inclusion Social Change 
Blogs

Topics

Choose a topic
  • Technology
    Technology
  • Technology Overview
  • Agile Project Management
  • Cloud
  • Continuous Delivery
  • Data Science & Engineering
  • Defending the Free Internet
  • Evolutionary Architecture
  • Experience Design
  • IoT
  • Languages, Tools & Frameworks
  • Legacy Modernization
  • Machine Learning & Artificial Intelligence
  • Microservices
  • Platforms
  • Security
  • Software Testing
  • Technology Strategy
  • Business
    Business
  • Business Overview
  • Financial Services
  • Global Health
  • Innovation
  • Retail
  • Transformation
  • Careers
    Careers
  • Careers Overview
  • Career Hacks
  • Diversity & Inclusion
  • Social Change
Data Science & EngineeringTechnology

Rise of NVMe Storage

Rajesh Tamhane Rajesh Tamhane

Published: Jul 22, 2020

The biggest explosion in the history of the universe

Forty kilometers to the north of Pune in western India lies the Giant Meter-wide Radio Telescope (GMRT) that's staring into the sky in multiple frequency bands. It's not just one telescope but an array of thirty-two 45 meter wide parabolic radio antennae. Scientists at the National Center of Radio Astrophysics in Pune and around the world  through this metal looking glass searching for the secrets of the universe. How do galaxies form? What makes pulsars pulse? How exactly do supernovae explode? And closer to home, they look to the sun to understand nano-solar winds amongst a myriad of other questions.

On a hot August day in 2018, GMRT spotted something – something instrumental in the discovery of the farthest galaxy known to humans. And, more recently, on another hot day in February 2020, GMRT was used to observe one of the biggest explosions in the history of the universe - the Ophiuchus Supercluster explosion.

GMRT
Giant Meter-wide Radio Telescope (GMRT), Pune

Looking through metal

How do scientists 'look' through GMRT? It begins with the radio-antennae 'listening' for specific radio-frequency bands, from 50 MHz to 1390 MHz. Each antenna provides 2 outputs, as an analog signal, that are 180 degrees out of phase. The signal goes through an analog to digital converter which streams UDP packets to a storage device. At a clock frequency of 800MHz, a Field Programmable Gate Array (FPGA) or a programmable CPU streams the output UDP packets at the rate of 1600 MBps, and at 1000 MHz the data rate is 1900 MBps. A single hour of observation will eventually generate a data volume of 7.2 TB.

This data is then written-to-disk on a DELL PowerEdge T620 that is equipped with dual Xeon processors, 64GB RAM, 2x dual 10G ethernet adapters and 17x 6TB SAS HDDs configured as a single RAID 0 volume.  The 17 SAS HDDs are there just to be able to meet the write data rate of 1.9 GBps. 

Scientists use this data to run their analytical algorithms a ndlook' at the sky through metal.

Overcoming data indigestion

While ingestion of data at that rate (1.9GBps) through 17 magnetic disks with movable parts was working, it was causing trouble - from drive failures, to packet-loss.

This article is of how we used commodity hardware and a new type of storage to meet GMRT’s data velocity challenge.

A brief walk through memory and CPU lanes

I built my first PC in 1994. Bill Clinton was the US President and Michael Jordan hadn't won the NBA Finals. My PC, however, boasted a 486DX4 Intel CPU running at 100 MHz, 16MB of RAM and 250MB of space on the hard disk drive. At the time, this was the state-of-the-art in personal computing. Today, we carry far more compute power in our mobile phones. Until recently, Moore’s law has kept up it’s prediction and the number of transistors on chips have been doubling every couple of years. The DELL T620 at GMRT has 2 Xeon processors that run at 3 GHz and are equipped with 16 cores. That is a 300x increase in CPU clock speed and even larger increase in performance. The clock speeds of commodity CPUs are approaching 5 GHz and those of memory have already crossed 4000 MHz. 

During this period, however, the data transfer speeds of storage devices have only increased from 133 MBps to 600MBps. That is a mere 4x increase in over 2 decades. 

The express lane

A new type of flash storage called Non-Volatile Memory Express (NVMe) is closing the chasm that existed between memory and storage speeds - with a difference. At 3 GBps, it is 5x faster than the SATA SSD and 25x faster than traditional HDDs. To be precise, NVMe is a protocol that is used on new generation NANT based storage devices. It runs on the PCIe bus and that is one of the reasons it's blazingly fast.  

So, when we designed our storage pods for the data intensive computing cluster, NVMes were the go-to choice for storage. We put together several pods using the AMD Ryzen CPU and chose to use NVMe flash storage as a part of a cluster. When we ran the FIO tests, our benchmarks resulted in random read-write throughputs of 3 GBps on a single consumer grade NVMe. That is 3 gigabytes per second. Combining 3 of the NVMes into a single disk using RAID 0 resulted in transfer speeds of close to 10 GBps.

Rubber meets the road

We took the findings to the National Center for Radio Astrophysics (NCRA) in Pune and they offered to let us test the storage pod at their observatory. 

We ran 4 tests to determine if the NVMes could match the write performance of this storage configuration. These tests are a more realistic representation of the real-world write performance of NVMes. The FIO tests wrote data from memory to the disk, while in this test, the data was written from the network interface card to a ring buffer and then copied to the disk.

The results showed that a single NVMe was able to support a write speed of 1.6GBps without filling up the buffer, but started to drop packets when the data rate was increased to 1.7Gbps. In a RAID 0 configuration with 3 NVMes, no bottlenecks observed even at 1.9GBps.

Lessons learned

NVMes let you create low to medium density storage using commodity hardware at a very attractive price point. This can be useful in computing applications that are read-write intensive and deal with large file sizes. By choosing an NVMe SSD with the appropriate TBW (Total Bytes Written) a data acquisition system could be built at both a lower cost and power consumption. (Side note; the NVMes used for the test were rated at 10W). 

The CPU architecture is important when designing storage nodes with NVMes. The number of PCIe lanes limit the storage node's density. Hardware RAID can improve CPU performance by off-loading some of the work it needs to do in managing the RAID volume.

Points to ponder over

Persistent storage is closing the gap with volatile memory. With the arrival of NVDIMM (Non-Volatile Dual Inline Memory Modules), this boundary may completely disappear. We are already seeing high-bandwidth memory impact data intensive application performance. Data intensive applications, databases and data structures, and algorithms have factored in the latency that has existed with persistent storage. How will these disspearing boundaries affect the  way our algorithms are written and the way our databases engines have been designed?

Acknowledgements

I would like to thank Dr. Yashwant Gupta, Director, National Center for Radio Astrophysics whose encouragement and keen questioning helped us examine our assumptions and explore further. This work has been a team effort with heavy lifting from Saurabh Mookherjee and Swapnil Khandekar. Saurabh’s deep systems experience has been crucial in architecting the compute cluster while Swapnil’s ability to navigate across systems and code made trivial work of some of the hardest problems. My colleagues, Chhaya Yadav and Prasanna Pendse have been instrumental in getting this artilce into shape. And finally, thanks to Harshal Hayatnagarkar who introduced us to the 4th paradigm of computing and started us on this journey.

Further reading

There is a lot more to this journey and this paper captures the experiment in comprehensive detail.

Technology Hub

An in-depth exploration of enterprise technology and engineering excellence.

Explore
Related blogs
Transformation

The technical mechanics of modernizing your tech estate

George Earle
Mike Mason
Learn more
Data Science & Engineering

A tribute to Alan Turing

Gitanjali Venkatraman
Learn more
Technology Strategy

Value-Driven Digital Business

David Robinson
Jim Highsmith
Learn more
  • What we do
  • Who we work with
  • Insights
  • Careers
  • About
  • Contact

WeChat

×
QR code to ThoughtWorks China WeChat subscription account

Media and analyst relations | Privacy policy | Modern Slavery statement ThoughtWorks| Accessibility | © 2021 ThoughtWorks, Inc.