Hi all, my name is Jeremy Hallum, I work up in NCRC as a Biomedical Computation Research Consultant/Facilitator. I'd like to share with everyone my experiences at BioIT World in Boston this year.
Introduction
BioIT World is a conference and expo that describes itself as: “a place that provides the perfect venue to share information and discuss enabling technologies that are driving biomedical research and the drug development process."
The conference covers a wide variety of topics in Biomedical research, from hard IT topics like IT infrastructure, software, Data Security and Cloud Computing to science and scientific tools in Bioinformatics, Next-Gen Sequencing Informatics, Clinical and Traditional Informatics, and Clinical Genomics. It is held at the World Trade Center in Boston every April, and this year it was April 21-23.
This is the third year that I have attended the conference, and this year I have concentrated on Research IT infrastructure, Cloud Computing, and a bit of the Data Security Track. Between this conference and the Supercomputing Conference, it’s pretty easy to stay on top of what colleagues and vendors are providing researchers at other institutions, both corporate and educational, through both formal lecture sessions and the product expo floor.
The overriding themes of the conference this year were:
- the increasing maturity of the private cloud framework, and
- the emerging ability to build converged (or “hyper-converged”) architecture solutions, via solutions as OpenStack or SwiftStack, or VMWare’s EVO:RAIL, or HP’s Goliath)
- the maturity of the object storage frameworks and the ability to transition from a private object storage pool to either EC2 or Google storage.
Sessions
- Everything now has an API on the back end. If you can’t massively automate your operational environment, you are behind the curve.
- Hyperconverged architectures are well on their way to being a reality.
- Keep an eye on high performance networking. The Mellanox Connect-X 4 framework includes both FDR and EDR infiniband ports and 10Gbit, 40 Gbit, and 100 Gbit ethernet on the same card!
- Object storage is the future, data will be accessible via software, and there will be a rich catalog of metadata able to be stored with it to identify it and it's qualities.
- Having standard "building blocks" of storage, networking and compute for your environment for easy upgrades are keys to easily maintaining your environment and making it easily upgradeable.
Another highlight is via Dirk Petersen from Fred Hutchinson Cancer Research in Seattle. His team has successfully deployed an object based backup and economy file storage system using Swift Stack. It has features that make it pretty easy to use with a command line and GUI interface to import data to a higher tier. So instead of mounting the data store to a server, a user would use software to copy the object to/from the cluster/compute server for analysis or storage. They tried using a data transfer service that emulates an NFS/CIFS mount from the object stores for researchers to transfer data, but they found that it was up to 10x slower to do so, so they deprecated the service after the first year.
The third highlight session was an overview of the Broad Institute’s environment by Chris Dwan, Director of Research Computing.
He presented the direction that they are going in, which is towards more of an automated, private cloud environment, starting strictly from a compute perspective (leaving storage, and SDN out of the picture). It gives them the local flexibility to give people VMs to be able to do work, but also allow for them to burst their research out of the private cloud, and into Amazon or Google’s environment.
He also spoke of their changes in their storage environment, where they use Avere’s caching technology to connect their edge file storage with Amazon to create something that looks like an exabyte filesystem. They are also building a multi-PB object store for their long-term project storage, and they are coming up with a series of internal standards for their data so that they can find things quickly and easily in their environments, especially given their massive intake of data (~100TB a week! from 60+ sequencers)
Other sessions I visited are (if you have any questions about any of these sessions, just contact me):
- Introduction to EVO:RAIL by VMware
- Comparisons of Storage Efficiencies through Hadoop
- Rapid Integration of Cancer Genomics data through Hadoop and Cloudera’s Impala
- Accelerating Biomedical Research Discovery: The 100G Internet2 Network – Built and Engineered for the Most Demanding Big Data Science Collaborations
- Managing Genomic Data at Scale! - Rules Based Intelligent Data Management
- Beyond Parallel Filesystems: NVMe Storage for Genomics Workflows
- The Expanding Face of Meta Data
- Intelligent Infrastructure Approaches for Emerging Life Sciences Data Management Issues at Scale
- How Next Generation Scale-Out Storage Fuels Breakthroughs in Life Sciences
- Start Small, Collaborate Often, Grow Big – Scaling NGS Compute and Storage Solutions for Personalized Medicine
- Out of the Trenches and Into the Future: Mixing File and Object Storage Architectures
- Breaking the $1,000 Genome Sequencing Barrier with Object Storage
- OpenFDA: IT and Informatics Innovation at the FDA
- Global Developments in Privacy and Data Security Law
Vendors
On the vendor floor, even most of the traditional vendors were touting their object storage capabilities alongside of their traditional block storage. There were two specific object storage only vendors on the floor: the most notable being SwiftStack and Cleversafe, and then a host of storage vendors that provided either both, or a way to transition object storage to block storage: DDN, Qumulo, Avere, EMC/Isilon, HP, Thinkmate/SuperMicro, Dell and IBM.
Other than some interesting conversations that I had with several vendors (particularly Cleversafe and Avere), I won’t go into too much detail in vendor offerings here. Please feel free to peruse the vendor floorplan and contact me if you have any questions about any particular vendor’s offerings.
Conclusion
I think the big takeaway is: there has been the beginning of a transition between local, decentralized resources to a more private cloud infrastructure for researchers, and the development of more specialized private clouds for researchers who have more HPC specific needs, especially at the research focused institutes with a trend toward the bleeding edge of HPC. Over time, this technology will begin to move from the research setting into more standard services, it may be only a matter of time now. Because of this, it’s time to start looking into this transitional services and see what makes sense in our environment and work with our partners to craft a service that will meet the needs of Medical School researchers.
As always, if you have any questions about the conference, don't hesitate to contact me.
As always, if you have any questions about the conference, don't hesitate to contact me.





















