Wednesday, October 10, 2018

Linux down the Memory Lane

This is the story of not just Linux down my memory lane, but also the story of why I fell for Linux and have remained a devout follower of it ever since. As folks now know, the reasons why Linux is preferred these days are very technical but for me besides being technically a better OS, the reason is also nostalgia.

Early 90s was a fascinating era for me. Let’s begin in 1991, now fondly known as the year of the Internet. The PC-XT (80186) and the higher-end PC-AT (80286) were just about proliferating work places and some homes. Intel 80386 processor based systems weren’t so common still. I had joined for my PhD in computer vision also in 1991.  In IIT Bombay, we had only a central computer centre (with the Cray X/MP super computer) from where there was a 64kbps VSAT link with the rest of the world for the Internet access. My department (electrical engineering) wasn’t even on the local network. In fact none of the departments were, except possible computer science department. I met two like-minded guys in my lab and we all started spending endless hours on improving infrastructure for the joy of doing it. First, we set up Ethernet cable from computer centre to our department, and then setup our department server which would be connected to the computer centre, so that we could login to the super computer by physically being in our department. We learnt about Ethernet, TCP/IP, networking, routing all on the job and without attending any course!

Now, with the “comfort” of accessing the Internet from the luxury of our own lab was achieved, one of my colleagues started looking out for more stuff and he found out about this guy called Linus Torvalds in Helsinki. While studying computer science at University of Helsinki, Linus began a project that later became the Linux OS. His reasons, too, were similar to ours. In those days, a commercial UNIX operating system for Intel 386 PCs was too expensive for private users. So, he wanted to build a free OS which could make the most of 80386 based PCs at the time. He apparently said once that if either the GNU or 386BSD had existed then, he may never have written his own.
Linus developed what he called “Freax” (for free freak unix, which later became Linux). He developed his OS on MINIX system, for which free code existed at that time. MINIX source code was released by Andrew Tannenbaum in his book “Operating systems: design and implementation”. Reason why freax had to be invented was, because Linus argued, that 16-bit design of MINIX was not well adopted to the 32-bit features of 80386 based computer architectures.

First version of Linux was launched on 25th August in 1991 by Linus. Probably the only other installation of Linux 0.0.1 in the world other than that by Linus, was in our lab and I still have the source code of the first ever Linux kernel ! Since the 0.0.1 kernel, I have pretty much used every other version released (especially in earlier days) and continue to remain an avid user of Linux till date. It’s fascinating to see Linux grow as I grew up.

Back in 1991, there was no Ubuntu, or RedHat or any other distribution of Linux available. The closest that came was H J Lu’s boot/root floppies. They were 5.25” 1.2MB diskettes that could be used to boot a system into Linux. One booted from the boot disk, and then, when prompted, one would insert root disk and after a while one would get the prompt. Back in those days, if one wanted to boot from the hard disk, then one had to use a hex editor on the master boot record of the disk and it wasn’t for the faint hearted ! These were the days when we could predict the life of the hard disk just by listening to the sounds it made !

This was all before a real distribution came in existence. The first such thing was the MCC Interim Linux (from Manchester Computing Centre). This was still console only Linux and no X. Shortly after there as a release from Texas A&M University, called TAMU 1.0A. This was the first distribution that let one run X. The first polished distro was Yggdrasil. One could boot from the floppy and run everything else from the CD (the equivalent of today’s Live CD). Folks don’t know this was in the days of 1x and 2x CD-ROM drives. Then, the distributions that followed were SLS Linux, SuSE, Debian and Slackware. Then there was the SCO Linux and after these came the Red Hat and Ubuntu.

In 1992, hearing of success of Linux, Andrew Tannenbaum wrote a Usenet article in the group comp.os.minix with the title “Linux is obsolete”. One should note that while Linus used MINIX for development, the principles of the OS were diametrically opposite to those held by Andrew at the time and also mentioned in the book. Andrew’s reasons why he thought Linux was obsolete, was primarily because kernel was monolithic and old-fashioned. Tannenbaum predicted that Linux would be obsolete soon. Rest is history as we today know where Linux is and where MINIX is or for that matter GNU Hurd, of which Andrew was a great proponent.

Today, the aggregate Linux server market revenues exceed that of the rest of the UNIX market. Google’s Linux based Android claims 75% market share of smart phones. Ubuntu claims 20,000,000+ users and kernel 4.0 is now released.

The free and open philosophy of Linux and the enterprising nature of Linus Torvalds left an indelible mark on me during my graduate days and I continue to respect the open community and hence have hardly used any other OS. My devices of choice today are Ubuntu based laptop and Android based phone.

Monday, October 8, 2018

Deep Learning and Genomics

Deep learning at work can be seen all around us. Facebook finds and tags friends in your photo. Google DeepMind’s AlphaGo beat many champions at the ancient game of Go last year. Skype translates spoken conversations in real time. Behind all these are deep learning algorithms. But to understand the role deep learning can play in ever fascinating umbrella branches of Biology, one has to understand what is deep in learning? I would skip the definition of learning here for the sake of brevity. The “smart” in “smart grid”, “smart home” and other such was equally intriguing initially and eventually turned out to be a damp squib. You will be surprised if “deep” could end up as smart’s ally eventually.

There is nothing ‘deep’ in deep learning in the colloquial sense of the word (well, there will be many who may want to jump on me for saying this and try proving just why deep learning is deep – but hold on). Deep learning is simply a term used to describe learning by a machine in a way similar to how humans learn. Now here is the dichotomy. We are still struggling to fully understand how the brain functions, but we do know how deep learning should model itself after the way brain operates! This reminds me of my problem in my PhD days in the late 90s in computer vision, the branch that deals with making machines see things as humans do. Back then, David Marr of MIT had written a seminal book on Vision popularly known as “Vision by Marr” that spent a whole lot explaining the neuroscience behind vision and how computer models should mimic that behavior. Computer vision seemed a saturated field in 90s though, as just how much maths and algorithms can be invented by looking at 2D array of numbers (pixels in an image)? But recent developments in machine learning and deep learning have brought focus right back to computer vision. And these days, folks don’t write the crazy low level image processing algorithms I used to write back then! They just show the algorithm 10,000 images of dogs and cats and then after ‘learning’ the computer is given another unknown image with a dog or cat and it would tell which is which with incredible accuracy. Doing these tasks of learning and prediction in the assumed model of how brain functions, namely the neural network, led to the development of field of artificial neural network (ANN). So any ANN that thinks like brain (at least as we think so) and produces results that are acceptable to all of us, generally speaking, is called deep learning.

There are two thoughts that I came across at different points in time that have shaped my professional career. One was by Jim Blinn. In his column in IEEE Trans. on computer graphics, vision and image processing in the 80s, he once wrote in the context of maturity of computer graphics at the time, that practical solutions should not necessarily be driven by theory. One should experiment and then use theory to explain why the best result one got, should work anyways. This is the essence of machine learning and deep learning. There is data and more data. If there isn’t enough, we carry out data augmentation and add more data, try multiple splits of training data as training and validation, then use multiple models to find accuracy of that model, whether it over-fits or doesn’t etc and then choose the best model. As a practicing data scientist, I can say there is no single approach at the outset that sets the path for required results. There is exploration and experimenting. Unfortunately, Blinn’s thesis can’t be applied to deep learning here after, for even after one gets the desired results, there is no direct way of applying theory to figure out why it should work anyways. In fact, many researchers have dedicated their lives figuring out why deep learning should work anyway and there is no consensus. Geoff Hindon and a few others perilously kept the branch of machine and deep learning alive during the years when it seemed saturated and while at the same time, scale became possible and now with multi-core CPUs and more importantly powerful GPUs (and now TPUs), artificial neural networks yield surprisingly fast and acceptable results, without anyone quite able to explain why it works anyways. Prof Naftali Tishby and his team have the most credible work to their credit. Called “information bottleneck”, they use concepts from information theory to explain why deep learning models should work. It is a fascinating field and still under development and many including Hindon have agreed that information bottleneck is a real mathematical tool that attempts to explain neural networks in a holistic way. But at the level of a practicing deep learner today, one tries tens of models and chooses the one that gives best results (or chooses an ensemble) and use accuracy or any other metric to crown it as the best among the equals and leave at that, for theory plays no further role.

The second thought is from Prof Eric Lander of the MIT. I had taken his online class on ‘Secret of life 7.00x’ in 2014. He has a PhD in Mathematics (information theory) and he got interested in Biology and became the principal face of the Human Genome project in 2000. In one of the classes, he had said that as a student one should build skills to learn all tools available and then later choose from them to problems at hand, as you never know which one is helpful when. He used his maths training in solving many tasks in the human genome project. He is singularly responsible for revival of my interest in Biology again. His course was a fascinating time travel in the fields of biochemistry, molecular biology and genetics and then an overall view of genomics. Interestingly for me, the timing was correct. 2014 onwards was also the time when machine learning and deep learning was sweeping the technology landscape and with my fresh perspective in Biology, I decided to work on applying deep learning to genomics.

In this article, I don’t intend to either use too much of technical jargon or make it look like a review article, so will skip many details. But I will say how I got involved in using deep learning with genomics. Genomics is a challenging application area of deep learning that entails unique challenges compared to others such as vision, speech, and text processing, since we have limited ability ourselves to interpret the genome information but we would expect from deep learning a super human intelligence to explore beyond our knowledge. There is still much in the works and as yet a watershed revolution has not been round the corner in deep genome. In one of the classes, Prof Lander was explaining the Huntington’s disease. Huntington’s disease is a rare neurological disease (five in 100,000). It is an unusual genetic disease. Most diseases are caused by recessive alleles, and people fall ill only if they get two copies of the disease allele, one from each parent. But Huntington’s disease is different, the allele that causes it is dominant and people only have to receive one copy from either parent to contract it. Most genetic diseases cause illness early in life, whereas Huntington sets in around midlife. Prof Lander went on to explain the works of David Botstein and Gusella where they identified the genetic marker linked to Huntington’s disease on chromosome 4 through a series of laborious experiments.  The idea was to use positional cloning and genetic markers (polymorphisms) to locate a gene that you don’t know where to look for. This work was carried out in 1983 when there was no human genome identified.

This introduction was good enough for me to get initiated in genomics. After all, we are looking for the unknown most of the time, and for a change we have a human genome now. So the thought is can we use markers to identify and locate specific genetic condition? Deep learning is good at doing boring tasks with incredible accuracy and bringing insight that may be humanly impossible. With computational speed available at hand, doing searches in blind alleys using deep learning is incredibly powerful and may hitherto lead to insights not intended for in the beginning.
Genomic research targets study of genomes of different species. It studies roles assumed by multiple genetic factors and the way they interact with surrounding environment under different conditions. A study of Homo sapiens involves searching through approximately 3 billion base pairs of DNA, containing protein coding genes, RNA genes, cis-regulatory elements, long range regulatory elements and transposable elements. Where this field intersects deep learning has far reaching impact in medicine, pharmacy, agriculture etc. Deep learning can be very useful in exploring gene expression, including its prediction, in regulatory genomics (i.e. finding promoters and enhancers), splicing, transcription factors and RNA-binding proteins, mutations/ polymorphisms and genetic variants among others. The field is nascent though. The predictive performances in most problems have not reached the expectation for real-world applications; neither the interpretations of these abstract models seem to elucidate insightful knowledge.

As the “neural” part in Artificial Neural Network (ANN) suggests, the ANNs are brain-inspired systems which are intended to replicate the way that we humans learn. Neural networks consist of input and output layers, as well as (in most cases) a hidden layer(s) consisting of units that transform the input into something that the output layer can use. Deep learning tools, inspired by real neural networks hence, are those algorithms that use a cascade of multiple layers neurons each serving a specific task. Each successive layer uses the output from the previous layer as input. While at the outset, I did say that there is nothing ‘deep’ about deep learning, technically one can say that just how deep a network is depends on the number of hidden layers deployed. The more the layers, the deeper is the network. They are excellent tools for finding patterns which are far too complex or numerous for a human programmer to extract and teach the machine to recognize. While neural networks existed since 1940s as perceptrons, they have become a serious tool for use only after 80s due to a technique called backpropagation, which allows networks to adjust their hidden layers of neurons in situations where outcome does match the expected. There are many types of neural networks. The most basic type is the feedforward type, the more popular is recurrent type and then there are convolutional neural networks, Boltzmann machines, Hopfield networks amongst others. Picking the right network depends on the data one has to train it with and the specific application in mind.

Hopefully, some day, we would be able to place all jigsaw pieces of the puzzle together. We would then be able to not only get good results, but have information bottleneck or any other tool explain why it should work anyways. And hopefully, that substantial, deep learning could pave way to provide deeper insights (no pun intended) on just how the brain works.