Thursday, April 1, 2021

Turing award winners - 2021

 Yesterday, I heard that Profs Aho and Ullman were awarded this years' Turing awards for their work in compiler design. I have never met Aho and Ullman, nor have I directly spoken with them. However, they were my heroes of my student life. I will explain how and why.

I joined my PhD in the year 1991 at IIT Bombay. I had completed by Bachelors in Electronics in 1987 and masters in 1989. This was an era when the Internet was not public craze yet and computers were mostly very basic - the PC XTs and ATs of the time based on Intel 286 and 386 chipsets! I had no professional training in computer science, yet computers fascinated me.

1991 was also the year when Linux was born. Probably outside of Linus' home, v 0.1 must have been installed in our lab in IIT Bombay! I had developed interests in software engineering aspects through use of Linux and working with many other server systems that we had recently procured in our lab. That, within months, led me to my interests in programming languages, how the compilers are built etc and I came across Aho and Ullman's red book, Principles of compiler design. I bought my personal copy then and it continues to be on my book shelf even today!

Concept of compilers fascinated me, given I had no formal training in CS.. So I started experimenting with concepts from Aho-Ullman book and started writing my experimental language parsers. I bought and read a book on the Backus-Naur Form of notations and started writing some toy programming language in the BNF notation and then starting to compile it with 'cc' on one of our Sun Solaris servers. I wrote my first language parser and compiler, autodidactically, by reading Aho-Ullman's 'red book'. Since then, of course, my interests have diversified and I haven't much worked in compilers for couple of decades now, but the mention of Aho and Ullman as rightful winners of Turing awards, took me back to the nostalgia that was 90s..

Congratulations Aho and Ullman on winning Turing award - richly deserved. And you will never know just how many lives you touched with your book and your work.

Monday, March 15, 2021

mRNA vaccines: A brief history of time


Covid-19 pandemic has ravaged the world throughout 2020. Amidst public health measures that varied across the globe, from full compliance to calling Covid19 a hoax, the scientific community was quietly working on the development of a vaccine for Covid-19. Here is a summary of vaccines for SARS-Cov-2, both approved and in development.

Vaccines Approved

Vaccines in Development

Pfizer, BioNTech’s BNT162b2

Bharat Biotech’s Covaxin

Moderna’s mRNA-1273

Univ of Oxford-AstraZeneca’s AZD1222

Sinovac’s CoronaVac

...

Russia’s Sputnik-V

...

Russia’s EpiVacCorona


China’s BBIBP-CorV



mRNA vaccines give our immune system genetic instructions to recognize the virus, without at any point in time introducing the virus (dead or alive or part or weakened) itself! An mRNA sequence is synthesized for virus’s spike protein (S-protein) and this sequence is introduced into our cellular mechanism to allow our cells to make the spike protein and thereby induce an immune response. The synthetic mRNA is packaged in a lipid nanoparticle that delivers the instructions to the cells. Once inside the cell, cellular machinery follows mRNA instructions to produce the viral spike protein, which then induces an immune response. This is not science fiction – this is real life! We insert instructions to our cells to make a protein from it that looks like a virus protein. The body then builds an immune response to that protein, so in future, if you are ever exposed to the virus, the immune system recognizes that spike protein on the virus and then destroys it before it can enter the cells!

I decided to put together a history in timeline for two most popular vaccines today, Moderna’s and Pfizer’s, which are both mRNA based vaccines.

What you may have heard or read:

Time

Description

64 days

Time it took Moderna to develop their vaccine and launch phase I trial

Jul 2020

Phase III trials began

Dec 2020

The vaccines were ready for deployment and around 3million people worldwide have already been vaccinated.


While everything you have heard and read is correct, the devil though is in the detail and there is a need to understand and appreciate the background work carried out by many scientists and particularly Katalin Kariko in most adverse conditions. In any scientific endeavour, there are hundreds and thousands of scientific researchers who give their everything and largely go unnamed, but in the success of both Moderna and Pfizer is one individual – Katalin Kariko!

I have tried to take descriptive narration out and summarize the history in a tabular form captured as a timeline. Hopefully this is useful, readable and informative.

Time

Description

1961

Messenger RNA (mRNA for short) was discovered by 9 scientists including Francis Crick (of the Crick-Watson double helix fame), Jacob, Brenner and Meselson (of the famous Meselson-Stahl experiment)

1976

Kariko first came to know mRNA in details after attending seminar in Hungary and became inspired to use it for therapeutics

1985

She moved to US from Hungary and joined Temple University as faculty

1990

Kariko moved to Univ of Pennsylvania (Upenn) following a dispute with her boss at Temple University who threatened to deport her.

1995

Through early nineties, she continued her work on using mRNA for drugs and therapeutics, but was not able to generate funding, as all her grant applications were rejected. Eventually, UPenn gave her 2 options – either leave or prepare for demotion

1995

Same year, she was diagnosed with Cancer – so given her circumstances and her desire to pursue research for using mRNA for therapeutics, she decided to stay on and take the humiliation of a demotion at Upenn

1997

In front of a largely dysfunctional copier machine, she met Drew Weissman, who had recently joined Upenn and had approved grants. He became interested in her work and decided to partly fund her experiments and started a partnership

2005

Kariko and Weissman published a paper announcing a modified form mRNA – which is congenial to easy acceptance by the immune system. Normally, we know of 4 bases in DNA, namely, A, T, C and G. In RNA, the T is replaced by a U. Their paper talked of replacing the U with 1-methyl-3’-pseudouridylyl in a synthetically created mRNA. This is generally denoted by greek letter, ψ.

For next 5 years, no additional funding came, not much interest was generated.

2010

Derrick Rossi got inspired by her paper and founded Moderna

2010

Kariko and Weissman licensed their technology to small German company BioNTech.

2012

UPenn refused to renew her faculty contract (since demotion) and told her “she was not faculty quality”

2013

Kariko accepted senior VP role at BioNTech

2017

Moderna began developing Zika virus vaccine based on mRNA

2018

BioNTech and Pfizer started co-working on development of mRNA vaccine for influenza. The landmark paper of 2005 and use of ψ is an integral part of the Pfizer’s vaccine.

Jan 2020

Within weeks, Chinese scientists had sequenced the SARS-Cov-2 virus. A synthetic mRNA sequence is extracted that corresponds to the spike protein.

Vaccine’s trick #1: A clever lipid packaging system delivers this (synthetic) mRNA into our cells.

Feb 2020

Pfizer’s vaccine development revolved around the use of ψ in mRNA sequence.

Vaccine’s trick #2: Cells are extremely unenthusiastic about foreign RNA and try hard to destroy it before it does anything. But the vaccine needs to go past immune system. The use of placates the immune system and interestingly it is however treated as a normal U by relevant parts of the cell.

Mar 2020

In 64 days, Moderna had completed the development of their mRNA vaccine and BioNTech had also reached similar completion stage.

Jul 2020

Phase III trials began

Nov / Dec 2020

The world is ready for vaccination with two leading mRNA vaccines that are 95% efficacious!


In scientific pursuit, never follow only the news in the media – it serves us better to find the whole truth. The vaccines were not developed in one year, as claimed – they are largely a result of tireless pursuit of one woman over 3 decades along with other inspired scientists who ensured that the recipe was ready, come Jan 2020. Technically speaking, we have waited 59 years since 1961 for this day! Yes, the arrival of SARS-Cov-2 virus definitely fast paced the latter development.

Katalin Kariko is directly responsible for the Pfizer vaccine, while an inspiration to Rossi and why Moderna was created! Remember the name, she might just feature in the news as a future Nobel Laureate!

Monday, March 8, 2021

FC1 and FC2 - a tale of two genes

This short article is largely my fictional essay. When the scientific community first put together the human genome in early 2000 and published subsequently, we suddenly discovered that there are more than 20,000 genes in our genome. We are still discovering new genes and their functions through a rigorous scientific protocol. These genes are either named based on their location, or their function. They are appended with a number if there are more genes doing the same thing but with a small difference. Metaphorically speaking, humans are discovering newer genes also based on human behaviour. While there is a MAGA gene doing rounds lately in the other half of the world, in our own backyard, two genes were unearthed during these last 6 months of the pandemic. Based on location and function, I choose to call them FC1 and FC2. Their resemblance to our WhatsApp group names is purely coincidental. Other than location and function, these 2 genes also have another attribute – behaviour!

Some of the most fundamental questions concerning our evolutionary origins, our social relations, and the organization of society are centred around the issues of altruism and selfishness. Experimental evidence indicates that human altruism is a powerful force and is unique in the animal world. However, there is much individual heterogeneity and the interaction between altruists and selfish individuals is vital to human cooperation. Depending on the environment and circumstances, a minority of altruists can force a majority of selfish individuals to cooperate or, conversely, a few egoists can induce a large number of altruists to defect. Current gene-based evolutionary theories cannot explain important patterns of human altruism, pointing towards the importance of both theories of cultural evolution as well as gene–culture co-evolution.

In evolutionary biology, an organism is said to behave altruistically when its behaviour benefits other organisms, at a cost to itself. Altruistic behaviour is largely considered more surreal and noble in nature, whereas its opposite, the selfish behaviour is more common in animal world. In everyday parlance, an action would only be called ‘altruistic’ if it was done with the conscious intention of helping another. But in the biological sense there is no such requirement. Indeed, some of the most interesting examples of biological altruism are found among creatures that are (presumably) not capable of conscious thought at all, e.g. insects.

Altruistic behaviour is common throughout the animal kingdom, particularly in species with complex social structures. There are plenty of examples of vampire bats, vervet monkeys, helper birds, meercats volunteering an individual to watch out for a predator essentially putting its life at risk. Such behaviour is maximally altruistic. From a Darwinian viewpoint, existence of altruism is puzzling. Natural selection leads us to expect animals to behave in ways that increase their own chances of survival.

Human societies represent a huge anomaly in the animal world. They are based on a detailed division of labour and cooperation between genetically unrelated individuals in large groups. This is obviously true for modern societies like ours. Why are humans so unusual among animals in this respect? Human altruism goes far beyond that which has been observed in the animal world. Among animals, fitness-reducing acts that confer fitness benefits on other individuals are largely restricted to kin groups. On the other hand, humans have the unique ability to form and cooperate within large social groups, which include many genetic strangers. For example, humans invest time and energy in helping other members in their neighborhood and make frequent donations to charity. They come to each other’s rescue in crises and disasters. They respond to appeals to sacrifice for their country during a war, and they put their lives at risk by helping complete strangers in an emergency.

Plato argues in his treatise, ‘The Republic’, that the soul comprises of three parts, rational, appetitive and spirit. He says, for a community to be just, every element has to perform the role to the best ability. He combined the concept of soul as defined by Socrates and Pythagoras before him.

Sigmund Freud presented an alternative theory of ego, superego and id. The id is trying to get you to do things and the superego is trying to get you to make good decisions and be an upstanding person. So the id and superego are always fighting with each other and the ego steps in between the two.

Both of the above abstractions, try to explain human behaviour at an individual and community level.

In the literature, two largely eminent theories of altruism are discussed, which are both mathematically founded and have overwhelming empirical evidence. They are the ‘kin selection theory’ and ‘reciprocal altruism theory’. The kin selection theory says that natural selection would favour behaviours that benefit those organisms or others who share their genes, e.g. closely related kins. On the other hand, reciprocal altruism involves shared altruism between neighbours as a reciprocal act of kindness either directly or at some point in time in future. ‘Competitive altruism theory’ explains other forms of altruism that can not be explained by these two theories. For example, acts of volunteering and charity for non-kin groups.

To some extent, the idea that kin-directed altruism is not ‘real’ has been fostered by the use of the ‘selfish gene’ terminology used by Richard Dawkins in his famous book by same name. A ‘selfish gene’ story can by definition be told about any trait, including a behavioural trait.

The origin of “The Selfish Gene” is intriguing. Dawkins revealed in the first volume of his memoirs, “An Appetite for Wonder”, that the idea of selfish genes was born ten years before the book was published. The Dutch biologist Niko Tinbergen asked Dawkins, then a research assistant with a new doctorate in animal behaviour, to give some lectures in his stead. Inspired by Hamilton, Dawkins wrote in his notes (reproduced in An Appetite for Wonder): “Genes are in a sense immortal. They pass through the generations, reshuffling themselves each time they pass from parent to offspring ... Natural selection will favour those genes which build themselves a body which is most likely to succeed in handing down safely to the next generation a large number of replicas of those genes ... our basic expectation on the basis of the orthodox, neo-Darwinian theory of evolution is that Genes will be 'selfish'.”

As an example of how the book changed science as well as explained it, a throwaway remark by Dawkins led to an entirely new theory in genomics. In the third chapter, he raised the then-new conundrum of excess DNA. It was dawning on molecular biologists that humans possessed 30–50 times more DNA than they needed for protein-coding genes; some species, such as lungfish, had even more. About the usefulness of this “apparently surplus DNA”, Dawkins wrote that “from the point of view of the selfish genes themselves there is no paradox. The true 'purpose' of DNA is to survive, no more and no less. The simplest way to explain the surplus DNA is to suppose that it is a parasite.” Four years later, two pairs of scientists published papers in Nature magazine formally setting out this theory of “selfish DNA”.

So, as a corollary to the competitive altruism theory, I can think of a theory that explains selfishness rather than altruism, which I can best describe as ‘competitive selfishness theory’.

I think we carry these altruist and selfish genes together in our DNA. They get expressed depending on the environment and circumstances. Let’s say FC1 is the altruist gene. It gets expressed routinely for our near and dear ones and those in immediate family (as per all theories mentioned above such as the kin selection, reciprocal altruism and competitive altruism). FC2, on the other hand, let’s say is the selfish gene. It gets expressed routinely for community related issues. Clearly, we have both of them. It is just their amount of gene expression depends on environment and context. Largely, the empirical evidence witnessed during the last 6 months of the pandemic speaks volumes of which gene is expressed more. It is also true, that we have seen both get expressed simultaneously. What else explains a behaviour when the individual is altruistic in his/her environment but clearly selfish for the same issue when it comes to community?

If humans are the most evolved form in animal kingdom and the only form capable of cognitive thinking, then we need to do much better than just expressing the FC2 gene most of the time. That is a behaviour genetically coded in us, by just virtue of us being the living species of animal kingdom. Every other species does that too. While a noble, just, fair and charitable community is clearly a Utopian idea, should the FC1 in us not get expressed at least as much as FC2, if not more? I think it should. What do you think?

Sunday, March 7, 2021

New beginning

March  2021 is not just a new month/year - its a year after the pandemic set foot on this planet and changed all of us - whether we liked it or not, whether we agreed with it or not.

So, my thought too is to revive my blog series. You would see I made many attempts before (since 2010) and have largely got caught up in following dilemmas:

  1.  Should I publish only after I write a meaningful post? The problem with this is it cant be periodic, but I have control over what I publish.
  2. Should I publish periodically with  not-so-great contents ? The problem is obvious.

I have decided to find the middle ground and decided to start publishing, both periodically and not wait for a long article, but publish one liners, paras, pages or articles as they appear important to me at that time. So, here is going to be another start on the lines of Hugh Prather's - "Notes to myself".

Monday, September 30, 2019

The Elusive Moon


ISRO’s recent mission to the moon, Chandrayaan-2, aroused interest amongst masses in space science. By itself, this is no mean achievement and probably the biggest contribution of the mission, besides its the scheduled objectives; for, it is rare that in our country, everyone at nook and corner talks about science! This article, though, is neither about reporting on the mission nor about the events that surrounded it but more about sharing my own journey through the couple of weeks that followed early hours of Sep 7, 2019. There are many questions, but fewer answers, at least yet.
When many of us were glued to the TV at around 1:50 AM on Sep 7, and it was obvious that something had gone wrong, the world waited with bated breath to hear the official word from ISRO. All that was told was that the communication with the lander was lost at 2.1 km above the lunar surface. My curiosity needed more information, but unfortunately, none was forthcoming.
That is when I started getting in touch with a few radio astronomers around the world on Twitter. I casually started interacting, specifically, with Dr Cees Bassa, Scott Tilley, Edgar Kaiser and Richard Stephenson. Dr Cees Bassa is based out of Netherlands and is an astronomer at The Netherlands Institute for Radio Astronomy (ASTRON_NL) working with LOFAR, the low frequency radio telescope. Scott Tilley, in his own words, is an amateur astronomer with his own dishes and antennas at home that he uses to track radio signals from deep space. He is based out of Roberts Creek in British Columbia in Canada. Scott Tilley is a citizen scientist, who became famous for an accidental discovery he made of NASA’s lost spacecraft, IMAGE. Edgar Kaiser is based out of Kiel in Germany and has varied interests in Physics, computer modeling, maritime life besides radio astronomy and amateur radio. Richard Stephenson is responsible for the operations at the Canberra Deep Space Network complex and is based out of Canberra in Australia.
For most part, these “conversations” were monologue, for I was the student and I was learning from each of them. Each had a theory based on their own radio measurements on what exactly happened to the Chandrayaan-2 lander and each was convincing. This article is an account of those theories, and a few more believable conspiracy theories. I am capturing below each of those episodes that contributed to my better understanding of radio astronomy.
Episode 1: Dr Cees Bassa had published a plot of the Doppler data from the NASA JPL Horizons of chandrayaan-2 lander vs the line-of-sight velocity of the lander. He also had Doppler data from the Dwingeloo observatory to corroborate those findings. In fact, Dr Bassa has published his Python code for this and a few of independently executed to generate the same plot. A couple others contributed to his original version, to make it more impactful. The Doppler data output plot is as shown in Fig 1. While it may be difficult to read the plot, there are few salient points that stand out.
  1. Orange curve shows the planned orbit of the Chandrayaan-2 lander from the NASA JPL Horizons. That curve shows that the scheduled landing was close to 20:21 UTC (1:51 AM IST). Do recall that ISRO had always mentioned that the scheduled landing was at 20:23 UTC.
  2. The black curve shows the approximate Doppler data of the lander from the Dwingeloo radio telescope. This curve closely matches the orange curve in the rough braking phase when everything was going on fine.
  3. As can be seen from the figure, things started going wrong 15 seconds into the fine braking phase.
  4. The black curve suddenly stops around 20:20 UTC (1:50 AM IST) indicating loss of signal. From what was known till now, this was supposed to have coincided with the ISRO proclamation that the contact was lost at 2.1km above the lunar surface.

Figure 1: Dr Cees Bassa’s Doppler predictions
  1. If the orange curve coincided with the black curve (during rough braking phase it did), then that Doppler data is authentic. What is also authentic is that the orange curve coincided with the times of rough braking phase. With these two in mind, one can conclude that the landing time was around 20:21 UTC (as per orange curve) rather than 20:23 UTC as suggested by ISRO.
The above points only pointed to one plausible explanation that the contact was not lost at 2.1 kms above lunar surface but upon impact on the lunar surface. That meant it was a hard landing and it was logical to conclude the loss of communication was due to equipment damage from the hard landing.
Episode 2: ISRO announced on Sep 10, 2019 that the Chandrayaan-2 orbiter had taken a thermal image of the area near the proposed landing site of the lander and that the thermal image proved that the lander was lying on the lunar surface. Many unfounded, unaccounted statements quoting sources within ISRO started doing rounds and one of the suggestions was that the lander was lying on side (not on its 4 legs) and that scientists could see the elongated shadow of its two legs! ISRO unfortunately, never release this thermal image – so all speculations and conclusions are within the realm of fantasy.
Episode 3: There was another statement attributed to ISRO during these times that suggested that to take a better image of the lander, ISRO was considering lowering the altitude of the Chandrayaan-2 orbiter from 100 kms to 50 kms! Thankfully, that was never attempted. For, the orbiter was already conducting scientific data as per schedule and there was no point in putting that part of the mission in jeopardy just to take a better picture of the lander, which was by now, presumed crashed. Scott Tilley and Edgar Kaiser would tune their radio antennas towards the moon and start collecting Doppler data of the orbiter over 24 days duration. For days together, each day, they would independently measure and report the altitude of the Chandrayaan-2 orbiter to be around 101-105 kms. That was a reassurance that the orbit wasn’t lowered, and they would put in more hours next day to reassure themselves and res of us that orbiter wasn’t touched. During these measurements, both would share their assumptions and publish the data and would adjust their setup and run other controlled experiments – each time verifying that the dreaded decision wasn’t implemented. In the process, I learnt many news things about practical radio astronomy. Now, I know from where to source the Doppler data, how to analyze it and how to understand Azimuth and altitude of orbiter, radio telescope etc.
Episode 4: In the meanwhile, Richard Stephenson would help us understand plans of the Deep Space Network (DSN). The DSN is NASA’s international array of giant radio telescopes that support interplanetary spacecraft missions. The DSN also provides radar and radio astronomy observations. The DSN is operated by NASA JPL and consists of three facilities spaced equidistant from each other – approximately 120 degrees apart in longitude around the world. These sites are Goldstone in California, near Madrid in Spain and near Canberra in Australia. Stephenson works with the Canberra observatory. The strategic placement of these sites permits constant communication with spacecraft as our planet rotates – before a distant spacecraft sinks below the horizon at one DSN site, another site can pick up the signal and carry on communicating. Richard would tell us when one of the giant antennas at one of the three sites would start sending a probe signal towards the moon – in hope of reviving the communication onboard Chandrayaan-2 lander. These were powerful signals (approx. 11KW power) that were transmitted towards the moon and round-trip time of these signals is about 2.5 seconds after hitting the surface of the moon. If the lander had received these signals and was able to activate its communication, it would have. One of the 4 antennas at each of the 3 sites would always transmit these “Hello” signals to the lander each day round the clock! Sadly, no response. Richard Stephenson summed up thus: “If at the end of the 14-day recovery window, nothing is heard. You have to accept that @isro has attempted everything humanly possible from earth to recover their Lander.”
Episode 5: Finally, the word came that NASA’s LRO (Lunar Reconnaissance Orbiter) which also orbits the moon at about 100kms above the surface would be passing by the landing site of the lander and would try to take picture. The time wasn’t on our side. The LRO was to have crossed the area on Sep 19, which was close to the window when lunar day would be over, and 14 earth-days of lunar night would set in. As it turned out, on the day of the fly-over, the long shadows had started creeping in and LRO couldn’t take a picture of Vikram lander. I created this image (Fig 2) which was appreciated by many on Twitter which showed the lunar shadows and suspected landing site of the lander.

Figure 2: Blue dot is where the lander is and LRO was in the shadow region on Sep 19
Episode 6: There are many other theories doing the rounds. Some of them are preposterous, some are very believable. Here is a summary.
  1. It was clear from Dr Bassa’s chart, that 15 seconds into fine braking phase, something had gone wrong. That time, the lander was still nowhere near the lunar surface. So, what was it? Was a command wrongly issued? Was the config data wrongly fed in? Wasn’t the testing fully done? Were some failed tests waived? Worse still, was there a bug in the code? At this time, till ISRO answers, any of this is possible.
  2. The 4-legs of the lander were supposed to be horizontally aligned during rough braking phase and were to be turned by 90 degrees to orient them towards the lunar surface during fine braking phase. Is it possible that instead of -90, a +90 was entered that caused the lander to flip upside down? Then the automatic lander program (ALP) spent its time trying to recover to a stable configuration – but lost precious time and crashed. Again? Possible.
  3. If indeed the orientation was wrongly given, then its possible that the thrusters meant to put brakes to the lander accelerated it further. This explains change in line of sight velocity around 15 seconds into the fine braking phase.
In conclusion, this is not our domain. ISRO has access to all the data and will make their own investigations and hopefully publish the findings. The moon has remained elusive. Not just to Chandrayaan-2 lander on Sep 7, but to Israel’s Beresheet on Apr 11 as well. It is quite ironical that despite 40 years after Neil Armstrong set foot on the lunar surface, it is not easy to land there.
But for me, personally, these 2 weeks of learning was almost real-time and it has got me sufficiently excited to a stage where Scott Tilley has shown interest to help me design dishes and antennas that I can put up on my home terrace to start my own radio astronomy in future. In words of Scott and Edgar, they don’t have someone in this part of the world who regularly collects this data – so it could be me! All I was told that I needede are 1m dishes for S- and X-band, cross yagis for VHF and UHF, an LPDA for L- and S-band and an LPDA for VHF/UHF. Moon and other celestial folks, here I come 😊

Wednesday, October 10, 2018


Linux down the Memory Lane


This is the story of not just Linux down my memory lane, but also the story of why I fell for Linux and have remained a devout follower of it ever since. As folks now know, the reasons why Linux is preferred these days are very technical but for me besides being technically a better OS, the reason is also nostalgia.

Early 90s was a fascinating era for me. Let’s begin in 1991, now fondly known as the year of the Internet. The PC-XT (80186) and the higher-end PC-AT (80286) were just about proliferating work places and some homes. Intel 80386 processor based systems weren’t so common still. I had joined for my PhD in computer vision also in 1991.  In IIT Bombay, we had only a central computer centre (with the Cray X/MP super computer) from where there was a 64kbps VSAT link with the rest of the world for the Internet access. My department (electrical engineering) wasn’t even on the local network. In fact none of the departments were, except possible computer science department. I met two like-minded guys in my lab and we all started spending endless hours on improving infrastructure for the joy of doing it. First, we set up Ethernet cable from computer centre to our department, and then setup our department server which would be connected to the computer centre, so that we could login to the super computer by physically being in our department. We learnt about Ethernet, TCP/IP, networking, routing all on the job and without attending any course!

Now, with the “comfort” of accessing the Internet from the luxury of our own lab was achieved, one of my colleagues started looking out for more stuff and he found out about this guy called Linus Torvalds in Helsinki. While studying computer science at University of Helsinki, Linus began a project that later became the Linux OS. His reasons, too, were similar to ours. In those days, a commercial UNIX operating system for Intel 386 PCs was too expensive for private users. So, he wanted to build a free OS which could make the most of 80386 based PCs at the time. He apparently said once that if either the GNU or 386BSD had existed then, he may never have written his own.
Linus developed what he called “Freax” (for free freak unix, which later became Linux). He developed his OS on MINIX system, for which free code existed at that time. MINIX source code was released by Andrew Tannenbaum in his book “Operating systems: design and implementation”. Reason why freax had to be invented was, because Linus argued, that 16-bit design of MINIX was not well adopted to the 32-bit features of 80386 based computer architectures.

First version of Linux was launched on 25th August in 1991 by Linus. Probably the only other installation of Linux 0.0.1 in the world other than that by Linus, was in our lab and I still have the source code of the first ever Linux kernel ! Since the 0.0.1 kernel, I have pretty much used every other version released (especially in earlier days) and continue to remain an avid user of Linux till date. It’s fascinating to see Linux grow as I grew up.

Back in 1991, there was no Ubuntu, or RedHat or any other distribution of Linux available. The closest that came was H J Lu’s boot/root floppies. They were 5.25” 1.2MB diskettes that could be used to boot a system into Linux. One booted from the boot disk, and then, when prompted, one would insert root disk and after a while one would get the prompt. Back in those days, if one wanted to boot from the hard disk, then one had to use a hex editor on the master boot record of the disk and it wasn’t for the faint hearted ! These were the days when we could predict the life of the hard disk just by listening to the sounds it made !

This was all before a real distribution came in existence. The first such thing was the MCC Interim Linux (from Manchester Computing Centre). This was still console only Linux and no X. Shortly after there as a release from Texas A&M University, called TAMU 1.0A. This was the first distribution that let one run X. The first polished distro was Yggdrasil. One could boot from the floppy and run everything else from the CD (the equivalent of today’s Live CD). Folks don’t know this was in the days of 1x and 2x CD-ROM drives. Then, the distributions that followed were SLS Linux, SuSE, Debian and Slackware. Then there was the SCO Linux and after these came the Red Hat and Ubuntu.

In 1992, hearing of success of Linux, Andrew Tannenbaum wrote a Usenet article in the group comp.os.minix with the title “Linux is obsolete”. One should note that while Linus used MINIX for development, the principles of the OS were diametrically opposite to those held by Andrew at the time and also mentioned in the book. Andrew’s reasons why he thought Linux was obsolete, was primarily because kernel was monolithic and old-fashioned. Tannenbaum predicted that Linux would be obsolete soon. Rest is history as we today know where Linux is and where MINIX is or for that matter GNU Hurd, of which Andrew was a great proponent.

Today, the aggregate Linux server market revenues exceed that of the rest of the UNIX market. Google’s Linux based Android claims 75% market share of smart phones. Ubuntu claims 20,000,000+ users and kernel 4.0 is now released.

The free and open philosophy of Linux and the enterprising nature of Linus Torvalds left an indelible mark on me during my graduate days and I continue to respect the open community and hence have hardly used any other OS. My devices of choice today are Ubuntu based laptop and Android based phone.

Monday, October 8, 2018


Deep Learning and Genomics

Deep learning at work can be seen all around us. Facebook finds and tags friends in your photo. Google DeepMind’s AlphaGo beat many champions at the ancient game of Go last year. Skype translates spoken conversations in real time. Behind all these are deep learning algorithms. But to understand the role deep learning can play in ever fascinating umbrella branches of Biology, one has to understand what is deep in learning? I would skip the definition of learning here for the sake of brevity. The “smart” in “smart grid”, “smart home” and other such was equally intriguing initially and eventually turned out to be a damp squib. You will be surprised if “deep” could end up as smart’s ally eventually.

There is nothing ‘deep’ in deep learning in the colloquial sense of the word (well, there will be many who may want to jump on me for saying this and try proving just why deep learning is deep – but hold on). Deep learning is simply a term used to describe learning by a machine in a way similar to how humans learn. Now here is the dichotomy. We are still struggling to fully understand how the brain functions, but we do know how deep learning should model itself after the way brain operates! This reminds me of my problem in my PhD days in the late 90s in computer vision, the branch that deals with making machines see things as humans do. Back then, David Marr of MIT had written a seminal book on Vision popularly known as “Vision by Marr” that spent a whole lot explaining the neuroscience behind vision and how computer models should mimic that behavior. Computer vision seemed a saturated field in 90s though, as just how much maths and algorithms can be invented by looking at 2D array of numbers (pixels in an image)? But recent developments in machine learning and deep learning have brought focus right back to computer vision. And these days, folks don’t write the crazy low level image processing algorithms I used to write back then! They just show the algorithm 10,000 images of dogs and cats and then after ‘learning’ the computer is given another unknown image with a dog or cat and it would tell which is which with incredible accuracy. Doing these tasks of learning and prediction in the assumed model of how brain functions, namely the neural network, led to the development of field of artificial neural network (ANN). So any ANN that thinks like brain (at least as we think so) and produces results that are acceptable to all of us, generally speaking, is called deep learning.

There are two thoughts that I came across at different points in time that have shaped my professional career. One was by Jim Blinn. In his column in IEEE Trans. on computer graphics, vision and image processing in the 80s, he once wrote in the context of maturity of computer graphics at the time, that practical solutions should not necessarily be driven by theory. One should experiment and then use theory to explain why the best result one got, should work anyways. This is the essence of machine learning and deep learning. There is data and more data. If there isn’t enough, we carry out data augmentation and add more data, try multiple splits of training data as training and validation, then use multiple models to find accuracy of that model, whether it over-fits or doesn’t etc and then choose the best model. As a practicing data scientist, I can say there is no single approach at the outset that sets the path for required results. There is exploration and experimenting. Unfortunately, Blinn’s thesis can’t be applied to deep learning here after, for even after one gets the desired results, there is no direct way of applying theory to figure out why it should work anyways. In fact, many researchers have dedicated their lives figuring out why deep learning should work anyway and there is no consensus. Geoff Hindon and a few others perilously kept the branch of machine and deep learning alive during the years when it seemed saturated and while at the same time, scale became possible and now with multi-core CPUs and more importantly powerful GPUs (and now TPUs), artificial neural networks yield surprisingly fast and acceptable results, without anyone quite able to explain why it works anyways. Prof Naftali Tishby and his team have the most credible work to their credit. Called “information bottleneck”, they use concepts from information theory to explain why deep learning models should work. It is a fascinating field and still under development and many including Hindon have agreed that information bottleneck is a real mathematical tool that attempts to explain neural networks in a holistic way. But at the level of a practicing deep learner today, one tries tens of models and chooses the one that gives best results (or chooses an ensemble) and use accuracy or any other metric to crown it as the best among the equals and leave at that, for theory plays no further role.

The second thought is from Prof Eric Lander of the MIT. I had taken his online class on ‘Secret of life 7.00x’ in 2014. He has a PhD in Mathematics (information theory) and he got interested in Biology and became the principal face of the Human Genome project in 2000. In one of the classes, he had said that as a student one should build skills to learn all tools available and then later choose from them to problems at hand, as you never know which one is helpful when. He used his maths training in solving many tasks in the human genome project. He is singularly responsible for revival of my interest in Biology again. His course was a fascinating time travel in the fields of biochemistry, molecular biology and genetics and then an overall view of genomics. Interestingly for me, the timing was correct. 2014 onwards was also the time when machine learning and deep learning was sweeping the technology landscape and with my fresh perspective in Biology, I decided to work on applying deep learning to genomics.

In this article, I don’t intend to either use too much of technical jargon or make it look like a review article, so will skip many details. But I will say how I got involved in using deep learning with genomics. Genomics is a challenging application area of deep learning that entails unique challenges compared to others such as vision, speech, and text processing, since we have limited ability ourselves to interpret the genome information but we would expect from deep learning a super human intelligence to explore beyond our knowledge. There is still much in the works and as yet a watershed revolution has not been round the corner in deep genome. In one of the classes, Prof Lander was explaining the Huntington’s disease. Huntington’s disease is a rare neurological disease (five in 100,000). It is an unusual genetic disease. Most diseases are caused by recessive alleles, and people fall ill only if they get two copies of the disease allele, one from each parent. But Huntington’s disease is different, the allele that causes it is dominant and people only have to receive one copy from either parent to contract it. Most genetic diseases cause illness early in life, whereas Huntington sets in around midlife. Prof Lander went on to explain the works of David Botstein and Gusella where they identified the genetic marker linked to Huntington’s disease on chromosome 4 through a series of laborious experiments.  The idea was to use positional cloning and genetic markers (polymorphisms) to locate a gene that you don’t know where to look for. This work was carried out in 1983 when there was no human genome identified.

This introduction was good enough for me to get initiated in genomics. After all, we are looking for the unknown most of the time, and for a change we have a human genome now. So the thought is can we use markers to identify and locate specific genetic condition? Deep learning is good at doing boring tasks with incredible accuracy and bringing insight that may be humanly impossible. With computational speed available at hand, doing searches in blind alleys using deep learning is incredibly powerful and may hitherto lead to insights not intended for in the beginning.
Genomic research targets study of genomes of different species. It studies roles assumed by multiple genetic factors and the way they interact with surrounding environment under different conditions. A study of Homo sapiens involves searching through approximately 3 billion base pairs of DNA, containing protein coding genes, RNA genes, cis-regulatory elements, long range regulatory elements and transposable elements. Where this field intersects deep learning has far reaching impact in medicine, pharmacy, agriculture etc. Deep learning can be very useful in exploring gene expression, including its prediction, in regulatory genomics (i.e. finding promoters and enhancers), splicing, transcription factors and RNA-binding proteins, mutations/ polymorphisms and genetic variants among others. The field is nascent though. The predictive performances in most problems have not reached the expectation for real-world applications; neither the interpretations of these abstract models seem to elucidate insightful knowledge.

As the “neural” part in Artificial Neural Network (ANN) suggests, the ANNs are brain-inspired systems which are intended to replicate the way that we humans learn. Neural networks consist of input and output layers, as well as (in most cases) a hidden layer(s) consisting of units that transform the input into something that the output layer can use. Deep learning tools, inspired by real neural networks hence, are those algorithms that use a cascade of multiple layers neurons each serving a specific task. Each successive layer uses the output from the previous layer as input. While at the outset, I did say that there is nothing ‘deep’ about deep learning, technically one can say that just how deep a network is depends on the number of hidden layers deployed. The more the layers, the deeper is the network. They are excellent tools for finding patterns which are far too complex or numerous for a human programmer to extract and teach the machine to recognize. While neural networks existed since 1940s as perceptrons, they have become a serious tool for use only after 80s due to a technique called backpropagation, which allows networks to adjust their hidden layers of neurons in situations where outcome does match the expected. There are many types of neural networks. The most basic type is the feedforward type, the more popular is recurrent type and then there are convolutional neural networks, Boltzmann machines, Hopfield networks amongst others. Picking the right network depends on the data one has to train it with and the specific application in mind.

Hopefully, some day, we would be able to place all jigsaw pieces of the puzzle together. We would then be able to not only get good results, but have information bottleneck or any other tool explain why it should work anyways. And hopefully, that substantial, deep learning could pave way to provide deeper insights (no pun intended) on just how the brain works.