Sunday, July 18, 2010

I Write Like Isaac Asimov!

"I Write Like" is an online tool that helps you find your inner author. The website "I Write Like" (http://iwl.me) has erupted online and scores of writers are tempted to go and check it online to see just who they write like. I Write Like is both entertainment and education. I have read Charles Dickens a lot in my life and he may have influenced a writing style subconsciously. So I was determined to find who I write like. The way the site works is simple. You go to the website and cut-and-paste your writings and press "analyze" button. And the website, without any explanations, tells you, that you write like ABC or XYZ. I pasted one of my older blog articles and the analysis had it that I write like "Arthur C Clarke". Hmm.. I thought I wrote some serious thought provoking proses and not science fiction! So I submitted a few of my other paragraphs from other older articles. The analysis indicated that I wrote, at times, like Isaac Asimov, at other times like Dan Brown and still at other times like Stephen King!

Who does not like to be an Arthur C Clarke, Isaac Asimov and Dan Brown all in one ! I would not mind a bit ;-) But then being the curious one, I started looking for pattern and it was obvious, not for once, the IWL analysis ever said I wrote like a famous English literateur. I was never quite close to Charles Dickens for sure, never close to Ernest Hemingway, not D H Lawrence, not Forsyth, not even Robin Cook. The pattern started emering. All of my blog articles are related to articles on technology and science and may be that is why names like Arthur C Clarke and Isaac Asimov sprang. Just to test this notion, I pasted a paragraph from a letter I had written to my parents some time back (not about technology and science) and lo and behold. It said I wrote like Charles Dickens!

So much about entertainment. Surely the concept is catchy and provides interesting insights for any one curious enough. Equally surely, it can not be an exact science, and it is not. But simply the idea of an algorithm that can provide traces of influence in writing has proven wildly popular.

Who is behind IWL? Though the site might seem the idle dalliance of an English professor on summer break, it was created by Dmitry Chestnykh, a 27-year-old Russian software programmer currently living in Montenegro. Though he speaks English reasonably well, it's his second language. In his own words, Dmitry wanted it to be educational. Chestnykh modeled the site on software for e-mail spam filters. This means that the site's text analysis is largely keyword based. Even if you write in short, declarative, Hemingwayesque sentences, its your word choice that may determine your comparison. Most writers will tell you, though, that the most telling signs of influence come from punctuation, rhythm and structure. I Write Like does account for some elements of style by things such as number of words per sentence.

Chestnykh says “Actually, the algorithm is not a rocket science, and you can find it on every computer today. It’s a Bayesian classifier, which is widely used to fight spam on the Internet. Take for example the “Mark as spam” button in Gmail or Outlook. When you receive a message that you think is spam, you click this button, and the internal database gets trained to recognize future messages similar to this one as spam. This is basically how “I Write Like” works on my side: I feed it with “Frankenstein” and tell it, “This is Mary Shelley. Recognize works similar to this as Mary Shelley.” Of course, the algorithm is slightly different from the one used to detect spam, because it takes into account more stylistic features of the text, such as the number of words in sentences, the number of commas, semicolons, and whether the sentence is a direct speech or a quotation.”

Chestnykh has uploaded works by about 50 authors — three books for each, he said. That, too, explains some of its shortcomings. Melville, for example, isn't in the system. But Chestnykh never expected the sudden success of the site and he plans to improve its accuracy by including more books and adding a probability percentage for each result. He hopes it can eventually be profitable.

Whatever the deficiencies of I Write Like, it does exude a love of writing and its many techniques. The site's blog updates with inspiring quotations from writers, and Chestnykh — whose company, Coding Robots, is also working on blog editing and diary writing software — shows a love of literature. He counts Gabriel Garcia Marquez and Agatha Christie among his favorites.

Whatever the strengths and weaknesses of IWL, it is sure that the algorithm does work and work well for almost any writing you submit. It analyzes with a certain probability and brackets you the author with someone well known. It is expected that each article we write has a different style and probably what is really required is another meta-level algorithm that can take various articles from an author and rather than saying that one writes like Arthur C Clarke, other like Isaac Asimov and Dan Brown, it should say your set of articles have a writing style like Isaac Asimov (I would like to hear it that way ;-)

Be that as it may, the educational value is there. This is by far the best known example of Bayesian classification I have heard and another point in the case for making teachings of quantitative methods in probability and statistics more interesting than it is !


Monday, July 5, 2010

Theory and practice

The FIFA world cup and schools reopening in India after summer were both partially responsible for my slump in the frequency of my blogs in the last month. Coming out of hibernation of sorts, I felt this time I should touch upon a topic that spans across all my technology domain areas. While I have written earlier about the role of innovation, this time around, I want to focus on a point that addresses whether, in any domain, theory indeed precedes practice. That is, for any technology, whether theoretical foundations are worked upon first before they are put into practice. This is a highly debatable and questionable topic - all the more reason I thought I should share my viewpoint on this.

When Computer Graphics, as an area was still evolving and still in its early days, I happened to read a column titled "Jim Blinn Corner" that used to appear in the IEEE transactions on Computer Graphics and Applications in early 1980s. Jim Blinn was considered a father-figure in the area, having worked on simulations of NASA JPL's Voyager project, as well as the 3-D simulations for the TV series Cosmos by Carl Sagan and for his research into many areas of computer graphics algorithms including shading models.

In one of his articles (dont recall specifically which one), he was discussing the topic of the title of this article. He argued whether theory should be developed first and only then algorithms should be developed. Considering that rasterization and implications of continuous domain into the discrete ones were not fully understood then, his primary goal was to solve the problem at hand. That meant carrying out some or the other simulation successfully. This required him to experiment a lot and developing theory was not necessarily an option for him at the time. His explanation that one should experiment a lot and when one is happy with an algorithm, then use all the governing laws and principles in the area to explain why it should work anyway, had a kind of an impact on me that has also shaped my later years. This is counter to the premise that theory precedes its applications and kind of puts the cart before the horse and argues that even theoretical development of the domain is aided when it is supplemented by practical products in the area.

While Jim Blinn was talking about graphics in that era, when he made the comment, it is clearly a generic comment that applies to all evolving domains that need practical solutions. Let us look at some of them I am working on and see how that can help
  1. Computer vision is much like computer graphics and derives much of its first principles from there, so surely all algorithmic development under image processing and computer graphics can happen first followed by a theoretical explanation of why it should work anyway.
  2. Mobile handsets is another areas. In an era of Apple iPhone, and android phones and many other intuitive designs, it is difficult to evolve the technology first. Solutions are made and then theory is used to explain why it will work anyway.
  3. I talked of harnessing solar energy (and also other renewable energy forms such as wind) in my last article and also addressed why research has not been complete in the area. There is a case for developing products, intuitive or counter-intuitive first, and then use our knowledge of physics and semiconductors to explain why it should work anyway.
While am completely aware of the fact that theoretical physicists frown upon their experimental counterparts and least likely are going to be impressed by the thesis in this article, the idea really is to take the debate beyond the boundaries of theory and experimentation, and take it to a point where it only helps solve a problem. More likely, the concept of innovation always operates in technological domains where groundwork in terms of development of theoretical concepts is always in inphancy and as a rule one needs to look at an approach to develop the domain. Computer graphics is richer because of Jim Blinn's thought process then, and many areas will benefit simiarly if we come out of the traditional thought process.

Technology, by definition, works at applying concepts evolved in science and engineering for day-to-day use in such a way that the human race benefits overall. In such a scenario, for a technology success, solving peoples' problems becomes the stated problem. That problem can be solved either by developing theory first (if we are lucky) or by developing products first and then explaining in theory, why it should work anyway.

In the larger scheme of things, theory and practice are both mere tools and they need to used intelligently and judiciously. It can then be left as a matter of personal opinion whether one approach is right against the other.