Who does not like to be an Arthur C Clarke, Isaac Asimov and Dan Brown all in one ! I would not mind a bit ;-) But then being the curious one, I started looking for pattern and it was obvious, not for once, the IWL analysis ever said I wrote like a famous English literateur. I was never quite close to Charles Dickens for sure, never close to Ernest Hemingway, not D H Lawrence, not Forsyth, not even Robin Cook. The pattern started emering. All of my blog articles are related to articles on technology and science and may be that is why names like Arthur C Clarke and Isaac Asimov sprang. Just to test this notion, I pasted a paragraph from a letter I had written to my parents some time back (not about technology and science) and lo and behold. It said I wrote like Charles Dickens!
So much about entertainment. Surely the concept is catchy and provides interesting insights for any one curious enough. Equally surely, it can not be an exact science, and it is not. But simply the idea of an algorithm that can provide traces of influence in writing has proven wildly popular.
Who is behind IWL? Though the site might seem the idle dalliance of an English professor on summer break, it was created by Dmitry Chestnykh, a 27-year-old Russian software programmer currently living in Montenegro. Though he speaks English reasonably well, it's his second language. In his own words, Dmitry wanted it to be educational. Chestnykh modeled the site on software for e-mail spam filters. This means that the site's text analysis is largely keyword based. Even if you write in short, declarative, Hemingwayesque sentences, its your word choice that may determine your comparison. Most writers will tell you, though, that the most telling signs of influence come from punctuation, rhythm and structure. I Write Like does account for some elements of style by things such as number of words per sentence.
Chestnykh says “Actually, the algorithm is not a rocket science, and you can find it on every computer today. It’s a Bayesian classifier, which is widely used to fight spam on the Internet. Take for example the “Mark as spam” button in Gmail or Outlook. When you receive a message that you think is spam, you click this button, and the internal database gets trained to recognize future messages similar to this one as spam. This is basically how “I Write Like” works on my side: I feed it with “Frankenstein” and tell it, “This is Mary Shelley. Recognize works similar to this as Mary Shelley.” Of course, the algorithm is slightly different from the one used to detect spam, because it takes into account more stylistic features of the text, such as the number of words in sentences, the number of commas, semicolons, and whether the sentence is a direct speech or a quotation.”
Chestnykh has uploaded works by about 50 authors — three books for each, he said. That, too, explains some of its shortcomings. Melville, for example, isn't in the system. But Chestnykh never expected the sudden success of the site and he plans to improve its accuracy by including more books and adding a probability percentage for each result. He hopes it can eventually be profitable.
Whatever the deficiencies of I Write Like, it does exude a love of writing and its many techniques. The site's blog updates with inspiring quotations from writers, and Chestnykh — whose company, Coding Robots, is also working on blog editing and diary writing software — shows a love of literature. He counts Gabriel Garcia Marquez and Agatha Christie among his favorites.
Whatever the strengths and weaknesses of IWL, it is sure that the algorithm does work and work well for almost any writing you submit. It analyzes with a certain probability and brackets you the author with someone well known. It is expected that each article we write has a different style and probably what is really required is another meta-level algorithm that can take various articles from an author and rather than saying that one writes like Arthur C Clarke, other like Isaac Asimov and Dan Brown, it should say your set of articles have a writing style like Isaac Asimov (I would like to hear it that way ;-)
Be that as it may, the educational value is there. This is by far the best known example of Bayesian classification I have heard and another point in the case for making teachings of quantitative methods in probability and statistics more interesting than it is !