Thursday, October 20, 2011

Sentiment analysis using Naive Bayes Algorithm

Experimented with simple Naive Bayes for sentiment classification.

Naive Bayes code is available  here chatper6/docclass.py and training data is available here

Changed the getwords() function in docclass.py

- to remove special characters like single-quote, comma, full stop from text
- to split based on white spaces instead of non word character because it ignored emots with non word   character split and
- included nltk stopwords corpus check.

[sourcecode language="python"]

def getwords(doc):
doc=re.sub('\.+|,+|!+|\'','',doc)
splitter=re.compile('\\s+')
#print doc
# Split the words by non-alpha characters
words=[s.lower().strip() for s in splitter.split(doc)
if s.lower().strip() not in nltk.corpus.stopwords.words('english') ]
print words
# Return the unique set of words only
return dict([(w,1) for w in words])
[/sourcecode]

For training data, converted ';;' separated data file to '\t' separated file because csv.reader() function
was not accepting  two symbol delimiters.

Changed sampletrain function to train classifier on training data file "testdata.manual.2009.05.25".

[sourcecode language="python"]
def sampletrain(cl):
read = csv.reader(open('pos 1', 'rb'), delimiter='\t')
cnt = 1
for row in read:
if row[0] == 0:
sent = 'bad'
else:
sent = 'pos'
data = row[5]
cl.train(data,sent)
cnt = cnt+1
print cnt
[/sourcecode]

Wednesday, October 5, 2011

Socially adept programmers

The programmer stereotype as described in personality traits of great programmer
The stereotypical programmer is a shy young man, either scrawny or overweight, who works by himself in an 8’x8’ cubicle in a bigger room of dozens cubicles, each holding someone just like him. He intensely concentrates on writing cryptic instructions to coax a computer to do what is needed. He  devotes his evenings, weekends, and summers to work. He has no social life and any hobbies he may have resemble his work. In some companies he is regarded as an indispensable genius; in others he is tolerated as an eccentric artist. (McConnell, 1999)

Programmers who value social image present themselves so to conform to a perception of society's preferred type of personality. Some ways in which they manipulate  perception of society are

  • When asked to stay little longer they decline by saying that they have some personal commitments, they need to spend time with family, when, in fact they will be working on some open source project or breaking into high profile gov network.



  • They don't use Facebook or twitter often, so they develop a program that autonomously posts status and comment on others feed periodically(using some NLP and ML techniques) to show that they spend a lot of time on social networks and are social.



  • They are aware that in social conversation what matters is not the correctness of the argument, but how much laughter it provokes and how much interesting it is.



  • They don't use IT jargon in social conversations, even if they know everything about them and in fact, make fun of people who use them(making fun of others is most frequently used technique in social conversations).



  • They identify themselves with hippie programmers instead of nerd programmers.



  • They have a girlfriend or at least this is what they tell to others.



  • They never tell programming as their hobby even if programming is at the top on the list.