Lazzy Scientist: Sentiment analysis using Naive Bayes Algorithm

Thursday, October 20, 2011

Sentiment analysis using Naive Bayes Algorithm

Experimented with simple Naive Bayes for sentiment classification.

Naive Bayes code is available here chatper6/docclass.py and training data is available here

Changed the getwords() function in docclass.py

- to remove special characters like single-quote, comma, full stop from text
- to split based on white spaces instead of non word character because it ignored emots with non word character split and
- included nltk stopwords corpus check.

[sourcecode language="python"]

def getwords(doc):
doc=re.sub('\.+|,+|!+|\'','',doc)
splitter=re.compile('\\s+')
#print doc
# Split the words by non-alpha characters
words=[s.lower().strip() for s in splitter.split(doc)
if s.lower().strip() not in nltk.corpus.stopwords.words('english') ]
print words
# Return the unique set of words only
return dict([(w,1) for w in words])
[/sourcecode]

For training data, converted ';;' separated data file to '\t' separated file because csv.reader() function
was not accepting two symbol delimiters.

Changed sampletrain function to train classifier on training data file "testdata.manual.2009.05.25".

[sourcecode language="python"]
def sampletrain(cl):
read = csv.reader(open('pos 1', 'rb'), delimiter='\t')
cnt = 1
for row in read:
if row[0] == 0:
sent = 'bad'
else:
sent = 'pos'
data = row[5]
cl.train(data,sent)
cnt = cnt+1
print cnt
[/sourcecode]

Lazzy Scientist

Thursday, October 20, 2011

Sentiment analysis using Naive Bayes Algorithm

No comments:

Post a Comment

Blog Archive