Oracles Demo mail package was available on OTN site previously, but this time when I searched I got 404 error. So, I thought of keeping a copy for future references.
Its a wrapper program to send mail using pl/sql. It hides the low level SMTP protocol details.
gist link of package
Friday, December 30, 2011
Monday, November 7, 2011
Unbiased Performance Evaluation
Sunday, November 6, 2011
Padovan String Sequence Problem
Problem Find number of occurrences of input string in nth Padovan String P(n)
where P(0) = 'X', P(1)='Y' and P(2)='Z'
Input Parameters: needle string str to search for and number n
Conditions to handle:
if number n is >= 40 return -1
if string str has any other characters then X,Y and Z return -1
Solution
Solved this problem for preliminary round of one programming contest.
PadovanString.java
[code language="java"]
public class PadovanString {
public int stringOccurrences(int n, String str){
if (n >= 40)
return -1;
if (str.replaceAll("X|Y|Z", "").length() > 0)
return -1;
String res= pad(n);
/* print nth Padovan String
System.out.println(res);*/
/*replace search string with null and caculate number of occurences*/
return (res.length() - res.replaceAll(str, "").length())/(str.length());
}
public String pad(int n){
if (n == 0) return "X";
if (n == 1) return "Y";
if (n == 2) return "Z";
else return pad(n-2) + pad(n-3);
}
}
[/code]
Test.java
[code language="java"]
public class Test {
public static void main(String []args)
{
PadovanString p = new PadovanString();
System.out.println(p.stringOccurrences(20,"YZ"));
}
}
[/code]
Also, Length of Nth padovan string = Nth Padovan number
where P(0) = 'X', P(1)='Y' and P(2)='Z'
Input Parameters: needle string str to search for and number n
Conditions to handle:
if number n is >= 40 return -1
if string str has any other characters then X,Y and Z return -1
Solution
Solved this problem for preliminary round of one programming contest.
PadovanString.java
[code language="java"]
public class PadovanString {
public int stringOccurrences(int n, String str){
if (n >= 40)
return -1;
if (str.replaceAll("X|Y|Z", "").length() > 0)
return -1;
String res= pad(n);
/* print nth Padovan String
System.out.println(res);*/
/*replace search string with null and caculate number of occurences*/
return (res.length() - res.replaceAll(str, "").length())/(str.length());
}
public String pad(int n){
if (n == 0) return "X";
if (n == 1) return "Y";
if (n == 2) return "Z";
else return pad(n-2) + pad(n-3);
}
}
[/code]
Test.java
[code language="java"]
public class Test {
public static void main(String []args)
{
PadovanString p = new PadovanString();
System.out.println(p.stringOccurrences(20,"YZ"));
}
}
[/code]
Also, Length of Nth padovan string = Nth Padovan number
Labels:
java,
padovan java,
padovan sequence,
padovan string
Saturday, November 5, 2011
Urban dictionary and Youtube search for cool slang's and their usage using Python
Urban dictionary is a user maintained dictionary of slang's. A thesaurus search for a word gives all the related slang's, related words could be antonym also.
Youtube, not long ago, started video comment search which is still in beta. Here, comments on all videos are searched for entered search term.
Requirement : To get related slang's from urban dictionary for a particular word, and then query youtube usage of those slang's.
[sourcecode language="python" wraplines="false"]
from BeautifulSoup import BeautifulSoup,NavigableString
import nltk
import os
import re
import urllib2
import webbrowser
import time
def get_soup(url):
#get soup object for the url
try:
page = urllib2.urlopen(url)
except urllib2.URLError, e:
print 'Failed to fetch ' + url
raise e
try:
soup = BeautifulSoup(page)
except HTMLParser.HTMLParseError, e:
print 'Failed to parse ' + url
raise e
return soup
def get_related(word):
word_list=[]
print 'Fetching related words for '+word+'........'
soup=get_soup('http://www.urbandictionary.com/thesaurus.php?term='+word)
for td in soup.findAll('td', {'class':'word'}): #each row
rel_word=td.find('a').contents[0].encode()
print rel_word
word_list.append(rel_word)
return word_list
def get_comment(word,npages):
print 'Fetching comments for '+word+'........'
comments =[]
for i in range(1,npages+1):
soup= get_soup('http://www.youtube.com/comment_search?q='+str(word)+'&ld=1&comment_only=1&hl=en&so=pagerank&page='+str(i))
for span in soup.findAll('span', {'class':'comment-result-comment'}): #each row
comment =''
for text in span.findAll(text=True):
comment = comment+ ' '+ text
comment=re.sub('[ ]+|\n|\r',' ',comment.strip())
comment=re.sub('^[0-9]+[ ]+','',comment)
comments.append(comment.encode('utf-8'))
time.sleep(6)
return comments
def main():
word='spooky'
npages=2
word_list=get_related(word)
#word_list = ['spooky']
file1=open('word_list','w')
for w in word_list:
for comment in get_comment(w,npages):
file1.write(comment+'\n')
file1.close()
if __name__ == '__main__':
main()
[/sourcecode]
Youtube, not long ago, started video comment search which is still in beta. Here, comments on all videos are searched for entered search term.
Requirement : To get related slang's from urban dictionary for a particular word, and then query youtube usage of those slang's.
[sourcecode language="python" wraplines="false"]
from BeautifulSoup import BeautifulSoup,NavigableString
import nltk
import os
import re
import urllib2
import webbrowser
import time
def get_soup(url):
#get soup object for the url
try:
page = urllib2.urlopen(url)
except urllib2.URLError, e:
print 'Failed to fetch ' + url
raise e
try:
soup = BeautifulSoup(page)
except HTMLParser.HTMLParseError, e:
print 'Failed to parse ' + url
raise e
return soup
def get_related(word):
word_list=[]
print 'Fetching related words for '+word+'........'
soup=get_soup('http://www.urbandictionary.com/thesaurus.php?term='+word)
for td in soup.findAll('td', {'class':'word'}): #each row
rel_word=td.find('a').contents[0].encode()
print rel_word
word_list.append(rel_word)
return word_list
def get_comment(word,npages):
print 'Fetching comments for '+word+'........'
comments =[]
for i in range(1,npages+1):
soup= get_soup('http://www.youtube.com/comment_search?q='+str(word)+'&ld=1&comment_only=1&hl=en&so=pagerank&page='+str(i))
for span in soup.findAll('span', {'class':'comment-result-comment'}): #each row
comment =''
for text in span.findAll(text=True):
comment = comment+ ' '+ text
comment=re.sub('[ ]+|\n|\r',' ',comment.strip())
comment=re.sub('^[0-9]+[ ]+','',comment)
comments.append(comment.encode('utf-8'))
time.sleep(6)
return comments
def main():
word='spooky'
npages=2
word_list=get_related(word)
#word_list = ['spooky']
file1=open('word_list','w')
for w in word_list:
for comment in get_comment(w,npages):
file1.write(comment+'\n')
file1.close()
if __name__ == '__main__':
main()
[/sourcecode]
Thursday, October 20, 2011
Sentiment analysis using Naive Bayes Algorithm
Experimented with simple Naive Bayes for sentiment classification.
Naive Bayes code is available here chatper6/docclass.py and training data is available here
Changed the getwords() function in docclass.py
- to remove special characters like single-quote, comma, full stop from text
- to split based on white spaces instead of non word character because it ignored emots with non word character split and
- included nltk stopwords corpus check.
[sourcecode language="python"]
def getwords(doc):
doc=re.sub('\.+|,+|!+|\'','',doc)
splitter=re.compile('\\s+')
#print doc
# Split the words by non-alpha characters
words=[s.lower().strip() for s in splitter.split(doc)
if s.lower().strip() not in nltk.corpus.stopwords.words('english') ]
print words
# Return the unique set of words only
return dict([(w,1) for w in words])
[/sourcecode]
For training data, converted ';;' separated data file to '\t' separated file because csv.reader() function
was not accepting two symbol delimiters.
Changed sampletrain function to train classifier on training data file "testdata.manual.2009.05.25".
[sourcecode language="python"]
def sampletrain(cl):
read = csv.reader(open('pos 1', 'rb'), delimiter='\t')
cnt = 1
for row in read:
if row[0] == 0:
sent = 'bad'
else:
sent = 'pos'
data = row[5]
cl.train(data,sent)
cnt = cnt+1
print cnt
[/sourcecode]
Naive Bayes code is available here chatper6/docclass.py and training data is available here
Changed the getwords() function in docclass.py
- to remove special characters like single-quote, comma, full stop from text
- to split based on white spaces instead of non word character because it ignored emots with non word character split and
- included nltk stopwords corpus check.
[sourcecode language="python"]
def getwords(doc):
doc=re.sub('\.+|,+|!+|\'','',doc)
splitter=re.compile('\\s+')
#print doc
# Split the words by non-alpha characters
words=[s.lower().strip() for s in splitter.split(doc)
if s.lower().strip() not in nltk.corpus.stopwords.words('english') ]
print words
# Return the unique set of words only
return dict([(w,1) for w in words])
[/sourcecode]
For training data, converted ';;' separated data file to '\t' separated file because csv.reader() function
was not accepting two symbol delimiters.
Changed sampletrain function to train classifier on training data file "testdata.manual.2009.05.25".
[sourcecode language="python"]
def sampletrain(cl):
read = csv.reader(open('pos 1', 'rb'), delimiter='\t')
cnt = 1
for row in read:
if row[0] == 0:
sent = 'bad'
else:
sent = 'pos'
data = row[5]
cl.train(data,sent)
cnt = cnt+1
print cnt
[/sourcecode]
Labels:
naive bayes,
python,
sentiment analysis,
Text Mining
Wednesday, October 5, 2011
Socially adept programmers
The programmer stereotype as described in personality traits of great programmer
Programmers who value social image present themselves so to conform to a perception of society's preferred type of personality. Some ways in which they manipulate perception of society are
The stereotypical programmer is a shy young man, either scrawny or overweight, who works by himself in an 8’x8’ cubicle in a bigger room of dozens cubicles, each holding someone just like him. He intensely concentrates on writing cryptic instructions to coax a computer to do what is needed. He devotes his evenings, weekends, and summers to work. He has no social life and any hobbies he may have resemble his work. In some companies he is regarded as an indispensable genius; in others he is tolerated as an eccentric artist. (McConnell, 1999)
Programmers who value social image present themselves so to conform to a perception of society's preferred type of personality. Some ways in which they manipulate perception of society are
- When asked to stay little longer they decline by saying that they have some personal commitments, they need to spend time with family, when, in fact they will be working on some open source project or breaking into high profile gov network.
- They don't use Facebook or twitter often, so they develop a program that autonomously posts status and comment on others feed periodically(using some NLP and ML techniques) to show that they spend a lot of time on social networks and are social.
- They are aware that in social conversation what matters is not the correctness of the argument, but how much laughter it provokes and how much interesting it is.
- They don't use IT jargon in social conversations, even if they know everything about them and in fact, make fun of people who use them(making fun of others is most frequently used technique in social conversations).
- They identify themselves with hippie programmers instead of nerd programmers.
- They have a girlfriend or at least this is what they tell to others.
- They never tell programming as their hobby even if programming is at the top on the list.
Thursday, September 22, 2011
Funny Incident @Work
Today, my manager asked me to send him my profile in one-page ppt. After I sent him the document he replied back saying 'You haven't included number of years of experience in it. Update it & send again' .
But I included that in document. So, to clarify, I call him.
Me: Hi <manager> , its regarding the profile. Years of experience, I have mentioned it in the document.
Manger: Where is it? I don't see it.
Me: Its in first sentence of document itself.
Manager: You mentioned it in letters. ah.. you should have mentioned it in digits.
Me: Okay.Anything else.
Manager: No, Nothing else. Just update it, change it to digit and resend me the document.
He wanted me to replace the word with a single digit and resend him the document. Amazing!!
Such incidents really provoke laughter and provide amusement at work.
But I included that in document. So, to clarify, I call him.
Me: Hi <manager> , its regarding the profile. Years of experience, I have mentioned it in the document.
Manger: Where is it? I don't see it.
Me: Its in first sentence of document itself.
Manager: You mentioned it in letters. ah.. you should have mentioned it in digits.
Me: Okay.Anything else.
Manager: No, Nothing else. Just update it, change it to digit and resend me the document.
He wanted me to replace the word with a single digit and resend him the document. Amazing!!
Such incidents really provoke laughter and provide amusement at work.
Monday, September 5, 2011
Extracting movie title from torrent file name using Regular Expression
Movie files downloaded from torrent sites has file name which contains format types (like dvdrip, dvdscr, xvid etc), year, comments , user names and of course movie name. We want to extract movie name from this file names.
Based on the observation that
'(.*?)(dvdrip|xvid| cd[0-9]|dvdscr|brrip|divx|[\{\(\[]?[0-9]{4}).*'
This regular expression will find file names where we have dvdrip, brrip, xvid(we can specify any number of values here) or year and finds first of any one of the appearing patterns because we have used lazy parsing here using .*?. We then extract the first back referenced part \1.
Secondly, to remove the part within brackets we use
'(.*?)\(.*\)(.*)' regular expression.
Following code snippet gets the movie names(to an extent) from file name.
import re
fr = open('filenameslist.txt', 'r')
fw = open('movienames.txt', 'w')
for line in fr:
text = line.strip()
text1 = re.search('([^\\\]+)\.(avi|mkv|mpeg|mpg|mov|mp4)$', text)
if text1:
text = text1.group(1)
text = text.replace('.', ' ').lower()
text2 = re.search('(.*?)(dvdrip|xvid| cd[0-9]|dvdscr|brrip|divx|[\{\(\[]?[0-9]{4}).*', text)
if text2:
text = text2.group(1)
text3 = re.search('(.*?)\(.*\)(.*)', text)
if text3:
text = text3.group(1)
# print text
fw.write(text + '\n')
fr.close()
fw.close()
Output can be improved further observing things like we can replace characters like underscore with space, we can check for only four digits where next character is non word character ..
Input File names | Output Achieved | |
---|---|---|
countdown.to.zero.2010.xvid-submerge.avi | countdown to zero | |
DrJn.2010.BRRip_mediafiremoviez.com.mkv | drjn | |
Nim's.Island[2008]DvDrip-aXXo.avi | nim's island | |
Invictus.DVDSCR.xViD-xSCR.CD1.avi | invictus | |
Invictus.DVDSCR.xViD-xSCR.CD2.avi | invictus | |
20000 Leagues Under The Sea.avi | ||
Across The Universe.MoZinRaT CD1.avi | across the universe mozinrat | |
Adoration 2008 DvdRip ExtraScene RG.avi | adoration | |
Amelie(English Dubbed).avi | amelie | |
America.2009.STV.DVDRip.XviD-ViSiON.avi | america | |
VTS_02_1.avi | vts_02_1 | |
VTS_02_2.avi | vts_02_2 | |
Antibodies.2005.GERMAN.DVDRip.XviD.AC3.CD1-AFO.avi | antibodies | |
arranged.xvid-reserved.avi | arranged | |
badder.santa.dvdrip.xvid-deity.avi | badder santa | |
Balls of Fury[2007]DvDrip[Eng]-FXG.avi | balls of fury | |
Bruno (2009) DVDRip-MAXSPEED www.torentz.3xforum.ro.avi | bruno | |
Defiance DvDSCR[2009] ( 10rating ).avi | defiance | |
Down With Love (cute romantic comedy).avi | down with love | |
Einstein.And.Eddington.2008.DVDRip.XviD.avi | einstein and eddington | |
ENEMY_OF_THE_STATE..DVDrip(vice).avi | enemy_of_the_state |
Based on the observation that
- Most of the file name contains the format like 'dvdrip', 'xvid', 'brrip','dvdscr' or other words like 'CD1','(<year>)','[<year>]' specified in the name and everything after any of this words doesnot contains any useful data.
- Sometimes extra information is added to file name inside bracket like
Defiance DvDSCR[2009] ( 10rating ).avi
Down With Love (cute romantic comedy).avi
so we can also ignore the part including and after brackets as movie names doesn't have brackets in them and have no useful information after it.
'(.*?)(dvdrip|xvid| cd[0-9]|dvdscr|brrip|divx|[\{\(\[]?[0-9]{4}).*'
This regular expression will find file names where we have dvdrip, brrip, xvid(we can specify any number of values here) or year and finds first of any one of the appearing patterns because we have used lazy parsing here using .*?. We then extract the first back referenced part \1.
Secondly, to remove the part within brackets we use
'(.*?)\(.*\)(.*)' regular expression.
Following code snippet gets the movie names(to an extent) from file name.
import re
fr = open('filenameslist.txt', 'r')
fw = open('movienames.txt', 'w')
for line in fr:
text = line.strip()
text1 = re.search('([^\\\]+)\.(avi|mkv|mpeg|mpg|mov|mp4)$', text)
if text1:
text = text1.group(1)
text = text.replace('.', ' ').lower()
text2 = re.search('(.*?)(dvdrip|xvid| cd[0-9]|dvdscr|brrip|divx|[\{\(\[]?[0-9]{4}).*', text)
if text2:
text = text2.group(1)
text3 = re.search('(.*?)\(.*\)(.*)', text)
if text3:
text = text3.group(1)
# print text
fw.write(text + '\n')
fr.close()
fw.close()
Output can be improved further observing things like we can replace characters like underscore with space, we can check for only four digits where next character is non word character ..
Sunday, September 4, 2011
Extracting IMDB data using Python
Sample code for extracting IMDB data using python BeautifulSoup package.
Requirement : To extract all the feature movie names and their ratings from IMDB database for a particular year.
Parameters used in query were identified using IMDB advanced search function. Start, count and year parameters were used in this case for querying.The url is queried for 100 records at a time since more than that is not allowed. After extracting movie names and rating for 100 records, the url is queried for next 100 records and so on.
Two files 'imdb_conf' and 'ratings' are created by the code. 'imdb_conf' file keeps track of the record number last read and 'ratings' file stores the movie name, rating and year.
Web scrapping the IMDB website for required data.
[sourcecode language="python"]
from BeautifulSoup import BeautifulSoup
import os
import re
import urllib2
def get_start_pos_yr(fimdb_config):
#for starting after last fetched record
#last line contains the last record fetched
nlines = fimdb_config.readlines()
startfrom = -1
year = None
if len(nlines) > 1:
list_num =re.search('[^\t]+',nlines[-1])
if list_num:
startfrom = int(list_num.group())+1
year =re.search('\t[0-9]+',nlines[-1]).group().strip()
return startfrom,year
def get_soup(url):
#get soup object for the url
try:
page = urllib2.urlopen(url)
except urllib2.URLError, e:
print 'Failed to fetch ' + url
raise e
try:
soup = BeautifulSoup(page)
except HTMLParser.HTMLParseError, e:
print 'Failed to parse ' + url
raise e
return soup
def get_ntotal(soup):
#fetch total number of records present for particular query
total_count=1
for div in soup.findAll('div', {'id':'left'}):
#print ivd.contents[0]
total_count = re.search('[ ]+[0-9,]+',div.contents[0])
if total_count:
total_count=total_count.group().replace(',','').strip()
#print "total"+total_count
return total_count
def set_rating(soup,fwimdb_config,frating,year,startfrom):
cond = True
count_rec=0
total_res=get_ntotal(soup)
year=str(year)
#total_res=100
while cond:
for tr in soup.findAll('tr', {'class':re.compile('(odd|even)[ a-zA-Z]*')}): #each row
for td in tr.findAll('td', {'class':'title'} ):
for link in td.findAll('a',{'href':re.compile('/title/tt[^/]+/$')}):
movie_name=link.contents[0] #title name
for rating in td.findAll('div',{'class':'rating rating-list'}):
count_rec=count_rec+1
if rating.has_key('title'):
#print "hurray"
rt = re.search('[0-9]+[^(]+',rating['title']) #rating
if rt:
frating.write(movie_name+"\t"+rt.group().strip()+"\t"+year+"\n")
else:
frating.write(movie_name+"\t--\t"+year+"\n")
else:
frating.write(movie_name+"\t--\t"+year+"\n")
#print movie_name+"\t"+rt.group()
fwimdb_config.write(str(count_rec)+"\t"+year+"\n")
if startfrom == 0:
startfrom = 101 #second run
else:
startfrom = startfrom + 100
if startfrom >= int(total_res):
cond=False
fwimdb_config.write("-1"+"\t"+str(int(year)-1)+"\n")
print str(startfrom)+" "+str(total_res)
soup=get_soup("http://www.imdb.com/search/title?languages=en&title_type=feature&count=100&sort=num_votes,desc&start="+str(startfrom)+"&year="+year)
def main():
fwimdb_conf=open("imdb_conf","r+")
frating = open("ratings","a") #ratings
fwimdb_conf.write("LastreadLine\tYear\n")
startfrom,year = get_start_pos_yr(fwimdb_conf)
if startfrom == -1:
startfrom = 0
if year == None:
year="2010"
print startfrom
soup=get_soup("http://www.imdb.com/search/title?languages=en&title_type=feature&sort=num_votes,desc&count=100&start="+str(startfrom)+"&year="+year)
set_rating(soup,fwimdb_conf,frating,year,startfrom)
frating.close()
fwimdb_conf.close()
if __name__ == '__main__':
main()
[/sourcecode]
For demonstration purposes only. If you plan to use IMDB data beyond personal usage, you should contact IMDB Licensing department.
Requirement : To extract all the feature movie names and their ratings from IMDB database for a particular year.
Parameters used in query were identified using IMDB advanced search function. Start, count and year parameters were used in this case for querying.The url is queried for 100 records at a time since more than that is not allowed. After extracting movie names and rating for 100 records, the url is queried for next 100 records and so on.
Two files 'imdb_conf' and 'ratings' are created by the code. 'imdb_conf' file keeps track of the record number last read and 'ratings' file stores the movie name, rating and year.
Web scrapping the IMDB website for required data.
[sourcecode language="python"]
from BeautifulSoup import BeautifulSoup
import os
import re
import urllib2
def get_start_pos_yr(fimdb_config):
#for starting after last fetched record
#last line contains the last record fetched
nlines = fimdb_config.readlines()
startfrom = -1
year = None
if len(nlines) > 1:
list_num =re.search('[^\t]+',nlines[-1])
if list_num:
startfrom = int(list_num.group())+1
year =re.search('\t[0-9]+',nlines[-1]).group().strip()
return startfrom,year
def get_soup(url):
#get soup object for the url
try:
page = urllib2.urlopen(url)
except urllib2.URLError, e:
print 'Failed to fetch ' + url
raise e
try:
soup = BeautifulSoup(page)
except HTMLParser.HTMLParseError, e:
print 'Failed to parse ' + url
raise e
return soup
def get_ntotal(soup):
#fetch total number of records present for particular query
total_count=1
for div in soup.findAll('div', {'id':'left'}):
#print ivd.contents[0]
total_count = re.search('[ ]+[0-9,]+',div.contents[0])
if total_count:
total_count=total_count.group().replace(',','').strip()
#print "total"+total_count
return total_count
def set_rating(soup,fwimdb_config,frating,year,startfrom):
cond = True
count_rec=0
total_res=get_ntotal(soup)
year=str(year)
#total_res=100
while cond:
for tr in soup.findAll('tr', {'class':re.compile('(odd|even)[ a-zA-Z]*')}): #each row
for td in tr.findAll('td', {'class':'title'} ):
for link in td.findAll('a',{'href':re.compile('/title/tt[^/]+/$')}):
movie_name=link.contents[0] #title name
for rating in td.findAll('div',{'class':'rating rating-list'}):
count_rec=count_rec+1
if rating.has_key('title'):
#print "hurray"
rt = re.search('[0-9]+[^(]+',rating['title']) #rating
if rt:
frating.write(movie_name+"\t"+rt.group().strip()+"\t"+year+"\n")
else:
frating.write(movie_name+"\t--\t"+year+"\n")
else:
frating.write(movie_name+"\t--\t"+year+"\n")
#print movie_name+"\t"+rt.group()
fwimdb_config.write(str(count_rec)+"\t"+year+"\n")
if startfrom == 0:
startfrom = 101 #second run
else:
startfrom = startfrom + 100
if startfrom >= int(total_res):
cond=False
fwimdb_config.write("-1"+"\t"+str(int(year)-1)+"\n")
print str(startfrom)+" "+str(total_res)
soup=get_soup("http://www.imdb.com/search/title?languages=en&title_type=feature&count=100&sort=num_votes,desc&start="+str(startfrom)+"&year="+year)
def main():
fwimdb_conf=open("imdb_conf","r+")
frating = open("ratings","a") #ratings
fwimdb_conf.write("LastreadLine\tYear\n")
startfrom,year = get_start_pos_yr(fwimdb_conf)
if startfrom == -1:
startfrom = 0
if year == None:
year="2010"
print startfrom
soup=get_soup("http://www.imdb.com/search/title?languages=en&title_type=feature&sort=num_votes,desc&count=100&start="+str(startfrom)+"&year="+year)
set_rating(soup,fwimdb_conf,frating,year,startfrom)
frating.close()
fwimdb_conf.close()
if __name__ == '__main__':
main()
[/sourcecode]
For demonstration purposes only. If you plan to use IMDB data beyond personal usage, you should contact IMDB Licensing department.
Tuesday, August 30, 2011
String Aggregation in Oracle SQL using XMLAGG
Requirement
We have table test5 with following data
c1
-----
ADD
DELETE
MODIFY
Required output
col
------
ADD,DELETE.MODIFY
i.e we want column value to be aggregated with comma.
Solution
In real scenario we will have another column on which we need to group and then aggregate corresponding values.
In oracle 11g there is one function wm_concat which concatenates column values with comma. We can write a procedure to perform this thing but here we will try to do it with sql query alone.
This solution uses XMLELEMENT, XML_AGGR functions
[sourcecode language="sql"]
SELECT replace (replace (replace (XMLELEMENT ("col" , XMLAGG(XMLELEMENT("col", c1))), '</col><col>' , ',' ),'<col><col>' , '' ),'</col></col>', '')
as "col" FROM test5
[/sourcecode]
Output
col
------
ADD,DELETE,MODIFY
We have table test5 with following data
c1
-----
ADD
DELETE
MODIFY
Required output
col
------
ADD,DELETE.MODIFY
i.e we want column value to be aggregated with comma.
Solution
In real scenario we will have another column on which we need to group and then aggregate corresponding values.
In oracle 11g there is one function wm_concat which concatenates column values with comma. We can write a procedure to perform this thing but here we will try to do it with sql query alone.
This solution uses XMLELEMENT, XML_AGGR functions
[sourcecode language="sql"]
SELECT replace (replace (replace (XMLELEMENT ("col" , XMLAGG(XMLELEMENT("col", c1))), '</col><col>' , ',' ),'<col><col>' , '' ),'</col></col>', '')
as "col" FROM test5
[/sourcecode]
Output
col
------
ADD,DELETE,MODIFY
Dynamic column name in Oracle SQL Query using dbms_xmlgen
Consider scenario where we need to use dynamic column name in query.
You might have tried using select query in place of column list something like
select (select distinct column_name from table1 where column_name2='some_value') from table2
Based on some condition we will select column name from one table(test3 table here) and query another table(test 4) for the fetched column.
Consider table test4 with following data
[sourcecode language="sql"]
create table test4(c1 varchar2(20), c2 varchar2(20), c3 varchar2(20))
[/sourcecode]
c1 c2 c3 c4
-----------------------------------------------
col1_val col2_val col3_val col4_val
[sourcecode language="sql"]
select to_char (extractvalue (xmltype (dbms_xmlgen.getxml('select '||
to_char(extractvalue(xmltype(dbms_xmlgen.getxml('select col_name from test3 where flag =1')),'/ROWSET/ROW/COL_NAME')) || ' col from test4'
)),'/ROWSET/ROW/COL')) col_value from dual[/sourcecode]
You might have tried using select query in place of column list something like
select (select distinct column_name from table1 where column_name2='some_value') from table2
Based on some condition we will select column name from one table(test3 table here) and query another table(test 4) for the fetched column.
Consider table test4 with following data
[sourcecode language="sql"]
create table test4(c1 varchar2(20), c2 varchar2(20), c3 varchar2(20))
[/sourcecode]
c1 c2 c3 c4
-----------------------------------------------
col1_val col2_val col3_val col4_val
And another table test3 with column names of test4 as rows here.
[sourcecode language="sql"]
create table test3(col_name varchar2(2), flag number)
[/sourcecode]
[sourcecode language="sql"]
create table test3(col_name varchar2(2), flag number)
[/sourcecode]
col_name flag
-------------------
c1 0
c2 1
c3 0
-------------------
c1 0
c2 1
c3 0
based on flag value we want to select the column name. We want to achieve something like
select (select col_name from test3 where flag = 1) from test4 [this doesn't works]
We will use oracle xml functions getxml and extractvalue to achieve result. dbms_xmlgen.getxml takes SQL query as its argument and returns result in xml format.This function allows us to dynamically create sql query, we can use dynamic column or table name in it.
[sourcecode language="sql"]
select to_char (extractvalue (xmltype (dbms_xmlgen.getxml('select '||
to_char(extractvalue(xmltype(dbms_xmlgen.getxml('select col_name from test3 where flag =1')),'/ROWSET/ROW/COL_NAME')) || ' col from test4'
)),'/ROWSET/ROW/COL')) col_value from dual[/sourcecode]
SQL Problem: Item falls in which group boolean result
Scenario
Consider scenario where students data is present in two tables
1) Students in A group have entry in table A_group
student_id
101
102
103
108
2)Students in B group have entry in table B_group
student_id
101
103
109
111
Condition : Student can belong to either or both groups.
Requirement
Result is required in the form
101 1 1
102 1 0
103 1 1
108 1 0
109 0 1
111 0 1
which shows student_id and Boolean values indicating if the student is present in that group or not.
Solution
Here is one of possible solution that I thought of on first shot
First we combine results from both tables without filtering any
SELECT DISTINCT student_id, 1 AS flag
FROM a_group
UNION ALL
SELECT DISTINCT student_id, 4 AS flag
FROM b_group
This query will give all the student ids from both tables with a flag indicating student's group
*flag value 1 indicates A group
flag value 4 indicates B group
Step 2 :
Now we have data of the form
student_id flag
101 4
103 4
109 4
111 4
Now comes the logic part
If we group data by student_id and perform aggregate sum on flag like
SELECT student_id,
SUM(flag) AS newflag
FROM (SELECT DISTINCT student_id, 1 AS flag
FROM a_group
UNION ALL
SELECT DISTINCT student_id, 4 AS flag
FROM b_group)
GROUP BY student_id
We will get following values of new flag
1 for students present in A group
4 for students present in B group
5 for student present in both groups
Result of above query would be of form
student_id newflag
101 5
102 1
103 5
108 1
109 4
111 4
Step 3 :
Next we need to use 'IF conditionals' to show required data with query like
SELECT student_id,
( CASE
WHEN newflag = 1
OR newflag = 5 THEN 1
ELSE 0
END ) a_group_flag,
( CASE
WHEN newflag = 4
OR newflag = 5 THEN 1
ELSE 0
END ) b_group_flag
FROM
(SELECT student_id,
SUM(flag) AS newflag
FROM (SELECT DISTINCT student_id, 1 AS flag
FROM a_group
UNION ALL
SELECT DISTINCT student_id, 4 AS flag
FROM b_group)
GROUP BY student_id )
This would give us required result of the form
student_id A_group_flag B_group_flag
101 1 1
102 1 0
103 1 1
108 1 0
109 0 1
111 0 1
Consider scenario where students data is present in two tables
1) Students in A group have entry in table A_group
student_id
101
102
103
108
2)Students in B group have entry in table B_group
student_id
101
103
109
111
Condition : Student can belong to either or both groups.
Requirement
Result is required in the form
student_id A_group_flag B_group_flag
101 1 1
102 1 0
103 1 1
108 1 0
109 0 1
111 0 1
which shows student_id and Boolean values indicating if the student is present in that group or not.
*flag value 1 indicates student is present in that group
*flag value 0 indicates student is not present in that group
Solution
Here is one of possible solution that I thought of on first shot
Step 1 :
First we combine results from both tables without filtering any
SELECT DISTINCT student_id, 1 AS flag
FROM a_group
UNION ALL
SELECT DISTINCT student_id, 4 AS flag
FROM b_group
This query will give all the student ids from both tables with a flag indicating student's group
*flag value 1 indicates A group
flag value 4 indicates B group
Step 2 :
Now we have data of the form
student_id flag
101 1
102 1
103 1
108 1
101 4
103 4
109 4
111 4
Now comes the logic part
If we group data by student_id and perform aggregate sum on flag like
SELECT student_id,
SUM(flag) AS newflag
FROM (SELECT DISTINCT student_id, 1 AS flag
FROM a_group
UNION ALL
SELECT DISTINCT student_id, 4 AS flag
FROM b_group)
GROUP BY student_id
We will get following values of new flag
1 for students present in A group
4 for students present in B group
5 for student present in both groups
Result of above query would be of form
student_id newflag
101 5
102 1
103 5
108 1
109 4
111 4
Step 3 :
Next we need to use 'IF conditionals' to show required data with query like
SELECT student_id,
( CASE
WHEN newflag = 1
OR newflag = 5 THEN 1
ELSE 0
END ) a_group_flag,
( CASE
WHEN newflag = 4
OR newflag = 5 THEN 1
ELSE 0
END ) b_group_flag
FROM
(SELECT student_id,
SUM(flag) AS newflag
FROM (SELECT DISTINCT student_id, 1 AS flag
FROM a_group
UNION ALL
SELECT DISTINCT student_id, 4 AS flag
FROM b_group)
GROUP BY student_id )
This would give us required result of the form
student_id A_group_flag B_group_flag
101 1 1
102 1 0
103 1 1
108 1 0
109 0 1
111 0 1
Session variable and Context in Oracle, Using dbms_session.set_context
Oracle 'Context' can be used to capture session variables
Let's test it
For creating and using context we would need below privileges
GRANT CREATE SESSION, CREATE ANY context TO USER;
create a context 'c_userAttr' associated with package 'p_userAttr'
Syntax
create context context-namespace using package-name;
CREATE context c_userattr USING p_userattr;
Step 3:
Now we will create package definition
CREATE OR replace PACKAGE p_userattr
IS
PROCEDURE set_contexts;
END;
Write package body that sets the attribute for context
We can set as many attributes we want using DBMS_SESSION.SET_CONTEXT statement
In this case 'username' is the attribute we will set here
In application we can set context with procedure call like this
BEGIN
p_userattr.Set_contexts('crazy');
END;
and in the same session we can query value in that context namespace like
SELECT Sys_context('C_USERATTR', 'USERNAME')
FROM dual;
- A context is associated with a package and in that package's procedure we can use this context
- It holds attribute value pair.
- Attribute value is set using DBMS_SESSION.SET_CONTEXT
- Attribute value can be set inside that package only to which it is associated.
- Attribute value can be queried with 'sys_context' function.
Let's test it
Step 1:
For creating and using context we would need below privileges
GRANT CREATE SESSION, CREATE ANY context TO USER;
Step 2:
create a context 'c_userAttr' associated with package 'p_userAttr'
Syntax
create context context-namespace using package-name;
CREATE context c_userattr USING p_userattr;
Step 3:
Now we will create package definition
CREATE OR replace PACKAGE p_userattr
IS
PROCEDURE set_contexts;
END;
Write package body that sets the attribute for context
We can set as many attributes we want using DBMS_SESSION.SET_CONTEXT statement
In this case 'username' is the attribute we will set here
CREATE OR replace PACKAGE BODY p_userattr
IS
c_context CONSTANT VARCHAR2(30) := 'c_userAttr';
PROCEDURE Set_contexts(v_uname IN VARCHAR2)
IS
BEGIN
dbms_session.Set_context (c_context, 'username', v_uname);
END;
END;
IS
c_context CONSTANT VARCHAR2(30) := 'c_userAttr';
PROCEDURE Set_contexts(v_uname IN VARCHAR2)
IS
BEGIN
dbms_session.Set_context (c_context, 'username', v_uname);
END;
END;
Last Step
In application we can set context with procedure call like this
BEGIN
p_userattr.Set_contexts('crazy');
END;
and in the same session we can query value in that context namespace like
SELECT Sys_context('C_USERATTR', 'USERNAME')
FROM dual;
Variable Table name in Oracle SQL using dbms_xmlgen
Requirement
To find count of all tables
The interesting stuff in oracle is not just writing queries that preforms the required task but to write some single liners that could do the same work as lengthy procedures
so another requirement is to make it interesting
Solution
Solution 1
A procedural way to achieve this through looping for cursor is
DECLARE
CURSOR c1 IS
SELECT table_name
FROM user_tables;
tname VARCHAR2(200);
count_stmt VARCHAR2(5000);
count1 NUMBER;
count2 NUMBER;
BEGIN
OPEN c1;
FETCH c1 INTO tname;
WHILE c1%found LOOP
count_stmt := 'select count(1) from '
|| tname;
EXECUTE IMMEDIATE count_stmt INTO count1;
count_stmt := tname;
dbms_output.Put(count_stmt);
count2 := 30 - Length(tname);
WHILE count2 > 0 LOOP
dbms_output.Put(' ');
count2 := count2 - 1;
END LOOP;
dbms_output.Put_line(count1);
FETCH c1 INTO tname;
END LOOP;
END;
set serveroutput on to see the result
Solution 2
I was struggling to find some way to run dynamic sql query inside the select phrase something like
select table_name, (select count(*) from var_table_name) from user_tables
where var_table_name is variable that should hold value of table_name for each row processed from user_tables
and I found this
SELECT table_name,
To_number (Extractvalue (Xmltype (
dbms_xmlgen.Getxml('select count(*) c from '
||table_name)),
'/ROWSET/ROW/C')) AS COUNT
FROM user_tables;
dbms_xmlgen.getxml : Converts the results from the SQL query string to XML format, and returns the XML as a temporary CLOB
extractvalue : gets the child value from the node.XML result from oracle is in the form
Labels:
dbms_xmlgen,
dynamic,
extractvalue,
getxml,
Oracle SQL,
xmltype
Oracle "create table as .. select " to add new column
I came across this requirement when one of my friend wanted to create new table from existing table but with new columns (few extra column for future use) added to it.
He doesn't wanted to write 'Alter' script
Requirement:
Add new column 'comments varchar(30)' with default value as null to new_table
while creating copy of existing table old_table
Solution:
create table new_table as select old_column1 as new_column1 ,old_column2 as new_column2, cast('' as varchar2(30)) as comments from old_table;
Reason:
? Why we need to perform cast of null here
If I don't perform cast here and add new column as
, '' as new_column1
Oracle gives error like 'column cannot be of zero length...' because it creates datatype of that column based on the value provided
Like if I give new column as
'null' as comments it considers datatype as char(4) for comments column and successfully creates new column
So, when I try to give
'' as comments it assumes we are trying to give datatype as char(0), which is not acceptable to Oracle
So we need to cast value here as varchar2 here.
He doesn't wanted to write 'Alter' script
Requirement:
Add new column 'comments varchar(30)' with default value as null to new_table
while creating copy of existing table old_table
Solution:
create table new_table as select old_column1 as new_column1 ,old_column2 as new_column2, cast('' as varchar2(30)) as comments from old_table;
Reason:
? Why we need to perform cast of null here
If I don't perform cast here and add new column as
, '' as new_column1
Oracle gives error like 'column cannot be of zero length...' because it creates datatype of that column based on the value provided
Like if I give new column as
'null' as comments it considers datatype as char(4) for comments column and successfully creates new column
So, when I try to give
'' as comments it assumes we are trying to give datatype as char(0), which is not acceptable to Oracle
So we need to cast value here as varchar2 here.
Oracle Pivot Query using max decode, Row to Column Conversion
Requirement 1:
Consider scenario where we have table with following data. Here an id can belong to number of groups(grp)
[sourcecode language="sql"]
create table test1(id number,grp varchar2(1));
[/sourcecode]
id grp
----------------
1 A
1 B
2 B
2 C
3 C
Required Output:
Distinct groups are displayed as column and a symbol x indicates that corresponding id is mapped to that group.
id A B C
-------------------
1 x x
2 x x
3 x
There are fixed number of groups {A B C}, i.e cardinality of column grp is known and also all distinct values.
Solution
We will use the max(decode(..)) technique. This technique is used to convert rows to column and column to rows.
First we perform decode operation to convert the rows value in to column.
We need to know number of distinct values and also their name in this case. For each distinct group, we write a decode as follows
[sourcecode language="sql"]
<pre>SELECT id,
Decode(grp, 'A', 'x',
NULL) a,
Decode(grp, 'B', 'x',
NULL) b,
Decode(grp, 'C', 'x',
NULL) c
FROM test1
ORDER BY id
[/sourcecode]
Result:
id A B C
----------------
1 x
1 x
2 x
2 x
3 x
Output will have same number of rows with x indicating mapping with corresponding id value.
Now we need to group rows by id, since we have distinct values for {id,group} only one x will be appearing for particular combination of id and grp in above result.
[sourcecode language="sql"]
SELECT id,
MAX(Decode(grp, 'A', 'x',
NULL)) a,
MAX(Decode(grp, 'B', 'x',
NULL)) b,
MAX(Decode(grp, 'C', 'x',
NULL)) c
FROM test1
GROUP BY id
ORDER BY id
[/sourcecode]
Result:
id A B C
---------------
1 x x
2 x x
3 x
Requirement 2 : Consider same table and data but now required result is
Required Output:
id grp1 grp2 grp3
--------------------------
1 A B
2 B C
3 C
that is we know distinct number of groups(we don't care about values or don't know values). So, all the groups to which id is mapped we need to show them in columns
Solution:
We will use the same max(decode(..)) technique but with slight modification.First for each id we will give a unique value to each grp value using analytic function row_number()
[sourcecode language="sql"]
SELECT id,
grp,
Row_number() over (PARTITION BY id ORDER BY grp) rn
FROM test1
[/sourcecode]
Result
ID GRP RN
----------------------
1 A 1
1 B 2
2 B 1
2 C 2
3 C 1
Now we use decode on rn field
[sourcecode language="sql"]SELECT id,
Decode(rn, 1, grp,
NULL) grp1,
Decode(rn, 2, grp,
NULL) grp2,
Decode(rn, 3, grp,
NULL) grp3
FROM (SELECT id,
grp,
Row_number() over (PARTITION BY id ORDER BY grp) rn
FROM test1)
ORDER BY id
[/sourcecode]
ID GRP1 GRP2 GRP3
--------------------
1 A - -
1 - B -
2 B - -
2 - C -
3 C - -
[sourcecode language="sql"]
SELECT id,
MAX(Decode(rn, 1, grp,
NULL)) grp1,
MAX(Decode(rn, 2, grp,
NULL)) grp2,
MAX(Decode(rn, 3, grp,
NULL)) grp3
FROM (SELECT id,
grp,
Row_number() over (PARTITION BY id ORDER BY grp) rn
FROM test1)
GROUP BY id
ORDER BY id
[/sourcecode]
Output:
ID GRP1 GRP2 GRP3
1 A B -
2 B C -
3 C - -
Consider scenario where we have table with following data. Here an id can belong to number of groups(grp)
[sourcecode language="sql"]
create table test1(id number,grp varchar2(1));
[/sourcecode]
id grp
----------------
1 A
1 B
2 B
2 C
3 C
Required Output:
Distinct groups are displayed as column and a symbol x indicates that corresponding id is mapped to that group.
id A B C
-------------------
1 x x
2 x x
3 x
There are fixed number of groups {A B C}, i.e cardinality of column grp is known and also all distinct values.
Solution
We will use the max(decode(..)) technique. This technique is used to convert rows to column and column to rows.
First we perform decode operation to convert the rows value in to column.
We need to know number of distinct values and also their name in this case. For each distinct group, we write a decode as follows
[sourcecode language="sql"]
<pre>SELECT id,
Decode(grp, 'A', 'x',
NULL) a,
Decode(grp, 'B', 'x',
NULL) b,
Decode(grp, 'C', 'x',
NULL) c
FROM test1
ORDER BY id
[/sourcecode]
Result:
id A B C
----------------
1 x
1 x
2 x
2 x
3 x
Output will have same number of rows with x indicating mapping with corresponding id value.
Now we need to group rows by id, since we have distinct values for {id,group} only one x will be appearing for particular combination of id and grp in above result.
[sourcecode language="sql"]
SELECT id,
MAX(Decode(grp, 'A', 'x',
NULL)) a,
MAX(Decode(grp, 'B', 'x',
NULL)) b,
MAX(Decode(grp, 'C', 'x',
NULL)) c
FROM test1
GROUP BY id
ORDER BY id
[/sourcecode]
Result:
id A B C
---------------
1 x x
2 x x
3 x
Requirement 2 : Consider same table and data but now required result is
Required Output:
id grp1 grp2 grp3
--------------------------
1 A B
2 B C
3 C
that is we know distinct number of groups(we don't care about values or don't know values). So, all the groups to which id is mapped we need to show them in columns
Solution:
We will use the same max(decode(..)) technique but with slight modification.First for each id we will give a unique value to each grp value using analytic function row_number()
[sourcecode language="sql"]
SELECT id,
grp,
Row_number() over (PARTITION BY id ORDER BY grp) rn
FROM test1
[/sourcecode]
Result
ID GRP RN
----------------------
1 A 1
1 B 2
2 B 1
2 C 2
3 C 1
Now we use decode on rn field
[sourcecode language="sql"]SELECT id,
Decode(rn, 1, grp,
NULL) grp1,
Decode(rn, 2, grp,
NULL) grp2,
Decode(rn, 3, grp,
NULL) grp3
FROM (SELECT id,
grp,
Row_number() over (PARTITION BY id ORDER BY grp) rn
FROM test1)
ORDER BY id
[/sourcecode]
ID GRP1 GRP2 GRP3
--------------------
1 A - -
1 - B -
2 B - -
2 - C -
3 C - -
[sourcecode language="sql"]
SELECT id,
MAX(Decode(rn, 1, grp,
NULL)) grp1,
MAX(Decode(rn, 2, grp,
NULL)) grp2,
MAX(Decode(rn, 3, grp,
NULL)) grp3
FROM (SELECT id,
grp,
Row_number() over (PARTITION BY id ORDER BY grp) rn
FROM test1)
GROUP BY id
ORDER BY id
[/sourcecode]
Output:
ID GRP1 GRP2 GRP3
1 A B -
2 B C -
3 C - -
Subscribe to:
Posts (Atom)