Lazzy Scientist: 2011

Friday, December 30, 2011

Oracles demo mail Package for sending mail

Oracles Demo mail package was available on OTN site previously, but this time when I searched I got 404 error. So, I thought of keeping a copy for future references.

Its a wrapper program to send mail using pl/sql. It hides the low level SMTP protocol details.

gist link of package

Monday, November 7, 2011

Unbiased Performance Evaluation

Sunday, November 6, 2011

Padovan String Sequence Problem

Problem Find number of occurrences of input string in nth Padovan String P(n)
where P(0) = 'X', P(1)='Y' and P(2)='Z'
Input Parameters: needle string str to search for and number n
Conditions to handle:
if number n is >= 40 return -1
if string str has any other characters then X,Y and Z return -1

Solution

Solved this problem for preliminary round of one programming contest.

PadovanString.java

[code language="java"]
public class PadovanString {

public int stringOccurrences(int n, String str){
if (n >= 40)
return -1;
if (str.replaceAll("X|Y|Z", "").length() > 0)
return -1;
String res= pad(n);

/* print nth Padovan String
System.out.println(res);*/

/*replace search string with null and caculate number of occurences*/
return (res.length() - res.replaceAll(str, "").length())/(str.length());
}

public String pad(int n){

if (n == 0) return "X";
if (n == 1) return "Y";
if (n == 2) return "Z";
else return pad(n-2) + pad(n-3);

}
}
[/code]

Test.java

[code language="java"]
public class Test {
public static void main(String []args)
{
PadovanString p = new PadovanString();
System.out.println(p.stringOccurrences(20,"YZ"));
}
}
[/code]

Also, Length of Nth padovan string = Nth Padovan number

Saturday, November 5, 2011

Urban dictionary and Youtube search for cool slang's and their usage using Python

Urban dictionary is a user maintained dictionary of slang's. A thesaurus search for a word gives all the related slang's, related words could be antonym also.
Youtube, not long ago, started video comment search which is still in beta. Here, comments on all videos are searched for entered search term.

Requirement : To get related slang's from urban dictionary for a particular word, and then query youtube usage of those slang's.

[sourcecode language="python" wraplines="false"]
from BeautifulSoup import BeautifulSoup,NavigableString
import nltk
import os
import re
import urllib2
import webbrowser
import time

def get_soup(url):
#get soup object for the url
try:
page = urllib2.urlopen(url)
except urllib2.URLError, e:
print 'Failed to fetch ' + url
raise e

try:
soup = BeautifulSoup(page)
except HTMLParser.HTMLParseError, e:
print 'Failed to parse ' + url
raise e
return soup

def get_related(word):
word_list=[]
print 'Fetching related words for '+word+'........'
soup=get_soup('http://www.urbandictionary.com/thesaurus.php?term='+word)
for td in soup.findAll('td', {'class':'word'}): #each row
rel_word=td.find('a').contents[0].encode()
print rel_word
word_list.append(rel_word)
return word_list

def get_comment(word,npages):

print 'Fetching comments for '+word+'........'
comments =[]
for i in range(1,npages+1):
soup= get_soup('http://www.youtube.com/comment_search?q='+str(word)+'&ld=1&comment_only=1&hl=en&so=pagerank&page='+str(i))
for span in soup.findAll('span', {'class':'comment-result-comment'}): #each row
comment =''
for text in span.findAll(text=True):
comment = comment+ ' '+ text
comment=re.sub('[ ]+|\n|\r',' ',comment.strip())
comment=re.sub('^[0-9]+[ ]+','',comment)
comments.append(comment.encode('utf-8'))
time.sleep(6)
return comments

def main():
word='spooky'
npages=2
word_list=get_related(word)
#word_list = ['spooky']
file1=open('word_list','w')
for w in word_list:
for comment in get_comment(w,npages):
file1.write(comment+'\n')
file1.close()

if __name__ == '__main__':
main()
[/sourcecode]

Thursday, October 20, 2011

Sentiment analysis using Naive Bayes Algorithm

Experimented with simple Naive Bayes for sentiment classification.

Naive Bayes code is available here chatper6/docclass.py and training data is available here

Changed the getwords() function in docclass.py

- to remove special characters like single-quote, comma, full stop from text
- to split based on white spaces instead of non word character because it ignored emots with non word character split and
- included nltk stopwords corpus check.

[sourcecode language="python"]

def getwords(doc):
doc=re.sub('\.+|,+|!+|\'','',doc)
splitter=re.compile('\\s+')
#print doc
# Split the words by non-alpha characters
words=[s.lower().strip() for s in splitter.split(doc)
if s.lower().strip() not in nltk.corpus.stopwords.words('english') ]
print words
# Return the unique set of words only
return dict([(w,1) for w in words])
[/sourcecode]

For training data, converted ';;' separated data file to '\t' separated file because csv.reader() function
was not accepting two symbol delimiters.

Changed sampletrain function to train classifier on training data file "testdata.manual.2009.05.25".

[sourcecode language="python"]
def sampletrain(cl):
read = csv.reader(open('pos 1', 'rb'), delimiter='\t')
cnt = 1
for row in read:
if row[0] == 0:
sent = 'bad'
else:
sent = 'pos'
data = row[5]
cl.train(data,sent)
cnt = cnt+1
print cnt
[/sourcecode]

Wednesday, October 5, 2011

Socially adept programmers

The programmer stereotype as described in personality traits of great programmer

The stereotypical programmer is a shy young man, either scrawny or overweight, who works by himself in an 8’x8’ cubicle in a bigger room of dozens cubicles, each holding someone just like him. He intensely concentrates on writing cryptic instructions to coax a computer to do what is needed. He devotes his evenings, weekends, and summers to work. He has no social life and any hobbies he may have resemble his work. In some companies he is regarded as an indispensable genius; in others he is tolerated as an eccentric artist. (McConnell, 1999)

Programmers who value social image present themselves so to conform to a perception of society's preferred type of personality. Some ways in which they manipulate perception of society are

When asked to stay little longer they decline by saying that they have some personal commitments, they need to spend time with family, when, in fact they will be working on some open source project or breaking into high profile gov network.

They don't use Facebook or twitter often, so they develop a program that autonomously posts status and comment on others feed periodically(using some NLP and ML techniques) to show that they spend a lot of time on social networks and are social.

They are aware that in social conversation what matters is not the correctness of the argument, but how much laughter it provokes and how much interesting it is.

They don't use IT jargon in social conversations, even if they know everything about them and in fact, make fun of people who use them(making fun of others is most frequently used technique in social conversations).

They identify themselves with hippie programmers instead of nerd programmers.

They have a girlfriend or at least this is what they tell to others.

They never tell programming as their hobby even if programming is at the top on the list.

Thursday, September 22, 2011

Funny Incident @Work

Today, my manager asked me to send him my profile in one-page ppt. After I sent him the document he replied back saying 'You haven't included number of years of experience in it. Update it & send again' .
But I included that in document. So, to clarify, I call him.

Me: Hi <manager> , its regarding the profile. Years of experience, I have mentioned it in the document.
Manger: Where is it? I don't see it.
Me: Its in first sentence of document itself.
Manager: You mentioned it in letters. ah.. you should have mentioned it in digits.
Me: Okay.Anything else.
Manager: No, Nothing else. Just update it, change it to digit and resend me the document.

He wanted me to replace the word with a single digit and resend him the document. Amazing!!
Such incidents really provoke laughter and provide amusement at work.

Monday, September 5, 2011

Extracting movie title from torrent file name using Regular Expression

Movie files downloaded from torrent sites has file name which contains format types (like dvdrip, dvdscr, xvid etc), year, comments , user names and of course movie name. We want to extract movie name from this file names.

	Input File names	Output Achieved
	countdown.to.zero.2010.xvid-submerge.avi	countdown to zero
	DrJn.2010.BRRip_mediafiremoviez.com.mkv	drjn
	Nim's.Island[2008]DvDrip-aXXo.avi	nim's island
	Invictus.DVDSCR.xViD-xSCR.CD1.avi	invictus
	Invictus.DVDSCR.xViD-xSCR.CD2.avi	invictus
	20000 Leagues Under The Sea.avi
	Across The Universe.MoZinRaT CD1.avi	across the universe mozinrat
	Adoration 2008 DvdRip ExtraScene RG.avi	adoration
	Amelie(English Dubbed).avi	amelie
	America.2009.STV.DVDRip.XviD-ViSiON.avi	america
	VTS_02_1.avi	vts_02_1
	VTS_02_2.avi	vts_02_2
	Antibodies.2005.GERMAN.DVDRip.XviD.AC3.CD1-AFO.avi	antibodies
	arranged.xvid-reserved.avi	arranged
	badder.santa.dvdrip.xvid-deity.avi	badder santa
	Balls of Fury[2007]DvDrip[Eng]-FXG.avi	balls of fury
	Bruno (2009) DVDRip-MAXSPEED www.torentz.3xforum.ro.avi	bruno
	Defiance DvDSCR[2009] ( 10rating ).avi	defiance
	Down With Love (cute romantic comedy).avi	down with love
	Einstein.And.Eddington.2008.DVDRip.XviD.avi	einstein and eddington
	ENEMY_OF_THE_STATE..DVDrip(vice).avi	enemy_of_the_state

Based on the observation that

Most of the file name contains the format like 'dvdrip', 'xvid', 'brrip','dvdscr' or other words like 'CD1','(<year>)','[<year>]' specified in the name and everything after any of this words doesnot contains any useful data.

Sometimes extra information is added to file name inside bracket like
Defiance DvDSCR[2009] ( 10rating ).avi
Down With Love (cute romantic comedy).avi
so we can also ignore the part including and after brackets as movie names doesn't have brackets in them and have no useful information after it.

'(.*?)(dvdrip|xvid| cd[0-9]|dvdscr|brrip|divx|[\{$\[]?[0-9]{4}).*'

This regular expression will find file names where we have dvdrip, brrip, xvid(we can specify any number of values here) or year and finds first of any one of the appearing patterns because we have used lazy parsing here using .*?. We then extract the first back referenced part \1.

Secondly, to remove the part within brackets we use

'(.*?)\(.*$(.*)' regular expression.

Following code snippet gets the movie names(to an extent) from file name.

import re
fr = open('filenameslist.txt', 'r')
fw = open('movienames.txt', 'w')
for line in fr:
    text = line.strip()
text1 = re.search('([^\\\]+)\.(avi|mkv|mpeg|mpg|mov|mp4)$', text)
if text1:
    text = text1.group(1)
text = text.replace('.', ' ').lower()
text2 = re.search('(.*?)(dvdrip|xvid| cd[0-9]|dvdscr|brrip|divx|[\{$\[]?[0-9]{4}).*', text)
if text2:
    text = text2.group(1)
text3 = re.search('(.*?)\(.*$(.*)', text)
if text3:
    text = text3.group(1)
# print text
fw.write(text + '\n')
fr.close()
fw.close()

Output can be improved further observing things like we can replace characters like underscore with space, we can check for only four digits where next character is non word character ..

Sunday, September 4, 2011

Extracting IMDB data using Python

Sample code for extracting IMDB data using python BeautifulSoup package.

Requirement : To extract all the feature movie names and their ratings from IMDB database for a particular year.

Parameters used in query were identified using IMDB advanced search function. Start, count and year parameters were used in this case for querying.The url is queried for 100 records at a time since more than that is not allowed. After extracting movie names and rating for 100 records, the url is queried for next 100 records and so on.

Two files 'imdb_conf' and 'ratings' are created by the code. 'imdb_conf' file keeps track of the record number last read and 'ratings' file stores the movie name, rating and year.

Web scrapping the IMDB website for required data.

[sourcecode language="python"]
from BeautifulSoup import BeautifulSoup
import os
import re
import urllib2

def get_start_pos_yr(fimdb_config):
#for starting after last fetched record
#last line contains the last record fetched
nlines = fimdb_config.readlines()
startfrom = -1
year = None
if len(nlines) > 1:
list_num =re.search('[^\t]+',nlines[-1])
if list_num:
startfrom = int(list_num.group())+1
year =re.search('\t[0-9]+',nlines[-1]).group().strip()
return startfrom,year

def get_soup(url):
#get soup object for the url
try:
page = urllib2.urlopen(url)
except urllib2.URLError, e:
print 'Failed to fetch ' + url
raise e

try:
soup = BeautifulSoup(page)
except HTMLParser.HTMLParseError, e:
print 'Failed to parse ' + url
raise e
return soup

def get_ntotal(soup):
#fetch total number of records present for particular query
total_count=1
for div in soup.findAll('div', {'id':'left'}):
#print ivd.contents[0]
total_count = re.search('[ ]+[0-9,]+',div.contents[0])
if total_count:
total_count=total_count.group().replace(',','').strip()
#print "total"+total_count
return total_count

def set_rating(soup,fwimdb_config,frating,year,startfrom):
cond = True
count_rec=0
total_res=get_ntotal(soup)
year=str(year)
#total_res=100
while cond:
for tr in soup.findAll('tr', {'class':re.compile('(odd|even)[ a-zA-Z]*')}): #each row
for td in tr.findAll('td', {'class':'title'} ):
for link in td.findAll('a',{'href':re.compile('/title/tt[^/]+/$')}):
movie_name=link.contents[0] #title name
for rating in td.findAll('div',{'class':'rating rating-list'}):
count_rec=count_rec+1
if rating.has_key('title'):
#print "hurray"
rt = re.search('[0-9]+[^(]+',rating['title']) #rating
if rt:
frating.write(movie_name+"\t"+rt.group().strip()+"\t"+year+"\n")
else:
frating.write(movie_name+"\t--\t"+year+"\n")
else:
frating.write(movie_name+"\t--\t"+year+"\n")
#print movie_name+"\t"+rt.group()
fwimdb_config.write(str(count_rec)+"\t"+year+"\n")
if startfrom == 0:
startfrom = 101 #second run
else:
startfrom = startfrom + 100

if startfrom >= int(total_res):
cond=False
fwimdb_config.write("-1"+"\t"+str(int(year)-1)+"\n")
print str(startfrom)+" "+str(total_res)
soup=get_soup("http://www.imdb.com/search/title?languages=en&title_type=feature&count=100&sort=num_votes,desc&start="+str(startfrom)+"&year="+year)

def main():

fwimdb_conf=open("imdb_conf","r+")
frating = open("ratings","a") #ratings
fwimdb_conf.write("LastreadLine\tYear\n")
startfrom,year = get_start_pos_yr(fwimdb_conf)

if startfrom == -1:
startfrom = 0
if year == None:
year="2010"
print startfrom

soup=get_soup("http://www.imdb.com/search/title?languages=en&title_type=feature&sort=num_votes,desc&count=100&start="+str(startfrom)+"&year="+year)
set_rating(soup,fwimdb_conf,frating,year,startfrom)

frating.close()
fwimdb_conf.close()

if __name__ == '__main__':
main()
[/sourcecode]

For demonstration purposes only. If you plan to use IMDB data beyond personal usage, you should contact IMDB Licensing department.

Tuesday, August 30, 2011

String Aggregation in Oracle SQL using XMLAGG

Requirement

We have table test5 with following data
c1
-----
ADD
DELETE
MODIFY

Required output
col
------
ADD,DELETE.MODIFY

i.e we want column value to be aggregated with comma.

Solution
In real scenario we will have another column on which we need to group and then aggregate corresponding values.

In oracle 11g there is one function wm_concat which concatenates column values with comma. We can write a procedure to perform this thing but here we will try to do it with sql query alone.

This solution uses XMLELEMENT, XML_AGGR functions

[sourcecode language="sql"]
SELECT replace (replace (replace (XMLELEMENT ("col" , XMLAGG(XMLELEMENT("col", c1))), '</col><col>' , ',' ),'<col><col>' , '' ),'</col></col>', '')
as "col" FROM test5
[/sourcecode]

Output
col
------
ADD,DELETE,MODIFY

Dynamic column name in Oracle SQL Query using dbms_xmlgen

Consider scenario where we need to use dynamic column name in query.
You might have tried using select query in place of column list something like

select (select distinct column_name from table1 where column_name2='some_value') from table2

Based on some condition we will select column name from one table(test3 table here) and query another table(test 4) for the fetched column.

Consider table test4 with following data

[sourcecode language="sql"]
create table test4(c1 varchar2(20), c2 varchar2(20), c3 varchar2(20))
[/sourcecode]

c1 c2 c3 c4
-----------------------------------------------
col1_val col2_val col3_val col4_val

And another table test3 with column names of test4 as rows here.

[sourcecode language="sql"]
create table test3(col_name varchar2(2), flag number)
[/sourcecode]

col_name flag
-------------------
c1              0
c2              1
c3            0

based on flag value we want to select the column name. We want to achieve something like

select (select col_name from test3 where flag = 1) from test4 [this doesn't works]

We will use oracle xml functions getxml and extractvalue to achieve result. dbms_xmlgen.getxml takes SQL query as its argument and returns result in xml format.This function allows us to dynamically create sql query, we can use dynamic column or table name in it.

[sourcecode language="sql"]
select to_char (extractvalue (xmltype (dbms_xmlgen.getxml('select '||
to_char(extractvalue(xmltype(dbms_xmlgen.getxml('select col_name from test3 where flag =1')),'/ROWSET/ROW/COL_NAME')) || ' col from test4'
)),'/ROWSET/ROW/COL')) col_value from dual[/sourcecode]

SQL Problem: Item falls in which group boolean result

Scenario
Consider scenario where students data is present in two tables

1) Students in A group have entry in table A_group
student_id
101
102
103
108

2)Students in B group have entry in table B_group

student_id
101
103
109
111
Condition : Student can belong to either or both groups.

Requirement

Result is required in the form

student_id A_group_flag B_group_flag

101        1     1
102    1       0
103          1     1
108                      1           0
109          0        1
111          0        1

which shows student_id and Boolean values indicating if the student is present in that group or not.

*flag value 1 indicates student is present in that group

*flag value 0 indicates student is not present in that group

Solution

Here is one of possible solution that I thought of on first shot

Step 1 :

First we combine results from both tables without filtering any

SELECT DISTINCT student_id, 1 AS flag
FROM   a_group
UNION ALL
SELECT DISTINCT student_id, 4 AS flag
FROM   b_group

This query will give all the student ids from both tables with a flag indicating student's group

*flag value 1 indicates A group
flag value 4 indicates B group

Step 2 :
Now we have data of the form

student_id   flag

101 1

102 1

103 1

108 1

101 4
103       4
109 4
111    4

Now comes the logic part
If we group data by student_id and perform aggregate sum on flag like

SELECT student_id,
SUM(flag) AS newflag
FROM   (SELECT DISTINCT student_id, 1 AS flag
FROM   a_group
UNION ALL
SELECT DISTINCT student_id, 4 AS flag
FROM   b_group)
GROUP BY student_id

We will get following values of new flag

1 for students present in A group
4 for students present in B group
5 for student present in both groups

Result of above query would be of form

student_id   newflag
101          5
102          1
103    5
108    1
109          4
111    4

Step 3 :
Next we need to use 'IF conditionals' to show required data with query like

SELECT student_id,
( CASE
WHEN newflag = 1
OR newflag = 5 THEN 1
ELSE 0
END ) a_group_flag,
( CASE
WHEN newflag = 4
OR newflag = 5 THEN 1
ELSE 0
END ) b_group_flag
FROM
(SELECT student_id,
SUM(flag) AS newflag
FROM   (SELECT DISTINCT student_id, 1 AS flag
FROM   a_group
UNION ALL
SELECT DISTINCT student_id, 4 AS flag
FROM   b_group)
GROUP BY student_id )

This would give us required result of the form

student_id A_group_flag B_group_flag
101             1        1
102           1       0
103                 1               1
108       1               0
109       0               1
111       0                  1

Session variable and Context in Oracle, Using dbms_session.set_context

Oracle 'Context' can be used to capture session variables

A context is associated with a package and in that package's procedure we can use this context

It holds attribute value pair.

Attribute value is set using DBMS_SESSION.SET_CONTEXT

Attribute value can be set inside that package only to which it is associated.

Attribute value can be queried with 'sys_context' function.

Let's test it

Step 1:

For creating and using context we would need below privileges

GRANT CREATE SESSION, CREATE ANY context TO USER;

Step 2:

create a context 'c_userAttr' associated with package 'p_userAttr'

Syntax
create context context-namespace using package-name;

CREATE context c_userattr USING p_userattr;

Step 3:
Now we will create package definition

CREATE OR replace PACKAGE p_userattr
IS
PROCEDURE set_contexts;
END;

Write package body that sets the attribute for context

We can set as many attributes we want using DBMS_SESSION.SET_CONTEXT statement
In this case 'username' is the attribute we will set here

CREATE OR replace PACKAGE BODY p_userattr
IS
c_context CONSTANT VARCHAR2(30) := 'c_userAttr';
PROCEDURE Set_contexts(v_uname IN VARCHAR2)
IS
BEGIN
dbms_session.Set_context (c_context, 'username', v_uname);
END;
END;

Last Step

In application we can set context with procedure call like this

BEGIN
p_userattr.Set_contexts('crazy');
END;
and in the same session we can query value in that context namespace like

SELECT Sys_context('C_USERATTR', 'USERNAME')
FROM dual;

Variable Table name in Oracle SQL using dbms_xmlgen

Requirement

To find count of all tables

The interesting stuff in oracle is not just writing queries that preforms the required task but to write some single liners that could do the same work as lengthy procedures

so another requirement is to make it interesting

Solution

Solution 1

A procedural way to achieve this through looping for cursor is
DECLARE
CURSOR c1 IS
SELECT table_name
FROM   user_tables;
tname      VARCHAR2(200);
count_stmt VARCHAR2(5000);
count1     NUMBER;
count2     NUMBER;
BEGIN
OPEN c1;

FETCH c1 INTO tname;

WHILE c1%found LOOP
count_stmt := 'select count(1) from '
|| tname;

EXECUTE IMMEDIATE count_stmt INTO count1;

count_stmt := tname;

dbms_output.Put(count_stmt);

count2 := 30 - Length(tname);

WHILE count2 > 0 LOOP
dbms_output.Put(' ');

count2 := count2 - 1;
END LOOP;

dbms_output.Put_line(count1);

FETCH c1 INTO tname;
END LOOP;
END;

set serveroutput on to see the result

Solution 2

I was struggling to find some way to run dynamic sql query inside the select phrase something like

select table_name, (select count(*) from var_table_name) from user_tables

where var_table_name is variable that should hold value of table_name for each row processed from user_tables

and I found this

SELECT table_name,
To_number (Extractvalue (Xmltype (
dbms_xmlgen.Getxml('select count(*) c from '
||table_name)),
'/ROWSET/ROW/C')) AS COUNT
FROM user_tables;

dbms_xmlgen.getxml : Converts the results from the SQL query string to XML format, and returns the XML as a temporary CLOB
extractvalue : gets the child value from the node.XML result from oracle is in the form

Oracle "create table as .. select " to add new column

I came across this requirement when one of my friend wanted to create new table from existing table but with new columns (few extra column for future use) added to it.
He doesn't wanted to write 'Alter' script

Requirement:
Add new column 'comments varchar(30)' with default value as null to new_table
while creating copy of existing table old_table

Solution:
create table new_table as select old_column1 as new_column1 ,old_column2 as new_column2, cast('' as varchar2(30)) as comments from old_table;

Reason:

? Why we need to perform cast of null here
If I don't perform cast here and add new column as
, '' as new_column1
Oracle gives error like 'column cannot be of zero length...' because it creates datatype of that column based on the value provided

Like if I give new column as
'null' as comments it considers datatype as char(4) for comments column and successfully creates new column

So, when I try to give
'' as comments it assumes we are trying to give datatype as char(0), which is not acceptable to Oracle

So we need to cast value here as varchar2 here.

Oracle Pivot Query using max decode, Row to Column Conversion

Requirement 1:

Consider scenario where we have table with following data. Here an id can belong to number of groups(grp)

[sourcecode language="sql"]
create table test1(id number,grp varchar2(1));
[/sourcecode]

id grp
----------------
1      A
1      B
2    B
2      C
3      C

Required Output:
Distinct groups are displayed as column and a symbol x indicates that corresponding id is mapped to that group.

id A B C
-------------------
1 x   x
2    x x
3 x

There are fixed number of groups {A B C}, i.e cardinality of column grp is known and also all distinct values.

Solution

We will use the max(decode(..)) technique. This technique is used to convert rows to column and column to rows.

First we perform decode operation to convert the rows value in to column.
We need to know number of distinct values and also their name in this case. For each distinct group, we write a decode as follows

[sourcecode language="sql"]
<pre>SELECT id,
Decode(grp, 'A', 'x',
NULL) a,
Decode(grp, 'B', 'x',
NULL) b,
Decode(grp, 'C', 'x',
NULL) c
FROM test1
ORDER BY id
[/sourcecode]

Result:

id A B   C
----------------
1 x
1      x
2      x
2           x
3    x

Output will have same number of rows with x indicating mapping with corresponding id value.

Now we need to group rows by id, since we have distinct values for {id,group} only one x will be appearing for particular combination of id and grp in above result.

[sourcecode language="sql"]
SELECT id,
MAX(Decode(grp, 'A', 'x',
NULL)) a,
MAX(Decode(grp, 'B', 'x',
NULL)) b,
MAX(Decode(grp, 'C', 'x',
NULL)) c
FROM test1
GROUP BY id
ORDER BY id
[/sourcecode]

Result:

id A   B C
---------------
1 x x
2    x    x
3    x

Requirement 2 : Consider same table and data but now required result is

Required Output:

id   grp1 grp2    grp3
--------------------------
1    A B
2    B     C
3    C

that is we know distinct number of groups(we don't care about values or don't know values). So, all the groups to which id is mapped we need to show them in columns

Solution:

We will use the same max(decode(..)) technique but with slight modification.First for each id we will give a unique value to each grp value using analytic function row_number()

[sourcecode language="sql"]
SELECT id,
grp,
Row_number() over (PARTITION BY id ORDER BY grp) rn
FROM test1
[/sourcecode]

Result

ID GRP RN
----------------------
1    A    1
1    B 2
2    B 1
2    C 2
3    C 1

Now we use decode on rn field

[sourcecode language="sql"]SELECT id,
Decode(rn, 1, grp,
NULL) grp1,
Decode(rn, 2, grp,
NULL) grp2,
Decode(rn, 3, grp,
NULL) grp3
FROM (SELECT id,
grp,
Row_number() over (PARTITION BY id ORDER BY grp) rn
FROM test1)
ORDER BY id

[/sourcecode]

ID GRP1 GRP2 GRP3
--------------------
1     A      -    -
1     -       B -
2 B -    -
2     -    C    -
3     C -    -

[sourcecode language="sql"]
SELECT id,
MAX(Decode(rn, 1, grp,
NULL)) grp1,
MAX(Decode(rn, 2, grp,
NULL)) grp2,
MAX(Decode(rn, 3, grp,
NULL)) grp3
FROM (SELECT id,
grp,
Row_number() over (PARTITION BY id ORDER BY grp) rn
FROM test1)
GROUP BY id
ORDER BY id
[/sourcecode]

Output:

ID GRP1 GRP2 GRP3
1    A    B -
2       B    C -
3       C - -