Wednesday, March 31, 2010

Deleting duplicate rows in Sybase

I had to delete a set of duplicate rows in Sybase and that had me in a quandry. I had done that many-a-times in Oracle using the ROWID psuedo-column but nothing similar is available in Sybase or for that matter in SQL Server. Here's how to go about this daunting task in Sybase!

The deletion of duplicate rows cannot be done in a single SQL query like it can be done in Oracle. Hence it needs to be done in 3 steps:
  1. Create temporary table with one instance of each distinct row
  2. Delete all rows having multiple instances
  3. Move back the unique data from temporary table to actual table

Step 1:
Create temporary table with one instance of each distinct row with the below query.
select distinct t.* into temp_table_name
from data_table_name t
group by keycol1, keycol2, keycol3 
having count(0) > 1
Note: that this step creates the temp_table_name table so it will fail if a table by that name already exists.

This step puts one instance of each distinct row into the temporary table. However, some of the rows may contain identical values in the key columns but different data in the remaining columns. At this point you need to check the data in the temp_table_name and reconcile which of the rows you wish to keep for a given duplicate key value. You may even need to do these three steps on the temporary table to achieve a set of unique records in the temporary table.

Step 2:
Delete all rows having multiple instances with the below query.
delete data_table_name from temp_table_name 
where data_table_name.keycol1 = temp_table_name.keycol1 
and data_table_name.keycol2 = temp_table_name.keycol2 
and data_table_name.keycol3 = temp_table_name.keycol3

Step 3:
Move back the unique data from temporary table to actual table with the below query.
insert into data_table_name 
select * from temp_table_name

Remember to change the table & column names in the above queries.
Replace data_table_name with your actual table name,
temp_table_name with your choice of name for temporary table and
keycol1..n with the actual table column names!

Check out my previous post here for doing this same thing in Oracle.

Deleting duplicate rows in Oracle

Deleting duplicate rows in Oracle seems to be the simplest to me (may be because I have used it more or may be its quite simple really!). Lets get down straight to the code and do a recap of how to delete duplicate rows in Oracle.

Simple Solution
Note that this simple solution assumes that there are no primary or unique keys on the table but the combination of data in a set of columns defines the duplicates. The example shows three key columns, but you can use more or less columns to identify unique rows based on your actual data.

DELETE FROM data_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM data_table
GROUP BY keycol1, keycol2, keycol3

However, there could be cases where the table has a sequence / auto-generated column as the primary key. In such cases too, you can use the above query after substituting rowid with the id column name.

Advanced Solution
If you are looking for an advanced and faster solution with more features & flexibility, you can use the following solution that I found on devx. This solution uses RANK() function's capabilities to provide the extra features.

FROM data_table
(SELECT tmp_key
(SELECT tmp_key, keycol1, keycol2,
RANK() OVER (PARTITION BY keycol1, keycol2 ORDER BY tmp_key) AS seq_num
(SELECT rowid as tmp_key, keycol1, keycol2
FROM data_table
WHERE (keycol1, keycol2) IN
(SELECT keycol1, keycol2
FROM data_table
GROUP BY keycol1, keycol2
HAVING COUNT(0) > 1)))
WHERE seq_num > 1)

Let me break up this complicated query into parts and explain in detail.
  1. The innermost sub-query (lines 12 - 15) finds those records that have duplicates. This gives us the sub-set of data that are only duplicate rows.
  2. The next outer sub-query (lines 9 - 11) uses ROWD psuedo-column to create a single key column. This adds a key column to the set of data rows we are interested in.
  3. The next outer sub-query (lines 6 - 8) uses RANK() function to dynamically assign sequence numbers to rows in the set of duplicate rows and orders them. Note that there are a set of sequence numbers each for each set of duplicate rows. This assigns a sequence number to and orders the rows inside each set of duplicates.
  4. The next outer sub-query (lines 4, 5 & 16) selects the duplicate rows based on the sequence number assigned by the RANK function is above step.
  5. The outer-most query (lines 1 - 3) deletes the duplicate rows.

Note: In case the table has a unique / primary key, substitute that instead of the ROWID psuedo-column in the above solution to get the same results.

Now, the extra features that using RANK provides are that you can choose the number of rows you want to keep. For example, you want to keep only the top 10 rows, you can change the numbers in lines 15 & 16 of the above query from 1 to 10 and add a proper ORDER BY to line 7 to get the data sorted in the required manner to determine the "top 10 rows".

Similarly, if you want to keep only the latest row, you can add a proper ORDER BY clause to line 7 to get the data sorted in the required manner to determine the latest row.

Remember to change the table & column names in the above queries.
Replace data_table with your actual table name and
keycol1..n with the actual table column names!

Check out my next post here for doing this same thing in Sybase.

Monday, March 22, 2010

Install WordPress on IIS7

A friend of mine wanted me help him install WordPress on his website - no big deal as WP is famous for it's 5 minute installs and so I figured this should be a cake walk for me. Little did I know that my friend's web-server was IIS7 on Windows Server 2008 and this would translate to a 5 hour install :(

Google, pointed me to a very good site that helped me out -

Here are the components that you need to download:

Here are the Installation steps:
  1. Install & configure PHP
  2. Install & configure MySQL
  3. Install & configure PHPMyAdmin
  4. Install & configure URL Rewrite Extension
  5. Install & configure WordPress
  6. Configure Search Engine Friendly URLs on WordPress

However, here are a few "gotchas" that need to be taken care of:
  • For PHPMyAdmin to work, you may need to change in
    $cfg[$i]['host'] = ‘localhost’; to
    $cfg[$i]['host'] = ‘127.0.01’; or
    $cfg['Servers'][$i]['host'] = ‘′
  • For PHPMyAdmin to work, you may need to change in
    controluser = ‘pma’ to
    controluser = ‘root’
  • For PHPMyAdmin to work, you may need to change windows\system32\drivers\etc\hosts file:
    Remove the comment (#) in front of the line
    Comment out the line ::1 localhost
  • Read the comments in all the pages as they have solutions to a lot of common problems

Note: I am assuming that IIS7 is already installed. If not, then there's a good tutorial for that as well here!

Wednesday, March 17, 2010

SQuirreL behaving weird on Ubuntu

I needed to access MySQL & MS SQL databases from my Ubuntu 9.04 laptop and was at a total loss. After a bit of research, SQuirreL sql client looked like the best choice.
However, since this is not available through the Synaptics Package Manager so it had to be downloaded and installed manually - which is the way software used to be installed on *nix boxes originally.

At that time, SQuirreL 3.0.2 was the latest version and all versions starting from 3.0 require Java6 - this I was able to install from the Synaptics Package Manager. Installing SQuirreL was simple: Run the command sudo java -jar squirrel-sql-3.0.2-install.jar and it was installed.

However I was unable to access MS SQL Server DB yet as I did not have the proper JDBC drivers for that. After narrowing down my choices, I found I was left with jTDS which is an open source 100% pure Java (type 4) JDBC 3.0 driver for Microsoft SQL Server (6.5, 7, 2000, 2005 and 2008) and Sybase (10, 11, 12, 15). Just put the jtds jar file in the squirrel install lib folder and then add the new driver from the UI.

After this, I was able to access MS SQL server but often, the program would just freeze or become unresponsive after a few minutes of use (or lack thereof). I would need to kill the java process and restart it. Also, the automatic updates never worked and so the software remained the same version till I decided to take things into my hands.

At this time the latest version available was 3.1 which was good news indeed - I was expecting a full solution to my problems! However, the website offered no documentation on how to upgrade the software, only how to install it. So I went ahead and installed it in a similar fashion - but this time it installed in a different folder. The previous install was in the folder /usr/local/SQuirrel SQL Client/ and the new one was in the folder /usr/local/squirrel-sql-3.1/

I then tried to uninstall the older version by running the uninstall.jar from the Unintall folder of the previous install. However that did not do much and so I just removed that whole folder - just like we used to remove software from *nix in the "good old days"! The good news was that the configuration was stored in my home folder in the .squirrel-sql folder so all aliases/connections/drivers etc were intact and I was able to use them immediately.

But alas! Even after all this effort, the problem of SQuirreL freezing after a few minutes of use still did not go away!

Please let me know if you see any solution to this problem or if you come across a better SQL Client for Ubuntu which can access MySQL & MS SQL!


Related Posts Plugin for WordPress, Blogger...