Quick, why are these do these sql statements take wildly different times to execute? Both are being run on a SQL Server Express 2008 database with DWPROD being an Oracle 10g database.
--Query 1 --takes 6 minutes 23 seconds to produce 575,424 rows select * from [DWPROD]..[DISC].[DW_SPD_DETAIL] x where x.FISCAL_YEAR = 2010;
vs.
--Query 2 --takes 1 minute 17 seconds to produce 575,424 rows select * from openquery(DWPROD,'select * from disc.dw_spd_detail where fiscal_year = 2010 ');
Query 1 is using a “linked server” to make the query. A linked server is basically an ODBC connection to another server built into SQL Server and you call it using fully qualified name. In this case the where clause is not being passed to the remote server because the SQL Server engine decided against it for some reason (I think it’s either the engine found a type mismatch or it’s a bug). Instead the entire DW_SPD_DETAIL table is being dumped into a temp table locally. Then it filters on the FISCAL_YEAR and gives the result. The whole table is millons of records and the server is only running on a 100 megabit port.

Time is money.
Photo by Search Engine People Blog
Query 2 is using OpenQuery to execute the query on the remote server. The difference is that the query string is not being parsed by the SQL Server engine, it’s being passed directly to the remote server. The remote server can filter on it’s side (using any indexes to speed it up) and only return the filtered results.
Query 2 is much faster because it’s only retrieving 575,424 rows from the database where query 1 is retrieving the entire table and then giving the results.
Time is money. The faster we can load all of our tables for the day, the faster we can start analyzing and reporting on the data. Plus, it’s good resume fodder to say “Reduced time for nightly database loads by 66%”.