I ranted a bit about SQL in general in my previous post, which was supposed to be a quick intro to the SQL training I'm putting myself through at home. Instead, I rambled on about developers being scared of SQL and bending over backwards in order to avoid it.
Anyway, my boss asked me last week to go to a SQL training. He correctly identified my lack of knowledge and understanding about the database tools we use at work and wanted to correct that. But instead of going to a training that just covers the basics in a breath-first manner, I bought two books: Inside SQL Server 2005: T-SQL Querying and Inside SQL Server 2005: T-SQL Programming. Oh my God... The amount and depth of knowledge in those books is staggering. I have four days to go from "just getting by" SQL developer to a "data can't hide from him" database guru. Of course, I told my boss that I'd be blogging about my discoveries in order to cement the knowledge in my head but now I find that I will not have enough of my four days of reclusion at home to learn everything I want to learn. I need to focus on what I need for work.
For this training, I dove deep into two books and started with the fundamentals...
Fundamentals
What steps SQL Server goes through to evaluate a query is a very useful thing to know and helps explain why certain things can or can't be done inside a query.
Take the following query for the Northwind database, which select the total number of orders made by all american employees where that number is greater than 100:
SELECT COUNT(Orders.OrderID) AS TotalOrderCount, Employees.EmployeeID
FROM OrdersJOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
WHERE Employees.Country = 'USA'
GROUP BY Employees.EmployeeIDHAVING COUNT(Orders.OrderID) > 100ORDER BY TotalOrderCount
SQL Server starts by executing the FROM and JOIN statements by performing a cross-join between the Orders table and the Employees table and storing the resulting rows in a virtual table that we'll call VT1. Then the ON statement is executed and only the rows that match the condition in the query are kept in another virtual table: VT2. The WHERE statement is then examined and SQL Server gets all the rows in VT2 where Country = 'USA' and stores them in VT3. VT4 is created by grouping together the rows based on employeeID, meaning that we are keeping only one row per employeeID and that we have to somehow aggregate the rest of the data from now on.
The HAVING clause indicates that we only want to keep the rows where the count of orderIDs per employeeID (aggregating the data) is greater than 100. The result is stored in VT5.
Then, we select the employeeID and the count of orders (remember, the rest of the data after executing the GROUP BY statement needs to be aggregated somehow) and assign, optionally, column aliases, which you can now use in the query, into VT6. Finally, VT7 is created when we use the ORDER BY clause.
Interstingly enough, VT7 is not returned. Instead, a cursor is what the query gives us back because of the ORDER BY clause. Without it, a virtual table would indeed by returned. This is why you cannot use ORDER BY inside subqueries, which are expected to return a table. After an ORDER BY statement, only functions that take cursors as input, like TOP(n), can be used.
So let's go back over the order in which SQL Server executes the query (I'll include every possible SQL statement this time):
1- FROM .. JOIN
2- ON
3- WHERE
4- GROUP BY
5- WITH
6- HAVING
7- SELECT
8- DISTINCT
9- ORDER BY
10- TOP
Any alias that you define in a query, be it a column alias, a table alias or a subquery alias can only be used in subsequent steps in the list or SQL Server will not recognize them.
On a side note, it is important to know about the UNKNOWN value in SQL. Usually, TRUE or FALSE is return when comparing values. But since some values can be NULL, comparing such a value with another one results in UNKNOWN. When a filter is applied to a query (WHERE, HAVING, ON), unknown values are considered like FALSE. But inside a check constraint, like making sure the salary in a column is greater than zero, unknown values are considered like TRUE.

2 comments:
SQL sure can do a lot, esp. T-SQL (I've written many a logic-filled sproc in my time) but you do have to be careful with that... as I'm sure you're reading, T-SQL has a fairly full set of loops, branching, and even some exception tools, and it's can be tempting to end up with business logic in your DB. It just needs to be said that logic is much harder to test in the DB, and may even be outside of your standard versioning/change control system-this is a risky set of affairs, and something to avoid if possible.
I will say that it's certainly still a good idea to get very familiar with what SQL can do, esp. if you have no DBA. If a query is running slowly, it's important to know why, and what can be done. (take a look at indexing and understanding query plans) Besides, even when using an ORM that can make queries, you'll want to know how to work with views and triggers to ease working with ugly tables while transitioning to better structures, or if you end up with a query that your ORM just isn't writing efficiently (rare, but I hear it happens), you might be able to handcraft something better.
Thanks for your comment, Anne.
You are absolutely right when you say that business logic has no place on the database side of of your application, and I should've mentioned it!
I am just amazed at the amount of information we can get out of a properly designed database and how fast SQL can get it for us.
Post a Comment