November 2011 – the Void

A while back I wrote about Stanford’s online DB course. Many of my friends who expressed interest unfortunately couldn’t afford the time to invest in an online course. Luckily, that wasn’t their last chance. But before I get to the upcoming courses, let me reflect on online courses in general and why Stanford, deservedly, got a wide coverage and following.

Online Lectures

I’ve been watching (and listening) to university lectures since circa 2004. Back then there weren’t too many available. In 2005 I discovered UC Berkeley’s WebCasts. These were RealVideo lectures from early 2000s on many hard sciences and some humanities. The quality of the courses were as expected high. The main issues were that RealVideo only allowed streaming, so there was no way of downloading. Besides that, the video was in rather low-resolution, low-quality and recorded from a single angle and reading the blackboard was unnecessary strain on the eye. Back in 2006 I wrote a python script using MPlayer to parse the pages, play the video with MPlayer and dump the resultant stream to files. The main issue with this was that a single hiccup and the lecture had to be downloaded from start again. Using this slow and painful method I downloaded dozens of courses and filled about 250GB worth of media and shared to colleagues and friends. Berkeley improved this by introducing Mp4 downloadable links in higher resolution starting in 2009 (I think) and now moved completely to iTunes and YouTube as platforms.

During that time I discovered MIT and Yale but neither could equal Berkeley in the number of courses or the topics. Berkeley’s Physics for Future Presidents is perhaps the best example that I can think of to show off what Berkeley had to offer. In 2008/2009 Stanford’s Leonard Susskind’s Modern Physics was available online, which is another top-notch lecture series.

As MIT, Yale, Harvard and Princeton made available more courses, I discovered Academic Earth, which I think at this point is perhaps the single best site for high-quality, diverse and highly usable video lectures.

Why University Lectures?

The process of learning isn’t linear nor comes in one flavor. Different people have different preferences on how they rather best learn or study something. Indeed, different topics might be best learnt in dissimilar methods. From books, tutorials, hands-on examples to demonstrations, labs and homework assignments. But university lectures place one’s mind into the classroom state. With all the students, rigor in subject treatment, questions from students, they all contribute to the state of mind that is very important in taking the information seriously. The lectures are also divided in such a manner that’s expected to be reasonably-paced. There are also review sessions and sometimes quizzes.

At this time I’m enjoying Justice on political philosophy and morality (by Michael Sandel of Harvard) and Physics I: Classical Mechanics (by Walter Lewin of MIT) on Academic Earth, besides Stanford’s DB, ML and AI courses, of course.

With tutorials, books and other forms of teaching most of the above is lost. Not to mention the caliber of the teachers in these universities are expectedly quite high.

The Stanford Model

Stanford introduced 3 “experimental” online courses in Database, Machine Learning and Artificial Intelligence in the Fall of 2011 (it’s mid-term exam week as I write this). Besides the fact that these courses are highly sought-after, the Artificial Intelligence course is taught by Google rock-star Sebastian Thrun who’s behind the Driverless Google car (video featuring Prof. Thrun).

Unlike all other online courses, Stanford’s approach was more course-like and less of a video recording of a lecture, as all others are. MIT might offer notes, slides and transcripts, but Stanford’s courses have forums where TA’s and teachers participate, sidecasts where teachers do online video chat with students, online and interactive quizzes that pop-up during the lectures, assignments and exams with automated grading. In addition, at the end of a course, the teacher will give a signed statement of accomplishment to the students who participated. This is in addition to prerequisite and preparatory lectures, external resource links, books and reading materials, transcripts and translations, downloadable video and lecture slides.

So there is not only much more interactivity with thousands of other students with study groups gathering in person and online, but also there is interactivity with the teachers and with online tests with immediate feed-back. These features make the courses much more than the sum of its parts. They exploit the internet and multimedia to their true potential and deliver a remarkable package for a globally available learning experience.

For those who can’t, or don’t want to, take up the assignments and exams, they can choose the Basic track. The course material is promised to be available to all during and after the end of the courses.

More Courses!

With the success that the DB, ML and AI classes saw, with well over 300,000 students enrolled in total, it’s no surprise that they are expanding this to other subjects as well.

CS 101 by Nick Parlante.
Natural Language Processing by Dan Jurafsky and Chris Manning.
Software Engineering for SAAS by Armando Fox and David Patterson.
Human-Computer Interfaces by Scott Klemmer.
Game Theory by Matthew Jackson and Yoav Shoham.
Probabilistic Graphical Models by Daphne Koller.
Machine Learning by Andrew Ng.
The Lean Launchpad by Steve Blank.
Technology Entrepreneurship by Chuck Eesley.
Design and Analysis of Algorithms I by Tim Roughgarden.
Cryptography by Dan Boneh.

These courses will start in January or February 2012. I expect they will announce others, most probably ~~Database and~~ (prof. Widom announced that the course will be available next fall) Artificial Intelligence, or so I hope.

Final Thoughts

Needless to say, Stanford’s online courses is a very welcome project and one that will change the face of education, e-learning and especially distance learning and autodidactism by raising the bar and setting new standards. US universities aren’t the only player here. Perhaps the best example is India’s National Programme on Technology Enhanced Learning (NPTEL) which already has hundreds of courses and plans to expand to over a thousand. Their Artificial Intelligence course, by prof. P. Dasgupta, is highly acclaimed. From the other end of the spectrum there is Khan Academy, which is a not-for-profit educational organization led by a very enthusiastic and charismatic figure, Salman Khan, who’s behind the 2700+ videos on most all topics. Khan Academy also has practice problems with scoring method and graphs tracking progress over time and in each subject.

Education and learning in general has never been this accessible before. With the internet, Wikipedia, free books and video tutorials and university lectures available to anyone with an internet connection. We no longer have an excuse for ignorance but our lack of will. Not looking up things we don’t know, suspect to know correctly or completely or want to learn more about, is practically inexcusable. Of course with all that information and availability also comes a sizable amount of chaff that one must weed-out from wheat. The signal-to-noise ration can be quite disappointing on some topics. But at least that can be done faster and much easier than it was only a decade or two ago, and not searching for criticism or opponents to get a more balanced picture is as inexcusable as not looking up in the first place. At least now one can also reach experts and hear what they have to say. We can even attend the best universities from the comfort of our armchairs.

I can’t help but wonder if the next generation will look at physically attending lectures as this generation does to putting pen to paper.

Transactions are the seat-belts of SQL servers. They protect our data by promising “all or nothing”. Data consistency on any level couldn’t be guaranteed without it. In a sense transactions work because they guarantee atomicity. They work because they allow you to abandon your changes and walk away at will, completely secure in the knowledge that not only your changes thus far aren’t visible to anyone, but also that your changes didn’t go anywhere near your valuable data.

Any reasonably-sized database code, therefore, is bound to have transactions peppered all about. Yet having a flawed implementation could be as worse as not having any at all. Perhaps even worse, given the false confidence. To my shock and horror, we just discovered a similar situation in one of our products. The opportunity to get to the bottom of transactions was pretty much forced, but it was a welcome opportunity nonetheless.

Here I’ll limit this investigation to MS SQL (2005 and 2008 used for testing).

Here is what we’re trying to achieve:

Understand how MS SQL transactions work.
Transactions must support nesting and rollback should abort all.
Develop a template that use transactions correctly with error handling.

Test Bed

Let’s start by a boilerplate database as a test-bed.

-- Create an uncreatively named database.
CREATE DATABASE DeeBee
GO
USE DeeBee
GO

-- Create a table to use for data modification.
CREATE TABLE Tee(
	-- We'll generate overflow exception at will using this field.
	Num TINYINT
)
GO

Now let’s see what happens if we do multiple inserts and one fails…

-- Sproc to do multiple inserts with failure.
CREATE PROCEDURE [Multi_Insert]
AS
	-- Normal case
	INSERT INTO Tee VALUES(1);
	-- Overflow case
	INSERT INTO Tee VALUES(2000);
	-- Normal case again
	INSERT INTO Tee VALUES(3);
GO

To execute the above sproc, run the following (repeat after each change to the sproc above):

-- Clean slate.
DELETE FROM Tee
GO

-- Execute complex statements.
EXEC Multi_Insert;
GO

-- Check the results
SELECT * FROM Tee
GO

Results (for the exec alone):

(1 row(s) affected)
Msg 220, Level 16, State 2, Procedure Multi_Insert, Line 8
Arithmetic overflow error for data type tinyint, value = 2000.
The statement has been terminated.

(1 row(s) affected)

Two things to observe here: First, unsurprisingly the first insert went as planned as can be seen in the first “1 row(s) affected” message, followed by the overflow error. Second, which is important here, is that this didn’t abort or change the flow of the sproc, rather, the 3rd insert was executed as well.

Selecting everything in the Tee gives us:

1
3

Error Handling

In many cases we wouldn’t want to resume with our operations when one fails. A quick solution might look something like this:

-- Sproc to do multiple inserts with failure.
ALTER PROCEDURE [Multi_Insert]
AS
	-- Normal case
	INSERT INTO Tee VALUES(1);
	if @@ERROR <> 0
		RETURN

	-- Overflow case
	INSERT INTO Tee VALUES(2000);
	if @@ERROR <> 0
		RETURN

	-- Normal case again
	INSERT INTO Tee VALUES(3);
GO

This would work, but it only prevents further operations after the initial error, it doesn’t undo previous changes. In addition, it’s very tedious, especially with complex sprocs and looks ugly. Let’s see what we can do with exception handling…

-- Sproc to do multiple inserts with failure.
ALTER PROCEDURE [Multi_Insert]
AS
BEGIN TRY
	-- Normal case
	INSERT INTO Tee VALUES(1);
	-- Overflow case
	INSERT INTO Tee VALUES(2000);
	-- Normal case again
	INSERT INTO Tee VALUES(3);
END TRY
BEGIN CATCH
	PRINT ERROR_MESSAGE();
END CATCH
GO

This is functionally equivalent to the previous version, however here we have a clean control-flow and no longer do we need to taint our code with error checking. The print statement isn’t necessary, I just added it to show that the same error occurs. The result of this sproc is a single insert into Tee with the value 1.

Transactions

Let’s do the same, this time using transactions only.

-- Sproc to do multiple inserts with failure.
ALTER PROCEDURE [Multi_Insert]
AS
BEGIN
	BEGIN TRANSACTION
		-- Normal case
		INSERT INTO Tee VALUES(1);
		-- Overflow case
		INSERT INTO Tee VALUES(2000);
		-- Normal case again
		INSERT INTO Tee VALUES(3);
	COMMIT TRANSACTION
END
GO

The above sproc will result in the following, familiar, output:

(1 row(s) affected)
Msg 220, Level 16, State 2, Procedure Multi_Insert, Line 9
Arithmetic overflow error for data type tinyint, value = 2000.
The statement has been terminated.

(1 row(s) affected)

This is the exact same behavior as without the transactions! This means transactions don’t automatically add protection, recovery and undo. Not only didn’t we revert the first insert after the error, but we went on to execute the remainder of the transaction. The only way to fix this is to actually issue a ROLLBACK TRANSACTION statement. Of course we must do this by first detecting that something went wrong. Since we already know that we must do this manually, we should either check for errors or wrap the code in TRY/CATCH clauses.

-- Sproc to do multiple inserts with failure.
ALTER PROCEDURE [Multi_Insert]
AS
BEGIN TRY
	BEGIN TRANSACTION
		-- Normal case
		INSERT INTO Tee VALUES(1);
		-- Overflow case
		INSERT INTO Tee VALUES(2000);
		-- Normal case again
		INSERT INTO Tee VALUES(3);
	COMMIT TRANSACTION
END TRY
BEGIN CATCH
	ROLLBACK TRANSACTION
	PRINT ERROR_MESSAGE();
END CATCH
GO

Results:

(1 row(s) affected)

(0 row(s) affected)
Arithmetic overflow error for data type tinyint, value = 2000.

Finally, modifications are rolled back and the table is back to a clean slate, as it was before executing the sproc.

Nested Transactions

Now that we have a working transaction with correct exception handling, let’s try to develop a version that is symmetric and can be nested. We could use the above try/transaction/catch/rollback pattern, except, it’s not robust. To see why that is so, let’s take a little bit more complex case. Suppose you used this pattern in the sprocs that do modification, and in one case you called one sproc from another.

-- Sproc that does complex operations.
CREATE PROCEDURE [SP_Complex]
AS
BEGIN TRY
	BEGIN TRANSACTION

		-- Normal case
		INSERT INTO Tee VALUES(0);

		-- Execute a bad sproc.
		EXEC Multi_Insert;

		-- Normal case again
		INSERT INTO Tee VALUES(5);
	COMMIT TRANSACTION
END TRY
BEGIN CATCH
	ROLLBACK TRANSACTION
	PRINT ERROR_MESSAGE();
END CATCH
GO

The above sproc is written using the same pattern we used in Multi_Insert. We know the inner Multi_Insert will have an exception, it’ll rollback the transaction and return. But what about the outer transaction in SP_Complex? Would the first insert be rolled back? Will the last insert get executed?

Results:

(1 row(s) affected)

(1 row(s) affected)

(0 row(s) affected)
Arithmetic overflow error for data type tinyint, value = 2000.
Msg 3903, Level 16, State 1, Procedure SP_Complex, Line 19
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION.
Transaction count after EXECUTE indicates that a COMMIT or ROLLBACK TRANSACTION statement is missing. Previous count = 1, current count = 0.

Things didn’t go as planned. Checking for modifications in the Tee table shows that indeed the transaction worked as expected and all changes were rolled back. We still got an error though! To understand what’s going on, let’s see what the error was about.

First, notice that there were two inserts, one happened in the outer transaction and the second from the inner one, yet both were rolled back (the table should be empty). Then we print the overflow error, which indicates that the second insert in Multi_Insert threw. So far, so good. Then we hit an unanticipated error message. The clue to what happened is in “Transaction count after EXECUTE […]”. It seems that SQL engine is checking for transaction count after EXEC statements. Since we called ROLLBACK in Multi_Insert it seems that this has rolled back not only the inner transaction, but also the outer. This is confirmed by two facts. First, the table hasn’t been changed and all our inserts have been undone. Second, the inner ROLLBACK does match the BEGIN TRANSACTION in Multi_Insert and so does the pair in SP_Complex. So the only way we could get a mismatch error is if something in Multi_Insert had an effect larger than its scope. (Also, what’s with “Transaction count” references?)

To see what actually happened, let’s trace the execution path:

-- Sproc to do multiple inserts with failure.
ALTER PROCEDURE [Multi_Insert]
AS
BEGIN TRY
	BEGIN TRANSACTION
		PRINT 'IN [Multi_Insert]. Transactions: ' + Convert(varchar, @@TRANCOUNT);
		-- Normal case
		INSERT INTO Tee VALUES(1);
		-- Overflow case
		INSERT INTO Tee VALUES(2000);
		-- Normal case again
		INSERT INTO Tee VALUES(3);
	COMMIT TRANSACTION
END TRY
BEGIN CATCH
	PRINT 'ERR [Multi_Insert]: ' + ERROR_MESSAGE();
	ROLLBACK TRANSACTION
	PRINT 'Rolled back. Transactions: ' + Convert(varchar, @@TRANCOUNT);
END CATCH
GO

-- Sproc that does complex operations.
ALTER PROCEDURE [SP_Complex]
AS
BEGIN TRY
	BEGIN TRANSACTION
		PRINT 'IN [SP_Complex]. Transactions: ' + Convert(varchar,@@TRANCOUNT);

		-- Normal case
		INSERT INTO Tee VALUES(0);

		-- Execute a bad sproc.
		EXEC Multi_Insert;

		-- Normal case again
		INSERT INTO Tee VALUES(5);
	COMMIT TRANSACTION
END TRY
BEGIN CATCH
	PRINT 'ERR [SP_Complex]: ' + ERROR_MESSAGE();
	ROLLBACK TRANSACTION
	PRINT 'Rolled back. Transactions: ' + Convert(varchar, @@TRANCOUNT);
END CATCH
GO

Results:

IN [SP_Complex]. Transactions: 1

(1 row(s) affected)
IN [Multi_Insert]. Transactions: 2

(1 row(s) affected)

(0 row(s) affected)
ERR [Multi_Insert]: Arithmetic overflow error for data type tinyint, value = 2000.
Rolled back. Transactions: 0
ERR [SP_Complex]: Transaction count after EXECUTE indicates that a COMMIT or ROLLBACK TRANSACTION statement is missing. Previous count = 1, current count = 0.
Msg 3903, Level 16, State 1, Procedure SP_Complex, Line 21
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION.
Rolled back. Transactions: 0

To get the transaction count we use @@TRANCOUNT. Now if we look at the output from the trace, we’ll see the exact same thing, except now it’s obvious that the transaction count is at 2 within the inner transaction, yet after catching the overflow exception and rolling back, the transaction count has dropped to 0! This confirms it. ROLLBACK works not only on the current transaction, but escalates all the way to the top-most transaction. The references to the “Previous count = 1, current count = 0.” is regarding the transaction count before the EXEC statement and after it. The ROLLBACK in Multi_Insert made both the COMMIT and ROLLBACK statements in SP_Complex obsolete, or rather invalid.

Correct Nesting

-- Sproc to do multiple inserts with failure.
ALTER PROCEDURE [Multi_Insert]
AS
BEGIN TRY
	BEGIN TRANSACTION
		PRINT 'IN [Multi_Insert]. Transactions: ' + Convert(varchar, @@TRANCOUNT);
		-- Normal case
		INSERT INTO Tee VALUES(1);
		-- Overflow case
		INSERT INTO Tee VALUES(2000);
		-- Normal case again
		INSERT INTO Tee VALUES(3);
	COMMIT TRANSACTION
END TRY
BEGIN CATCH
	PRINT 'ERR [Multi_Insert]: ' + ERROR_MESSAGE();
	IF (@@TRANCOUNT > 0)
		ROLLBACK TRANSACTION
	PRINT 'Rolled back. Transactions: ' + Convert(varchar, @@TRANCOUNT);
END CATCH
GO

-- Sproc that does complex operations.
ALTER PROCEDURE [SP_Complex]
AS
BEGIN TRY
	BEGIN TRANSACTION
		PRINT 'IN [SP_Complex]. Transactions: ' + Convert(varchar, @@TRANCOUNT);

		-- Normal case
		INSERT INTO Tee VALUES(0);

		-- Execute a bad sproc.
		EXEC Multi_Insert;

		-- Normal case again
		INSERT INTO Tee VALUES(5);
	COMMIT TRANSACTION
END TRY
BEGIN CATCH
	PRINT 'ERR [SP_Complex]: ' + ERROR_MESSAGE();
	IF (@@TRANCOUNT > 0)
		ROLLBACK TRANSACTION
	PRINT 'Rolled back. Transactions: ' + Convert(varchar, @@TRANCOUNT);
END CATCH
GO

By checking whether or not an inner transaction has rolled back already we avoid any mismatches. This works because using try/catch errors always execute the catch clauses, preventing any statements after the error to execute. If our transactions catch an exception, yet the @@TRANCOUNT is 0, then we must conclude that an inner sproc has made more commits than the transactions it created, or it rolled back the transaction(s). In both cases we must not ROLLBACK unless we have a valid transaction, and we must always ROLLBACK if we’re in a CATCH clause if we have a valid transaction. Notice that we don’t check for @@TRANCOUNT to COMMIT TRANSACTION. The reason is because the only case where we could get to it is when the execution of the previous statements are successful. True that if somewhere a mismatched COMMIT is done, we’ll get a mismatch, but that’s a programmer error and must not be suppressed. That is, by suppressing that case, our code will not work as expected (in a transaction) because all subsequent statements will execute outside transactions, and therefore can’t be rolled back. So we let that mismatch blow up.

The Template

So, now that we have a good understanding of how transactions and error handling works, we can finally create a template that we can reuse, safe in the knowledge that it’s nestable and will behave as expected in all cases.

BEGIN TRY
	BEGIN TRANSACTION

	--
	-- YOUR CODE HERE
	--

	-- Don't check for mismatching. Such an error is fatal.
	COMMIT TRANSACTION
END TRY
BEGIN CATCH
	-- An inner sproc might have rolled-back already.
	IF (@@TRANCOUNT > 0)
		ROLLBACK TRANSACTION
END CATCH

Full test-bed for reference:

IF EXISTS(SELECT * FROM sys.sysdatabases where name='DeeBee')
DROP DATABASE DeeBee
GO

-- Create an uncreatively named database.
CREATE DATABASE DeeBee
GO
USE DeeBee
GO

-- Create a table to use for data modification.
CREATE TABLE Tee(
	-- We'll generate overflow exception at will using this field.
	Num TINYINT
)
GO


-- Sproc to do multiple inserts with failure.
CREATE PROCEDURE [Multi_Insert]
AS
BEGIN TRY
	BEGIN TRANSACTION
		PRINT 'IN [Multi_Insert]. Transactions: ' + Convert(varchar, @@TRANCOUNT);
		-- Normal case
		INSERT INTO Tee VALUES(1);
		-- Overflow case
		INSERT INTO Tee VALUES(2000);
		-- Normal case again
		INSERT INTO Tee VALUES(3);
	COMMIT TRANSACTION
END TRY
BEGIN CATCH
	PRINT 'ERR [Multi_Insert]: ' + ERROR_MESSAGE();
	IF (@@TRANCOUNT > 0)
		ROLLBACK TRANSACTION
	PRINT 'Rolled back. Transactions: ' + Convert(varchar, @@TRANCOUNT);
END CATCH
GO

-- Sproc that does complex operations.
CREATE PROCEDURE [SP_Complex]
AS
BEGIN TRY
	BEGIN TRANSACTION
		PRINT 'IN [SP_Complex]. Transactions: ' + Convert(varchar, @@TRANCOUNT);

		-- Normal case
		INSERT INTO Tee VALUES(0);

		-- Execute a bad sproc.
		EXEC Multi_Insert;

		-- Normal case again
		INSERT INTO Tee VALUES(5);
	COMMIT TRANSACTION
END TRY
BEGIN CATCH
	PRINT 'ERR [SP_Complex]: ' + ERROR_MESSAGE();
	IF (@@TRANCOUNT > 0)
		ROLLBACK TRANSACTION
	PRINT 'Rolled back. Transactions: ' + Convert(varchar, @@TRANCOUNT);
END CATCH
GO

-- Clean slate.
DELETE FROM Tee
GO

-- Execute complex statements.
--EXEC Multi_Insert;
EXEC SP_Complex;
GO

-- Check the results
SELECT * FROM Tee
GO

Conclusion

Error handling is often tricky and full of gotcha’s. We should be careful not to make any unwarranted assumptions, especially when working cross-languages. The error handling, and more importantly transactions, in SQL don’t necessarily behave like your favorite programming language. By carefully reading documentation, experimenting and adding traces we can get insight and develop robust solutions.

However the above isn’t the last word on error handling or transactions. Your needs might be different and your use-cases might dictate other priorities. Perhaps my approach misses some subtle case that you can’t afford to ignore. Or may be this is good enough for a boilerplate template. Either way, comments, corrections and suggestions are more than welcome.