There are many reasons but this is what I ran into:
The query was simple as follows:
select top 1 stopid from busstops where code = cast('836' as int)
Relevant portions of the table schema is as follows:
CREATE TABLE [dbo].[BusStops]( [ID] [int] IDENTITY(1,1) NOT NULL, [StopID] [varchar](20) NULL, [Code] [varchar](20) NULL, [Name] [varchar](100) NULL, [Summary] [varchar](200) NULL, [Lat] [float] NULL, [Lon] [float] NULL, [the_geom] [geography] NULL, CONSTRAINT [PK_BusStops] PRIMARY KEY CLUSTERED ( [ID] ASC ))
When the specified query was executed, SQL Server returned the following response:
Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the varchar value 'xxxx' to data type int.
However, if an equivalent query was executed
select top 1 stopid from busstops where code = cast('8789' as int)
this returned the correct result. What is going on here? Are my eyes planning tricks on me?
Apparently, there is a bad value in the code column of the busstops table that is not evaluating to an integer. But is the value entry literally ‘xxxx’ or is this some SQL Server error code? To answer this question, I ran the following query:
select code from busstops where isnumeric(code) = 0
and sure enough, there were a bunch of NULLs and several entries with literal values of ‘xxxx’ as shown below:
There you go, bad data causing the problem. Presented with a string of ‘xxxx’, SQL Server was unable to convert it to an int, hence the error. Just out of curiosity, I also wanted to know why the first query failed with code of 836 while second query with code of 8789 work?
When you select a top N, SQL Server will stop evaluating the query as soon as it has accumulated N records that meet this criteria. There is a possibility, therefore, that the entire table will not be scanned or seeked, if we are dealing with indexed rows. The theory I formulated was the row with code = 8789 definitely appears before any rows with code = xxxx. To validate this hypothesis, I ran this query:
select id, code from busstops where code in ('8789', '836', 'xxxx')
and the result was as follows;
select id, code from busstops where code in (‘8789’, ‘836’, ‘xxxx’)
thus validating the hypothesis. To get to the row containing code = 836, SQL Server has to evaluate the rows containing code = xxxx and it trips here.
I then did a search in the original text file for ‘xxxx’ and sure enough, there were a couple of lines containing this string literal such as :
WH367,xxxx,”RICHARDSON / ALL SAINTS H.S.”,,latitide,longitude,,,0
which is the root cause. To fix this, I correct the data file, re-imported by data into SQL Server and was back to business. However, to make this query more robust and to be able to withstand further discrepancies in the data, further considering this field is of type varchar, I decided to change the query to
select top 1 stopid from busstops where code = '836'