SSIS export to Excel – Data type issue – “Number Stored as Text”

ISSUE: SQL export to Excel using SSIS
We want to export numeric data directly to Excel but the rows come out as “Number Stored as Text”

Why is this and how do we get around it so that the numeric data is exported as numberic in Excel?

The first thing to understand is that Excel is not a database!
In Excel each cell is autonomous. That is, it is an entity in its own right and not part of a row or column other than by inclusion. This means each cell can have a different type within the same row or columm. By contrast all rows in a column within a database have the same type.

The connection provider (JET or ODBC) exposes the spread sheet workbook to look like it is a database with each sheet looking like a table, therefore we have rows of data organised in to columns which contain meaningful names (denoted by the text in the first row of the sheet) but this does not mean that the it supports all the properties of a database.

There are two major limitations affecting us right now.

• You cannot define the data type in Excel

• You cannot delete rows from the sheet

i.e. Columns therefore do not have a defined schema (As each cell in the column can be of a different type).

When you read data from Excel using a provider such as Jet or ODBC then the type is inferred from the first 8 rows of data in each column. The column header (name) is assumed to be in the first row so the (column) type will be inferred from rows 2 to 9 (unless you specify otherwise in the connection string).

Sadly when you write data to Excel, MS saw fit to use the same system which doesn’t really make sense. It would be nice if they had looked at the data type of the data you want to put in to the sheet but they didn’t so tough!

Because the type is inferred from the Sheet you cannot specify the data type anywhere in SSIS as this would be pointless.

  • You can change the type of the data you are exporting at the source but it will still be exported in the format dictated by the already existing Excel sheet.
  • You can change the column type in the advanced properties of the Excel destination control in SSIS but this will just result in an error in the connection until you re-bind the sheet and the value will be put back to what the Excel provider thinks it should be.

What are the rules for inferring type from the column?

  • If there are less than 8 rows then it uses the data available
  • If there are more than 8 rows it uses the top 8 rows (not including the header)
  • If the top 8 rows have different types then the majority rule
  • If the top 8 rows have different types but the same number of each e.g. 4 dates and 4 integers then the type follows type precedence which in Excel is numeric first (so integer wins in this case)
  • Type is not affected by Excel cell formatting settings.
  • If there is no data in the rows then no type can be inferred so when writing the data, the provider will always write as text as this is the safer option. You can use the VBA VAL() function to convert the text to a number.

If reading data from a column with mixed types then type is inferred as above and cells with alternate types will be returned as NULL

The best way to export data to Excel is use a template file (blank Excel sheet that you copy to a new location and then fill) and define a single hidden row in the Excel Spread sheet just under the column headings which contains a value of the correct data type.

When we connect SSIS to the sheet it will infer its type from this hidden rows and therefore populate additional rows with the correct type, however, as we cannot delete rows from the spread sheet via SSIS we cannot remove this hidden data prior to sending to the client so this hidden data must be innocuous. i.e. It must not be seen as a result or have any effect on the results.

The following bugs should also be noted when using the ODBC driver. This is not the case for the ADO connection that uses the JET driver. The following was taken from a Microsoft knowledgebase article….

Column headings: By default, it is assumed that the first row of your Excel data source contains columns headings, which can be used as field names. If this is not the case, you must turn this setting off, or your first row of data “disappears” to be used as field names. This is done by adding the optional FirstRowHasNames= setting to the connection string. The default, which does not need to be specified, is FirstRowHasNames=1, where 1 = True. If you do not have column headings, you need to specify FirstRowHasNames=0, where 0 = False; the driver names your fields F1, F2, and so forth. This option is not available in the DSN configuration dialog box.

However, due to a bug in the ODBC driver, specifying the FirstRowHasNames setting currently has no effect. In other words, the Excel ODBC driver (MDAC 2.1 and later) always treats the first row in the specified data source as field names. For additional informationon the Column Heading bug, click the article number below to view the article in the Microsoft Knowledge Base:
288343 BUG: Excel ODBC Driver Disregards the FirstRowHasNames or Header Setting

Rows to Scan: Excel does not provide ADO with detailed schema information about the data it contains, as a relational database would. Therefore, the driver must scan through at least a few rows of the existing data in order to make an educated guess at the data type of each column. The default for “Rows to Scan” is eight (8) rows. You can specify an integer value from one (1) to sixteen (16) rows, or you can specify zero (0) to scan all existing rows. This is done by adding the optional MaxScanRows= setting to the connection string, or by changing the Rows to Scan setting in the DSN configuration dialog box.

However, due to a bug in the ODBC driver, specifying the Rows to Scan (MaxScanRows) setting currently has no effect. In other words, the Excel ODBC driver (MDAC 2.1 and later) always scans the first 8 rows in the specified data source in order to determine each column’s datatype.

Advertisements

About davidbridge

I am a contractor for David Bridge Technology Limited specialising in database design and Web development using Microsoft technologies such as c#, MVC .net and SQL Server (full stack)
This entry was posted in Developer stuff, SQL Stuff and tagged , , , , , , , . Bookmark the permalink.

3 Responses to SSIS export to Excel – Data type issue – “Number Stored as Text”

  1. Peter Midgley says:

    This is very useful, thanks.
    Regarding the hidden template row, I accept we cannot delete it, and it all depends on how much of an issue it is for the end user, but 2 rows could be used, with values that net to 0; however if there needs to be nothing there, I have found (using xlsx and the oledb.ace driver) is that the ‘hidden template’ row can be updated to contain nulls, if you really don’t want to pass it on to the end user. This is done via a sql task, connecting to the same xlsx connection and running an update statement such as
    update [Sheet1$] set [mycol1]=null, [mycol2]=null where [mycol1]=’deletethis’

  2. bluenectar says:

    This post very helpful. Thanks David for the tricks. I’ve already googling for hours and finally got this blog.

  3. zlatenika says:

    This post is really helpful. Thank you!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s