System.Data.SQLite

Login
This project makes use of Eagle, provided by Mistachkin Systems.
Eagle: Secure Software Automation
Ticket Hash: e122d26e70f7d9d4d05f06d2f72ac71d4b03711c
Title: Is the call to ColumnAffinity in SQLiteDataReader.GetSQLiteType(SQLiteConnectionFlags flags, int i) needed?
Status: Closed Type: Performance
Severity: Important Priority: Blocker
Subsystem: Data_Reader Resolution: Fixed
Last Modified: 2015-03-04 19:27:30
Version Found In: trunk
User Comments:
anonymous added on 2014-09-22 17:09:05:
The virtual function call _activeStatement._sql.ColumnAffinity(_activeStatement, i) in SQLiteDataReader.GetSQLiteType(SQLiteConnectionFlags flags, int i) showed up in our perf profiles.
Now, I'm wondering why the call is needed here. Shouldn't the type affinity already be set if the type is being 
retrieved from the _fieldTypeArray cache? IOW, why is typ.Affinity reset if it is != TypeAffinity.Uninitialized?

If the call is not needed an alternative implementation could look like this (totally untested code, may erase all your data)?

		private SQLiteType GetSQLiteType(SQLiteConnectionFlags flags, int i)
		{
			// Initialize the field types array if not already initialized
			if (_fieldTypeArray == null)
				_fieldTypeArray = new SQLiteType[VisibleFieldCount];

			if (_fieldTypeArray[i] != null)
				return _fieldTypeArray[i];

			var type = new SQLiteType();
			_fieldTypeArray[i] = type;
			var connection = GetConnection(this);
			var columnType = _activeStatement._sql.ColumnType(_activeStatement, i, out type.Affinity);
			type.Type = SQLiteConvert.TypeNameToDbType(connection, columnType, flags);
			return type;
		}

mistachkin added on 2014-09-22 19:42:05:
The call is present as far back as the check-in [67f18e8aaa] on 2007-10-01.

anonymous (claiming to be Alexander Sieb) added on 2014-09-23 14:22:38:
Some background. Our program is reading ~1.5M rows using the SQLiteDataReader which is 100% cpu bound. So we are
looking for ways to decrease the time spent in the data reader.

The costs are roughly like this:
GetDouble(100%)
  get_VisibleFieldCount(~25% of caller)
  GetDouble(~25%)
  VerifyType(~50%)
    CheckClosed(~35%)
    GetSQLiteType(~50%)
      ColumnAffinity(~80%)

So removing ColumnAffinity() should result in better perf. Alternatively, VerifyType() could cache its results per column index.

Btw, the call to CheckClosed() in VerifyType can be removed since all callers of VerifyType() seem to call it via VisibleFieldCount prior to calling VerifyType(). Also, CheckDisposed() is called twice in GetDouble(), GetInt64(), etc: First line and through VisibleFieldCount.

Thanks for your help!

mistachkin added on 2014-09-24 05:56:33:
Experimentation reveals that the call to ColumnAffinity in the
SQLiteDataReader.GetSQLiteType method is absolutely required as
its value can change on a per-row basis.

mistachkin added on 2014-09-24 05:57:27:
Several other minor optimizations have been checked in on the 'experimental'
branch.  Can you rebuild System.Data.SQLite from that branch and re-run your
benchmarks against it?

anonymous (claiming to be Alexander Sieb) added on 2014-09-25 17:37:25:
With System.Data.SQLite-8aa9c8c9e4 the perf profile is as follows (x86, i7 3770K):

GetDouble(100%)
  GetDouble(~27%)
  CheckClosed(~21%)
    ?JIT_GetField64@@..(~52%) - Access to _version which is of type long.
  VerifyType(~42%)
    GetSQLiteType(~91%)
      ColumnAffinity(~82%)

Overall perf increase seems to be about 3% compared to 1.0.94.0, but other changes
in sqlite itself could be a factor too. I'll try to compare 8aa9c8c9e4 to trunk without your
improvements get a clearer picture.

Is there a way to set _throwOnDisposed to false? It's internal and set to true in the ctor. 
CheckClosed() would return early without the 64 bit access to _version if we could set it to false.

FYI, inside SQLite the top five functions wrt. cpu load are:
 sqliteVdebExec
 memcpy
 sqlite3BtreeMovetoUnpacked
 _RtlLeaveCriticalSection
 _RtleEnterCriticalSection

mistachkin added on 2014-09-26 03:39:43:
Trust me when I say that the code is CheckClosed has been carefully tested to
make sure that native resources are not accessed after being freed and that
the previously fixed issues with resource cleanup stay fixed.

That being said, I *think* the _version field can be changed to simply "int",
which should speed things up a bit (thanks for pointing out it was a "long",
as that had not occurred to me before this point).

Given the highly dynamic nature of SQLite, the GetSQLiteType method basically
must call ColumnAffinity each time to make sure the correct type affinity
handling is used for a given column (and row).

mistachkin added on 2014-09-26 04:02:48:
I've updated the 'experimental' branch with the change of '_version' from long
to int.  Can you try this change and see if it improves the performance even
further?

anonymous (claiming to be Alexander Sieb) added on 2014-09-29 16:31:13:
So I made the following additional changes. 

1. I changed the last if in CheckClosed() to:
  SQLiteConnection cn = _command.Connection;
  if (cn._version != _version || cn.State != ConnectionState.Open)
    throw new InvalidOperationException("Connection was closed, statement was terminated");

 a) I assigned _command.Connection to a variable to remove the double access. The compiler/JIT won't (aren't allowed to) do that and it was visible in the profiler.
 b) I changed the condition to test _version first. It looks more fluent to me since _version is used
in the previous if statement. Also, _version should still be *hot* because of that.

With those changes the runtime of CheckClosed() is reduced from 202 Inclusive Samples (trunk) to 60 Inclusive Samples (experimental + above changes). So the change from long to int and the other changes certainly help.

2. I believe that double is used more often than float, so I made the test for DbType.Double the first test in VerifyType 'case TypeAffinity.Double:' to help branch prediction and to execute less code on average. The difference is small, but visible in my tests on an Intel 3770K.
            case TypeAffinity.Int64:
                if (typ == DbType.Int16) return affinity;
                if (typ == DbType.Int32) return affinity;
                if (typ == DbType.Int64) return affinity;
                if (typ == DbType.DateTime) return affinity;
                if (typ == DbType.Double) return affinity;
                if (typ == DbType.Single) return affinity;
                if (typ == DbType.Decimal) return affinity;
                if (typ == DbType.Boolean) return affinity;
                if (typ == DbType.SByte) return affinity;
                if (typ == DbType.Byte) return affinity;
                break;
            case TypeAffinity.Double:
 -->            if (typ == DbType.Double) return affinity;
                if (typ == DbType.Single) return affinity;
...

Compared to trunk I get about 100000 rows/s better read throughput (1000000 instead of 900000 rows/s).

mistachkin added on 2014-09-30 14:05:31:
I've made the changes on the experimental branch.

But now I'm wondering if there are more opportunities for optimization of the
VerifyType method, e.g. for the Int64 affinity.

anonymous (claiming to be Alexander Sieb) added on 2014-10-02 10:23:47:
I believe DbType.Int16 is less likely to occur in real world scenarios, so I would move it after DbType.Int64. 

   case TypeAffinity.Int64:
     if (typ == DbType.Int32) return affinity;
     if (typ == DbType.Int64) return affinity;
-->  if (typ == DbType.Int16) return affinity;
     if (typ == DbType.DateTime) return affinity;
     if (typ == DbType.Double) return affinity;
     if (typ == DbType.Single) return affinity;
     if (typ == DbType.Decimal) return affinity;
     if (typ == DbType.Boolean) return affinity;
     if (typ == DbType.SByte) return affinity;
     if (typ == DbType.Byte) return affinity;


Looking at GetSQLiteType again. The lazy init of _fieldTypeArray:

     // Initialize the field types array if not already initialized
     if (_fieldTypeArray == null)
       _fieldTypeArray = new SQLiteType[PrivateVisibleFieldCount];

could be moved to where _fieldCount is initialized in NextResult() (near line 1395), so that it isn't executed
for every single field access. This should be safe since _fieldCount has to be initialized before GetSQLiteType
is called. Also _fieldTypeArray is set to null in NextResult(), so instead we could initialize the array there:

     // Ahh, we found a row-returning resultset eligible to be returned!
     _activeStatement = stmt;
     _fieldCount = fieldCount;
     _fieldIndexes = null;
-->  _fieldTypeArray = new SQLiteType[fieldCount];


The other thing I was thinking about is if the following two if statements mean same thing:
    // Initialize this column's field type instance
    if (_fieldTypeArray[i] == null) _fieldTypeArray[i] = new SQLiteType();
    typ = _fieldTypeArray[i];

    // If not initialized, then fetch the declared column datatype and attempt to convert it
    // to a known DbType.
    if (typ.Affinity == TypeAffinity.Uninitialized)
    {

If typ is constructed as result of the first if statement typ.Affinity is TypeAffinity.Uninitialized.
TypeAffinity.Uninitialized is 0 and SQLiteBase.ColumnAffinity should never return 0. If that's true 
  if (typ.Affinity == TypeAffinity.Uninitialized) 
can be eliminated and its body joined with the first if statement. Plus, VerifyType would throw if typ.Affinity
is left uninitialized. 

My simplified version would look like this (untested):

    private SQLiteType GetSQLiteType(SQLiteConnectionFlags flags, int i)
    {
        SQLiteType typ;

        // _fieldTypeArray is initialized at the of NextResult()

        // If not initialized, then fetch the declared column datatype and attempt to convert it
        // to a known DbType.
        if (_fieldTypeArray[i] == null)
        {
            typ = new SQLiteType();
            _fieldTypeArray[i] = typ;
            typ.Type = SQLiteConvert.TypeNameToDbType(
                GetConnection(this), _activeStatement._sql.ColumnType(
                _activeStatement, i, out typ.Affinity), flags);
        }
        else
        {
            typ = _fieldTypeArray[i];
            typ.Affinity = _activeStatement._sql.ColumnAffinity(
                _activeStatement, i);
        }
        return typ;
    }

A more defensive version retaining the TypeAffinity.Uninitialized check would be:

    private SQLiteType GetSQLiteType(SQLiteConnectionFlags flags, int i)
    {
        SQLiteType typ;

        // Initialize this column's field type instance
        if (_fieldTypeArray[i] == null)
        {
            typ = new SQLiteType();
            // If not initialized, then fetch the declared column datatype and attempt to convert it
            // to a known DbType.
            typ.Type = SQLiteConvert.TypeNameToDbType(
                GetConnection(this), _activeStatement._sql.ColumnType(
                _activeStatement, i, out typ.Affinity), flags);
-->         if (typ.Affinity != TypeAffinity.Uninitialized)
-->             _fieldTypeArray[i] = typ;
        }
        else
        {
            typ = _fieldTypeArray[i];
            typ.Affinity = _activeStatement._sql.ColumnAffinity(
                _activeStatement, i);
        }
        return typ;
    }

And IMHO I would invert the if to have the common case in the if branch:

    private SQLiteType GetSQLiteType(SQLiteConnectionFlags flags, int i)
    {
        SQLiteType typ;
        if (_fieldTypeArray[i] != null)
        {
            typ = _fieldTypeArray[i];
            typ.Affinity = _activeStatement._sql.ColumnAffinity(_activeStatement, i);
        }
        else
        {
            typ = new SQLiteType();
            // If not initialized, then fetch the declared column datatype and attempt to convert it
            // to a known DbType.
            typ.Type = SQLiteConvert.TypeNameToDbType(
                GetConnection(this), _activeStatement._sql.
                ColumnType(_activeStatement, i, out typ.Affinity), flags);
            if (typ.Affinity != TypeAffinity.Uninitialized)
                _fieldTypeArray[i] = typ;   
        }
        return typ;
    }

I'll try to test the changes in the next days with our setup.

mistachkin added on 2014-10-06 18:21:54:
I'll try to evaluate these changes this week.

mistachkin added on 2014-10-07 17:55:06:
I've adjusted the code on the 'experimental' branch (see check-in [53e8310fda])
in order to simplify things.  I think this is the most optimization that can be
done to the GetSQLiteType method while still retaining the existing semantics.

mistachkin added on 2014-10-08 20:56:55:
The changes have now been merged to trunk.