System.Data.SQLite

Login
This project makes use of Eagle, provided by Mistachkin Systems.
Eagle: Secure Software Automation
2015-03-04
19:27 Closed ticket [e122d26e70]: Is the call to ColumnAffinity in SQLiteDataReader.GetSQLiteType(SQLiteConnectionFlags flags, int i) needed? plus 2 other changes artifact: 2d19ff718c user: mistachkin
2014-11-18
22:12 Fixed ticket [e122d26e70]. artifact: 22319572f7 user: mistachkin
22:08 Ticket [e122d26e70]: 4 changes artifact: e567a35f34 user: mistachkin
2014-10-08
20:56 Ticket [e122d26e70]: 3 changes artifact: 2eb926d64a user: mistachkin
18:49
Various minor performance enhancements to the SQLiteDataReader class. Pursuant to [e122d26e70]. check-in: e6cc5d000b user: mistachkin tags: trunk
2014-10-07
18:06 Pending ticket [e122d26e70]: Is the call to ColumnAffinity in SQLiteDataReader.GetSQLiteType(SQLiteConnectionFlags flags, int i) needed? plus 4 other changes artifact: d020fdfed8 user: mistachkin
17:55 Ticket [e122d26e70]: 3 changes artifact: 668a4c8e8e user: mistachkin
17:52
More performance adjustments to the data reader class. Pursuant to [e122d26e70]. Closed-Leaf check-in: 53e8310fda user: mistachkin tags: experimental
2014-10-06
18:21 Ticket [e122d26e70] Is the call to ColumnAffinity in SQLiteDataReader.GetSQLiteType(SQLiteConnectionFlags flags, int i) needed? status still Open with 3 other changes artifact: 80887650b4 user: mistachkin
2014-10-02
10:23 Ticket [e122d26e70]: 4 changes artifact: 8948e652c8 user: anonymous
2014-09-30
14:05 Ticket [e122d26e70]: 3 changes artifact: 9954618b3d user: mistachkin
14:03
More minor performance enhancements. Pursuant to [e122d26e70]. check-in: 684f381cd4 user: mistachkin tags: experimental
2014-09-29
16:31 Ticket [e122d26e70] Is the call to ColumnAffinity in SQLiteDataReader.GetSQLiteType(SQLiteConnectionFlags flags, int i) needed? status still Open with 4 other changes artifact: abb633082d user: anonymous
2014-09-26
04:02 Ticket [e122d26e70]: 3 changes artifact: 7c27eb00e6 user: mistachkin
03:39 Ticket [e122d26e70]: 3 changes artifact: 5dc7e34bb1 user: mistachkin
2014-09-25
17:37 Ticket [e122d26e70]: 4 changes artifact: 4d8cf6f615 user: anonymous
2014-09-24
05:57 Ticket [e122d26e70]: 3 changes artifact: 7c430af946 user: mistachkin
05:56 Ticket [e122d26e70]: 3 changes artifact: 8c06191c41 user: mistachkin
04:15
Experimental changes to remove superfluous calls from the SQLiteDataReader class. These changes are not yet completely correct. Pursuant to [e122d26e70]. check-in: ab425e79d7 user: mistachkin tags: experimental
2014-09-23
14:22 Ticket [e122d26e70] Is the call to ColumnAffinity in SQLiteDataReader.GetSQLiteType(SQLiteConnectionFlags flags, int i) needed? status still Open with 4 other changes artifact: bc4a2d084a user: anonymous
2014-09-22
19:42 Ticket [e122d26e70]: 6 changes artifact: 6359a77cd4 user: mistachkin
17:09 New ticket [e122d26e70]. artifact: 65fdaa03a7 user: anonymous

Ticket Hash: e122d26e70f7d9d4d05f06d2f72ac71d4b03711c
Title: Is the call to ColumnAffinity in SQLiteDataReader.GetSQLiteType(SQLiteConnectionFlags flags, int i) needed?
Status: Closed Type: Performance
Severity: Important Priority: Blocker
Subsystem: Data_Reader Resolution: Fixed
Last Modified: 2015-03-04 19:27:30
Version Found In: trunk
User Comments:
anonymous added on 2014-09-22 17:09:05:
The virtual function call _activeStatement._sql.ColumnAffinity(_activeStatement, i) in SQLiteDataReader.GetSQLiteType(SQLiteConnectionFlags flags, int i) showed up in our perf profiles.
Now, I'm wondering why the call is needed here. Shouldn't the type affinity already be set if the type is being 
retrieved from the _fieldTypeArray cache? IOW, why is typ.Affinity reset if it is != TypeAffinity.Uninitialized?

If the call is not needed an alternative implementation could look like this (totally untested code, may erase all your data)?

		private SQLiteType GetSQLiteType(SQLiteConnectionFlags flags, int i)
		{
			// Initialize the field types array if not already initialized
			if (_fieldTypeArray == null)
				_fieldTypeArray = new SQLiteType[VisibleFieldCount];

			if (_fieldTypeArray[i] != null)
				return _fieldTypeArray[i];

			var type = new SQLiteType();
			_fieldTypeArray[i] = type;
			var connection = GetConnection(this);
			var columnType = _activeStatement._sql.ColumnType(_activeStatement, i, out type.Affinity);
			type.Type = SQLiteConvert.TypeNameToDbType(connection, columnType, flags);
			return type;
		}

mistachkin added on 2014-09-22 19:42:05:
The call is present as far back as the check-in [67f18e8aaa] on 2007-10-01.

anonymous (claiming to be Alexander Sieb) added on 2014-09-23 14:22:38:
Some background. Our program is reading ~1.5M rows using the SQLiteDataReader which is 100% cpu bound. So we are
looking for ways to decrease the time spent in the data reader.

The costs are roughly like this:
GetDouble(100%)
  get_VisibleFieldCount(~25% of caller)
  GetDouble(~25%)
  VerifyType(~50%)
    CheckClosed(~35%)
    GetSQLiteType(~50%)
      ColumnAffinity(~80%)

So removing ColumnAffinity() should result in better perf. Alternatively, VerifyType() could cache its results per column index.

Btw, the call to CheckClosed() in VerifyType can be removed since all callers of VerifyType() seem to call it via VisibleFieldCount prior to calling VerifyType(). Also, CheckDisposed() is called twice in GetDouble(), GetInt64(), etc: First line and through VisibleFieldCount.

Thanks for your help!

mistachkin added on 2014-09-24 05:56:33:
Experimentation reveals that the call to ColumnAffinity in the
SQLiteDataReader.GetSQLiteType method is absolutely required as
its value can change on a per-row basis.

mistachkin added on 2014-09-24 05:57:27:
Several other minor optimizations have been checked in on the 'experimental'
branch.  Can you rebuild System.Data.SQLite from that branch and re-run your
benchmarks against it?

anonymous (claiming to be Alexander Sieb) added on 2014-09-25 17:37:25:
With System.Data.SQLite-8aa9c8c9e4 the perf profile is as follows (x86, i7 3770K):

GetDouble(100%)
  GetDouble(~27%)
  CheckClosed(~21%)
    ?JIT_GetField64@@..(~52%) - Access to _version which is of type long.
  VerifyType(~42%)
    GetSQLiteType(~91%)
      ColumnAffinity(~82%)

Overall perf increase seems to be about 3% compared to 1.0.94.0, but other changes
in sqlite itself could be a factor too. I'll try to compare 8aa9c8c9e4 to trunk without your
improvements get a clearer picture.

Is there a way to set _throwOnDisposed to false? It's internal and set to true in the ctor. 
CheckClosed() would return early without the 64 bit access to _version if we could set it to false.

FYI, inside SQLite the top five functions wrt. cpu load are:
 sqliteVdebExec
 memcpy
 sqlite3BtreeMovetoUnpacked
 _RtlLeaveCriticalSection
 _RtleEnterCriticalSection

mistachkin added on 2014-09-26 03:39:43:
Trust me when I say that the code is CheckClosed has been carefully tested to
make sure that native resources are not accessed after being freed and that
the previously fixed issues with resource cleanup stay fixed.

That being said, I *think* the _version field can be changed to simply "int",
which should speed things up a bit (thanks for pointing out it was a "long",
as that had not occurred to me before this point).

Given the highly dynamic nature of SQLite, the GetSQLiteType method basically
must call ColumnAffinity each time to make sure the correct type affinity
handling is used for a given column (and row).

mistachkin added on 2014-09-26 04:02:48:
I've updated the 'experimental' branch with the change of '_version' from long
to int.  Can you try this change and see if it improves the performance even
further?

anonymous (claiming to be Alexander Sieb) added on 2014-09-29 16:31:13:
So I made the following additional changes. 

1. I changed the last if in CheckClosed() to:
  SQLiteConnection cn = _command.Connection;
  if (cn._version != _version || cn.State != ConnectionState.Open)
    throw new InvalidOperationException("Connection was closed, statement was terminated");

 a) I assigned _command.Connection to a variable to remove the double access. The compiler/JIT won't (aren't allowed to) do that and it was visible in the profiler.
 b) I changed the condition to test _version first. It looks more fluent to me since _version is used
in the previous if statement. Also, _version should still be *hot* because of that.

With those changes the runtime of CheckClosed() is reduced from 202 Inclusive Samples (trunk) to 60 Inclusive Samples (experimental + above changes). So the change from long to int and the other changes certainly help.

2. I believe that double is used more often than float, so I made the test for DbType.Double the first test in VerifyType 'case TypeAffinity.Double:' to help branch prediction and to execute less code on average. The difference is small, but visible in my tests on an Intel 3770K.
            case TypeAffinity.Int64:
                if (typ == DbType.Int16) return affinity;
                if (typ == DbType.Int32) return affinity;
                if (typ == DbType.Int64) return affinity;
                if (typ == DbType.DateTime) return affinity;
                if (typ == DbType.Double) return affinity;
                if (typ == DbType.Single) return affinity;
                if (typ == DbType.Decimal) return affinity;
                if (typ == DbType.Boolean) return affinity;
                if (typ == DbType.SByte) return affinity;
                if (typ == DbType.Byte) return affinity;
                break;
            case TypeAffinity.Double:
 -->            if (typ == DbType.Double) return affinity;
                if (typ == DbType.Single) return affinity;
...

Compared to trunk I get about 100000 rows/s better read throughput (1000000 instead of 900000 rows/s).

mistachkin added on 2014-09-30 14:05:31:
I've made the changes on the experimental branch.

But now I'm wondering if there are more opportunities for optimization of the
VerifyType method, e.g. for the Int64 affinity.

anonymous (claiming to be Alexander Sieb) added on 2014-10-02 10:23:47:
I believe DbType.Int16 is less likely to occur in real world scenarios, so I would move it after DbType.Int64. 

   case TypeAffinity.Int64:
     if (typ == DbType.Int32) return affinity;
     if (typ == DbType.Int64) return affinity;
-->  if (typ == DbType.Int16) return affinity;
     if (typ == DbType.DateTime) return affinity;
     if (typ == DbType.Double) return affinity;
     if (typ == DbType.Single) return affinity;
     if (typ == DbType.Decimal) return affinity;
     if (typ == DbType.Boolean) return affinity;
     if (typ == DbType.SByte) return affinity;
     if (typ == DbType.Byte) return affinity;


Looking at GetSQLiteType again. The lazy init of _fieldTypeArray:

     // Initialize the field types array if not already initialized
     if (_fieldTypeArray == null)
       _fieldTypeArray = new SQLiteType[PrivateVisibleFieldCount];

could be moved to where _fieldCount is initialized in NextResult() (near line 1395), so that it isn't executed
for every single field access. This should be safe since _fieldCount has to be initialized before GetSQLiteType
is called. Also _fieldTypeArray is set to null in NextResult(), so instead we could initialize the array there:

     // Ahh, we found a row-returning resultset eligible to be returned!
     _activeStatement = stmt;
     _fieldCount = fieldCount;
     _fieldIndexes = null;
-->  _fieldTypeArray = new SQLiteType[fieldCount];


The other thing I was thinking about is if the following two if statements mean same thing:
    // Initialize this column's field type instance
    if (_fieldTypeArray[i] == null) _fieldTypeArray[i] = new SQLiteType();
    typ = _fieldTypeArray[i];

    // If not initialized, then fetch the declared column datatype and attempt to convert it
    // to a known DbType.
    if (typ.Affinity == TypeAffinity.Uninitialized)
    {

If typ is constructed as result of the first if statement typ.Affinity is TypeAffinity.Uninitialized.
TypeAffinity.Uninitialized is 0 and SQLiteBase.ColumnAffinity should never return 0. If that's true 
  if (typ.Affinity == TypeAffinity.Uninitialized) 
can be eliminated and its body joined with the first if statement. Plus, VerifyType would throw if typ.Affinity
is left uninitialized. 

My simplified version would look like this (untested):

    private SQLiteType GetSQLiteType(SQLiteConnectionFlags flags, int i)
    {
        SQLiteType typ;

        // _fieldTypeArray is initialized at the of NextResult()

        // If not initialized, then fetch the declared column datatype and attempt to convert it
        // to a known DbType.
        if (_fieldTypeArray[i] == null)
        {
            typ = new SQLiteType();
            _fieldTypeArray[i] = typ;
            typ.Type = SQLiteConvert.TypeNameToDbType(
                GetConnection(this), _activeStatement._sql.ColumnType(
                _activeStatement, i, out typ.Affinity), flags);
        }
        else
        {
            typ = _fieldTypeArray[i];
            typ.Affinity = _activeStatement._sql.ColumnAffinity(
                _activeStatement, i);
        }
        return typ;
    }

A more defensive version retaining the TypeAffinity.Uninitialized check would be:

    private SQLiteType GetSQLiteType(SQLiteConnectionFlags flags, int i)
    {
        SQLiteType typ;

        // Initialize this column's field type instance
        if (_fieldTypeArray[i] == null)
        {
            typ = new SQLiteType();
            // If not initialized, then fetch the declared column datatype and attempt to convert it
            // to a known DbType.
            typ.Type = SQLiteConvert.TypeNameToDbType(
                GetConnection(this), _activeStatement._sql.ColumnType(
                _activeStatement, i, out typ.Affinity), flags);
-->         if (typ.Affinity != TypeAffinity.Uninitialized)
-->             _fieldTypeArray[i] = typ;
        }
        else
        {
            typ = _fieldTypeArray[i];
            typ.Affinity = _activeStatement._sql.ColumnAffinity(
                _activeStatement, i);
        }
        return typ;
    }

And IMHO I would invert the if to have the common case in the if branch:

    private SQLiteType GetSQLiteType(SQLiteConnectionFlags flags, int i)
    {
        SQLiteType typ;
        if (_fieldTypeArray[i] != null)
        {
            typ = _fieldTypeArray[i];
            typ.Affinity = _activeStatement._sql.ColumnAffinity(_activeStatement, i);
        }
        else
        {
            typ = new SQLiteType();
            // If not initialized, then fetch the declared column datatype and attempt to convert it
            // to a known DbType.
            typ.Type = SQLiteConvert.TypeNameToDbType(
                GetConnection(this), _activeStatement._sql.
                ColumnType(_activeStatement, i, out typ.Affinity), flags);
            if (typ.Affinity != TypeAffinity.Uninitialized)
                _fieldTypeArray[i] = typ;   
        }
        return typ;
    }

I'll try to test the changes in the next days with our setup.

mistachkin added on 2014-10-06 18:21:54:
I'll try to evaluate these changes this week.

mistachkin added on 2014-10-07 17:55:06:
I've adjusted the code on the 'experimental' branch (see check-in [53e8310fda])
in order to simplify things.  I think this is the most optimization that can be
done to the GetSQLiteType method while still retaining the existing semantics.

mistachkin added on 2014-10-08 20:56:55:
The changes have now been merged to trunk.