System.Data.SQLite

View Ticket
Login
Ticket Hash: 2bfd91266ad9202517f7cb798c54c680f85e6278
Title: CPU is burned when parallel work is done
Status: Open Type: Performance
Severity: Important Priority: Medium
Subsystem: Integration_Via_PInvoke Resolution: Under_Review
Last Modified: 2021-05-04 07:21:20
4.05 years ago
Created: 2021-05-02 22:24:56
4.05 years ago
Version Found In: newest and several years back
User Comments:
anonymous added on 2021-05-02 22:24:56:

The problem is hard to reproduce. It seems it gets easier the faster the CPU is. While a 10 years old i7 just doesn't scale well, a modern Ryzen or Xeon cpu falls back below single core performance on parallelism of 4 or higher! But still the full CPU is getting burned: The more CPUs you insert into the system the slower the total gets.

I hope I can attach the test case later - currently there is no button to do so.

The test case in abstract:

  • take any relational DB (with foreign keys) of size 50-100 MB and make 10 copies on a fast SSD (on a system where everything will fit into disc cache) *)
  • with increasing parallelism (concurrent threads): create a new thread and a private connection per thread to DB and execute "pragma foreign_key_check" (or a multi-join SQL with WHERE conditions on multiple tables) in a loop till 2 seconds have passed - count total loops

I compared many versions of System.Data.SQLite (SDS) and Microsoft.Data.Sqlite (MDS): All time the same: SDS has problems and MDS just works like a charm - so I'm pretty sure the problem is not located in native SQLite but somewhere in this wrapper; maybe the nested while(true)->for(;;)-loops - but I don't know for sure.

But this doesn't help me because my app consists of thousands of queries going through EntityFramework6 - there is no fast transition to EFcore. And SDS provides many additional features like Module for virtual tables, which I also use.

Concrete values (on an i7 with 4 cores (8 HT)) using a 65GB database using SDS 1 concurrent thread : 13 loops per thread -> 13 total (2s CPU time) 2 concurrent threads: 7-8 loops per thread -> 15 total (4s CPU time) 3 concurrent threads: 5 loops per thread -> 15 total (5.5s CPU time)

In contrast when using MDS: 1 concurrent thread : 13 loops per thread -> 13 total (2s CPU time) 2 concurrent threads: 13 loops per thread -> 26 total (4s CPU time) 3 concurrent threads: 12 loops per thread -> 36 total (6s CPU time)

(the i7 doesn't fall below single thread performance; I'll post values for a Ryzen7 tomorrow)

Note: Not every query performs bad, a simple "select * from Table limit 10" might scale well.

Feel free to contact me by mail. This is a very catastrophic issue for my app and solving this may be useful for many others, too.

*) I noticed this 4 thread-limit years ago, but only worked against one single DB at that time, where it wasn't a real problem. To my shame I never investigated further till now: The test case uses one private connection to one private DB(copy) and it still doesn't scale.. well, the CPU usage scales, keeps at 100% while the total work done decreases the more parallel threads are working (each using its own db); I assume there is some spin-lock which spins far too much.


mistachkin added on 2021-05-02 23:30:57:
Do you know where the excess CPU is being consumed?  Is it for some busy
waits, e.g. in SQLite3.Prepare, etc?

What is the version of Microsoft.Data.Sqlite are you using?  Was it obtained
from NuGet?

I assume, based on the version of System.Data.SQLite, that you are using the
version from NuGet verbatim?  Is that correct?  What is the target framework
for your project?  Also, is the processor architecture x86 or x64?

mistachkin added on 2021-05-02 23:41:55:

Since this is a longstanding issue, it will NOT block the 1.0.114.0 release.


anonymous added on 2021-05-03 07:53:33:
All threads are always at the same position when I pause the debugger:
- SDS.UnsafeNativeMethods.sqlite3_step (stmt)
- (Unmanaged code)
- SDS.SQLite3.Step (stmt)
- SDS.SQLiteDataReader.NextResult()

This is also true for the queries which run fast.

I tried multiple versions of each MDS (6-preview3, 5.0.5, 3.1.14) and SDS downto 1.0.111 : 3.1.14 and 1.0.111 even are using the same SQLite 3.28.0 version.
All from NuGet.org (just see the attached .linq files) - architecture is always x64 and I tried both netcore3.1 and net5.

I'll attach two screenshots which show the difference between SDS and MDS on a Ryzen7 8core/16HT machine (Intel Xeons behaving the same) - exactly same DBs and query, everything loaded into RAM before.

anonymous added on 2021-05-03 11:28:22:

When I break into MDS while all threads are busy, all have this callstack, where "(Unmanaged code)" is always on top (while SDS always has one stack frame on top of "(Unmanaged code)" - this might be either more debug symbols, or some kind of callback?

  • (Unmanaged code)
  • M.D.S.SqliteDataReader.NextResult()
  • M.D.S.SqliteCommand.ExecuteReader (behavior)

mistachkin added on 2021-05-03 20:31:15:
The output from "PRAGMA compile_options" may be revealing here as it sounds
like the core native library may have been compiled with different options.

anonymous added on 2021-05-03 21:42:52:
SDS: msvc-1900
MDS: msvc-1922 (3.1.14) and msvc-1928 (6.0.0-preview3)

mistachkin added on 2021-05-04 01:26:17:
Is that really the complete lists of compile options?  Perhaps that is only
the first one?  There should be many more than that, even in the simplest of
build configurations.

anonymous added on 2021-05-04 07:21:20:
Oops, sorry, I only executed a scalar and thought the number is a bit-flag combination.

latest MDS (SQLitePCLRaw.lib.e_sqlite3 2.0.5-pre20210119130047):
COMPILER=msvc-1928 
DEFAULT_FOREIGN_KEYS 
ENABLE_COLUMN_METADATA 
ENABLE_FTS3_PARENTHESIS 
ENABLE_FTS4 
ENABLE_FTS5 
ENABLE_JSON1 
ENABLE_RTREE 
THREADSAFE=1 

SDS 1.0.113.7:
COMPILER=msvc-1900 
ENABLE_API_ARMOR 
ENABLE_COLUMN_METADATA 
ENABLE_DBSTAT_VTAB 
ENABLE_FTS3 
ENABLE_LOAD_EXTENSION 
ENABLE_MEMORY_MANAGEMENT 
ENABLE_PREUPDATE_HOOK 
ENABLE_RTREE 
ENABLE_SESSION 
ENABLE_STAT4 
ENABLE_STMTVTAB 
MAX_ATTACHED=30 
SOUNDEX 
THREADSAFE=1 
USE_URI 
WIN32_MALLOC