rbUpdate small problem

I have now rbUpdate “work”.

When i run it DelayedWrites false, all records are processed, but no changes are written to the Words table.
When i run it DelayedWrites true, the SQL cursor is busy in the loop. But i get a duplicate index exception from the DB for the RBWords index.

Also, after the loop is finished, i get FireDAC exceptions when trying to close the application. I have missed something after the loop has finished?

Here is the code:
When we come here, the DB has been opened.
No other operations has been done.

      rbFLUpdate.Initialize;
      FDQueryText.SQL.Text :=
        'select id, search_meta_text, search_content_text ' +
        'from documents where id >= ' + IntToStr(FStartLoc);
        // Don't call Query1.Open, Open via the TextLink!!
      rbTextFireDACLink.Open;
      FDQueryText.First;
      while (not FDQueryText.EOF) do
      begin
        rbFLUpdate.BatchAdd;
        cxMemo1.Lines.Add(FDQueryText.Fields[0].AsString);  // Test log
        Application.ProcessMessages; // Only for test
        FDQueryText.Next;
      end;
          // if rbUpdate1.DelayedWrites is True, flush the cache
      if rbFLUpdate.DelayedWrites then
        rbFLUpdate.FlushCache;

Any ideas?

I have tried using a parameter like in one of the samples. No difference.

TIA,

/Dany

Hi Dany

Do you see any reason that we cannot duplicate this problem using a different database than “documents” ? What field types are search_meta_text and search_content_text? We have RailAccident handy in Firebird SQL format and it has plenty of long character fields.

I have run into Duplicate Index errors recently with FireDAC and SQLIte3 when I had invalid charset issues, where two words that I saw as different turned out the same to SQLite3. I would not expect that with Firebird if you are connecting with charset UTF8.

What controls the transaction object? Is there any chance this is acting with two independent transactions?

Sorry not more helpful yet.

Ann

PS. Lajos has tested BatchAdd with SQLite, Mysql and Postgresql using these sample datasets: City and the big Railaccident.
Sample data for Firebird RailAccident can be found here :
https://data.rubicon.href.com/RBSampleData/FirebirdSQL/index.html

PPS. Lajos has been working on making his test with your data quickly and easily reproducible. This is i process, will take a few more days I think.

I suspect it is related to character set too. Two reasons; Because logically a duplicate index should not be possible from the DB. If i feed the exact same record to batchAdd, worst case is that the words will be incorrectly ranked or that the bit engine throws an error. And; i do not see the same word in a Made index with that data included, as shown in the DB exception.

So, if my clients leave me some time today, i’ll switch everything to IBObjects where i have the sources. That way if the problem persists i can trace into the DAC and see exactly why.

Ok. We have source to FireDAC with our version of Delphi. I didn’t realize you did not. Good point.

No, unfortunately FireDAC sources starts at Enterprise (?). I am on Professional. Previously you could by it as an add-on, but Emba removed that option. :frowning:

I have now tried the rbBatchAdd sample using the IBO bridge.
I immediately get “Cannot modify a read-only dataset”, EDatabaseError exception.

FYI: The IBO link it seems the SQL is

SELECT * FROM documents_fts
WHERE rbWord = :rbWord

When the exception is raised from TrbCustomUpdate.iUpdInsDel in the call to WordLink.Edit.

I’ll put some more time into FireDAC but w/o sources it is very frustrating work.
Also embas docwiki is down/flaky atm.

The fields are text blobs (memos). Plain text, but very noisy and international. UTF8 charset. The DB charset is UTF8.

I have no expirience of FireDAC transactions. I removed the transaction object, no change.
I am not sure about the issue have something to do with character sets.
Error appear on words with “special” chars only, but not 100% consistently.
I mean, yes, the same places each run, but these chars exists in the table in other places.

Should i try to do something with the FD bridge “transaction” property?

Usually I try to make sure that all the queries use the same transaction instance, whichever that is. If your project creates its own query for the text or anything, hook it up to the same one as the query-to-words.

What are the params on your TFDConnection object? Are you sure the charset has been set?

Our RecipeText table has plenty of non-english content as varchar(4096).

Here is the latest utility for downloading master copies of the sample data:
https://data.rubicon.href.com/RbcResetSampleData.exe

Use that to replace any prior copy in r_setup.

Please also find your copy of FDConnectionDefs.ini located under the r_samples\config_ini folder. You will need to add this or svn update to get the adjustments from subversion:

[FB_3_0.Recipe]
DriverID=FB_3_0
Database=D:\Rubicon_Sample_Data\FB_3_0\Recipe.FDB
User_Name=sysdba
Password=masterkey
CharacterSet=UTF8
ExtendedMetadata=False

If you have CodeSite installed, run CSDispatcher first so that you’ll get more clues as to activity.

RbcResetSampleData.exe was compiled for win64. Run it, choose FB_3 and choose the 4th sample which is the Recipe database. Tick [x] Download First. It will take a while to run. Please note that there will be one extra confirmation dialog after the binary files (*.fdmem) download, to make sure you have FB_3 running. The CSLiveViewer window can cover that dialog.

Let me know if you run that and successfully end up with a large RecipeText file in FB_3.

1. Some of the questions above

Using IBObjects my transaction handling is a bit more convoluted than “just one transaction”. For test projects, however, i only place transaction objects if i am testing multiple transaction handling. So no extra transactions besides the one(s) FireDAC will auto-create.

In IBObjects the connection setting of “CharSet” is only used when creating databases. When opening a database, this information is present and read from the DB. I have no idea about FireDAC but of course i have looked through every single property of every component.

You have pointed me at CodeSite previously. However, i rather trace along, it is as complicated as trying to locate inconsistencies in huge logs that i am not familiar with. What would be better using logging is that the Alexandria debugger chokes a lot of the times. If you have breakpoints and just skipped 121 hits to get to the right instance, it’s enormously frustrating…

To download sample data and set it up i need to check everything around that data too, so i do not feel i will save any time.

2. Problem solved for me

After tracing and switching FireDAC/IBObjects/FireDAC i have found the offending line. In TrbCustomWordsFireDACLink.GotoWord at the 16th line (@626) there is this:

        Params[0].DataType := ftString;  // D21 wants to know the DataType

in a couple of other places, the rbWord field’s DataType is overwritten, but with ftWideString. When tracing to the next line i could see in the debugger that the word had been “normalized” after that assignment.

This was logical because i could see (after catching the DB exception, and logging the words) that some words where only missing while some had MORE words in the rbBatchAdd:ed list. That rbBatchAdd would find more words than rbMake was a conundrum.

So i changed to

        Params[0].DataType := ftWideString;  // D21 wants to know the DataType

and now i do not get any duplicate index error when running batchAdd!

Also running make for location 1-1000 gives axactly the same checksums as running rmMake on 1-899 and rbBatchAdd on 900-1000. Finally!

So to answer the last question, no i do not think you would have been able to reproduce the error with other data.

In order to really pin down what is going on here one would need FireDAC sources and/or acceptable documentation (like the docwiki). I would suspect a bug in FireDAC, perhaps something to do with the client library.

The offending Characters did not offend at every character position, some only first and some only in the middle. I do not have time to do speculate, but it smells charset conversion. Here are some of the erroneously “normalized” words:

ČLANOVI, POVEĆALI, IZMEĐU, POMOĆ, ODREĐENIH, ŠESTOMESEČNOM

Hi Dany, As I see You have issues with Serbian Latin chars
Have I right?

Are those Č, Đ, Č… chars in documents table OK? Please write me some rows, when I can find in blob fields word with Serbian chars. My mother language is Hungarian, but I live in Serbia and speak almost fluently Serbian too.

Lajos,

Yes, i know, you wrote about your location previously.
My father had many Hungarian friends when i grew up!
Hungarian is a very interesting language, IMHO.

It seems to start at 19916 if the make has been done up to 19899.
If you run rbBatchUpdate on a DB where rmMake is at another “location”, then maybe those words was not included, and a “duplicate” will not happen.

When at FDQuery.Params[0].AsString := InWord; my debugger shows the InWord correctly. If i hover over FDQuery.Params[0].AsString after F8, the debugger shows the word w/o diacrits. Very very odd.

I see FireDAC has an “embedded” property, i cannot find documentation atm.
I am using a 32 bit fb 3.0 client library. This might be pertinent information.

One could speculate as to whether these characters are in a different UTF8 “interval” thus having more bytes per character but again, w/o sources it is not much use.

Hi Dany,

Did You download Recipe.fdb database what Ann sent? Please write me some rows number in Rubicon_Append db, where in blob fields are Serbian chars (Č,Š,Đ…). In Recipe.fdb is CharSet for fdb UTF8, but Hungarian and Russian chars are bad. What to do. In Your Index files are those Serbian chars right?
In SQLite, Mysql, Postgresql is all right with Serbian, Hungarian, Russian and other UTF8 chars, only in Firebird is not good, what to do?

No, i have not downloaded other databases. I do not see how that would be beneficial to me.

Hi Dany,
Which Charset You are using for Firebird and which database Manager program ?

UTF8. This is still all done on the MVP. Exactly the same.

I use several. But not for this job.
For this job i do not need a manager.

I have used:

FalmeRobin, DatabaseWorkbench, IBExpert, IBObjects Manager, Firebird Editor Pro and a few more. But i do not need a manager for this project.

This is not the first time you ask, so i have to ask, what does it matter?

Hi Dany,

I asked because when I with Edit Blob I went into blob field and add some Unicode words like “Lajos”, “Dianna”…it is all right with Rubicon make and it is founded with Rubicon search. If I try to add Serbian word like “Đorđević” the blob field is crashed, there is some error.

I have not edited any fields manually like that.

So you mean that you have reproduced my problem on your side?
That is great news!
Did my fix above work?

Hi Dany,
I reproduced Your problem. First I have to talk about with Ann. She wanted to work with Recipe db. I wanted to see, how Rubicon Search works with Your database with Serbian chars, but I can’t write into blob field those words like Đorđević, Žužič or alike to see after Rubicon Make, how works Rubicon Search. I have also issues with FB3 Recipe db with UTF8 Hungarian, Russian and other no English words. I wanted test BatchAdd with Firebird too. In SQLite, Postgresql it works fine, but with FB I have issues. More tomorrow