Meego Wiki
Views

Architecture/planning/evolution-data-server/eds-improvements

From MeeGo wiki
< Architecture | planning/evolution-data-server(Difference between revisions)
Jump to: navigation, search
(client parsing in libical is not actually necessary. While mplementing KCal-EDS, Patrick Ohly found that the parsing of iCalendar 2.0 data into libical can be done in libecal without causing overhead)
(More Efficient Access to Meta Data for Change Tracking)
 
(32 intermediate revisions not shown)
Line 1: Line 1:
This page describes improvements that are desirable or even essential for good performance of Evolution Data Server (EDS) as PIM storage in MeeGo.
This page describes improvements that are desirable or even essential for good performance of Evolution Data Server (EDS) as PIM storage in MeeGo.
-
All of these changes should be developed for the EDS master branch, reviewed by upstream maintainers and then back-ported to EDS 2.32, the version in MeeGo.
+
All of these changes should be developed for the evolution-data-server gnome-2-32 branch, as used by MeeGo 1.2, then ported to master for review by upstream maintainers. Most changes will not be accepted in upstream's gnome-2-32 branch, because is is stable, but Meego will apply them as patches to its package.
-
== content protection ==
+
See the list of [https://bugzilla.gnome.org/buglist.cgi?status_whiteboard_type=allwordssubstr;query_format=advanced;field0-0-0=product;status_whiteboard=meego;bug_status=UNCONFIRMED;bug_status=NEW;bug_status=ASSIGNED;bug_status=REOPENED;bug_status=NEEDINFO;type0-0-0=substring;value0-0-0=Evolution-Data-Server "meego" bugs in the upstream evolution-data-server bugzilla].
-
Only authorized applications (= processes) with the necessary permissions should be able to read/write data. Exact permissions and mechanisms for verifying them to be defined as part of the (revised?) MeeGo architecture.
+
== Change Tracking ==
-
 
+
-
This can be added as follows:
+
-
# lock down file permissions
+
-
# add permission checks to EDS D-Bus servers
+
-
 
+
-
== change tracking ==
+
EDS is based on the model/view/controller principle: an application opens a database (calendar or addressbook) and requests a view with server-side filtering, then the application is sent:
EDS is based on the model/view/controller principle: an application opens a database (calendar or addressbook) and requests a view with server-side filtering, then the application is sent:
Line 23: Line 17:
TODO:
TODO:
 +
* https://bugs.meego.com/show_bug.cgi?id=18760 - "QtContacts-EDS + EDS: more flexible EBookView (UID only, only future changes)"
-
=== Suppress unnecessary data transfer ===
+
=== Suppress Unnecessary Data Transfer ===
EDS currently sends all data when a view is requested, which is unnecessary when the client is only interested in change notifications. An extension to the EDS query language could optionally suppresses that sending of existing data when the view is opened, to be implemented only for the local contacts and calendar backends. This would allow easier integration with the QContact API.
EDS currently sends all data when a view is requested, which is unnecessary when the client is only interested in change notifications. An extension to the EDS query language could optionally suppresses that sending of existing data when the view is opened, to be implemented only for the local contacts and calendar backends. This would allow easier integration with the QContact API.
Line 34: Line 29:
* in the server: "if (query is OR-Query and first clause is exists("X-EDS-QUERY-FLAG-SKIP-DATA-DUMP")) { skip_data_dump = TRUE; query = second clause; }
* in the server: "if (query is OR-Query and first clause is exists("X-EDS-QUERY-FLAG-SKIP-DATA-DUMP")) { skip_data_dump = TRUE; query = second clause; }
-
=== Return UIDs only ===
+
[https://bugzilla.gnome.org/show_bug.cgi?id=652171 Upstream evolution-data-server bug]
-
The e_book_get_book_view() function takes a list of requested fields, to avoid returning unnecessary data, but this is apparently ignored for the local contacts backend. It would anyway require extra parsing and generation of smaller VCards, losing some possible advantages from delayed parsing of VCards.
+
=== Notify with UIDs Only ===
-
However, the UID is already stored separately for use as the primary key of the BDB database, so it would be worthwhile to support that particular requested field. This would allow easier integration with the QContact API. It would also offer much better performance in cases where the client already has the data but just wants to discover what subset of that data meets a certain criteria.
+
EBookView emits signals when contacts are changed, added, or removed. But those signals sends lists of entire EContacts, containing the entire contact data, though we often have all the data already and just want to know which contacts have changed.
-
== optional parsing of ebook data in client ==
+
[http://developer.gnome.org/libebook/stable/EBook.html#e-book-get-book-view e_book_get_book_view()] takes an [http://developer.gnome.org/libebook/stable/libebook-e-book-query.html#EBookQuery EBookQuery] which could be extended to take a new flag to make the EBookView send only the UID in the signals' EContacts. For instance, "X-EDS-QUERY-FLAG-NOTIFY-UID-ONLY". The UID is already stored separately for use as the primary key of the local-storage backend's BDB database, so no VCard parsing should be necessary to provide it.
 +
 
 +
This would allow easier integration with the QContact API, greatly improving performance.
 +
 
 +
Note that users of this feature should probably use just the base EVCard API on the provided EContact objects, to avoid extra processing and caching in the EContact code.
 +
 
 +
Note: We previously considered implementing e_book_get_book_view()'s requested fields parameter, but that changes the whole view, though we only really want to change what is sent for changed/new contacts.
 +
 
 +
Note: Chris discovered a dedicated e_book_get_contact_uids() API in the Fremantle EDS fork. The new API call was accepted upstream (in master) and [https://bugzilla.gnome.org/show_bug.cgi?id=651446#c11 backported as a patch] (used in the Meego package) for the gnome-2-32 branch, but this is not a full solution.
 +
 
 +
[https://bugzilla.gnome.org/show_bug.cgi?id=652172 Upstream evolution-data-server bug] (DONE)
 +
 
 +
== Optional Parsing of ebook Data in Client ==
 +
 
 +
The EDS client-side APIs currently parse and (temporarily) store the parsed details of VCards. However, this is often unnecessary because APIs such as e_book_get_book_view() cause it to return the whole VCard anyway. There will be some performance gain from delaying VCard parsing until it is really needed by calls to functions such as e_contact_new_from_vcard().
libebook provides different functionality:
libebook provides different functionality:
Line 59: Line 68:
# Should delayed parsing be done always (e_vcard_to_string())) or only if requested? How would that be configured by a libebook user?
# Should delayed parsing be done always (e_vcard_to_string())) or only if requested? How would that be configured by a libebook user?
-
== more efficient access to meta data for change tracking + atomic updates ==
+
An [http://build.meego.com/package/view_file?file=EBookView-Implement-delayed-vcard-parsing.patch&package=evolution-data-server&project=eds&srcmd5=977d2b885bed818cb616599749407e79 e_book_view_set_parse_vcards() patch], already used in the Meego package, partly implements this already. This patch was taken from an old fremantle EDS patch that was never actually used, and is generally considered to be a bad hack that should not be accepted upstream.
 +
 
 +
[https://bugzilla.gnome.org/show_bug.cgi?id=652173 Upstream evolution-data-server bug]
 +
 
 +
== Asynchronous API for storing items in libecal ==
 +
 
 +
libecal should allow the caller to request storing of data without blocking on processing of the data. It should notify the caller when the data has been stored, or when processing has failed.
 +
 
 +
These functions may require async versions:
 +
* icalcomponent_new_from_string()
 +
* e_cal_modify_object[_with_mod]
 +
* e_cal_create_object()
 +
 
 +
Making libical asynchronous is almost impossible without going multithreaded. Let's focus on e_cal D-Bus communication first.
 +
 
 +
Rob started some work on this in the rbradford-wip-ecal-async-api branch. Perhaps you can reuse some of that work. It is not functional, but might be a source of inspiration or code fragments.
 +
 
 +
[https://bugzilla.gnome.org/show_bug.cgi?id=652174 Upstream evolution-data-server bug]
 +
 
 +
== More Efficient Access to Meta Data for Change Tracking ==
This was discussed a while ago: [[http://www.mail-archive.com/evolution-hackers@gnome.org/msg03029.html  
This was discussed a while ago: [[http://www.mail-archive.com/evolution-hackers@gnome.org/msg03029.html  
Line 73: Line 101:
TODO:
TODO:
-
# extend EDS query language with a flag that limits delivered data to just UID+RECURRENCE-ID+LAST-MODIFIED resp. UID+REV
 
-
# implement this query for local contact and calendar storage, reject with error in others
 
-
# later: implement atomic changes as discussed in the mail thread
 
-
== quick check for "data changed" ==
+
The Maemo Fremantle port of EDS, which is of course already open source, apparently has some solution for this, according to Mathias Hasselman, so that should be investigated first.
-
Even quicker than listing items and comparing their revisions strings would be a comparison of a revision string for the whole database. The semantic would be this:
+
However, this should probably be made possible by:
-
* If the revision string is empty or different, then the content of the database '''might''' have changed. A more detailed check is necessary to determine that.
+
-
* If it is non-empty and the same as before, nothing has changed. No further check necessary.
+
-
The modification time stamp of the .ics or .db files could be used. Probably need a new "query property" API. e_cal_get_static_capability() only returns bool. Same with e_book_get_static_capabilities()/e_book_check_static_capability().
+
# Extending the EDS query language with a flag that limits the delivered data to just:
 +
## UID and REV for libebook (REV is in VCard 3.0)
 +
## UID+RECURRENCE-ID+LAST-MODIFIED, for libecal
 +
# Implementing these queries for the local contact and calendar backends. The query would be rejected, with an error, for other backends.
 +
[https://bugzilla.gnome.org/show_bug.cgi?id=652179 libebook: Upstream evolution-data-server bug] (DONE)<BR>
 +
[https://bugzilla.gnome.org/show_bug.cgi?id=652180 libecal: Upstream evolution-data-server bug]
 +
== Atomic Updates ==
 +
Later, implement atomic changes as discussed in the mail thread: [http://www.mail-archive.com/evolution-hackers@gnome.org/msg03029.html]
-
== contacts: store PHOTO data as plain files ==
+
== Quick check for "data changed" ==
 +
 
 +
The libebook and libical APIs should offer a way to discover if any data whatsoever has changed. This would, for instance, help SyncEvolution to avoid unnecessary detailed checking. This could just check the modification dates of local storage files, such as the BDB .db file, or .ics file, as long as we do not use those files to store permanent caches of parsed data. This would be much quicker than listing items for the whole database and comparing their revisions strings, as is currently necessary.
 +
 
 +
This would require small API additions, to check that the opaque revision strings are unchanged, such as
 +
 
 +
gchar* e_book_get_revision(EBook *book)
 +
gboolean e_book_check_revision(EBook *book , const gchar* revision)
 +
 
 +
and
 +
 
 +
gchar* e_cal_get_revision(ECal *ecal)
 +
gboolean e_cal_check_revision(ECal ecal, const gchar revision)
 +
 
 +
The semantic would be this:
 +
* If the revision string is empty or different, then the content of the database '''might''' have changed. A more detailed check is necessary to determine if it has really changed. This will be the case for non-local storage.
 +
* If it is non-empty and the same as before, the nothing has changed. No further check will be necessary.
 +
 
 +
[https://bugzilla.gnome.org/show_bug.cgi?id=652175 libebook: Upstream evolution-data-server bug]<BR>
 +
[https://bugzilla.gnome.org/show_bug.cgi?id=652177 libecal: Upstream evolution-data-server bug]
 +
 
 +
== Contacts: Store PHOTO Data as Plain Files ==
Photos should be transferred over D-Bus via a filepath to a local file rather than transferring the actual photo data over D-Bus. The VCard format already allows this option (See the PHOTO property). We must be careful to keep these local files in sync and to remove them when contacts are removed, and to restrict access to the files from other processes.  
Photos should be transferred over D-Bus via a filepath to a local file rather than transferring the actual photo data over D-Bus. The VCard format already allows this option (See the PHOTO property). We must be careful to keep these local files in sync and to remove them when contacts are removed, and to restrict access to the files from other processes.  
Line 98: Line 149:
# Apps should be able to create photo files without ever passing the data to libebook.
# Apps should be able to create photo files without ever passing the data to libebook.
# External PHOTO file and corresponding contact must be kept in sync:
# External PHOTO file and corresponding contact must be kept in sync:
-
** If a contact is deleted, the photo also needs to be removed.
+
## If a contact is deleted, the photo also needs to be removed.
-
** If a photo file is created and then storing the contact fails, who removes the file?
+
## If a photo file is created and then storing the contact fails, who removes the file?
# Access to photo data must be limited to processes which have the right permissions.
# Access to photo data must be limited to processes which have the right permissions.
Line 115: Line 166:
* Should there be automatic garbage collection of stale photos in case something went wrong and a photo wasn't properly deleted?
* Should there be automatic garbage collection of stale photos in case something went wrong and a photo wasn't properly deleted?
* Trust TYPE=JPEG/PNG/... or look into file content to determine type?
* Trust TYPE=JPEG/PNG/... or look into file content to determine type?
 +
 +
[https://bugzilla.gnome.org/show_bug.cgi?id=652178 Upstream evolution-data-server bug]
 +
 +
== Future: Content Protection ==
 +
 +
Only authorized applications (= processes) with the necessary permissions should be able to read/write data. Exact permissions and mechanisms for verifying them to be defined as part of the (revised?) MeeGo architecture.
 +
 +
This can be added as follows:
 +
# lock down file permissions
 +
# add permission checks to EDS D-Bus servers
 +
 +
Currently not planned. Depends on the future MeeGo security infrastructure.

Latest revision as of 06:53, 6 October 2011

This page describes improvements that are desirable or even essential for good performance of Evolution Data Server (EDS) as PIM storage in MeeGo.

All of these changes should be developed for the evolution-data-server gnome-2-32 branch, as used by MeeGo 1.2, then ported to master for review by upstream maintainers. Most changes will not be accepted in upstream's gnome-2-32 branch, because is is stable, but Meego will apply them as patches to its package.

See the list of "meego" bugs in the upstream evolution-data-server bugzilla.

Contents

Change Tracking

EDS is based on the model/view/controller principle: an application opens a database (calendar or addressbook) and requests a view with server-side filtering, then the application is sent:

  1. all data covered by the view
  2. the "all data sent" signal
  3. changes (add/update/remove) while the view exists

Applications modify data with asynchronous requests (controller) and keep showing the old data until the change notification comes in. The exact changes made to the data are determined by the server (for example when meeting invitations are sent).

QtContacts and KCalCore do not have such a strict model. With QtContacts the app simply loads and stores contacts and is responsible for maintaining its own view. Changes are sent for changes made after the database was opened. KCalCore is based on the concept of loading a calendar (partially) into the client, modifying it there, then writing it back. Changes are also sent while the storage is open.

TODO:

Suppress Unnecessary Data Transfer

EDS currently sends all data when a view is requested, which is unnecessary when the client is only interested in change notifications. An extension to the EDS query language could optionally suppresses that sending of existing data when the view is opened, to be implemented only for the local contacts and calendar backends. This would allow easier integration with the QContact API.

The Fremantle EDS fork also had a “freezable” (static) capability for libebook, which should probably be investigated. See, for instance: http://maemo.org/api_refs/5.0/5.0-final/libebook/EBookView.html#e-book-view-set-freezable

One possibility for adding the flag is to (ab)use the existing query language with a match for special fields:

  • in the client: query = e_book_query_orv(e_book_query_field_exists("X-EDS-QUERY-FLAG-SKIP-DATA-DUMP"), <real query>, NULL);
  • in the server: "if (query is OR-Query and first clause is exists("X-EDS-QUERY-FLAG-SKIP-DATA-DUMP")) { skip_data_dump = TRUE; query = second clause; }

Upstream evolution-data-server bug

Notify with UIDs Only

EBookView emits signals when contacts are changed, added, or removed. But those signals sends lists of entire EContacts, containing the entire contact data, though we often have all the data already and just want to know which contacts have changed.

e_book_get_book_view() takes an EBookQuery which could be extended to take a new flag to make the EBookView send only the UID in the signals' EContacts. For instance, "X-EDS-QUERY-FLAG-NOTIFY-UID-ONLY". The UID is already stored separately for use as the primary key of the local-storage backend's BDB database, so no VCard parsing should be necessary to provide it.

This would allow easier integration with the QContact API, greatly improving performance.

Note that users of this feature should probably use just the base EVCard API on the provided EContact objects, to avoid extra processing and caching in the EContact code.

Note: We previously considered implementing e_book_get_book_view()'s requested fields parameter, but that changes the whole view, though we only really want to change what is sent for changed/new contacts.

Note: Chris discovered a dedicated e_book_get_contact_uids() API in the Fremantle EDS fork. The new API call was accepted upstream (in master) and backported as a patch (used in the Meego package) for the gnome-2-32 branch, but this is not a full solution.

Upstream evolution-data-server bug (DONE)

Optional Parsing of ebook Data in Client

The EDS client-side APIs currently parse and (temporarily) store the parsed details of VCards. However, this is often unnecessary because APIs such as e_book_get_book_view() cause it to return the whole VCard anyway. There will be some performance gain from delaying VCard parsing until it is really needed by calls to functions such as e_contact_new_from_vcard().

libebook provides different functionality:

  1. exchange data as vCard with EDS daemon
  2. parse items and wrap them in a C API

QtContacts replaces the second part. An obvious performance improvement is to only do the second part if the additional C APIs are really called. The following calls then transfer data without any additional parsing in libebook:

  • e_book_get_contact()/asynchronous read/view + e_vcard_to_string()
  • e_contact_new_from_vcard() + e_book_add_contact()/e_book_commit_contact()/asynchronous storing

Note that libecal (for iCalendar) can already do this, though libebook can't yet.

TODO:

  1. add delayed parsing to libebook, triggered by API call which needs parsed data

PROBLEM:

  1. If delayed parsing fails, how can error be reported?
  2. Should delayed parsing be done always (e_vcard_to_string())) or only if requested? How would that be configured by a libebook user?

An e_book_view_set_parse_vcards() patch, already used in the Meego package, partly implements this already. This patch was taken from an old fremantle EDS patch that was never actually used, and is generally considered to be a bad hack that should not be accepted upstream.

Upstream evolution-data-server bug

Asynchronous API for storing items in libecal

libecal should allow the caller to request storing of data without blocking on processing of the data. It should notify the caller when the data has been stored, or when processing has failed.

These functions may require async versions:

  • icalcomponent_new_from_string()
  • e_cal_modify_object[_with_mod]
  • e_cal_create_object()

Making libical asynchronous is almost impossible without going multithreaded. Let's focus on e_cal D-Bus communication first.

Rob started some work on this in the rbradford-wip-ecal-async-api branch. Perhaps you can reuse some of that work. It is not functional, but might be a source of inspiration or code fragments.

Upstream evolution-data-server bug

More Efficient Access to Meta Data for Change Tracking

This was discussed a while ago: [[http://www.mail-archive.com/evolution-hackers@gnome.org/msg03029.html [Evolution-hackers] concurrent modifications of items in GUI and EDS database]]

Data synchronization only needs to know which items exist (list of local IDs) and which revision of each item is currently stored. Revisions can be identified with a string which changes for each modification. Restoring an older revision should (but doesn't have to) restore the revision string.

For calendar, UID+RECURRENCE-ID and LAST-MODIFIED can be used. For contacts, UID + REV.

By comparing a stored list against the current one, changes can be detected without relying on journals in the daemon or comparison of time stamps. Journals are hard to maintain (how many of them, how long?). Comparison of time stamps has problems when changes are allowed while a sync runs and assumes that system time is linear (which is not always true).

As an additional use case, local ID + revision string could be used to prevent overwriting stale data in the server, as in HTTP PUT eTag checks.

TODO:

The Maemo Fremantle port of EDS, which is of course already open source, apparently has some solution for this, according to Mathias Hasselman, so that should be investigated first.

However, this should probably be made possible by:

  1. Extending the EDS query language with a flag that limits the delivered data to just:
    1. UID and REV for libebook (REV is in VCard 3.0)
    2. UID+RECURRENCE-ID+LAST-MODIFIED, for libecal
  2. Implementing these queries for the local contact and calendar backends. The query would be rejected, with an error, for other backends.


libebook: Upstream evolution-data-server bug (DONE)
libecal: Upstream evolution-data-server bug

Atomic Updates

Later, implement atomic changes as discussed in the mail thread: [1]

Quick check for "data changed"

The libebook and libical APIs should offer a way to discover if any data whatsoever has changed. This would, for instance, help SyncEvolution to avoid unnecessary detailed checking. This could just check the modification dates of local storage files, such as the BDB .db file, or .ics file, as long as we do not use those files to store permanent caches of parsed data. This would be much quicker than listing items for the whole database and comparing their revisions strings, as is currently necessary.

This would require small API additions, to check that the opaque revision strings are unchanged, such as

gchar* e_book_get_revision(EBook *book)
gboolean e_book_check_revision(EBook *book , const gchar* revision)

and

gchar* e_cal_get_revision(ECal *ecal)
gboolean e_cal_check_revision(ECal ecal, const gchar revision)

The semantic would be this:

  • If the revision string is empty or different, then the content of the database might have changed. A more detailed check is necessary to determine if it has really changed. This will be the case for non-local storage.
  • If it is non-empty and the same as before, the nothing has changed. No further check will be necessary.

libebook: Upstream evolution-data-server bug
libecal: Upstream evolution-data-server bug

Contacts: Store PHOTO Data as Plain Files

Photos should be transferred over D-Bus via a filepath to a local file rather than transferring the actual photo data over D-Bus. The VCard format already allows this option (See the PHOTO property). We must be careful to keep these local files in sync and to remove them when contacts are removed, and to restrict access to the files from other processes.

There should be utility code available to libebook users to convert between the different representations.

Goals:

  1. Apps should be able to create photo files without ever passing the data to libebook.
  2. External PHOTO file and corresponding contact must be kept in sync:
    1. If a contact is deleted, the photo also needs to be removed.
    2. If a photo file is created and then storing the contact fails, who removes the file?
  3. Access to photo data must be limited to processes which have the right permissions.

Proposal:

  • For each ~/.local/share/evolution/foo calendar, store photos in <photo path>=~/.local/share/evolution/foo/photos, with same permissions.
  • New photo files are created with a UUID as base name and suffix according to the file content.
  • If a contact is deleted and has a PHOTO;VALUE=uri:file://<photo path>/*, delete that file.
  • For apps which want to create the photo file directly:
    • Apps get a file descriptor and the path from a new libebook API.
    • They own the file until the EDS server acknowledges the successful storing of a contact with a matching PHOTO;VALUE=uri:file property. If that fails, the app must remove the file.
  • Add new API calls to libebook which split out resp. include data inline.
    • Must track ownership of file after splitting it out and before storing - might never get stored!

Open:

  • Should there be automatic garbage collection of stale photos in case something went wrong and a photo wasn't properly deleted?
  • Trust TYPE=JPEG/PNG/... or look into file content to determine type?

Upstream evolution-data-server bug

Future: Content Protection

Only authorized applications (= processes) with the necessary permissions should be able to read/write data. Exact permissions and mechanisms for verifying them to be defined as part of the (revised?) MeeGo architecture.

This can be added as follows:

  1. lock down file permissions
  2. add permission checks to EDS D-Bus servers

Currently not planned. Depends on the future MeeGo security infrastructure.

Personal tools