SQL Server Everywhere – how big is that data?

As i mentioned in my previous post regarding SQL/e we have encountered a number of issues during our development phase.  One of the issues we encountered early on was that, for a number of different reasons, a reinitialisation maybe required for a merge subscription.  When a SQL/e database is set up as a subscriber to a merge subscription the initial dataset (this is usually a subset of the full database, partitioned according to device/user/other information) is replicated – we refer to this as a full sync.  From then on each sync only uploads/downloads changes that have been made to ensure the database is up to date – we refer to this as a partial sync.  Occasionally a sync was taking much longer than expected.  It turns out that it was essentially reinitialising the subscription, forcing a full sync.

Reinitialisation can result from a number of reasons, such as:

  • SQL/e subscription is reinitialised (directly) by the client
  • Subscriptions are reinitialised (directly) on the server
  • Changes are made to the database schema that require all subscriptions to be reinitialised
  • A subscription has been inactive for too long – effectively a timeout

After spending quite a bit of time reading documentation there appears to be no way to predict (from the client) whether a sync will be a full or partial, prior to it starting.  Take the example that a user is travelling and is connecting using their cellula network (expensive connectivity in most locations).  Before commiting to a large download it would be great for the user to be notified and presented with an option to cancel the sync.  This is also relevant from a timing perspective.  The user may just want to access some data, instead they have to wait 10 minutes for a full sync to complete.

Giving up on the documentation we went hunting for what information resides on the server regarding each subscription.  It turns out that for every subscription the server tracks information about when the subscription was last sync’d, the schema version they have, and much more.  By querying this information it is possible to determine whether the next sync to be carried out will be a full or partial sync.  For example we could run a query like (-1 indicates a reinitialisation is required):

select schemaversion
from dbo.sysmergesubscriptions
where pubid = ‘4875EFFF-BA73-4660-B7A0-781CAC97384E’ –Publication GUID
and subid = ‘9990AD5A-6A39-4234-A0B4-26BDB29B4C21’ –Subscription GUID

Now that we can query the database on the server side we need a way for the client to be able to access this information.  This is done via a webservice which runs with enough privileges to be able to query this information on the server.  The only remaining trick is to work out what the publication GUID and subscription GUIDs are.  On the SQL/e database, once the subscription has been setup, there is a __sysMergeSubscription table that can be queried for this information.

In summary the process is quite simple.  Prior to syncing we query the local database for the publication/subscription ID (if these are not there then it have to be a full sync).  This information is sent to the webservice that queries the subscription information on the server.  If a full sync is required we prompt the user to confirm that they are willing to proceed with the sync.

Hope this information helps others with this issue as we had a tough time trying to locate a good information source on how all these bits and pieces work (or more specifically all the things which can – and inevitably will – go wrong with merge replication).  While I would definitely still recommend using replication to push data out to an occasionally connected client, it is not something you can just “assume it will work”.

Leave a comment