Concurrency, scaling in a scenario involving Azure Table storage + external apis

The scenario is a background processor which does the following :

  1. A method writes a new entity in to table storage with status “unprocessed”
  2. Get unprocessed records from Table storage
  3. Call service A
  4. Call Service B (if A Succeeds)
  5. Update Azure Table storage with the results of A and B (irrespective of if A and B are succeeded)

A and B are idempotent so even when we scaled the app we did not see any issues, as Azure table storage was not using any concurrency (Last Write wins).

Now the new scenario is

  1. A method writes a new entity in to table storage with status “unprocessed”
  2. Get unprocessed records from Table storage
  3. Call service A
  4. Call Service B (if A Succeeds)
  5. Update Azure Table storage with the results of A and B (irrespective of if A and B are succeeded)
  6. Finally send an email

Now the Last Write wins strategy is not going to work because sending a mail is not idempotent, if only a single instance of this was running it is still fine but if we scale this say 2 instances, users will be receiving mails twice.

Solutions 1: Using blob lease

We can create a Blob and create a lease, which the background processor can check and only retrieve records when the lease is available, you can corelate this with a SP in SQL which fetches records using a LOCK statement, using some flag say “IsProcessed”.

This solution works but though we scale the processors at any point of time only a single processor would be working and the remaining would be just waiting for the lease to be available.

Solution 2: using table storage queue

  1. A method writes a new entity in to table storage with status “unprocessed”
  2. After writing to table storage in the same call put a message in the table storage Queue
  3. Get unprocessed records from Table storage, processor reads from the queue and
  4. Calls service A
  5. Calls Service B (if A Succeeds)
  6. Updates Azure Table storage with the results of A and B (irrespective of if A and B are succeeded)
  7. Finally send an email

The advantage of this over the blob approach is

  1. All the scaled out processors are always working
  2. No concurrency issues
  3. Only 1 mail is sent.