Today I have encountered an interesting REST API. It looked like this one
GET /objects/{objectId} % Object resource - returns 302 (Found) redirecting to the latest version GET, PUT /objects/{objectId}/versions/{versionId} % Version resource - getting, changing current status of the object
We have two connected resources. I will call the first one the object resource. If you call GET
on the object resource /object/123
you always get redirected to the other resource – version resource. It contains the most recent version /object/123/versions/1
. This resource is immutable, a new version is created every time someone updates the resource.
At first glance such API has lot of advantages. First of all, if you use a good HTTP library, the redirect from the object resource to its most recent version happens automatically. You do not have to care about it. You just send request to the object URI and get the response. Moreover, you can cache the most recent version of the document. Next time when you want to retrieve it, you just send request to /object/123
, get the version resource URI and if the resource has not changed you can get the most recent version from the cache. The caching can be provided by standard HTTP proxies, server or the client. The version resource is immutable, so it can be cached indefinitely.
Not only that, you get optimistic locking for free. You send the PUT request to the most recent version. If the object has been modified by someone else in the meantime, you get an error. Cool.
The problem is that such API does not make sense if you do not need access to old versions of the object. If you do not keep track of changes and have only current version of the resource, this API is plainly wrong.
Why? The first problem is what to do if someone wants to access an older version of the object. Return an error? We do not have the old value any more, so it has to be an error. But now the resource is not immutable. It’s inconsistent and unpredictable. We can get an error one time and an old version from cache another time.
What’s worse, by having to do two HTTP calls to get the status of the resource, we can get to a race condition. Imagine the following scenario. Both client A and client B call GET /objects/123
. Both of them get redirected to /objects/123/versions/1
. Client A gets the resource representation, modifies it and sends PUT to /objects/123/versions/1
thus changing its state. Client B is kind of slow so the redirected call to GET /objects/123/versions/1
will be sent after client A changed the value. Therefore client B gets an error because he is trying to access an old version that is no more accessible. The redirect is no longer transparent! The client has to be prepared for such situations and manually implement a retry mechanism just to get the resource representation!
There are other shortcomings. The main motivation for such solution is to leverage HTTP caching. But we have to always call the object resource first to get the URI of the latest version. Only after that a cache can be used. So we sometimes end-up with 2 HTTP calls instead of one. If the resource had been implemented in a simple way, only one HTTP request would have been needed.
The biggest problem of this solution is that it attempts to solve in a clever way something that has been solved long time before in HTTP standard.
I am talking about ETag. We do not need the version resource. We can return the resource representation directly from the object resource /object/123
. If we want to leverage caching and optimistic locking, we can return ETag header which describes current state of the object. Basically it will have exactly the same value as versionId
we have used in the our version resource.
Now if the client or a proxy has an old value in the cache, it can send GET to /object/123
with If-None-Match: ETagValue
HTTP header. If the resource state has not changed, the server will send 304 (Not Modified) and no HTTP body. If the resource has changed, new representation will be returned. We have to always do only one HTTP request. If the server value is the same as cached value, no response body is sent in the response so it saves some bandwidth and processing power.
Optimistic locking (a.k.a Conditional Update) can be done in similar way. Just send PUT to /object/123
with If-Match: ETagValue
header. If the ETag value correspond to the resource state, the update will be performed. If the resource has been changed in the meantime, the ETag value will not correspond and the server will return 412 (Precondition Failed).
It does everything as the versioned API, but better. What’s more, it is described in HTTP standard so there is high chance that it will be supported by HTTP client libraries, proxy servers and application frameworks.
Not that I want to defend the broken versioning schema you describe here, but isn’t the race condition you describe in the paragraph starting with “What’s worse…” exactly what optimistic locking is about? After all the version with etags does exactlu the same – the PUT ends up with client error (4xx) and the client has to restart the operation with the new version, if it needs to update the new value.
The problem with the versioned API is that you can get 4xx error on GET. On PUT you have to expect it, on GET it’s confusing.
Right.