Archive for the ‘Articles in English’ Category

CompletableFutures – why to use async methods?

Wednesday, December 25th, 2013

I have been playing with Java 8 CompletableFutures and one thing has been bothering me – shall I use methods with Async suffix or the variant without it. Let's take a look at the following example:

CompletableFuture.supplyAsync(() -> getUser(userId))
                .thenApply(CreditRatingService::getCreditRatingSystem1)
                .thenAccept(System.out::println);

private static User getUser(int id) {...}

private static CreditRating getCreditRatingSystem1(User user) {...}

Here I want to get some data about a user from a remote system, then call another system to get his credit rating and once it is finished, I want to print the result. All of the tasks may take some time, so I do not want to block the main thread, hence the use of completable futures.

If you are not yet familiar with Java 8 syntax, () -> getUser(userId) is a lambda calling getUser method. CreditRatingService::getCreditRatingSystem1 and System.out::println are method references. Underneath, the supplyAsync method creates a task which gets the user details. The task is submitted to fork-join thread pool and when it finishes, the result is passed to the next task. When the next task finishes, its result is sent further and so on. It's quite neat and simple.

The question is, whether I should use the previous variant or this one.

CompletableFuture.supplyAsync(() -> getUser(userId))
                .thenApplyAsync(CreditRatingService::getCreditRatingSystem1)
                .thenAcceptAsync(System.out::println);

The difference is in the async suffix on the method names. The methods without async execute their task in the same thread as the previous task. So in the first example, all getUser, getCreditRating and println are executed in the same thread. It's OK, it's still a thread from the fork-join pool, so the main thread is not blocked.

The second variant always submits the succeeding task to the pool, so each of the tasks can be handled by different thread. The result will be the same, the first variant is a bit more effective due to less thread switching overhead. It does not make any sense to use the async variant here, so what is it good for? It was not clear to me, until extended the example to download the credit rating data from two different systems.

CompletableFuture<User> user = CompletableFuture.supplyAsync(() -> getUser(userId));

CompletableFuture<CreditRating> rating1 = 
    user.thenApplyAsync(CreditRatingService::getCreditRatingSystem1);
CompletableFuture<CreditRating> rating2 = 
    user.thenApplyAsync(CreditRatingService::getCreditRatingSystem2);

rating1
    .thenCombineAsync(rating2, CreditRating::combine)
    .thenAccept(System.out::println);

Here I have two actions waiting for user details. Once the details are available, I want to get two credit ratings from two different systems. Since I want to do it in parallel, I have to use at least one async method. Without async, the code would use only one thread so both credit rating tasks would be executed serially.

I have added combine phase that waits for both credit rating tasks to complete. It's better to make this async too, but from different reason. Without async, the same thread as in rating1 would be used. But you do not want to block the thread while waiting for the rating2 task. You want to return it to the pool and get a thread only when it is needed.

When both tasks are ready, we can combine the result and print it in the same thread, so the last thenAccept is without async.

As we can see, both method variants are useful. Use async, when you need to execute more tasks in parallel or you do not want to block a thread while waiting for a task to finish. But since it is not easy to reason about such programs, I would recommend to use async everywhere. It might be little bit less effective, but the difference is negligible. For very small small price, it gives you the freedom to delegate your thinking to the machine. Just make everything async and let the fork-join framework care about the optimization.

REST Fire

Saturday, August 17th, 2013

Let me introduce you to my latest pet project – REST Fire. It's basically a thin wrapper around Apache HTTP client, providing simple DSL for REST service testing. You can test your REST APIs like this:

        fire().get()
                .to("https://www.google.com/search?q=rest-fire")
                .withHeader("Accept", "text/html")
        .expectResponse()
                .havingStatusEqualTo(200)
                .havingHeader("Content-Type", hasItem(startsWith("text/html")))
                .havingBody(containsString("rest-fire"));

And that's basically all. If you know REST Assured library, you may be asking why I just do not use REST Assured and write something new instead. Apart from NIH syndrome, I have two reasons.

First of all, I do not like REST Assured syntax, where you define the request, then you validate the response and then again you send the request. REST Fire is more straightforward, just fire a requst and validate the response. The syntax is strongly inspired by Jadler, so if you want to mock backends and test your API at the same time, you do not have to switch between two different styles.

Secondly, REST fire is built on top of Apache HTTP client and lets you configure whatever you want. It's even capable to use our exotic authentication mechanism.

Usage of Apache HTTP client has its downsides too. It is hard to simulate invalid requests. Apache HTTP client tries to make sure that the requests are valid and if you need to test reactions on invalid requests, you are out of luck.

So if you want to try REST fire, visit its GitHub page for more details and let me know what you think about it.

Asynchronous servlet processing

Sunday, April 7th, 2013

Welcome to new episode of fun with threads. Today we will cover the exciting topic of asynchronous servlet processing. In the last episode we have learned that it's not a problem to start more than 10,000 threads in modern JVM, today we will play with asynchronous “threadless” model.

The first important thing thing to note is that we still need a thread everytime our code is doing something. If you have your code in front of you, you always need a thread to move between the lines of the code. But if the code is waiting for something like IO, you do not need a thread and we can leverage that. We have to realize that the vast majority of Java applications act as a middleman – they just toss data from one side to another. From database to HTML, from proprietary back-end to SOAP WS, from REST service to another REST service. You get the picture. I usually do not write chat applications that need to keep a connection open and wait for an event. I usually end-up writing a classical request-response applications that just need to handle lot of requests.

Let's examine how such applications work. What are the characteristics of such applications? First thing to notice is that they are waiting for most of the time. They are waiting for an incoming request. When the request comes, they do some processing, validation and transformation and then call a back-end. And than they wait again, this time for the response. When the response arrives, our application does some transformation and sends the response back. And then it waits again.

Request processing

Of course it's application dependent, but in most cases there are few milliseconds of work followed by tens or hundreds of milliseconds of waiting time. You do not have to block a thread when you are waiting. Even though the threads are pretty cheap, we do not want to waste them.

But as is usually the case, there are some tradeoffs. The traditional threaded model is easier to reason about, the asynchronous model is harder to grasp and easier to mess-up. I am not talking about shared state and similar stuff, I am just talking about the fact that my brain works better when I can read the code in a linear way an not thinking about jumping here and there.

Take the following example of a naive proxy using HTTP client 4

log("Servlet no. {} called.", number);
HttpGet backendRequest = createBackendRequest(req, number);
HttpResponse result = client.execute(backendRequest);
copyResultToResponse(result, resp);
log("Servlet no. {} processed.", number);
log("Servlet no. {} returning.", number);

Full source code

It's evident what's going on. I create a back-end request, then I execute it and then process the result. It's straightforward.

Now let's take a look at the asynchronous version. The easiest way is to delegate the asynchronous processing to libraries. On the top we will use Servlet 3 asynchronous processing, on the back-end side we will use HttpClient asychronous capabilities. Luckily both features play well together, so I can use this code.

log("Servlet no. {} called.", number);
HttpGet backendRequest = createBackendRequest(req, number);
//start async processing
final AsyncContext asyncContext = req.startAsync(req, resp);
client.execute(backendRequest, new FutureCallback<HttpResponse>() {
       @Override
       public void completed(HttpResponse result) {
              ServletResponse response = asyncContext.getResponse();
              copyResultToResponse(result, response);
              log("Servlet no. {} processed.", number);
              asyncContext.complete();
      }
      // error handling removed for brevity
           
});
log("Servlet no. {} returning.", number);

Full source code

Well it's kind of readable. I can imagine that readability could be better in modern languages. What I do not like is that's hard to grasp the code. Try it for yourself, what will be the order of log messages? It can be

a) Servlet no. 1 called. - Servlet no. 1 processed. - Servlet no. 1 returning.
b) Servlet no. 1 called. - Servlet no. 1 returning. - Servlet no. 1 processed.
c) Servlet no. 1 returning. - Servlet no. 1 called. - Servlet no. 1 processed.

What's your answer? It's of course the second one. The first one is for the synchronous version and the third is just made-up. Why the second version? It's easy. By calling req.startAsync(req, resp) I say that I am starting the asynchronous processing. Then I execute back-end call but I put the response processing to a callback. So it's processed after the back-end response arrives. Now I have finished the request execution and the request processing thread provided by the servlet container finishes the execution. Of course it has to unroll the whole stack. So it goes through all your filters as if the processing has already finished.

But it has not. It's still waiting for the back-end response. Once the response arrives, the HttpClient calls the callback using its own thread. It has to, we have already returned the servlet thread to the container.

In the callback we process the response and by calling asyncContext.complete() we say to the servlet container that we have really finished. When we leave the callback, HttpClient will use the same thread to process another back-end response.

So in reality, the log from asynchronous call is

ServletThread - Servlet no. 1 called.
ServletThread - Servlet no. 1 returning.
HttpClientThread - Servlet no. 1 processed.

It's easy to understand it once you get used to it. But you have to be aware of few downsides. The first is, that servlet filters do not work as you are used to. Do you want to do a response postprocessing in a filter? You are out of luck. You can use asyncContext.dispatch() method, but it adds even more complexity. Better solution may be to use framework support like the one in Spring MVC that can help you with the task.

The other downside is that you have to think more about the threads and sizing of thread pools. For example, if we have more time-consuming response processing, we have to add more threads to the HttpClient thread pool.

The third downside is that if you do not have support in the client library as we had in HttpClient, you have to take care about the threading yourself. Not only the library needs to support asynchronous calls, it has to support callbacks. If it uses Futures, you have to figure out how to get the result from the future and process it. Again, you will end up with some threading work.

So, if the asynchronous model is so complicated why would we want to use it? Why do not we just use the old synchronous model? The truth is that the synchronous model make sense for most of the use-cases. It is able to process few thousands of parallel requests and it is usually enough for most sites.

But if you have really massive load and you have to utilize you hardware as much as possible, asynchronous processing performs much better. In my test, I recursively call a servlet which in turns calls itself which calls itself until some limit is reached. Jetty is able to process 20,000 connections in less than 15 seconds (after some warm-up when it creates all the connections needed and then just reuses them). Keep in mind that under this test all connections are kept open. So effectively, in the middle of the test I have 20,000 connections open on servlet side and 20,000 connections open on the HTTP client side. All of this with 30 threads including JVM housekeeping threads. All of this on my five-year-old laptop with less than 1.5GB of heap. Pretty cool.

You can try it for yourself the source code is available here.