Tuesday, March 24, 2015

Is multi-threading really worth it?

"Premature optimization is the root of all evil" - Donald E. Knuth

This article is in no way against the usage of multithreading if we really need it. This is just an attempt to explain the cost multithreading comes with as there ain't no such thing as a free lunch. I have seen many examples where multithreading is used in real life projects in name of making things faster. We should always benchmark these things to conclude whether concurrent code has really made our code faster or not. It is quite possible that sometimes it may introduce a lot of delay and its better to avoid then. There is always a context switch overhead and a thread needs more resources to run.

I will explain it with one very simple example I am running on my machine with Intel Core i5 processor with 4 GB of DDR3 Ram. The example is regarding a very simple counter class having only variable count as:
public class Counter {
    long count;
    public Counter(long count) {
        this.count = count;
    }
    void  incrementByOne() { count++; }
    long getCount() { return count; }
}

This class will be used to increment the counter to 1 billion (1,000,000,000) as shown below:
public static void main(String[] args) {
    Counter counter = new Counter(0);
    long startTime = System.currentTimeMillis();
    for (long index = 0; index<1000000000L; index++) {
        counter.incrementByOne();
    }
    long endTime = System.currentTimeMillis();
    System.out.println("Time taken: " + (endTime - startTime) + " ms");
    System.out.println("Value is: " + counter.getCount());
}

There is only one thread which is going to increment the counter to one billion. The time taken on average is 2450 ms. I have also printed the value of count to ensure that count is incremented to one billion successfully and nothing went wrong during it.Next thing I am going to try is to mark the count variable in Counter class volatile and will again run the same code and note the time taken.
public class Counter {
    volatile long count;
    public Counter(long count) {
        this.count = count;
    }
    void  incrementByOne() { count++; }
    long getCount() { return count; }
}

This takes on average time of 8680 ms, so time is increased by a factor of 3.54. We also have atomic classes to use to increment a counter. Lets write another counter class making use of AtomicLong as:
public class AtomicCounter {
    private AtomicLong count;

    public AtomicCounter() {
        count = new AtomicLong(0);
    }
    void  incrementByOne() { count.incrementAndGet(); }
    long getCount() { return count.get(); }
}

We will now use this class for single thread only to check the time taken.
public static void main(String[] args) {
        AtomicCounter counter = new AtomicCounter();
        long startTime = System.currentTimeMillis();
        for (long index = 0; index<1000000000L; index++) {
            counter.incrementByOne();
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Time taken: " + (endTime - startTime) + " ms");
        System.out.println("Value is: " + counter.getCount());
    }




The time taken on average while using atomic long is 8720 so time is increased by a factor of 3.55. Next example we will try is to use Lock to implement the similar counter. We will use the same Counter class but we will make use of Lock while increasing the counter as:
public static void main(String[] args) {
        Counter counter = new Counter(0);
        Lock lock = new ReentrantLock();
        long startTime = System.currentTimeMillis();
        for (long index = 0; index < 1000000000L; index++) {
            lock.lock();
            try {
                counter.incrementByOne();
            } finally {
                lock.unlock();
            }
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Time taken: " + (endTime - startTime) + " ms");
        System.out.println("Value is: " + counter.getCount());
    }

This time it takes an average time of 28340 which means time is increased by a whopping multiple of 11.5 and the fun part is we are still using single thread. Lets now move to two threads sharing the same instance of AtomicCounter as:
public static void main(String[] args) {
        AtomicCounter counter = new AtomicCounter();

        Thread thread1 = new Thread(() -> {
            for (long index = 0; index < 500000000L; index++) {
                counter.incrementByOne();
            }
        });
        Thread thread2 = new Thread(() -> {
            for (long index = 0; index < 500000000L; index++) {
                counter.incrementByOne();
            }
        });
        long startTime = System.currentTimeMillis();

        thread1.start();
        thread2.start();
        try {
            thread1.join();
            thread2.join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Time taken: " + (endTime - startTime) + " ms");
        System.out.println("Value is: " + counter.getCount());
    }

When we were having single thread time taken was 8720 and when we test it for two threads it took an average time of 20687 almost 2.3 times more time. Now what time will be taken when using two threads with locks.
public static void main(String[] args) {

        Counter counter = new Counter(0);
        Lock lock = new ReentrantLock();

        Thread thread1 = new Thread(() -> {
            for (long index = 0; index < 500000000L; index++) {
                lock.lock();
                try {
                    counter.incrementByOne();
                } finally {
                    lock.unlock();
                }
            }
        });

        Thread thread2 = new Thread(() -> {
            for (long index = 0; index < 500000000L; index++) {
                lock.lock();
                try {
                    counter.incrementByOne();
                } finally {
                    lock.unlock();
                }
            }
        });

        long startTime = System.currentTimeMillis();

        thread1.start();
        thread2.start();
        try {
            thread1.join();
            thread2.join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Time taken: " + (endTime - startTime) + " ms");
        System.out.println("Value is: " + counter.getCount());
    }

Here is the summary of the result:

Threads Time Taken (in milliseconds)
Single Thread2450
Single Thread using volatile8680 (3.54 times)
Single Thread using AtomicLong8720 (3.55 times)
Single Thread using Lock28340 (11.5 times)
Two Threads using AtomicLong20687
Two threads using Lock90245

So moral of the story is that concurrency comes with its own cost and we need to use it when we really need it. That's all for now. Enjoy!!

9 comments:

Anonymous said...

Add some sleep in the inner task. Small tasks can be parallelized in big bulks.

Anonymous said...

The example is bad.

When your code do something on the network, call other services, waits for response, suddenly you wouldn't like to do things using only one thread...

Anonymous said...

When your code do something on the network, call other services, waits for response, suddenly you wouldn't like to do things using only one thread...


That's exactly what NodeJS does. Single thread, but using evented calls. You can do the same thing in Java and have a minimal set of threads (or even one) using an async HTTP client.

Eduard de Jong said...

This example is a nice example of when not to use concurrency. With Amdahl's law it is easy to see that the problem of incrementing a single counter cannot be improved by paralel computing as essentially 100% of the code is shared between each thread.
However, the example does nicely show that synchronization has a cost.
A much beter example would be to compute a histogram of all te letter isn the alphabet found on all te files on a harddisk. In a single thread this would be a very long task, multiple threads would be much faster. This could then be done with and without reporting intermediate results after each file in order to show the costs of synchronizing the thread when they merge results. In this case the cost difference between lock and atomic long would likely depend on the number of threads.

Akhil Mittal said...

Anonymous: I understand this is not the right example for using concurrency and that's why I have named the title "Is concurrency really worth it"?

The objective of this post is to demonstrate that blindly using concurrency can cause a lot of harm.

Another thing I wanted to demonstrate is the cost incurred with various synchronization options.

Akhil Mittal said...

Eduard: Thanks.

Gopinadh D said...
This comment has been removed by the author.
Gopinadh D said...

Hi Akhil.. Nice blog, all your articles are really interesting reads but I just have few questions. When you say "The time taken on average is 2450 ms", how exactly are you calculating the average value ? Is it a calculated average of values from multiple runs or you're doing multiple loops in a single run ? Another important to consider while doing such benchmarks is JIT Compilation. We need to make our code hot first so that JIT compiler will kick-in and do all the optimizations and then we can start measuring the duration. Have a look at JMH for these kind of micro benchmarks, it's very promising. In case you're already considering all these things while measuring, please add that as well so that it's clear to everyone. Thanks again for all these nice articles :)

Inderpal Singh Dhami said...

We should avoid working for small efficiencies for non-critical parts of the application. Developers need to avoid Multi-threading if its being added just for sake of it, or if they feel its fancy enough. This article showcased the results clearly. Adding muti-threading into an application needs a very good understanding of Java Threading Model, how VM handles threads, which hardware its running on, and not just the coding.