I'm testing concurrency in Java, my objective is to determine if having multiple threads is actually beneficial, however, I'm getting results that don't add up. I'm trying to optimize a factorial function, for this particular test I'm using 1e9! and I want the result modulo 1e9+7 so that it does not overflow. Firstly, I divided the number based on number_of_threads and assign to each thread their work respectively. I then do it normally and compare the times I get. It seems that when number_of_threads = 4, I get better results than the version without threads, which makes sense since my CPU has 4 cores. As expected, any number of threads greater than 4 has slower time compared to only 4. However, when doing it with less than 4 threads the results become way to big, for instance, with 1 threads I expect it to last the same as doing it without a thread + overhead. Without thread I get 6.2 seconds and 19.3 with 1 thread, which is way too much of a difference for it to be just the overhead.
To test why, I put some counter on the run method and it seems like, sometimes, the execution of only 1 cycle of the for inside of it takes more than a millisecond, and it shouldn't since it's just two operations plus the timer.
public class Calc implements Runnable{
long min, max, mod, res;
Res r;
public Calc(long min, long max, long mod, Res r) {
this.min = min;
this.max = max;
this.mod = mod;
res = 1;
this.r = r;
}
public void run() {
for(long i = min; i <= max; i++) {
res *= i;
res %= mod;
}
r.addup(res);
}
}
public class Res{
long result;
long mod;
public Res(long mod) {
result = 1;
this.mod = mod;
}
public synchronized void addup(long add) {
result *= add;
result %= mod;
}
public long getResult() {
return result;
}
}
public class Main{
public static void main(String args[]) {
long startTime = System.nanoTime();
final long factorial = 1000000000L;
final long modulo = 1000000007L;
Res res = new Res(modulo);
int number_of_threads = 1;
Thread[] c = new Thread[number_of_threads];
long min = 1, max = factorial/(long)number_of_threads;
long cant = max;
for(int i = 0; i < number_of_threads; i++) {
if((long)i < (factorial % number_of_threads))max++;
c[i] = new Thread(new Calc(min, max, modulo, res));
c[i].start();
min = max +1;
max += cant;
}
for(int i = 0; i < number_of_threads; i++) {
try {
c[i].join();
}catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
System.out.println(res.getResult());
long endTime = System.nanoTime();
long totalTime = endTime - startTime;
System.out.println((double)totalTime/1000000000L);
}
}
When number_of_threads = 1 I get 19.3 seconds. When number_of_threads = 2 I get 10.1 seconds. When number_of_threads = 3 I get 7.1 seconds. When number_of_threads = 4 I get 5.4 seconds. And when doing it without threads I get 6.2 seconds (I calculate the time on this one with the same method)
There shouldn't be that much difference between only 1 thread and no threads, and for 2 and 3 threads it should be faster than no thread. Why is that and is there any way to fix it? Thanks.
Edit: Adding without thread version
public class Main{
public static void main(String args[]) {
long startTime = System.nanoTime();
final long factorial = 1000000000L;
final long modulo = 1000000007L;
long res = 1;
for(long i = 1; i <= factorial; i++) {
res *= i;
res %= modulo;
}
System.out.println(res);
long endTime = System.nanoTime();
long totalTime = endTime - startTime;
System.out.println((double)totalTime/1000000000L);
}
}