Post

Use clustering with NodeJS to utilize multi-core processors

NodeJS is the most popular and versatile JavaScript runtime environment, which has become a fundamental server-side and network application development tool. NodeJS runtime utilizes the V8 JavaScript engine developed by Lars Bak and the team for the Chrome browser when they are working for Google. (The same set of guys gave life to Dart programming language and have been founders of one of my favorite IoT platforms, Toit).

NodeJS was initially released in 2009 by Ryan Dahl and gained widespread popularity in web development. At its core, NodeJS enables developers to write server-side applications using JavaScript, a language primarily associated with client-side web scripting.

Its event-driven, non-blocking architecture, coupled with a vast ecosystem of packages, has made it a top choice for building scalable, high-performance web applications and a wide range of other software solutions. Whether you’re a web developer, system administrator, or IoT enthusiast, NodeJS offers a versatile and efficient platform for your projects.

NodeJS gained popularity and the developers’ attention by Strome due to the innovative design of the run time. NodeJS offered several key advantages over conventional programming languages of that era.

  1. Asynchronous and Event-Driven: NodeJS is designed around a non-blocking, event-driven architecture. It leverages JavaScript’s single-threaded nature to handle multiple concurrent connections efficiently, making it ideal for applications with high levels of concurrency, such as real-time applications and web servers.

  2. High Performance: Powered by the V8 engine, NodeJS boasts exceptional performance. It compiles JavaScript code into machine code, making it fast and suitable for real-time data processing applications like chat applications or online gaming platforms.

  3. Cross-Platform: NodeJS is cross-platform and runs on various operating systems, including Windows, macOS, and Linux. This allows developers to write code that works consistently across different environments.

  4. Large Ecosystem: NodeJS has a vast ecosystem of open-source packages and modules available through npm (Node Package Manager). This extensive repository of reusable code makes it easy for developers to add functionality to their applications and speeds up development considerably.

The novel approach of Event-driven asynchronous nature with the help of an event loop gave a significant advantage to NodeJS to win this battle easily.

NodeJS Event loop

The NodeJS event loop is a critical part of how NodeJS manages asynchronous operations, making it efficient and suitable for handling a large number of concurrent connections. It’s at the core of NodeJS’s non-blocking, event-driven architecture. This Design has the following components to achieve optimal execution.

Event Queue: The event loop starts by processing events from the event queue. Events can include I/O operations (e.g., reading from a file, making a network request), timers (e.g., setTimeout), or custom events generated by your code.

Non-Blocking Operations: When you initiate an asynchronous operation, like reading a file or making an HTTP request, NodeJS doesn’t block the main execution thread. Instead, it starts the operation and continues to execute other code.

Callback Functions: Asynchronous operations are associated with callback functions. When an operation completes (e.g., a file has been read, or a network request receives a response), NodeJS places the associated callback function in the event queue.

Event Loop Iteration: The event loop continually checks the event queue for pending events. It processes events one at a time, in a loop. When an event is processed, its associated callback function is executed.

Concurrency and Non-Blocking: NodeJS’s event loop efficiently handles many concurrent connections without creating a separate thread or process for each. Because it doesn’t block the main thread while waiting for I/O operations to complete, it can efficiently switch between tasks, making the most of CPU resources.

Timers: The event loop also manages timers, such as those created with setTimeout and setInterval. When a timer expires, its associated callback is placed in the event queue for execution.

The following example illustrates the processing of I/O operations asynchronously using callbacks.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
var fs = require('fs');

console.log("Start");

setTimeout(() => {
  console.log("Timer callback executed");
}, 1000);

fs.readFile("sample.txt", (err, data) => {
  if (err) throw err;
  console.log("File read callback executed");
});

console.log("End");

When code is executed, “Start” and “End” will be logged first. Then, “Timer callback executed” will be logged after approximately one second. When the file read operation is complete, “File read callback executed” will be logged, depending on the time it takes for IO operations to read the file.

So, NodeJS processes I/O operations and callbacks asynchronously, allowing it to handle multiple tasks concurrently without blocking the main thread. This event-driven architecture makes NodeJS well-suited for building highly scalable and responsive applications, particularly those that involve a lot of I/O operations, such as web servers and real-time applications.

Utilising multi-core CPUs while running NodeJS applications

utilising-multicore

To achieve the non-blocking execution without synchronization locks and waits, NodeJS is built with a single-threaded, event-driven runtime environment utilizing the V8 JavaScript engine.

The idea behind this NodeJS async processing model is that a single thread-based asynchronous processing model can provide much more performance and scalability compared to conventional synchronization-based multi-thread execution models.

Managing thread-locks, deadlock prevention, resource utilization, priority management, lock acquisition and release mechanisms, and thread pool management are some challenges conventional multi-threaded-based runtimes face, which introduce significant overhead for their execution cycles.

This single-threaded nature enables NodeJS to utilize CPU cycles on other tasks when working on time-consuming I/O operations without blocking and waiting for the thread until the I/O operation is completed. This non-blocking nature enables NodeJS processes to serve hundreds of thousands of concurrent requests without blocking external I/O delays.

However, it’s essential to utilize the full potential of multi-core processors for improved performance and scalability in NodeJS applications. Here are some approaches and techniques to achieve this.

Utilising CPU cores with NodeJS worker threads

The worker_threads module was developed as an experimental feature from Node 10 and introduced with Node version 12, which provides a way to take advantage of multiple CPU cores by running JavaScript code concurrently in separate worker threads. This module allows developers to create new program threads using the JavaScript Worker class.

Worker threads share the same memory space, creating multiple V8 engine nodes containing event loops. This approach follows the same architecture of NodeJS processes.

These threads can help to improve the performance and scalability of NodeJS applications as they run in parallel with the main thread. They help to offload CPU-intensive tasks and free up the main thread for other processing requests.

Following is a simple script to run CPU-intensive tasks on additional cores with the help of worker threads. These tasks will run parallel on different CPU cores.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');
const os = require('os');

// Function to perform a CPU-intensive task
function performTask(workerId) {
  let result = 0;
  for (let i = 0; i < 1000000000; i++) {
    result += Math.random();
  }
  return `Worker ${workerId}: Result is ${result}`;
}

if (isMainThread) {
  // This is the main thread

  // Get the number of CPU cores
  const numCores = os.cpus().length;

  // Create an array to hold the worker instances
  const workers = [];

  // Create a simple function to distribute tasks to workers
  const distributeTasks = () => {
    for (let i = 0; i < numCores; i++) {
      const worker = new Worker(__filename, {
        workerData: { workerId: i + 1 },
      });

      // Handle messages from worker threads
      worker.on('message', message => {
        console.log(message);
      });

      workers.push(worker);
    }
  };

  // Distribute tasks to workers
  distributeTasks();

  // Perform any other tasks in the main thread
  console.log(`Main thread: Number of CPU cores: ${numCores}`);
} else {
  // This is a worker thread

  // Access the workerData passed from the main thread
  const workerId = workerData.workerId;

  // Perform the CPU-intensive task
  const result = performTask(workerId);

  // Send the result back to the main thread
  parentPort.postMessage(result);
}

NodeJS Clustering for utilizing additional CPU cores

NodeJS clusters create additional processes to utilize CPU cores. It is the same as running multiple NodeJS instances to distribute workloads.

Clustering differs from worker_threads because there is clear process isolation regarding clustering. Although process level separation exists, clusters share the server ports, as the name implies.

NodeJS clustering create child V8 engine runtime processes with separate event loop and memory space. Since clusters share the same port, requests are distributed to those processes in a load-balanced manner. Because of that, throughput will be increased, and processes will utilize CPU resources efficiently.

Blocking I/O will be more frequently handed over to the system kernel by multiple processes and will be processed by the operating system efficiently with multithreading. Clustering helps to support high availability as well. When one process crashes, others can serve the additional requests until that specific process recovers. Clustering solutions in PM2 use this pool of processes for rolling restarts.

Since clustering is a prominent approach for most use cases and will need minimal changes (or no changes in some instances) to our application logic, let’s explore clustering approaches and the real-life benefits of clustering.

Available options for NodeJS to implement clustering

There are several ways to utilize the clustering of NodeJS applications. Let’s discuss the following three approaches, which are popular options to run clustering-enabled production workloads.

  1. NodeJS clustering with the help of an inbuild cluster NodeJS module
  2. NodeJS clustering with the help of the throng package
  3. NodeJS clustering with the help of the PM2 process manager

All three implementations are equally good ways to do clustering. The inbuild clustering module is a straightforward implementation using the NodeJS provided modules.

While throng makes clustering easy by providing many configurations and wrapped functionalities, PM2 provides a mechanism with zero code-level clustering support.

NodeJS clustering module and throng provide more flexibility, which may be helpful when you need manual process cleanup or connection monitoring in certain scenarios like socket-based communication, etc.

Web server for running jobs in different CPU cores using clustering

Let’s use the following simple express server implementation to generate the required workload on demand. So we can measure the utilization of CPU cores and overall server resources.

We can use this implementation directly on benchmarking non-clustered workload later, as well as the PM2 approach, as it needs zero code changes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
const express = require("express");
const app = express();
const port = 3000;

// generate workload with given iterations on demand
app.get("/api/counter/:n", function (req, res) {
  let n = parseInt(req.params.n);
  let count = 0;
 
  // More than 5000000000 can generate unwanted workload
  if (n > 5000000000) n = 5000000000;
 
  for (let i = 0; i <= n; i++) {
    count += i;
  }
 
  res.send(`Final count is ${count}`);
});
 
app.listen(port, () => {
  console.log(`App listening on port ${port}`);
});

NodeJS Clustering with NodeJS Cluster module

To set up a cluster in NodeJS, let us use the inbuild cluster module of the NodeJS runtime.

We use an inbuild cluster module to fork processes and run our express server as a series of worker nodes. The master node is responsible for forking, running, and respawning worker nodes as needed.

These child processes (workers) run their instance of the NodeJS event loop. Workers can share incoming network connections and distribute the load across CPU cores. It is a straightforward way to parallelize NodeJS applications.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
const express = require('express');
const port = 3010;
const cluster = require("cluster");
const totalCPUs = require("os").cpus().length;
 
if (cluster.isMaster) {
  console.log(`Number of CPUs is ${totalCPUs}`);
  console.log(`Master ${process.pid} is running`);
 
  // Fork workers to match with cpu cores.
  for (let i = 0; i < totalCPUs; i++) {
    cluster.fork();
  }
 
  // respawning workers on possible crashes
  cluster.on("exit", (worker, code, signal) => {
    console.log(`worker ${worker.process.pid} died`);
    console.log("Let's fork another worker!");
    cluster.fork();
  });
} else {
  // This is a worker process, run the express server
  const app = express();
  console.log(`Worker ${process.pid} started`);
 
  app.get("/api/counter/:n", function (req, res) {
    let n = parseInt(req.params.n);
    let count = 0;
 
    if (n > 5000000000) n = 5000000000;
 
    for (let i = 0; i <= n; i++) {
      count += i;
    }
 
    res.send(`Final count is ${count}`);
  });
 
  app.listen(port, () => {
    console.log(`App listening on port ${port}`);
  });
}

NodeJS clustering using the throng npm package

Throng is a NodeJS package that simplifies creating a cluster of child processes for a NodeJS application. It provides a simple API for setting up a cluster and automatically respawning child processes as needed.

Scaling a NodeJS application across several CPU cores is simple with Throng. By automatically restarting child processes if they crash or terminate for whatever reason, Thong can assist in ensuring that the application remains active.

Let’s install the throng npm package to use it in the NodeJS application.

1
npm install throng

Let’s use it in the NodeJS application to create and manage clusters. You can set a couple of configurations to control the cluster’s behavior. Their documentation provides detailed information on the same.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
const express = require("express");
const throng = require('throng');
const totalCPUs = require("os").cpus().length;
const app = express();
const port = 3020;

throng({
    master: () => {},
    worker: startserver,
    count: totalCPUs,
    lifetime: Infinity,
    grace: 5000,
    signals: ['SIGTERM', 'SIGINT']
  })

function startserver() {
    app.get("/api/counter/:n", function (req, res) {
        let n = parseInt(req.params.n);
        let count = 0;

        if (n > 5000000000) n = 5000000000;

        for (let i = 0; i <= n; i++) {
            count += i;
        }

        res.send(`Final count is ${count}`);
    });

    app.listen(port, () => {
        console.log(`App listening on port ${port}`);
    });
}

Throng is particularly useful for web servers and other CPU-bound NodeJS applications. It ensures that your application can handle more concurrent requests by efficiently distributing the workload across multiple processes or threads using multi-core processors.

Clustering NodeJS app with the help of PM2 process manager

PM2 is a process manager for NodeJS workloads. It is designed to manage the lifecycle of NodeJS applications, including starting, stopping, and restarting, in addition to monitoring performance and resource usage.

One of the main benefits of using PM2 is that it provides a simple and convenient way to manage NodeJS applications in production setups. It can help to ensure that your applications are always running and can automatically restart them if they crash or exit for any reason. PM2 also provides many features for monitoring and logging the performance of your applications, including real-time application metrics and error tracking.

Using PM2, you can start your NodeJS application in cluster mode without changing the existing code. Process manager will create necessary processes and manage their uptime accordingly.

First, install the PM2 process manager npm package. Since this is not a dependency package, installing it as a global package in the runtime environment is advisable.

1
npm install -g pm2

Let’s use our original non-clustered server implementation mentioned earlier at the beginning of the section to start the express server process as a cluster using the PM2 process manager.

1
pm2 start server.js -i max

Here, with the -i option, we are asking pm2 to create cluster instances and to utilize the maximum instance count with the max parameter, equivalent to CPU cores available in the runtime environment.

Benchmarking of NodeJS clustering implementations

I am using the ` bombardier ` load generator tool to generate a significant load on these REST API implementations and measure the statistics the tool provides, such as Requests per second and latency.

For the execution environment, I am using my Windows laptop running with a Ryzen 7 5800H processor, which has eight cores and a total of 16 threads.

We are using 50000000 recursion calculations, which generate a considerable workload. We are emulating 1000 user requests with a maximum of 100 concurrent connections at a given time.

Running baseline NodeJS server without clustering support as a baseline

For the baseline, let’s start our NodeJS REST API without any clustering options and measure the performance and resource utilisations.

running-baseline

Here, we can see that the process is overloaded with incoming traffic. Only 72 requests were processed correctly, while 226 timed out. 702 requests were refused since the process is overloaded.

Following are the measured traffic.

StatisticsMeasurement
Requests per second15.39
Latency6.20s
Success72
Timeouts226
Refused702
Throughput2.53 KB/s

Following is the CPU resource utilization of the same

baseline cpu

We can see other cores of the CPU are idle while requests are timing out due to the load.

Running NodeJS server with NodeJS cluster module support

Let’s run the application with the NodeJS cluster module this time, which leverages the additional CPU cores. Let’s benchmark the application using the same load configuration using the ` bombardier ` tool.

nodejs-cluster mode run

Here, we can see significant improvement over the baseline statistics.

StatisticsMeasurement
Requests per second280.37
Latency420.89ms
Success1000
Timeouts0
Refused0
Throughput74.55 KB/s

Following is the CPU Utilisation of the same

nodejs-cluster cpu

We can see that the load is evenly distributed to other CPU Cores here. All the requests were processed in this run, and there was a significant improvement in all the statistics factors.

Running NodeJS server with support of throng npm package for clustering

This time, we are using the throng npm package to enable clustering support. We are using the exact throng-based implementation mentioned above. Let’s use the same configuration for the load generation as well.

nodejs-throng run

Here, we can see the same level of improvement we had with cluster module implementation.

StatisticsMeasurement
Requests per second241.82
Latency426.83ms
Success1000
Timeouts0
Refused0
Throughput73.67 KB/s

Following is the CPU Utilisation of the same

nodejs-throng-cpu

Here also, we can see the same kind of load distribution among different CPU cores.

Running NodeJS server with PM2 process manager

Here, let’s run the same NodeJS server code we ran for the baseline benchmark, with no clustering support implemented. But here, we use the PM2 process manager to add clustering support at runtime.

pm2-run

Although there is no code change, we can see the same level of improvement with this execution as well. This is the same code we used in the baseline run as well. But we can see a significant improvement here. Compared with other benchmarks, PM2 has slightly better performance as well.

StatisticsMeasurement
Requests per second268.01
Latency395.55ms
Success1000
Timeouts0
Refused0
Throughput78.56 KB/s

Following is the CPU Utilisation for this load test

pm2-cpu

As expected, the load is distributed among available CPU cores in this case as well.

Summary

BenchmarkRequests Per SecondLatencySuccessRefusedThroughput
Without clustering15.396.20s727022.53 KB/s
NodeJS cluster module280.37420.89ms1000074.55 KB/s
Custering with Throng241.82426.83ms1000073.67 KB/s
Clustering with PM2268.01395.55ms1000078.56 KB/s

We can clearly see a massive advantage of using clustering support when we have multi-core processors in the runtime environment setup. While the service could process only 72 requests in the initial setup, clustering setups could process all 1000 requests with improved response times. You do not need to refactor your code; you can use the PM2 process manager to enable this feature, yielding the same level of optimized execution. Please do not waste your CPU cores if you are paying for them 🫢

Cheers!

This post is licensed under CC BY 4.0 by the author.