Resources

Inside the Infostealer: How CybaVerse Uncovered a PyInstaller Packed Threat

Written by Admin | Mar 31, 2026 1:22:44 PM

CybaVerse was recently engaged in an incident response (IR) investigation in which a threat actor was able to gain a foothold into the victim’s network and move laterally to multiple endpoints. On one of these endpoints, CybaVerse identified a suspicious binary that was later identified to be an infostealer left by the threat actor, with the filename “edge-chrome.exe”. This infostealer does not match any known malware families identified through OSINT or existing sample repositories, pointing to a newly observed or previously unreported threat. As our analysis shows, it demonstrates unusual functionality, including the use of SMB to remotely extract browsing history, suggesting that it is bespoke rather than a commodity infostealer.  

Figure 1 – The infostealer that was observed by CybaVerse.

What immediately stands out about this binary is the file size at 14.696 MB. Such a large file size is unusual for C/C++ or .NET compiled binaries, suggesting that this binary may have been compiled from a different language. Additionally, the file icon is the default icon assigned to an executable when it is compiled using PyInstaller.

Extracting human-readable text from the binary, multiple strings were identified that are indicative of a PyInstaller compiled binary.

Figure 2 – Python strings in the infostealer.

 

What is Pyinstaller?

PyInstaller is a tool used in the Python ecosystem to convert Python applications into standalone executable files[1]. It analyses a Python script, collects all the required dependencies (including libraries, modules and resources and bundles them together with a Python interpreter into a single executable file. This enables Python programs to run on machines that do not have Python installed. While Python is an interpreted language, Pyinstaller allows programmers to compile Python code, just like a compiled language such as C or C++.

 

Reverse Engineering the Infostealer

After confirming that the binary had been compiled using PyInstaller, a tool called pyinstxtractor can be used to extract the Python scripts and its dependencies from the executable [2].

Figure 3 – Running pyinstxtractor against the infostealer.

Reviewing the output, a large number of files can be seen. Most of these files are dependencies for the Python script that the threat actor compiled. The file of interest in this output is the Python bytecode file ‘edge.pyc’.

Figure 4 – Extracted Python scripts and dependencies from the infostealer.

When a Python script is executed by the Python interpreter, it is first compiled to an intermediary language called Python bytecode before being executed by the Python Virtual Machine (PVM) [3]. Python bytecode files (identifiable from their extension .pyc), are not human readable and therefore must be decompiled to obtain the original Python code. This can be done using tools such as Uncompyle6 for Python version 1.0 to 3.8 and PyLingual for Python versions 3.6 and above [3] [4].

After decompiling the Python Bytecode, the raw Python code can be viewed and analysed.

 

Analysis of the Decompiled Python Bytecode

The most immediately apparent finding from the decompiled Python code is the use of docstrings written in Russian.

Figure 5 – Decompiled Python bytecode from the infostealer.

Figure 6 – Translation of the Russian comments seen in the decompiled Python bytecode.

This would suggest that a Russian threat actor compiled this Python script.

 

Imported Libraries

Looking at the libraries imported, there are two that stand out: smbclient which enables interaction with the Server Message Block (SMB) protocol and sqlite3 which enables interaction with browser data.

Figure 7 – Libraries imported by the Python script.

It is also notable that importing the getuser function from getpass (which returns the username of the currently logged in user) is redundant given that the OS library has already been imported. This, combined with the generic and non-descriptive docstrings could suggest that this code was generated using AI. However, passing the script through AI code detectors gives mixed results. Nonetheless, while it can’t be definitively proven that this script was AI generated, there are indicators that this could be the case.

The script uses the Python logging library to collect error and informational output, saving it to the file “browser_collector.log” in the directory the Python is executed from.

Figure 8 – Functionality within the script to log informational level logs.

Which the threat actor left behind on the endpoint, giving an indication of the time the executable was run and the output of the program.

Figure 9 – Log file left on an endpoint the infostealer had been run on.

 

Main Function

Moving on to the main function where code flow begins, the script uses sys.argv to accept command line arguments used as configuration options.

These are:

  • ip_file

    • A file containing the list of IPs to target

  • browser_type

    • The browser type to target, with the only options being Chrome or Edge.

  • OUTPUT_DIR

    • The directory to store the output of the script.

  • THREADS

    • The number of threads for the script to use.

Figure 10 – Arguments accepted by the Python script.

The script then sets up the smbclient configuration to use the current user’s Windows credentials for authentication. The Server Message Block (SMB) protocol is a network file-sharing protocol that allows computers on a network to share files, printers and serial ports. This enables the Python script to connect to and access shared network resources on the local network.

Figure 11 – Use of the smbclient library within the Python script.

The script then uses the Python threading library to create the user specified number of threads. Each thread will execute the worker function with two parameters: the IP to target (which was retrieved from the user specified IP file) and the browser type (Edge or Chrome). The use of threading is done for efficiency as it allows for concurrent execution of the worker function rather than sequential execution. This speeds up the execution of the program.

 

Worker Function

The worker function takes two arguments: task_queue and browser_type. Each thread created in the main function will run this function concurrently, passing the IP it was assigned and the browser type (Edge or Chrome) as parameters to satisfy the above arguments.

Two variables are then retrieved from the get_browser_config function:

  • browser_path
    • The path of the directory that stores all browser data for the respective browser (Edge or Chrome).
  • target_files
    • The target files to be retrieved from the browser’s data directory.

Figure 12 – The worker function in the Python script.

Within the get_browser_config function, the target files are stored in a dictionary as key/value pairs.

Figure 13 – Browser data files targeted for collection by the Python script.

The script targets the same files whether the target browser is Edge or Chrome.

The target files are:

  1. History

  2. Bookmarks

  3. Cookies

  4. WebData (Contains autofill data)

  5. Extensions

Back in the worker function, the SMB client connects to the administrative share (a hidden share for the C: drive) of the given IP address and checks if the Users directory exists on that machine. If the connection fails or the users directory does not exist, an error message for that IP will be logged and the execution of the thread will terminate.

Figure 14 – Functionality to connect to the administrative share using the Python smbclient library.

The script then filters out default directories within C:\Users (e.g. Public, Default) and looks for the path to the target browser.

Figure 15 – Functionality to filter out specific user directories that are not relevant and looking for the browser directory for each user on the target endpoint.

The script then enumerates the profiles within the user’s browser directory. A browser profile is a distinct, isolated container within a browser that separates user data. Some users may use multiple browser profiles and for this reason the script will retrieve the default profile and any other profiles that start with ‘Profile’.

Figure 16 – Functionality to find all browser profile for a user.

For each identified profile, the script will create a new directory under temp_files in the format {ip}_{username}_{profile_name}, before using the SMB client to retrieve all the target files (history, cookies, autofill data etc.) in the profile.

Figure 17 – Functionality to retrieve all the target files and copy them to the local system using the Python smbclient library.

Finally, the process_user_data function is called, which uses the sqlite3 library to extract the relevant data from the retrieved browser files. For example, the following function that is called by the process_user_data function extracts the name, value, date created and last used date from the browser’s autofill data.

 Figure 18 – Functionality to use the sqlite3 Python library to extract data from browser files. 

This extracted data is then written to the user specified output directory, along with a summary count of the amount of information retrieved per profile (e.g. 23 cookies, 14 autofill items etc.) All the files and sub-directories within temp_files are then deleted and the program terminates.

Figure 19 – Functionality to output the extracted browser data.

 

Summary

Through the use of open-source tools, CybaVerse was able to reverse engineer and analyse the Pyinstaller binary left by the threat actor. It was determined that the suspicious executable is a Chrome / Edge infostealer that uses SMB to facilitate the compromise of sensitive browser data from other hosts on the network. This tool significantly aids the threat actors in exfiltration activities, with the sensitive information retrieved potentially aiding in further malicious activities such as access to other accounts, resources, or enabling identity theft. Furthermore, the sensitive information retrieved by this tool could potentially be held for ransom in a double-extortion ransomware attack.

Python is often used to develop tools such as this due to its myriads of library options and its rapid development speed. While Python’s applicability for this application is limited due to being an interpreted language, tools such as Pyinstaller or Nuitka offer threat actors a method of turning Python scripts into portable executables. This makes Python an appropriate choice for writing malicious tools or malware.

In recent years, with the advancements of Large Language Models (LLMs) threat actors will often use AI to develop malware and tools. For the same reasons Python is chosen by developers, Python is often the language of choice for threat actors when using AI to generate these malicious tools. While it cannot be definitively determined this infostealer was generated by AI, there are multiple artefacts within the script that would suggest it is AI generated – such as the generic, non-descriptive and standardised format of the docstrings. There are also redundant imports and redundant code throughout the script. This combined with the overall structure of the code indicates AI generation, rather than a structured, well-thought-out and logical approach that would be expected of a human developer.

If you’ve suffered a compromise and need incident response contact CybaVerse today.