Binary Atlas — Static Analysis for Windows PE Files

Most malware analysis tutorials teach you what to look for. Very few force you to build something that actually breaks when you’re wrong.

Binary Atlas started as exactly that kind of project. I wanted to understand how static analysis really works—not from textbooks, but by building a tool that had to make real decisions about real binaries. And it broke in interesting ways.


Here’s what I learned: simple projects don’t stay simple.

I started with just a couple detector modules. Parse the PE, extract strings, check entropy. Maybe 200 lines of code. Easy. Then I added a third module, then fourth, then realized I needed to rethink the whole architecture.

By the time I hit eight modules, the code was a mess. Things overlapped. Duplicated logic everywhere. So I rebuilt it — made each detector completely independent, like little plugins. That worked better, but now I had fourteen separate modules, each with its own logic, thresholds, and output format.

Then came the reports. At first, it just printed to console. Tables of data, walls of text. Unreadable gibberish. I realized other malware analysis tools suffer from the same problem — they dump everything to the console or a boring text file.

So I thought: what if the report was actually readable? What if it was HTML? Styled, organized, with collapsible sections and color-coded severity?

That decision changed everything. HTML output was harder than I expected. Templating issues, formatting nightmares, and every detector output had a different structure. HTML handling got completely out of hand.

Binary Atlas is the result of all this. It’s not perfect—and it never will be. It’s not meant to replace real security tools.

But it forced me to confront how malware detection actually works. And it exposed the gap between what looks clever in theory and what actually survives contact with real code.

The Architecture: 14 Detectors Working Together

I started with a simple idea: make each detector independent. The system evolved into 14 independent detectors, each targeting a specific class of signals:

  • Packer Detector — Looks for high entropy in sections, known packer signatures
  • Anti-Analysis Detector — Finds anti-debug APIs, anti-VM instructions, timing checks
  • Shellcode Detector — Searches for ROP gadgets, NOP sleds, call/pop sequences
  • DLL Hijacking Detector — Detects relative path loading and suspicious side-loading
  • Import Anomaly — Looks for suspicious API combinations and dynamic loading
  • Persistence Detector — Finds registry paths, startup folders, scheduled tasks
  • Overlay Analysis — Detects appended encrypted data beyond the declared PE size
  • YARA Scanner — Runs 35+ custom YARA rules for pattern matching
  • Mutex Detector — Extracts and matches mutex patterns against known malware families
  • COM Hijacking — Detects CLSID manipulation
  • String Entropy — Finds high-entropy strings suggesting encryption
  • Resource Analysis — Examines PE resources and certificates
  • Compiler Detection — Identifies the compiler and version from metadata
  • Security Checks — Analyzes DEP, ASLR, SEH, and code signing

The beauty is that each detector works independently. One detector being wrong doesn’t break the others.

The False Positive Problem

This was the biggest lesson building this tool.

The honest truth: Static analysis has inherent false positives. You can’t execute the code to understand its behavior, so you rely on heuristics. Heuristics are educated guesses. Educated guesses can be wrong.


Configuration is More Important than Code

Rather than hardcoding detection thresholds, I built 22 config files. Entropy threshold? Configurable. Packer signatures? Configurable. API patterns? Configurable. This makes the tool adaptable without recompiling.

Reporting is Half the Battle

The best analysis is useless if no one understands the output. Readable enough to be useful.


The Current State

Honest status: This is a work-in-progress learning project. It works. The analysis completes, the reports generate, the detectors run. But it’s not production-ready.

Known issues:

  • False positives on legitimate system binaries
  • Limited YARA rule sophistication
  • No behavioral verification (can’t tell if detected APIs are actually used)
  • Edge cases cause crashes
  • No caching (re-analyzing is slow)
  • HTML reports lack polish

What it’s good for:

  • Learning how PE parsing works
  • Understanding static malware detection
  • Experimenting with detection techniques
  • Teaching binary analysis concepts

What it’s NOT:

  • A production security tool
  • A replacement for established malware scanners
  • Suitable for automated decisions
  • Reliable for incident response

The Takeaway

Static analysis doesn’t give you answers—it gives you suspicions.

Building Binary Atlas taught me how unreliable those suspicions can be. How many legitimate applications do things that look suspicious. How hard it is to draw lines confidently.

That’s why the professionals exist. And that’s why building this tool mattered.


You can find the full source on GitHub: bilal0x0002-sketch/Binary-Atlas