λ1 | SCTF-Write-ups - Q2

λ1 - (Reversing)

Writeup by Jonathan S

Created: 2015-12-12

Problem

I've got this Haskell program here, but for some reason, it just won't complete. Can you figure out the correct output for me?

Hint

I think it has something to do with BPF_IF_PW^^VSNXAPCZL^XKM...?

Answer

Overview

Reverse engineer the given program to find what it is computing in a very unoptimized way and calculate that value more efficiently.

Details

Intro

The binary, which we know came Haskell source code, was compiled with the industry standard Haskell compiler, the Glasgow Haskell Compiler, as evidenced by the numerous occurances of the string GHC in symbol names and the fact that basically no one uses anything but GHC these days.

Now, before we begin, there are some important facts to realize about Haskell:

Haskell is a functional programming language. This means that functions are treated as first-class objects that can be passed around and manipulated just like any piece of data. Moreover, instead of programming by writing down a list of commands that the computer will execute, functions are defined in terms of what functions are applied to pieces of data (in fact, only having the ability to define and apply functions is enough to compute anything computable, as shown by lambda calculus, a mathematical system that underpins much of functional programming).
Haskell is more specifically a pure functional programming language. This means that nearly every piece of code will not use mutable variables, will not have any side effects (like input or output), will produce the same result every time it is run, and won't care when in the program's execution it is run. This will be incredibly useful for analyzing our program, as we can just look at individual expressions and how they are computed, not caring when and where those expressions are called.
Haskell has non-strict semantics, which are implemented in GHC using lazy evaluation. This means that expressions are only evaluated once the program has an immediate need for them, delaying any calculation to the last moment possible. This, more than any of the other properties, messes up execution order enourmously and almost single-handedly renders conventional debuggers basically useless.
Haskell is incredibly generic-heavy, and for maximum flexibility (including support for polymorphic recursion), GHC stores and passes around at runtime certain information about a generic type that might be filled in at compile time or ignored in other languages.
Haskell programmers often like to use point-free style in their programs, manipulating and compsing functions instead of directly talking about data.

All these properties lead to Haskell having a dramatically different execution model from something like C and from how the CPU actually executes. As a result, GHC does many things completely differently from C, including a distinct calling convention, a manually managed stack, and a specialized memory system and garbage collector. Code produced by GHC is very distinctive, and will not look anything like something produced by GCC, and debuggers will not aid understanding at all.

Aside: The compilation process

GHC actually sends a Haskell program through 3 intermediate representations before generating any assembly. Haskell source is transformed into Core, which is essentially a desugared, explicitly typed, and name-resolved version of Haskell. Then, Core is lowered into STG, or Spineless Tagless G-machine code (a name that is somewhat incorrect now that GHC has incorporated pointer tagging). STG is still functional, but with all temporaries and thunks (expressions to be lazily evaluated) explicitly spelled out. Finally, STG is converted into C--, a low level language designed as a generic language for compilers to target, that, despite its goals, is only targeted by GHC. C--, as implied by its name, is quite like C and readily translates to assembly.

When optimizations are disabled, all of these transformations (except possibly from STG to C--) just involve logical desugarings and don't introduce any random new complexity. As such, we will ignore any intermediate representations for the sake of this write-up, acting as if GHC went directly from Haskell to assembly.

For more details, see the GHC Wiki here.

Entering the program

With all of that out of the way, let's start on analyzing the acutal binary. Load it up in your favorite disassembler (note that IDA, at least, balks at the foreign character in a file name), and you should find that the programs starts much as a C program might: after a short few steps in _start and __libc_start_main, the main function is called. We inspect it:

    public main
main:          ; DATA XREF: _start+17 o
    push    ebp
    mov     ebp, esp
    and     esp, -16
    sub     esp, 48
    mov     eax, ds:defaultRtsConfig
    mov     edx, ds:dword_80BEE64
    mov     [esp+40], eax
    mov     [esp+44], edx
    mov     dword ptr [esp+40], 1
    mov     eax, [esp+40]
    mov     edx, [esp+44]
    mov     [esp+12], eax
    mov     [esp+16], edx
    mov     dword ptr [esp+8], offset ZCMain_main_closure
    mov     eax, [ebp+12]
    mov     [esp+4], eax
    mov     eax, [ebp+8]
    mov     [esp], eax
    call    hs_main

We might be initially tempted to continue down the rabbit hole looking at hs_main. However, this would be a mistake, as the only code we actually care about is behind ZCMain_main_closure. Remember that Haskell is implemented lazily, so to execute anything, the unevaluated expression needs to be stored, then demanded later by something that needs the expression. In this case, ZCMain_main_closure is the unevaluated expression corresponding to the whole program, and hs_main is just responsibly for demanding that expression. In general, anything labeled _closure is a reference to an unevaluated, top-level expression.

Therefore, we dive into ZCMain_main_closure and find that it is in fact only a wrapper around the actual implementation, ZCMain_main_info:

ZCMain_main_closure dd offset ZCMain_main_info ; DATA XREF: .text:0804CA7B o

All objects in Haskell are represented as pointers to pointers to code that evaluates the expression. This extra layer of indirection allows for a crucial optimization where, after an expression is evaluated, it is replaced by a simplified expression that just returns the evaluated answer directly. Since we don't have to care about when things are evaluated, we will just skip over any details pretaining to this, and just go staight into ZCMain_main_info.

Our first Haskell expression

Disassembling ZCMain_main_info gives the following, which we will take chunk by chunk:

    lea     eax, [ebp-12]
    cmp     eax, [ebx+84]
    jb      short loc_804CA42
    add     edi, 8
    cmp     edi, [ebx+92]
    ja      short loc_804CA3B
    mov     dword ptr [edi-4], offset stg_CAF_BLACKHOLE_info
    mov     eax, [ebx+100]
    mov     [edi], eax
    lea     eax, [edi-4]
    push    eax
    push    esi
    push    ebx
    call    newCAF
    add     esp, 12
    test    eax, eax
    jz      short loc_804CA45
    mov     dword ptr [ebp-8], offset stg_bh_upd_frame_info
    lea     eax, [edi-4]
    mov     [ebp-4], eax
    mov     esi, offset base_GHCziTopHandler_runMainIO_closure
    mov     dword ptr [ebp-12], offset Main_main_closure
    add     ebp, -12
    jmp     stg_ap_p_fast

We start at the top.

    lea     eax, [ebp-12]
    cmp     eax, [ebx+84]
    jb      short loc_804CA42

This is a stack overflow check. As we mentioned before, GHC manages its own stack, and the tip of this stack is stored in the register ebp. This short section of assembly, which comes at the beginning of nearly every section in the whole program, just checks that the stack pointer does not run past the end of the stack, the value stored in [ebx+84].

    add     edi, 8
    cmp     edi, [ebx+92]
    ja      short loc_804CA3B

Looking very similar to the stack overflow check, this is the allocation code and heap overflow check. It doesn't look very similar to what you might see in many other languages, with a call to a fairly complex malloc function, instead only adding 8 to edi. This is because GHC's garbage collector is a copying, and therefore compacting collector, which makes finding free memory trivial. Instead of having some complicated division tracked through allocator data structures of free and allocated memory, everything less than edi is allocated, while everything greater than edi is free. Therefore, the add edi, 8 is in fact allocating 8 bytes. The other two instructions, similiarly to the stack overflow check, are just checking for heap overflow and possibly initiating a collection.

    mov     dword ptr [edi-4], offset stg_CAF_BLACKHOLE_info
    mov     eax, [ebx+100]
    mov     [edi], eax

This code is putting some values into the heap spots that were just allocated. We don't really have to worry about either value, but the stg_CAF_BLACKFOLE_info is interesting. When evaluating an expression, GHC temporarily replaces that expression with what is known as a black hole. This has two helpful side effects. First, if the expression is actually an infinite loop, referring to itself (as in loop = loop), then this black hole will get evaluated and, instead of hanging in a hard to debug way, the program will print out <<loop>> and crash. Second, if two threads attempt to evalutate the same expression at the same time, one thread will see the black hole and not do the same work that the other thread is doing. The storing of stg_CAF_BLACKHOLE_info is most of this mechanism.

    lea     eax, [edi-4]
    push    eax
    push    esi
    push    ebx
    call    newCAF
    add     esp, 12
    test    eax, eax
    jz      short loc_804CA45

Here, a new Constant Applicative Form is being allocated. A CAF, for most purposes, is just a top level expression, and specifically one that isn't a function. These are special because they want to be evaluated once throughout the whole lifetime of a program. The details that make CAFs different from other expressions aren't really important to us. The one thing of note is that the address of the just allocated black hole is passed to newCAF so that it can finish initialization.

    mov     dword ptr [ebp-8], offset stg_bh_upd_frame_info
    lea     eax, [edi-4]
    mov     [ebp-4], eax

Remember that GHC's stack is tracked using ebp, so this code is putting some stuff onto the stack. In particular, since GHC manages its own stack, it also needs to put addresses that function calls will return to on the stack manually. This is what this piece of code is doing. When the function call coming up returns, it will jump to stg_bh_upd_frame_info. This function is called an update (stack) frame, hence its name.

As mentioned before, while an expression is being evaluated, it gets replaced by a blackhole. However, for laziness to work properly, it needs to remove the blackhole and put the final, evaluated expression in its place. This is exactly what stg_bh_upd_frame_info does. Note that immediately above this update function is the address where stg_CAF_BLACKHOLE_info was stored, giving the update frame the information it needs to replace that blackhole. Once the evaluated expression is stored, stg_bh_upd_frame_info will return to the address immediately preceding it, at [ebp] - in general, the place an expression needs to return to will be at [ebp].

    mov     esi, offset base_GHCziTopHandler_runMainIO_closure
    mov     dword ptr [ebp-12], offset Main_main_closure

Here we set up the function we want to call and its argument. base_GHCziTopHandler_runMainIO_closure is stored in esi, the location for the function to be called, while its argument, Main_main_closure, like most languages is put at the bottom of the stack.

At this point is starts being useful to decode these cryptic names of expressions. Every closure comes in 4 parts, separated by underscores. The first part, base in this case, refers to the package that the expression came from. In this case, base is the core standard library for Haskell. The second part, GHCziTopHandler, is the module that the expression came from. The zi in there is actually just code for a period, used because periods in symbol names cause issues (compiled C++ files will also have this slightly cryptic mess). Therefore, the module in this case is GHC.TopHandler. For the full list of rules for decoding symbols, see here. The third part of a name is the actual name of the expression being referenced, in this case runMainIO. The final part is just closure or info, depending on whether the symbol is referring to the actual code implementing the expression (info) or just a memory location pointing to the info.

Sometimes, as in Main_main_closure, not every part of the name is present. Since Main_main_closure is user code, it doesn't have a package associated to it, and is just main in the module Main. This allows us to very easily identify library code.

    add     ebp, -12

Here we actually update the stack, moving its tip past the 12 bytes we used.

    jmp     stg_ap_p_fast

Finally, we jump into the actually function that will perform function application. This function is part of GHC's generic apply mechanism, used when GHC isn't certain at compile time that a function is fully evaluated and called on exactly as many arguments as it needs. Because this file was compiled without optimization, almost every single function call will use this mechanism.

stg_ap_p_fast takes the function to apply in esi (hence why runMainIO was stored there earlier) and the argument it needs on the stack. As denoted by the single p in its name, we are passing a single pointer argument (main) to runMainIO. In the future, we will often see, for example, stg_ap_pp_fast, which takes 2 pointer arguments.

In total, this whole section boils down the following statement:

:Main.main = runMainIO Main.main

Entering user code: `Main.main`

We next enter Main.main (or Main_main_closure), the first expression we see that the programmer actually wrote. runMainIO is basically useless to us because it is just a library function, and one that basically just starts up the program.

    lea     eax, [ebp-16]
    cmp     eax, [ebx+84]
    jb      short loc_804C9D9
    add     edi, 8
    cmp     edi, [ebx+92]
    ja      short loc_804C9D2
    mov     dword ptr [edi-4], offset stg_CAF_BLACKHOLE_info
    mov     eax, [ebx+100]
    mov     [edi], eax
    lea     eax, [edi-4]
    push    eax
    push    esi
    push    ebx
    call    newCAF
    add     esp, 12
    test    eax, eax
    jz      short loc_804C9DC
    mov     dword ptr [ebp-8], offset stg_bh_upd_frame_info
    lea     eax, [edi-4]
    mov     [ebp-4], eax
    mov     esi, offset base_GHCziBase_zd_closure
    mov     dword ptr [ebp-12], offset sBr_closure
    mov     dword ptr [ebp-16], offset sBs_closure
    add     ebp, -16
    jmp     stg_ap_pp_fast

We see the same stack and heap checks, blackhole creation, CAF initialization, and update frame creation as we did in :Main.main. The change is the meat of what we are looking for - what function is applied to what. In this case, we have base's GHC.Base.($) applied to sBs and sBr. Note that since $ (a convienience function that applies a function to an argument, useful for avoiding parentheses) takes two arguments, we are using stg_ap_pp_fast, not stg_ap_p_fast. In total, we add the line

Main.main = sBs $ sBr

`sBs`

Jumping into sBs_info, we see much of the same, but some interesting things happen when filling the stack and heap:

    mov     dword ptr [edi-12], offset sB8_info
    mov     dword ptr [edi-4], offset sBm_info
    lea     eax, [edi-12]
    mov     [edi], eax
    mov     esi, offset base_GHCziBase_zi_closure
    lea     eax, [edi-3]
    mov     [ebp-12], eax
    mov     dword ptr [ebp-16], offset base_SystemziIO_putStrLn_closure
    add     ebp, -16
    jmp     stg_ap_pp_fast

Following the patterns we saw before, GHC.Base.(.) is being called on two arguments, the first being System.IO.putStrLn, but the second is a little stange. For the second argument, there is a pointer into the heap, specifically [edi-3].

Now, we didn't store anything at [edi-3] - every store done was at an aligned address, at multiple of four. However, as an optimization, GHC uses the lower 2 bits (3 on a 64 bit machine) to tag pointers. Having zeroed out low bits means that the pointed-to expression might be unevalutated, but any other tag bits mean that the expression is evaluated, the specific tag value telling us about what the evaluated expression. In this case, since (.) takes two functions as arguments, we know that the tag bits are tellings us the arity (number of arguments) of the function passed, so [edi-4], or sBm, must be a function of 1 argument.

The remaining interesting part of sBs is the 3 other heap words allocated - we store sB8_info and a pointer to that slot, and [edi-8] is left empty, essentially giving sBs an extra word to store information about its evaluation status.

In total, we end up with something like

:Main.main = runMainIO Main.main
Main.main = sBs $ sBr
sBs = let sB8 = ???
          sBm = ???
      in putStrLn . sBm

`sB8`

sB8_info looks a little bit different from the previous expressions we have seen.

    lea     eax, [ebp-12]
    cmp     eax, [ebx+84]
    jb      short loc_804C6DE
    mov     dword ptr [ebp-8], offset stg_upd_frame_info
    mov     [ebp-4], esi
    mov     esi, offset ghczmprim_GHCziCString_unpackCStringzh_closure
    mov     dword ptr [ebp-12], offset cDX_str ; "BPF_IF_PW^^VSNXAPCZL^XKM"
    add     ebp, -12
    jmp     stg_ap_n_fast

Since it does no allocation, it only has a stack overflow check, and, since we have left the realm of top level expressions, we don't do any of the CAF business that previous things had to deal with. We still set up the update frame, but the function we call and its argument are slightly strange. We are now using stg_ap_n_fast, which signifies a function (here, unpackCString#) that takes a single non-pointer argument. Calling this argument non-pointer is a little stange, as it is in fact a pointer, but looking at the signature of unpackCString#, we see that this function takes an Addr#, a special unboxed value that isn't like a normal Haskell object.

This unpackCString# business is automatically inserted by GHC every time there is a string literal in the Haskell source file, as an optimization. Instead of storing it in Haskell format (a horribly inefficient linked list of pointers to characters), GHC just emits the more compact C-style string, then inserts a call to unpackCString#. In particular, the C-style encoded string that is being unpacked into a Haskell-style string, stored at cDX_str, is "BPF_IF_PW^^VSNXAPCZL^XKM", the string mentioned in the hint to the problem. We're on the right track! The result after inspecting sB8 is the following:

:Main.main = runMainIO Main.main
Main.main = sBs $ sBr
sBs = let sB8 = "BPF_IF_PW^^VSNXAPCZL^XKM"
          sBm = ???
      in putStrLn . sBm

`sBm`

sBm_info has the normal stack and heap check, but, as it is a function, does some interesting things after them:

    mov     dword ptr [edi-20], offset sDJ_info
    mov     eax, [ebp+0]
    mov     [edi-12], eax
    mov     dword ptr [edi-8], offset sDK_info
    mov     eax, [esi+3]
    mov     [edi], eax
    mov     esi, offset base_GHCziBase_zd_closure
    lea     eax, [edi-20]
    mov     [ebp+0], eax
    lea     eax, [edi-8]
    mov     [ebp-4], eax
    add     ebp, -4
    jmp     stg_ap_pp_fast

sDJ and sDK are stored on the heap, with empty words adjacent to them, as we saw before in sBs. The application code also works as we saw before, applying GHC.Base.($) to sDK and sDJ. However, we are now using [ebp+0] as a valid stack slot to overwrite. This is because sBm was passed its argument in [ebp+0] and needs to return to [ebp+4]. Previously, we were only dealing with simple expressions, not functions, so [ebp+0] held to return address and could not be overridden, but now we have the extra space left for our argument. Now, just overwriting our argument would be bad, so before doing so we copy it to the heap, putting it at [edi-12].

After this, the one unexplained allocation is putting [esi+3] at [edi]. Crucial to this is that when a function is called, esi holds a pointer to the function itself (this is related to why stg_ap_??_fast takes the function to apply in esi). This is useful, because, going back to sBs, sBm was allocated on the heap just 4 bytes away from the pointer to sB8. The function location, plus 4 bytes, holds sB8, and sBm wants to reference that string. Now, we don't use [esi+4], instead using [esi+3]. This is because, as mentioned before, pointers are tagged, and pointers to functions hold the arity of that function in the tag bits. Therefore, esi is actually pointing at sBm plus 1 byte (since sBm is running, it must be fully evaluated), and [esi+3] is the correct location.

In conclusion, we now have:

:Main.main = runMainIO Main.main
Main.main = sBs $ sBr
sBs = let sB8 = "BPF_IF_PW^^VSNXAPCZL^XKM"
          sBm x = let sDJ = ???
                      heap_x = x
                      sDK = ???
                      heap_sB8 = sB8
                  in sDK $ sDJ
      in putStrLn . sBm

`sDJ`

Once again in sDJ we see the stack check and the update frame, but the rest is a little different from what we've seen before:

    mov     eax, [esi+8]
    mov     [ebp-12], eax
    mov     dword ptr [ebp-16], offset stg_ap_p_info
    mov     dword ptr [ebp-20], offset base_GHCziShow_zdfShowInteger_closure
    add     ebp, -20
    jmp     base_GHCziShow_show_info

Instead of using the generic apply mechanism, this code jumps directly to GHC.Show.show, the function in Haskell that produces human readable versions of values. Importantly, this function is generic, and can be applied to many different types. To implement these generic functions and tell show which type of object it is being passed and how to display it, GHC uses a typeclass dictionary. For every type that implements the type class (like an interface in other languages) Show, GHC generates a wrapper data structure that just contains every function that is part of Show. the function show is then just an instance accessor function - it takes a single argument, the dictionary, and returns the function that actually does the displaying of the type we care about.

Therefore, show, which we jump to, takes a single argument, $fShowInteger (this is the mangled name for the implementation of Show for the type Integer). show stores its return value (the actual function for displaying Integers) in esi, then jumps to the return address given to it, stg_ap_p_info. This leads to another interesting thing about sDJ. By putting stg_ap_p_info on the stack, we make it so that whatever show returns is then called on one more argument, stored just beyond it in [ebp-12]. Through this double-application process, we all in all display the integer stored at [ebp-12] and return that result.

Now, just like in sBm, we have a reference to esi, this time pulling from [esi+8]. Note that since we are running an thunk (unevaluated expression), not a function, we know we have no tag bits, and esi points directly at the heap location containing sDJ. Going back to sBm where sDJ was allocated, we find that [esi+8] refers to the argument of sBm, copied into the heap. This reference to sBm is then copied into [ebp-12], the place for the integer that we will display.

All together, we end up with:

:Main.main = runMainIO Main.main
Main.main = sBs $ sBr
sBs = let sB8 = "BPF_IF_PW^^VSNXAPCZL^XKM"
          sBm x = let sDJ = show heap_x
                      heap_x = x
                      sDK = ???
                      heap_sB8 = sB8
                  in sDK $ sDJ
      in putStrLn . sBm

`sDK`

For the first time, sDK introduces nothing new, and following through everything done in previous sections, we see that zipWith is called on two arguments, the first being a newly allocated sBh and the second being the previously allocated reference to sB8

Therefore, we have:

:Main.main = runMainIO Main.main
Main.main = sBs $ sBr
sBs = let sB8 = "BPF_IF_PW^^VSNXAPCZL^XKM"
          sBm x = let sDJ = show heap_x
                      heap_x = x
                      sDK = let sBh = ???
                            in zipWith sBh heap_sB8
                      heap_sB8 = sB8
                  in sDK $ sDJ
      in putStrLn . sBm

Going deeper into this rabbit hole is similarly straightforward (moreso, even, as there are no esi references), so we finish that up and result in:

:Main.main = runMainIO Main.main
Main.main = sBs $ sBr
sBs = let sB8 = "BPF_IF_PW^^VSNXAPCZL^XKM"
          sBm x = let sDJ = show heap_x
                      heap_x = x
                      sDK = let sBh = let sBf = let sBc = xor
                                                    sBd = fmap chr
                                                in sBd . sBc
                                      in sBf `on` ord
                            in zipWith sBh heap_sB8
                      heap_sB8 = sB8
                  in sDK $ sDJ
      in putStrLn . sBm

Inlining a bunch of stuff to simplify, we get that

:Main.main = runMainIO Main.main
Main.main = sBs $ sBr
sBs = putStrLn . (\x -> zipWith ((fmap chr . xor) `on` ord) "BPF_IF_PW^^VSNXAPCZL^XKM" $ show x)

Essentially, sBs takes a number, displays it (in base 10), and xors the ascii values of that decimal representation with the string "BPF_IF_PW^^VSNXAPCZL^XKM", one byte at a type, then finally it prints the result.

`sBr`

After all the stadard stuff (stack check, heap check, blackhole and CAF stuff, and update frame), we have something new and interesting.

    mov     dword ptr [edi-4], offset integerzmgmp_GHCziIntegerziType_Szh_con_info
    mov     dword ptr [edi], 120
    lea     eax, [edi-3]
    mov     [ebp-12], eax
    add     ebp, -12
    jmp     rlz_info

Here, following that patterns established previously, we are calling the function rlz, providing one argument, the pointer to [edi-3]. However, we don't know for certain how many arguments rlz expects. It looks like 1, since we only provide one, but it would be good to check. The answer comes right before the code of rlz_info. GHC stores various pieces of information about a function right next to the code, including the arity. The 16 bit integer 10 bytes before the code holds this number, and sure enough, 10 bytes before rlz_info, we see a 1.

Now we look at the argument passed to rlz. This, as previously done with functions, is misaligned, because it is pointing at an already evaluated expression. This time, however, we are looking at a data structure. GHC represents algebraic data types very simply, storing one word that contains a constructor, to distinguish between different variants, followed immediately afterwards by any arguments that constructor takes. In this case, we are dealing with an Integer. Since Integers are the default commonly used numeric type in Haskell, GHC (at least by default, when relying on GMP for big integer support), not wanting to go through the overhead of full arbitrary precision, defines an Integer variant, created by S#, that just wraps a machine word. This is used any time an integer fits inside a machine word, and results in a useful speedup. Here, we just see the construction of a literal Integer 120. Since S# is variant number 1, the tag on our pointer is also one.

Therefore, we have:

:Main.main = runMainIO Main.main
Main.main = sBs $ sBr
sBs = putStrLn . (\x -> zipWith ((fmap chr . xor) `on` ord) "BPF_IF_PW^^VSNXAPCZL^XKM" $ show x)
sBr = rlz 120

`rlz`

This function is the heart of the whole problem. It is mostly staightforward, but introduces one new thing:

    mov     dword ptr [edi-4], offset integerzmgmp_GHCziIntegerziType_Szh_con_info
    mov     dword ptr [edi], 1
    lea     eax, [edi-3]
    mov     [ebp-8], eax
    mov     eax, [ebp+0]
    mov     [ebp-12], eax
    mov     dword ptr [ebp-16], offset stg_ap_pp_info
    mov     dword ptr [ebp-20], offset integerzmgmp_GHCziIntegerziType_zdfEqInteger_closure
    mov     dword ptr [ebp-4], offset sBx_info
    add     ebp, -20
    jmp     ghczmprim_GHCziClasses_zeze_info

Following the patterns seen before, this calls == on its argument and the constant 1. However, there is no update frame following this. Instead, we see sBx_info, and the remaining stack slot ([ebp+0], where the argument was) is left empty. This means that when == returns, it will (like everything) put its result in esi, then jump into sBx_info, which will be responsible for finishing up anything that needs to be done. This is the general pattern we will see any time the return value of a function is scrutinized using a case expression.

`sBx`

This, as a case statement, has a start that we haven't seen at all yet:

    mov     eax, esi
    and     eax, 3
    cmp     eax, 2
    jnb     short loc_804C5A1

Remember that == put its result in esi before jumping to sBx. Therefore, esi holds an object of type Bool that we know to be evaluated. Since we know that it is evaluated, we can just check the tag bits, masking out all but the bottom 2 bits and doing a comparison. False is the first variant, so has tag value 1, while True has tag value 2. Therefore, this will jump to loc_804C5A1 if == returned True.

The False case, coming immediately after this prelude, is fairly straight forward, following all the patterns we have seen before. Note, however, that the argument to rlz is now stored at [ebp+4], since [ebp+0] is where sBx_info was stored. Going into sBu and sBv (which are identical), we find that the false case returns rlz (arg - 1) + rlz (arg - 1).

The True case, at the target of the prelude's jump, is something new. After a quick allocation an heap check, we have:

    mov     dword ptr [edi-4], offset integerzmgmp_GHCziIntegerziType_Szh_con_info
    mov     dword ptr [edi], 2
    lea     esi, [edi-3]
    add     ebp, 8
    jmp     dword ptr [ebp+0]

This allocates a literal integer 2 on the heap, stores a pointer to it in esi, and then jumps to a pointer further down the stack. This is exactly how we said that all functions will eventually return. This case is just returning 2.

All together, this results in the following:

:Main.main = runMainIO Main.main
Main.main = sBs $ sBr
sBs = putStrLn . (\x -> zipWith ((fmap chr . xor) `on` ord) "BPF_IF_PW^^VSNXAPCZL^XKM" $ show x)
sBr = rlz 120
rlz x = case x == 1 of
    False -> rlz (x - 1) + rlz (x - 1),
    True -> 2

Optimization and conclusion

Once we have this source code, getting the flag isn't too hard. rlz x is just 2 to the power of x, calculated stupidly slowly. A quick calculation or search on Wolfram Alpha gives that 2^120 is 1329227995784915872903807060280344576, which, when xored character by character in ascii values with "BPF_IF_PW^^VSNXAPCZL^XKM", gives the flag.

Flag

sctf{thinkingwiththunks}