I have a working ALU V1 but it's not that easy to see how it works.   So I'm having another go.

An additional constraint from the architecture and LEDs and wiring page is that both inputs have to have LEDs and the output of the ALU is to a memory register which also shows state.  This means that it doesn't matter too much if the input or output is inverted.

It's easy to see how diodes work, they pass current in one direction and not the other.   It's also easy to implement ASR, OR and AND with diodes in a easy to understand way (with fairly low component count).   So the idea is to build a standard 2T memory cell where instead of having one write and one feedback/sustain, there is one write per ALU function and the normal feedback/sustain.

All inputs from the bus got through a transistor which both inverts and shows the input on an LED.   So, in reality, the ALU has not A and not B available to it.

AND, OR and ASR will work with the negated values as input, so the input to the 2T memory cell is also negated.   That's fine, the other half will have the needed value, so it will be possible to use and show the correct value.  This only leaves the ADD/SUB/XOR chain, and the output of that can be easily inverted with the INV ALU input.  The only question is, can I get the right LED lights on the two XORs needed to compute ADD/SUB?

By far the slowest part of this ALU is the carry propagation mechanism for ADD/SUB as we use a ripple-carry adder (there are more efficient adders but these are way out of scope).  This is because a one bit change a low order bit can change all the high order bits (consider adding 1 to 0xFFFF) which is a very long data path.   So that's what I've spent time on.
Just from the input, Ai and Bi, we can construct two signals at every bit position:
  • Gi = ABi is the Generate signal, if Ai and Bi are set then we generate a carry at this position
  • Pi = Ai + Bi is the Propagate signal, if either Ai or Bi are set then we will propagate the carry from Ci to Ci+1. In practice we use the inverse of this, the Kill signal, Ki.
Now we can express the carry out as Ci = Gi + Pi Ci-1 and so a circuit is needed which implements this without the long times needed to switch off a transistor   Diodes work much faster, especially Schottky diodes which both switch fast and have a low voltage drop. 

Here is a (probably minimal) design where the ripple carry only goes through diodes:

The carry in signal can be killed by the transistor that takes it to ground.  The carry out signal is the OR of the generate and the (potentially killed) carry in.

This is good but I worry about signal loss.  I see 0.83V drop across 5 diodes, so 2.49V drop for the full 16 bit ALU.   That's very marginal with a 3.3V power supply.  I have no problem adding an oscillator and an voltage doubler to up the voltage of Cin if needs be.