Week 1 – First steps

Author

Luiz Garcia

Published

December 3, 2025

Doi

10.5281/zenodo.17584973

Abstract

Describing the baseline of the baseline architecture, the repository and the first toy experiments with results.

Introduction

After refreshing about Weightless Neuron Networks (WNNs) and understanding more Large Language Models (LLMs), I started to develop a small RAM based architecture and a toy project to test it out and create some traction and understanding.

Architecture

First, I started with a RAMNeuron, RAMLayer and a RAMAutomaton, but then learning more about PyTorch and how to improve a lot with vectorization, I tweaked the standard RAM neurons a little.

So, today, the RAMNeurons are actually implemented by a vectorization class called Memory and RAMLayer is just a wrapper class to Memory. Memory also have a MemoryVal IntEnum representing what it stores. And finally, the RAMAutomaton was replaced by a RAMTransformer, which is not yet a Transformer, as known by the LLM Perceptron architecture, but it is it’s goal to become a weightless Transformer.

The RAMTransformer has 3 layers based on a research I did during my undergrad years: 1. Input layer - linked directly to the input bits. 2. State layer - linked with the state layer output and with some input layer output. This allows context and make the network learn with time/context. 3. Output layer - linked with the input layer and state layer and providing the network output.

Repository

The code is stored on GitHub where a more detailed explanation about the architecture is laid down.

Toy problem

In order to test it out the architecture, I used a simple problem, simple enough to initially not even need a state layer.

The problem is the parity check.

So, I defined three phases, where all hidden neurons were connected to the only neuron on the output layer (the parity check is a classifier of one bit, True or False for parity): 1. testing the network with only input layer (no state layer): a. x neurons with y connections, where y were smaller than the number of input bits n; but x * y > n, to allow all input bits were linked to an input neuron. b. 1 neuron with n connections, where n were the input size. Output layer were connected to the only one hidden neuron. 2. testing the network with input and state layers, so we can have a window. The limitation with #1 is that the higher the parity check, the higher is the number of input neurons/bits. Having a state to give context of the previous parity checks, it allows the network to do a parity check on every m bits at a time, allowing an unlimitted size of bits to do a parity check.

Results

1.a. For parity checks up to 4 bits, 100% was common, if not the norm. Between 5-6, it became harder and harder, where the network started to cluster some neurons on nearby hamming distances and 90%, and 98-99% being common, but 100% rare. Above that up to 12 bits… 92-99% was the norm, depending on the parameters. As this is a toy problem used for PoC, I won’t go further on analysis of those parameters as this could be a complete paper per si and out of focus to where I want to achieve. It provided enough clarity for me to theorize 1.b. 1.b. 1 input neuron linked to all input bits and output neuron linked to the input neuron. As the parity check is a deterministic problem, if the learning algorithm for the WNN was correct, this architecture should make the network converge to the answer. I tested the parity from 1-12 with 100% and fast. Then I jumped to parity check for 20 bits and 25 bits, where is was also 100% acceptance rate, but it took 6h and 27h respectively. 2. 1.b. showed me the network and the algorithm was correct and could properly converge. And also its limitation… which brought me to the idea of an window, which the state layer would help and require an improvement on the learning algorithm. As the teaching now will need to backpropagate the training on many iterations through some paths taken on the network and its hidden layers.

That is where I am now.

References

Reuse

CC-BY-NC-SA-4.0