Home
Weightless Neural Networks for Language Modeling
This research explores whether Weightless Neural Networks (WNNs) — specifically RAM-based neurons — can serve as a foundation for language modeling, traditionally dominated by weighted transformer architectures.
Current Focus
Building a complete RAM-based language model pipeline:
- BitwiseRAMLM: 16-cluster per-bit output architecture with log-product token reconstruction
- Connectivity optimization: Genetic Algorithm (Holland 1975; Goldberg 1989) + Tabu Search (Glover 1989, 1990) over neuron connectivity patterns
- Rust+Metal acceleration: Full pipeline on Apple Silicon (16 CPU + 40 GPU cores)
- Gating mechanisms: Content-based filtering inspired by DeepSeek’s Engram (DeepSeek-AI and Wenfeng 2026) architecture
Key Results
All models evaluated on WikiText-2 with GPT-2 tokenizer (50,257 vocab). CE = Cross-Entropy, PPL = Perplexity, Acc = top-1 accuracy.
| Architecture | CE | PPL | Acc | Details |
|---|---|---|---|---|
| Random baseline | 10.82 | 50,257 | 0.002% | Uniform prediction (methodology) |
| Tiered RAMLM (50K clusters) | ~10.20 | ~27,000 | ~4.9% | 5-tier, EMPTY=0.0 (results) |
| BitwiseRAMLM (16 clusters) | ~9.11 | ~9,000 | ~6.4% | Per-bit prediction (results) |
| Target: GPT-2 | (Radford et al. 2019) | |||
| GPT-2 Small (124M) | 3.38 | 29.41 | – | Zero-shot |
| GPT-2 Medium (355M) | 3.12 | 22.76 | – | Zero-shot |
| GPT-2 Large (774M) | 2.99 | 19.93 | – | Zero-shot |
| GPT-2 XL (1.5B) | 2.91 | 18.34 | – | Zero-shot |
Weekly Updates
Follow the research progress in the Blog.
Repository
All code is at github.com/lacg/wnn.
References
DeepSeek-AI, and Liang Wenfeng. 2026. “Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models.” https://arxiv.org/abs/2601.07372.
Glover, F. 1989. “Tabu Search – Part i.” ORSA Journal on Computing 1 (3): 190–206. https://doi.org/10.1287/ijoc.1.1.1.
———. 1990. “Tabu Search – Part II.” ORSA Journal on Computing 2 (1): 4–32. https://doi.org/10.1287/ijoc.2.1.4.
Goldberg, David E. 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley. https://archive.org/details/geneticalgorithm0000gold.
Holland, John H. 1975. Adaptation in Natural and Artificial Systems. University of Michigan Press. https://archive.org/details/adaptationinnatu0000holl.
Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. “Language Models Are Unsupervised Multitask Learners.” OpenAI. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.