Week 0 – Preparation

Author

Luiz Garcia

Published

November 11, 2025

Doi
Abstract

Short description of what this project is about, goals, and the workflow you plan to follow.

Introduction

The LLM Optimizer will start small. The main idea is to leverage my previous knowledge using Weightless Neural Networks (WNN (Teresa Bernarda Ludermir and Souto 1999)), Simulated Annealing (SA (Kirkpatrick, Gelatt, and Vecchi 1983)), Tabu Search (TS (Glover 1989)) and Genetic Algorithm (GA (Holland 1975)) for optimization tasks and apply to a modern problem, Large Language Models (LLM (Vaswani et al. 2017), (Liu et al. 2021) and (OpenAI 2023)).

That’s how a meta-heuristic prompt optimization came to be. Use NN/SA/TS/GA to discover high‑quality prompts for a target task. A tiny RAM‑net (a weightless neural network) is trained as a surrogate that quickly predicts how good a prompt will be, so the meta‑heuristic can explore millions of candidates without calling the LLM each time.

Timeline

Week Milestone
0 Blog launch and research plan
1‑3 Literature and hypothesis
4‑8 Architecture design and simulation
9‑15 Prototype and baseline
16‑24 Experiments and ablations
25‑33 Writing paper

Novelty

Based on the literature, these are the load-bearing facts:

  1. Weightless / LUT-style layers are being revisited for energy-efficient inference and have been integrated into transformer/ViT variants (Quasi-Weightless / LUT-based layers). This shows feasibility of weightless modules inside transformer-style networks (Nag et al. 2024).
  2. Differentiable variants of weightless networks (DWN) exist, enabling gradient-based training of lookup-table models via surrogates. That provides an option to either keep discrete WNNs and evolve parameters, or use differentiable surrogates (Bacellar et al. 2024).
  3. Memory-augmented transformer families and associative-memory transformer variants (for long context) have been proposed — e.g., ARMT and other memory transformer works — showing the research appetite for better memory mechanisms. This confirms the area is active, but not dominated by WNN approaches yet (Rodkin et al. 2025).
  4. Neuroevolution / evolutionary NAS has been applied to transformer architectures for locating operations, attention variants, and hybrid operations. This means applying GA/TS/SA to discrete WNN addressing/routing is a natural and novel cross-over (Yang et al. 2023).
  5. Edge/energy papers and recent 2024–2025 WNN+Transformer/ViT works indicate momentum but show the space is still sparse — room for a focused contribution that (a) integrates WNN as KV cache replacement, (b) uses neuroevolution for discrete addressing/routing, and (c) evaluates on long-context & efficiency tradeoffs. In short: people have tried LUT layers and memory modules, but the specific combination I can propose (WNN as KV cache + GA/TS/SA optimized addressing + interpretability/energy evaluation) looks novel and publishable (Yang et al. 2023).

Bottom line on novelty: prior work demonstrates feasibility (weightless modules and memory-augmented transformers) and neuroevolution for NAS; however, the exact combination — WNN modules substituting/supplementing the KV cache + evolutionary search for discrete addressing/routing + detailed M-series energy/latency evaluation and interpretability analysis — appears to be a fresh contribution space with low prior saturation and good publishability.

References

Bacellar, Alan T. L., Zachary Susskind, Zachary Susskind, Marício Breternitz Jr, Eugene B. John, Lizy K. John, Priscila M. V. Lima, and Felipe M. G. França. 2024. “Differentiable Weightless Neural Networks.” https://doi.org/10.48550/arXiv.2410.11112v4.
Glover, F. 1989. “Tabu Search – Part i.” ORSA Journal on Computing 1 (3): 190–206. https://doi.org/10.1287/ijoc.1.1.1.
Holland, John H. 1975. Adaptation in Natural and Artificial Systems. University of Michigan Press. https://archive.org/details/adaptationinnatu0000holl.
Kirkpatrick, S., C. D. Gelatt, and M. P. Vecchi. 1983. “Optimization by Simulated Annealing.” Science 220 (4598): 671–80. https://doi.org/10.1126/science.220.4598.671.
Liu, Y., M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, and V. & Stoyanov. 2021. “Pre‑train, Prompt, and Predict: A Survey on Large Language Models.” arXiv Preprint. https://arxiv.org/abs/2107.13586.
Nag, Shashank, Alan T. L. Bacellar, Zachary Susskind, Anshul Jha, Logan Liberty, Aishwarya Sivakumar, Eugene B. John, et al. 2024. “Shrinking the Giant : Quasi-Weightless Transformers for Low Energy Inference.” https://doi.org/10.48550/arXiv.2411.01818.
OpenAI. 2023. “GPT‑4 Technical Report.” arXiv Preprint. https://arxiv.org/abs/2303.08774.
Rodkin, Ivan, Yuri Kuratov, Aydar Bulatov, and Mikhail Burtsev. 2025. “Associative Recurrent Memory Transformer.” https://doi.org/10.48550/arXiv.2407.04841.
Teresa Bernarda Ludermir, Antônio P. Braga, André de Carvalho, and Marcílio C. P. de Souto. 1999. “Weightless Neural Models: A Review of Current and Past Works.” In Neural Computing Surveys 2 - ICSI Berkeley, 41–61. Berkeley, USA. https://www.cin.ufpe.br/~tbl/artigos/vol2_2-ncs-1999.pdf.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” In Advances in Neural Information Processing Systems, 30:5998–6008. https://arxiv.org/abs/1706.03762.
Yang, Shangshang, Xiaoshan Yu, Ye Tian, Xueming Yan, Haiping Ma, and Xingyi Zhang. 2023. “Evolutionary Neural Architecture Search for Transformer in Knowledge Tracing.” https://doi.org/10.48550/arXiv.2310.01180.

Reuse

CC-BY-NC-SA-4.0