Week 0 – Preparation
Short description of what this project is about, goals, and the workflow you plan to follow.
Introduction
The LLM Optimizer will start small. The main idea is to leverage my previous knowledge using Weightless Neural Networks (WNN (Teresa Bernarda Ludermir and Souto 1999)), Simulated Annealing (SA (Kirkpatrick, Gelatt, and Vecchi 1983)), Tabu Search (TS (Glover 1989)) and Genetic Algorithm (GA (Holland 1975)) for optimization tasks and apply to a modern problem, Large Language Models (LLM (Vaswani et al. 2017), (Liu et al. 2021) and (OpenAI 2023)).
That’s how a meta-heuristic prompt optimization came to be. Use NN/SA/TS/GA to discover high‑quality prompts for a target task. A tiny RAM‑net (a weightless neural network) is trained as a surrogate that quickly predicts how good a prompt will be, so the meta‑heuristic can explore millions of candidates without calling the LLM each time.
Timeline
| Week | Milestone |
|---|---|
| 0 | Blog launch and research plan |
| 1‑3 | Literature and hypothesis |
| 4‑8 | Architecture design and simulation |
| 9‑15 | Prototype and baseline |
| 16‑24 | Experiments and ablations |
| 25‑33 | Writing paper |
Novelty
Based on the literature, these are the load-bearing facts:
- Weightless / LUT-style layers are being revisited for energy-efficient inference and have been integrated into transformer/ViT variants (Quasi-Weightless / LUT-based layers). This shows feasibility of weightless modules inside transformer-style networks (Nag et al. 2024).
- Differentiable variants of weightless networks (DWN) exist, enabling gradient-based training of lookup-table models via surrogates. That provides an option to either keep discrete WNNs and evolve parameters, or use differentiable surrogates (Bacellar et al. 2024).
- Memory-augmented transformer families and associative-memory transformer variants (for long context) have been proposed — e.g., ARMT and other memory transformer works — showing the research appetite for better memory mechanisms. This confirms the area is active, but not dominated by WNN approaches yet (Rodkin et al. 2025).
- Neuroevolution / evolutionary NAS has been applied to transformer architectures for locating operations, attention variants, and hybrid operations. This means applying GA/TS/SA to discrete WNN addressing/routing is a natural and novel cross-over (Yang et al. 2023).
- Edge/energy papers and recent 2024–2025 WNN+Transformer/ViT works indicate momentum but show the space is still sparse — room for a focused contribution that (a) integrates WNN as KV cache replacement, (b) uses neuroevolution for discrete addressing/routing, and (c) evaluates on long-context & efficiency tradeoffs. In short: people have tried LUT layers and memory modules, but the specific combination I can propose (WNN as KV cache + GA/TS/SA optimized addressing + interpretability/energy evaluation) looks novel and publishable (Yang et al. 2023).
Bottom line on novelty: prior work demonstrates feasibility (weightless modules and memory-augmented transformers) and neuroevolution for NAS; however, the exact combination — WNN modules substituting/supplementing the KV cache + evolutionary search for discrete addressing/routing + detailed M-series energy/latency evaluation and interpretability analysis — appears to be a fresh contribution space with low prior saturation and good publishability.