I spent the better part of a month teaching a quantum computer to classify microscopic lake creatures after crushing each image into a 4x4 grayscale grid. The full sevenâphase investigation shows how a compressed quantum model behaves, how it compares to a parameterâmatched classical baseline, and how circuit structure maps to learning dynamics. This post combines the core results with the lessons I learned along the wayâwhat held up, what didnât, and where this research might go next.
The Keyhole Problem: Input Compression Shapes Everything
Quantum machine learning is often presented through glossy benchmarks and broad claims, but nearly all of those results ride on the same hidden assumption: you can feed the circuit enough information for it to learn something real.
In this project, you canât. The 4x4 encoding exists because simulating 17 qubits on a classical machine is already expensive. The bottleneck is not theoretical capacity; it is simulation. A 16âpixel input is a keyhole that narrows every inference you draw. That keyhole shapes the entire story.
The core question becomes: if you force both quantum and classical models into the same brutal information bottleneck, what happens?
From MNIST Reproduction to Plankton Classification
The repository progresses in phases, moving from reproduction to controlled comparison and then to interpretability and circuitâlevel analysis.
Phase 1: MNIST Reproduction
Recreate a published quantum MNIST demo to validate the stack.
Phase 2: Binary Quantum Classification
Plankton images are downsampled to 4x4, flattened to 16 features, and angleâencoded via (Ry(\pi x_i)) rotations. A TensorFlow Quantum pipeline trains a binary classifier on plankton pairs using 5âfold stratified crossâvalidation with bootstrap confidence intervals.
The initial result: 38.44% mean accuracy, with a 95% CI of ([35.49\%, 42.50\%]). On a binary task. That is worse than a coin flip.
That wasnât a failure of quantum circuits; it was a failure of configuration. The defaults were wrong.
Phase 3: Hyperparameter Optimization
Nested crossâvalidation tuned the configuration. The best model used:
- Angle encoding
- One PQC layer
- Learning rate 0.01
- Batch size 16
- Hinge loss
Under this setup, the QNN surpassed 60% on multiple pairs, and hit 95.3% on diaphanosoma vs diatom_chain. Same architecture. Same data. Different choices. Bad results did not mean the approach was deadâonly that it wasnât tuned yet.
Phase 4: Classical Comparison
This is the scientific center. Instead of comparing against modern CNNs (which would be absurd at 4x4), the QNN is matched against a classical model with the same compressed inputs and similar parameter budgets. We tested 25 binary plankton pairs, equalized sample counts, and used perâpair metrics plus aggregate inference.
Aggregate results across pairs:
- Mean delta (QNN â classical): +6.23%
- Cohenâs d: 0.705
- Wilcoxon p: 0.0007
- QNN wins: 20 / 25
Under the compression regime, the QNN was competitive and often slightly better.
Phase 5: MultiâClass Scaling
Both models degrade as the number of categories increases. The QNN edges ahead at (k = 2), but the classical model overtakes by (k = 5). The crossover happens around (k = 3). This is the most honest outcome: the circuitâs advantage appears only in narrow, lowâclass regimes under severe compression.
Phases 6â7: Saliency, Expressibility, Entanglement
- Saliency maps show the QNN attends to localized morphology even at
4x4. - Expressibility and entanglement increase with depth, but deeper circuits overfit the tiny input. One layer was the right inductive bias.
A Calculus Studentâs Guide to PQC (via Groverâs Search)
To understand how a âQuantum Neural Networkâ works, you first need to understand Groverâs Search.
The Primer: Groverâs as Geometric Rotation
In calculus, youâre used to functions (f(x)) that map numbers to numbers. In quantum, we map vectors to vectors.
Imagine a 16-dimensional space (for our 4x4 grid). Every possible image is a unit vector in this space. Groverâs Algorithm is a fixed sequence of rotations. You start with a âuniformâ vector (pointing equally toward all possibilities) and you apply a âReflectionâ and a âRotation.â
- The Oracle: Flips the sign of the âcorrectâ vector (Reflection).
- The Diffusion: Rotates the entire state toward that flipped vector.
After (\approx \sqrt{N}) steps, the vector points almost perfectly at the marked item. Groverâs is a hardâcoded geometric search. Itâs like a compass that is preâprogrammed to find North.
From Grover to PQC: The Learnable Compass
A Parametric Quantum Circuit (PQC) is Groverâs Search with adjustable rotation angles. Instead of a fixed compass, you have knobs (\theta_1, \theta_2, \ldots, \theta_n) that change the circuitâs behavior.
We define a function (f(\theta)) where the output is the expectation value after many measurements:
[ f(\theta) = \langle \psi | U(\theta)^\dagger M U(\theta) | \psi \rangle ]
For a calculus student, this is just a composite multivariable function:
- Input: 16 pixel intensities (initial rotations).
- Layers: A series of rotation matrices (R(\theta_i)).
- Output: A scalar between (-1) and (1).
Our goal is to find the (\theta) that minimizes a loss function (L(f(\theta))). For that, we need the gradient (\nabla f).
The ParameterâShift Rule: Quantum Calculus
You canât directly inspect the middle of a quantum circuit without collapsing the state. Instead, you use the parameterâshift rule. For many gates, the derivative of the expectation value is exactly:
[ \frac{\partial f}{\partial \theta_i} = \frac{1}{2} \left( f\left(\theta_i + \frac{\pi}{2}\right) - f\left(\theta_i - \frac{\pi}{2}\right) \right) ]
This looks like the difference quotient you learned in calculus, except itâs not an approximation. Itâs exact. Run the circuit twiceâshifted forward and backwardâand you get the precise slope. That is how the QNN learns.
Mapping the Math to the Source Code
If you look at phase2/binary_quantum_classifier.py, you can see exactly where the calculus meets the qubits.
- The Knobs (
sympy.Symbol)
symbol = sympy.Symbol(prefix + "-" + str(i))
circuit.append(gate(qubit, self.readout) ** symbol)
These symbols are the (\theta) variables. They define the degrees of freedom that the optimizer twists to minimize error.
- The Geometric Transformation (
XX,ZZ,RX,RY)
The create_quantum_model function defines the structure of the circuit using entangling gates (XX, ZZ) and rotation gates (RX, RY). Together they form a trainable landscape the model walks through during optimization.
- The Readout Preparation (
X -> H)
circuit.append(cirq.X(readout))
circuit.append(cirq.H(readout))
This prepares the readout qubit in a specific state. A final H at the end turns phase information into a measurable probability.
- The Bridge (
tfq.layers.PQC)
tfq.layers.PQC(model_circuit, model_readout)
This layer implements the parameterâshift rule under the hood. When the optimizer asks for gradients, TFQ runs shifted circuits, computes exact derivatives, and passes them back into the classical training loop.
How the Quantum Model Is Built
At the architectural level, the classifier is intentionally simple but scientifically motivated.
- Classical preprocessing
- Start from
16x16grayscale plankton images. - Downsample to
4x4. - Apply minâmax normalization.
- Flatten to a 16âdimensional feature vector.
- Start from
- Angle encoding and entanglement
- Encode each normalized pixel intensity (x_i) as an (Ry(\pi x_i)) rotation.
- Use a linear chain of
CZgates to capture spatial correlations. - Add parameterized
XX,ZZ, andYYinteractions tying data qubits to a readout qubit.
- Readout and loss
- Measure the readout qubit in the (Z) basis and interpret the expectation value as a continuous score.
- Optimize with hinge loss, which matches the ([-1, 1]) output range and works naturally with binary labels.
This structure runs on a classical simulator in TensorFlow Quantum. The later expressibility and entanglement analyses use the same production circuit so the performance numbers and circuit metrics align.
Statistical Design, Power, and Reproducibility
The project treats experimental design as part of the experiment.
CrossâValidation and Sampling
- Phases 2, 4, 6 use stratified 5âfold crossâvalidation with bootstrap 95% confidence intervals.
- Phases 3, 5 use nested crossâvalidation (5 outer, 3 inner folds).
- The
Q_SAMPLESparameter enforces equalized sample budgets across models.
Majorityâclass and random baselines are reported so every accuracy number has a meaningful reference.
Statistical Testing and Power Analysis
Perâpair tests are underpowered at (n = 5) folds, so the primary inference aggregates across the 25 class pairs. A dedicated utils/power_analysis.py script explores power for the Wilcoxon signedârank test at the observed effect size ((d \approx 0.65)), showing the design reaches about 88% power with 25 pairs.
Reproducibility Infrastructure
Reproducibility is supported by deterministic file ordering, consistent seeding, pinned dependencies in the Dockerfile, and an 82âtest verification suite that runs at build time.
Why Deeper Was Worse
I expected deeper circuits to help. The expressibility analysis showed more depth increased entanglement and expanded the reachable state space. But on 16 features, that capacity was wasted; it fitted noise. One layer, 32 parameters, was already generous. The circuit should match the information content, not your ambition.
The Statistics Nearly Killed the Story
Most quantum ML papers report a single split and claim advantage. This project did the opposite: validation, power analysis, baseline comparisons, and perâpair statistical reporting.
The perâpair Wilcoxon tests were underpowered with 5 folds; the minimum achievable pâvalue is 0.0625. If you only looked at perâpair stats, you would conclude no difference.
The correct inference treats pairs as replication units. That is where the signal appears. Itâs a cautionary lesson: the design is the experiment.
Practical Reality: TFQ Only Works in Docker
TensorFlow Quantum depends on a fragile constellation of pinned versions (tensorflow==2.7.0, cirq==0.13.1, sympy==1.8, numpy==1.21.6). It does not install cleanly on modern machines without Docker. On Apple Silicon it runs under AMD64 emulation and needs thermal pacing to survive long runs.
So reproducibility lives in the Dockerfile, not the README. The image runs an 82âtest suite at build time. If the tests fail, the image doesnât build. That is the only reproducibility that matters.
What the Project Actually Tells Us
This is not quantum advantage. The comparison is against a deliberately hobbled classical model under severe compression. A ResNet at full resolution would obliterate both.
What it does show is how parameterized quantum circuits behave under extreme information bottlenecks:
- They can extract structure when tuned carefully.
- They compete in lowâclass regimes under compression.
- Their inductive bias changes with depth, and overâcapacity appears quickly.
- They are interpretable via standard gradient tools.
That is scientifically useful, even if it is not headlineâworthy.
Where This Goes Next
Hardware Changes the Game
On real quantum hardware, 17 qubits is trivial. With 64 or 128 qubits, you can encode real spatial structure. The keyhole widens. The central question becomes: does the QNN advantage at (k=2) persist when compression is relaxed?
Better Circuit Design
The circuits here are first drafts. Quantum conv nets, data reâuploading, attentionâlike entanglement, and deeper inductive biases are all open directions. The design space is huge and barely explored.
Generative Quantum Models
Discriminative classification is only one angle. Quantum generative models could synthesize new examples for rare species. Thatâs where quantum sampling could matter.
Hybrid Pipelines
One promising path: use the quantum circuit as a feature extractor feeding a classical head. Let the quantum model do lowâdimensional feature interactions; let the classical model handle multiâclass scaling.
Closing
I built a sevenâphase experimental pipeline, a power analysis framework, and a reproducible Docker stack to classify plankton at 4x4 resolution. The QNN won 20 of 25 binary comparisons. It lost the multiâclass scaling contest. It produced interpretable saliency maps. It ran on a noiseless simulator because hardware isnât ready.
Is this quantum advantage? No. Is it scientifically informative? Yes.
The future of quantum ML is not a single paper. Itâs slow, careful accumulationâcircuit by circuit, dataset by datasetâuntil the hardware catches up and we find out what the exponential promise actually buys.
Iâm betting it buys something. But Iâm keeping my classical baselines close.