arXiv preprint · MITml research

NeuroLens

Adversarial ML toolkit with a cross-modal CLIP → ResNet transfer study.

ResNet-18 clean acc.≥93%CIFAR-10 target

CLIP-lite R@1≥60%Flickr8k retrieval target

StatusarXiv preprintUnder review

The problem

Adversarial robustness research has focused on unimodal classifiers, but CLIP-style vision-language models create a new attack surface. If perturbations crafted against a multimodal model transfer to a downstream unimodal classifier, deploying CLIP in a pipeline could silently compromise everything downstream.

Architecture

Models built from scratch in PyTorch (ResNet-18 on CIFAR-10, Transformer on AG News, CLIP-lite on Flickr8k — no torchvision pretrained imports)→
optimize perturbation δ against CLIP-lite contrastive loss→
apply δ to standalone ResNet-18→
measure transfer rate vs. direct PGD attack as the upper bound→
defenses with certified guarantees as baselines→
W&B for experiment tracking→
writeup as arXiv preprint.

Key decisions

DECISIONCHOICEWHY

ImplementationEvery model and attack from scratchReading a paper is not understanding it. Implementing forces every assumption and hyperparameter into the open.

Attack targetCLIP-lite joint embedding spaceCLIP's joint embedding is the bridge between modalities; attacking the bridge corrupts both sides simultaneously.

HypothesisMultimodal → unimodal transferBoth models share similar low-level features; if perturbations transfer, pipeline security has to account for it.

PythonPyTorchFGSMPGDCLIP-liteResNet-18