Vision Language Models are Biased
Large language models (LLMs) memorize a vast amount of prior knowledge from the Internet that help them on downstream tasks...
Large language models (LLMs) memorize a vast amount of prior knowledge from the Internet that help them on downstream tasks...
Multi-head self-attention (MHSA) is a key component of Transformers, a widely popular architecture in both language and vision. Multiple heads...
Detecting object-level changes between two images across possibly different views is a core task in many applications that involve visual...
Generative AI (GenAI) holds significant promise for automating everyday image editing tasks, especially following the recent release of GPT-4o on...
Most face identification approaches employ a Siamese neural network to compare two images at the image embedding level. Yet, this...
Nearest neighbors (NN) are traditionally used to compute final decisions, e.g., in Support Vector Machines or k-NN classifiers, and to...
While large language models with vision capabilities (VLMs), e.g., GPT-4o and Gemini-1.5 Pro, are powering various image-text applications and scoring...
* Equal contribution. CLIP-based classifiers rely on the prompt containing a {class name} that is known to the text encoder....
Large multimodal models (LMMs) have evolved from large language models (LLMs) to integrate multiple input modalities, such as visual inputs....
Image classifiers are information-discarding machines, by design. Yet, how these models discard information remains mysterious. We hypothesize that one way...
Face identification (FI) is ubiquitous and drives many high-stake decisions made by law enforcement. State-of-the-art FI approaches compare two images...
Large-scale, multimodal models trained on web data such as OpenAI’s CLIP are becoming the foundation of many applications. Yet, they...
Explaining artificial intelligence (AI) predictions is increasingly important and even imperative in many high-stakes applications where humans are the ultimate...
Three important criteria of existing convolutional neural networks (CNNs) are (1) test-set accuracy; (2) out-of-distribution accuracy; and (3) explainability. While...
Recent research in adversarially robust classifiers suggests their representations tend to be aligned with human perception, which makes them attractive...
Explaining the decisions of an Artificial Intelligence (AI) model is increasingly critical in many real-world, high-stake applications. Hundreds of papers...
Interpretability methods often measure the contribution of an input feature to an image classifier’s decisions by heuristically removing it via...
* All authors contributed equally. Adversarial training has been the topic of dozens of studies and a leading method for...
* Equal contributions. Attribution methods can provide powerful insights into the reasons for a classifier’s decision. We argue that a...
Despite excellent performance on stationary test sets, deep neural networks (DNNs) can fail to generalize to out-of-distribution (OoD) inputs, including...
Large, pre-trained generative models have been increasingly popular and useful to both the research and wider communities. Specifically, BigGANs a...
Motion-sensor cameras in natural habitats offer the opportunity to inexpensively and unobtrusively gather vast amounts of data on animals in...
Training deep neural networks on images represented as grids of pixels has brought to light an interesting phenomenon known as...
Generating high-resolution, photo-realistic images has been a long-standing goal in machine learning. Recently, Nguyen et al. (2016) showed one interesting...
Deep neural networks (DNNs) have demonstrated state-of-the-art results on many pattern recognition tasks, especially vision classification problems. Understanding the inner...
We can better understand deep neural networks by identifying which features each of their neurons have learned to detect. To...
Deep neural networks (DNNs) have recently been achieving state-of-the-art performance on a variety of pattern-recognition tasks, most notably visual classification...
The Achilles Heel of stochastic optimization algorithms is getting trapped on local optima. Novelty Search avoids this problem by encouraging...
Recent years have produced great advances in training large, deep neural networks (DNNs), including notable successes in training convolutional neural...