jvparidon@gmail.com
github.com/jvparidon
linkedin.com/in/jp-van-paridon
jvparidon.io/resume
I am a computational cognitive scientist and data scientist with a background in language sciences. I use online and in-person experiments and large public datasets to inform statistical and computational models of human behavior and cognition.
For this project, we developed a sequential testing framework using Bayesian multilevel regression models to efficiently sample participants without violating statistical assumptions. I also integrated a Python interface for a MIDI drumkit into our behavioral experiment protocol as a cheap and effective way to track motor activity in participants' hands and feet.
For this project, I used a variety of NLP and statistical techniques to demonstrate that color knowledge in both blind and sighted people can be predicted from word embeddings. I also reworked the original word2vec algorithm to gain access to word embeddings during model training and track how specific sentences in the training corpus affect the final state of the embedding model.
In this project, I used Bayesian multilevel regression models to analyze behavioral data collected from illiterate participants to show that acquiring literacy does not degrade performance on other visual tasks (a claim that had been made repeatedly in the literature in prior years).
In this project, we produced a novel set of word embeddings in 55 languages from a large archive of film and television subtitles, using the fastText algorithm. Using several classic evaluation metrics (predicting similarity ratings using cosine distances; solving lexical analogies) and a novel lexical norm prediction task (implemented using ridge regression), we demonstrate that embeddings trained on subtitles contain information not well-represented in embeddings trained on e.g. Wikipedia, for instance about how offensive a given word is. The subs2vec package I developed alongside this project also provides a lightweight framework for working with word embeddings in Python.
In this project, I built a computational model to simulate the temporal coordination of speaking and listening processes during simultaneous interpreting, demonstrating that one significant limit on the rate of speech during this task is imposed by the demand to access lexical networks for both speech production and speech comprehension processes.
In this paper, we make recommendations for improving EEG analyses by using robust models which can account for outliers without having to reject data after setting arbitrary thresholds. We provide examples of analyses using robust models in both frequentist (using a robust estimator) and Bayesian (using a heavy-tailed likelihood) frameworks.
In this paper, we explain that the high degree of multicollinearity between various measures of word frequency and transitional probability in linear regression models used in psycholinguistic analyses means that coefficients are often misinterpreted. It is hard to make blanket recommendations for analytical decisions, but we argue that in many cases, a theoretically motivated choice for one specific predictor leads to the most interpretable outcome.
In this project, my role was to set up an analysis pipeline using (cross-validated) SVM classifiers trained on fMRI images from an eye-saccade task to predict the spatial orientation of words processed by the same participants in a different task from their corresponding fMRI images.