From Formal Systems to Neural Signals: My Path Toward Data Science and Cognitive Research

Early Years: Exploration and Transition

The transition from Physics to Mathematics, combined with a period of exploring diverse academic interests, meant my undergraduate years were marked by experimentation across a wide range of disciplines. While this phase lacked a clear direction at times, the process of trial and error proved formative — it clarified my genuine passion for data-driven, computational, and quantitative research, laying the groundwork for the focus and rigor that later defined my graduate studies.

Despite that rocky start, I never lost my love for mathematics — the rigor, abstraction, and purity that first drew me to the field still feel like sheer beauty to me. What changed was my understanding of its purpose: rather than treating math as an isolated pursuit, I began to see it as a language that gives structure to complexity and meaning to noise. This realization reshaped my trajectory — toward applying mathematical reasoning to domains that are deeply human, from language and cognition to clinical and neural data — and taught me that intellectual resilience is not about perfection, but about redefinition.

Upon receiving my B.Sc. in mathematics, I entered the Graduate Institute of Linguistics as I was fascinated by the complexity and structure of language — not only as symbolic systems derived from formal logic, but also as real-world data that contain both signal (patterns, structure) and noise (exceptions, errors). I was particularly intrigued by how “neural networks,” both biological (the brain) and artificial (language models), process and generate language. This perspective positioned linguistics not merely as a humanities discipline, but as a data science of human communication — bridging formal theory, empirical data, and computational modeling.

ERP Research and Self-Taught Coding

My research foundation was built during my first year of graduate study in the Brain and Language Processing Lab, where I worked on a regression-based ERP (rERP, a cutting-edge ERP method) reanalysis of a psycholinguistic dataset. At the time, I re-learned Python from scratch — through trial and error, without the convenience of modern AI tools — by implementing regression and time-series analysis pipelines via MNE. The project was later accepted for presentation at the 2023 Society for Psychophysiological Research Annual Meeting (though not attended due to timing conflicts and financial considerations).

Beyond data analysis, I gained hands-on experience in EEG data collection and experimental design. During the stimuli construction stage, I taught myself Google Apps Script (JavaScript-based) to automate large-scale emailing for participant ratings on linguistic attributes. This system replaced what would otherwise have been days of manual labor — my first real taste of how computational tools can streamline real-world workflows.

My technical growth continued as a teaching assistant for BrainHack School Taiwan 2023, where I helped bridge neuroscience and computation for students from different fields. In the following year, I introduced Python into the ERP course curriculum for the first time, developing my own preprocessing and visualization scripts to complement the lab’s traditional MATLAB and ERPLAB pipelines. These experiences solidified my conviction that computational thinking can deepen experimental rigor in neuroscience, and that programming fluency is integral to modern scientific literacy.

GitHub:

“See branches preprocessing and hw3-tutorial; branch installation is for, well, student installation guides.”

Thesis Research: Rationality, Interpretability, and Reconstruction

My path into data science was forged through reconstruction — intellectual, emotional, and methodological. During my master’s thesis, I was intrigued by how LLMs make decisions and turned to A Course in Game Theory by Osborne and Rubinstein to build the mathematical foundation. Derivations like the proof of Nash equilibrium via Kakutani’s fixed point theorem grounded my understanding of rationality as a formal construct.

To move beyond prompt-response analysis, I explored XAI/model interpretability, but quickly hit dead ends: saliency maps require gradients, unavailable for black-box LLMs; model-agnostic methods like LIME perturb local input space, difficult to define for discrete (tokenized) text. After weeks of failed attempts, I landed on Shapley values, whose origin in cooperative game theory made them both theoretically elegant and technically viable. Integrating SHAP with custom clustering and token-level analysis became an important, though still evolving, part of my thesis.

An early version of this work, submitted to an ACL workshop, was rejected. That rejection became a stationary inflection point on my learning curve: the reviewers’ critiques sharpened my thinking and pushed me to reconstruct my experiments from the ground up: expanding sample sizes, adding statistical justification, formalizing my analysis through model architecture, and aligning results with cognitive psychology literature.

This trajectory taught me that sometimes failures reveal more than successes: in LLMs, flaws in strategic reasoning expose representational limits; in neuroscience, anomalies in biosignals illuminate cognition and psychopathology. I now see research as advancing not only by optimizing performance, but also by interpreting breakdowns — whether in artificial neural networks or in the noisy, complex signals of the human brain. This philosophy motivates me to transfer the interpretability and data science skills I developed in my thesis to EEG research and psychopathological prediction, and to contribute to the broader field of science and clinical medicine.

GitHub:

Can-LLMs-Play-the-Game-They-Talk

“I plan to further refine this study for international conference submissions.”

Current Work and Looking Ahead

Beyond formal coursework and research, I have developed a range of independent machine learning and data science side projects — covering RAG and agent systems, FastICA and neural networks built from scratch, Seq2Seq with attention in PyTorch, CNN for ECG classification in Keras, as well as algorithmic explorations in computer science, mathematics, and cryptography.

GitHub: data-science-side-projects

Currently, I am a part-time student researcher at Academia Sinica. My ongoing work involves integrating facial expression features and EEG signals to predict psychopathological factors using machine learning models (reference: NeurIPS EEG Challenge). Looking ahead, I aim to deepen this line of research at the intersection of brain data, AI, and computational modeling — bridging data science and neuroscience to advance both scientific understanding and clinical applications.

Updated 2025 · GitHub: @amandalin047