I used to think I wasn’t a good researcher because I didn’t thrive in wet lab.
I’m a night owl. I work best at my own pace, late at night, when the world is quiet and I can think deeply about a problem. The wet lab simply doesn’t care about that. Experiments run on their own schedule. Cells don’t wait for you to be a morning person. Protocols demand precision at 8 AM whether you slept well or not.

An AI depiction showing my transition
I spent time in a hybrid wet lab–dry lab project early in my career, and while I could do the bench work, I never felt like it was where I belonged. I wasn’t bad at it , but I wasn’t lighting up. The part that excited me was always what came after: looking at the data, asking what it meant, synthesizing ideas across experiments. I just didn’t know yet that there was a whole career built around that.
Then I moved fully into dry lab work in the beginning of 2024 and while started working with population genetics, large-scale biobank analysis ,and something clicked. This post is about that transition, the mistakes I made, and the things I wish someone had told me on day one.
The Beginning Was Still Overwhelming
Let me be clear: just because dry lab was a better fit for me didn’t mean it was easy. The first months were genuinely overwhelming. I didn’t know what tools to use, what language to write in, or even how to organize my files. Everything felt like it had a learning curve stacked on top of another learning curve.
I remember being overwhelmed working with a full linux system on the sandbox environment of finngen. I had taken many bioinformatics courses, but working hands on turned out to be a different ball game.
One of my early mistakes was trying to go deep on everything at once. I’d encounter a new concept or tool and feel like I needed to master it completely before moving on. That meant I spent far more time on certain things than I actually needed to. I was learning, but not efficiently. It took a while to develop the instinct for knowing when you understand something well enough to keep going, and when you actually need to dig deeper. I remember being stuck on the concept of principal component analysis far too long then it was actually needed.
I think there is a specific term for it, which is called “Time Blindness.” It is common for beginners to suffer from that.
That instinct doesn’t come from reading , it comes from doing the work, making mistakes, and slowly building judgment. Give yourself permission to learn unevenly. Not everything deserves the same depth at the same time.
Finding My Rhythm With FinnGen-Scale Data
The work that really shaped me was working with biobank-scale data — specifically FinnGen, where you’re dealing with over 500,0000+ individuals, health record data with close to a million data points, and the sheer volume of information that comes with population level genetics.
The first time I tried to work with data at that scale, I hit a wall. My usual approaches were too slow. Loading everything into memory wasn’t an option. I needed to learn how to be efficient, not just correct.
That’s where tools like DuckDB and BigQuery became essential. DuckDB was a revelation for working with huge datasets locally — being able to run SQL-style queries directly on large files without spinning up a database server changed how I approached data extraction. BigQuery was critical for working with the phenotype data efficiently at scale. Instead of pulling entire datasets and filtering afterward, I learned to query smartly: extract exactly what I needed, filter early, and keep my working data manageable.
These aren’t glamorous skills. Nobody puts “learned to write efficient SQL queries for phenotyping” on a conference poster. But they’re the skills that let you actually do the science instead of waiting for your computer to finish choking on a file.
R and Python: It’s Not a Philosophy — It’s Driven by the Task
One of my earliest sources of confusion was the classic question: R or Python? I wasted time going back and forth, starting things in one language and switching to the other, never feeling settled.
What eventually resolved it wasn’t a blog post or a debate — it was just doing enough real work that the answer became obvious from context.
R became my primary tool for analysis and visualization. When I needed to explore GWAS results, make Manhattan plots, generate QQ plots, or do any kind of interactive data exploration, R was where I went. The Bioconductor ecosystem is built for genomics, and ggplot2 made it possible to create publication-quality figures that actually communicated what I wanted to show. When the output from a pipeline was messy — and it often was — R was where I cleaned it up and turned it into something presentable.
Python, on the other hand, became my tool for automation and backend tasks. When I needed to convert RSIDs to genomic positions, or run a script in the terminal as part of a larger pipeline, Python was more practical. It’s better suited for scripting, for batch processing, for anything that needs to run unattended on a server.
The choice was never ideological. It was always driven by the specific task in front of me.
Once I stopped thinking about it as an identity question and started thinking about it as a practical one, the anxiety disappeared.
The Clumsy Outputs Nobody Warns You About
Here’s something that surprised me: the tools we rely on in genomics — the GWAS pipelines, FUMA, the various annotation and enrichment software — they work, but their output is often a mess.
You run an analysis, you get results, and then you open the output file and it’s formatted in a way that’s not immediately usable. Columns are named inconsistently. Files have weird delimiters. Summary statistics come in formats that don’t match what the next tool expects. You end up spending significant time just wrangling output from one tool into input for another.
This is where R became indispensable for me. Not for the analysis itself, but for the cleanup afterward. Reading in clumsy output files, renaming columns, filtering, reformatting, merging datasets, and then visualizing the results — that whole post-processing workflow lived in R, and ggplot was at the center of it.
Learning the basics of how GWAS pipelines work, how to interpret FUMA output, how to trace a signal from summary statistics through to functional annotation — that all takes time. But what takes even more time, and what nobody teaches you, is building the data wrangling muscle to handle the mess in between.
Discovering ggplot Changed How I Think About Data
I remember my early attempts at plotting — base R graphics that technically showed the data but looked awful and were painful to customize. Then I found ggplot2, and it reshaped not just my plots, but how I think about communicating results.
The grammar of graphics forces you to be intentional. What variable maps to which axis? What does color represent? Is a boxplot or a violin plot the right choice for this distribution? When you’re presenting PCA plots of population structure or allele frequency distributions across cohorts, these decisions matter. A careless plot can mislead. A thoughtful one can make your point before you say a word.
I invested real time in learning ggplot well — axis labels, themes, color palettes that are colorblind-friendly, faceting for multi-panel figures — and it has paid off in every presentation, every paper, every time a collaborator says “that’s a clear figure.”
Data Cleaning Is the Real First Step
This is the unsexy truth of bioinformatics: most of your time is spent cleaning data. Mismatched sample IDs. Missing phenotype fields. Inconsistent chromosome naming. Duplicated entries. Numeric columns with hidden character values buried in row 40,000.
When you’re working at biobank scale, these problems are amplified. A small inconsistency in 500,000 records can silently corrupt an entire analysis. I learned — the hard way — to treat data cleaning as the foundation, not a chore. If your input is wrong, everything downstream is wrong. Every time.
The Small Habits That Changed Everything
Some of the most impactful things I learned had nothing to do with code.
Date your files. Not final_results.csv . Not results_v3_FINAL.csv . Instead: 2025-02- 15_gwas_results_chr22.csv . When you’re juggling dozens of output files, timestamps are how you keep your sanity.
Use R Projects. For months, I was setting working directories manually and losing track of which script belonged to which analysis. R Projects give you a self-contained workspace with relative paths, an isolated environment, and automatic organization. It’s the single easiest thing you can do to bring order to your work.
Keep a dry lab notebook. This one hit me hardest, because it’s so obvious in retrospect. In wet lab, we document everything — every protocol, every gel, every result. Then we move to dry lab and somehow abandon that discipline entirely. I’d run an analysis, get a result, move on, and two weeks later have no idea what parameters I used or which version of the data I was working with.
Now I keep a running document in each project: what I did, why I did it, what the input was, what I changed from last time. When a collaborator asks “how did you get this number?” or when I revisit something six months later, the answer is there. Organization isn’t a soft skill in bioinformatics. It’s a survival skill.
I Started Enjoying It — And That Changed Everything
The turning point wasn’t a single moment. It was gradual. One day I realized I was excited to open my laptop. I was staying up late not because I had to, but because I wanted to figure something out. The night owl in me had finally found a schedule that worked.
I started enjoying the puzzle of it — taking a messy dataset and making it tell a story. Learning how different tools fit together. Getting faster at the things that used to take me days. Building intuition for when something in the data looked off.
That enjoyment is what carried me through the steep parts of the learning curve. The beginning is hard, and it stays hard for a while. But if you’re the kind of person who gets satisfaction from synthesizing ideas, from seeing patterns in data, from turning a wall of numbers into a clean figure that makes a point — you’ll find your stride.
What I’d Tell Someone Just Starting Out
If you’re a wet lab person staring at a terminal for the first time, here’s what I want you to know.
You don’t have to learn everything deeply all at once. Build judgment about what needs depth and what just needs familiarity. Let R and Python serve different roles — don’t agonize over choosing one forever. Invest early in ggplot; it’ll pay off in every figure you make. Learn to query your data efficiently before you try to analyze it, especially at biobank scale. Expect messy output from bioinformatics tools and build the skills to wrangle it. Date your files, use R Projects, and for the love of science, keep a notebook.
The transition from wet lab to dry lab isn’t about becoming a different kind of scientist. It’s about finding the version of science that fits how your brain actually works. For me, that meant late nights, big datasets, and the quiet satisfaction of a clean pipeline producing a clear result.
You’ve already proven you can learn hard things. This is just the next one.
Leave a comment