I am a biochemist & computational biologist by training, have worked as a software engineer for a startup, and am now in training to become a data scientist.
On the technical side I am enthused by data analysis, predictive analytics, and engineering clean code. I am also enthusiastic about understanding the business opportunities that data-driven decision making can produce.
In my work approach I am proactive and enjoy both autonomous and collaborative work.
As part of the Science 2 Data Science summer school I did work for Singular Intelligence, a startup that helps businesses develop data- and analytics-driven corporate strategies.
As part of my work I did data exploration and visualization in
I further employed
scikit-learn (linear models and random forests) to develop a predictive model of sales performance as a function of business-relevant features.
WalletSaver is a startup in the space of personalized and data-driven recommendation systems. WalletSaver develop an app that measures securely parameters of mobile phone usage and recommends the best mobile phone plan based on cost, signal coverage, and phone carriers called.
I engineered the recommendation engine behind WalletSaver in
SQLAlchemy (a SQL object-relation mapper library) and
Memcached to speed up data queries.
I followed a
test-driven development methodology for this project and performed
timing-based optimization such as precomputing certain data.
I sought actively customer feedback and amended the recommendation engine in response to feedback.
I also engineered and maintained geolocation code, critical for signal coverage analysis, in Python with the
Research and preparation of lecture notes, preparation of exercise questions, preparation of exam questions, grading exams.
Science 2 Data Science (S2DS) is an industry-sponsored summer school that leads graduates with numerical backgrounds into the field of data science.
At this course I learned about
NoSQL databases, natural language processing (
NLP), statistics with
machine learning with scikit-learn, and
I further learned about business opportunities that arise from data-enhanced and
data-driven business models, economics, marketing, finance, project management, and corporate strategy.
I summarize my experience at this school at http://georg.io/s2ds.
My research evolved around modeling cell polarity under a number of complex conditions. The formation of poles (e.g. front v. back) is important for cells to know which direction to grow or migrate. Specifically, I studied these equations in extreme environments such as exceptionally large and small cells.
unit tested code in Python with
NetworkX that implements a mathematical framework for bifurcation analysis of large reaction networks.
Here I had to rephrase a number of combinatorial problems as classical graph-theoretic problems to reduce computational cost.
One particular graph-theoretic algorithm, the enumeration of all cliques, was not available in NetworkX at the time.
I implemented a published algorithm for this problem and my implementation has since been added to NetworkX.
I further implemented variants of Crank-Nicolson in Python with
NumPy to integrate numerically my specific convection-diffusion-reaction systems.
I performed timing-supported optimization of my code and added
inline C code where necessary for further speed improvement.
During my PhD I also wrote numerical simulations of a high-dimensional dynamical system in pure
C with support of the
GNU Scientific Library.
Research and write up of Master thesis: Stochastic Simulations of Cell Polarization and Wave Pinning.
For this project I implemented Gillespie’s stochastic simulation algorithm where I wrote the simulation setup in
MATLAB and time-critical code in C.
For the same project I made extensive use of numerical bifurcation packages such as
Self-designed major in computational biology and biological chemistry with strong focus on numerical analysis and data analytics.
Specific mathematical and computational courses that were part of my curriculum:
Major in biochemistry.
|Programming||Python||Advanced, numerical simulations, database-driven backend development, data analysis, predictive analytics (research projects and work for startups)|
|C||Advanced, numerical simulations, stochastic simulations (research projects)|
|C++||Familiar and learning (education and self-taught)|
|Operating Systems||Linux, Windows||Advanced user, Python package development and distribution, scripting|
|Databases||PostgreSQL, MySQL||Working knowledge, data modeling, querying, and data integration into time-critical code (work for a startup)|
|Neo4j, Rexster||Working knowledge (personal side projects)|
|Project Management and Collaboration||git||Advanced, have worked on shared codebase in a startup|
|Continuous Integration||Working knowledge, have used Jekyll CI to test academic code on Linux and OSX|
|Test-Driven Development||Working knowledge, I have used TDD in a startup setting|
|Communication||Work communication||I am comfortable presenting my work in informal and formal settings|
|Scientific writing||I have authored two scientific articles as lead author|
|Technical writing||I write technical articles about my research and side projects at http://georg.io|
Walther GR, Hartley M, Mincheva M (2014): GraTeLPy: Graph-Theoretic Linear Stability Analysis, BMC Systems Biology (software available at http://github.com/gratelpy)
I implemented a graph-theoretic framework for the analysis of dynamical systems in Python with NetworkX. I designed the algorithm, implemented it along with auxiliary methods that were necessary, and combined graph-theoretic approaches in novel ways to reduce combinatorial blowup.
I further responded to user feedback with extensive testing on Windows, OSX, and Linux and simplification of the installation process.
Walther GR, Marée AFM, Edelstein-Keshet L, Grieneisen VA (2012): Deterministic Versus Stochastic Cell Polarisation Through Wave-Pinning, Bulletin of Mathematical Biology
I implemented Gillespie’s stochastic simulation algorithm in MATLAB and time-critical parts of the code in C. I further made extensive use of numerical bifurcation packages XPP/auto and MatCont.