The following is a list of my recommended books which I find of general use for computational modeling, bioinformatics, teaching or mere entertainment.

The Fundamentals

Fundamental reading in terms of understanding the fields from the modern side of the spectrum of computer science.

Matrix Analysis and Linear Algebra (Carl Meyer, 2000) - What it says on the tin. Properties of vectors and matrices.

Matrix Computations (Gene Golub and Charles van Loan, 1983) - Detailed implementation recipes for various algorithms for matrix decompositions and approximations.

Introduction Functional Analysis with applications (Erwin Kreyszig, 1978) - Basic definitions from functional analysis; inner product-, metric-, normed- and various more specific spaces, theory of operators, boundedness, etc. Essential for understanding machine learning among other things.

Convex Optimization (Stephen Boyd and Lieve Vanderberghe, 2004) - Understanding mathematical optimization, building from the basic set theory and vector spaces; Many optimization problems can be solved if written/approximated as optimization over convex sets or optimization of convex functions. Contains the theory, algorithms and applications, and is complemented with well-made video lectures on YouTube.

Introduction to algorithms (Thomas Cormen et. al, 2001) - Perhaps a computer science one-oh-one book and a bible for any computer scientist or engineer. It is probably not found on reading lists related to any kind modeling, but I nevertheless find it useful especially for computational complexity analysis.

Machine Learning and Statistics

Books that satisfy some of my primary interests in machine learning, namely kernel methods and various algorithms based on matrix decompositions.

Kernel Methods for Pattern Analysis (John Shawe-Taylor and Nello Cristianini, 2004) - I prefer this book to other books about kernel methods because of right ratio between reader intuition and mathematical rigour. On top of that, it starts from the very basics and carefully takes the reader through the wonders of Hilbert spaces. Application-wise it contains little material, but luckily the former is abundant in this nowadays very mature field.

Pattern Recognition and Machine Learning (Chris Bishop, 2006) - Machine learning models from a probabilistic (Bayesian statistics) perspective. Contains little information on algorithms and applications, which I think is good. More importantly, it provides intuition and derivation of different models, with the right level of mathematical very rigor. The exercises complement the main content, with some having solutions on the web. Especially useful are chapters on Sampling methods and various latent variable models. Less clear is the description of kernel methods and Gaussian processes, so I would advise the reader to look elsewhere for these concrete topics.

Deep Learning (Ian Goodfellow, Yoshua Bengio, Aaron Courvile, 2016-) - Despite initial skepticism towards what seems to be a hype topic, this is an excellent book. Out of many other books on neural networks, this one tries to explain how and why the nuts and bolts of neural nets make them powerful function approximations. Most of the books is written in plain English, and casually skips theoretical proofs. Even if one is not interested in neural nets, it concisely introduces important developments in optimization of highly-dimensional, non-convex loss functions.

To understand the intuition behind this behavior, observe that the Hessian matrix at a local minimum has only positive eigenvalues. The Hessian matrix at a saddle point has a mixture of positive and negative eigenvalues. Imagine that the sign of each eigenvalue is generated by flipping a coin. In a single dimension, it is easy to obtain a local minimum by tossing a coin and getting heads once. In n-dimensional space, it is exponentially unlikely that all n coin tosses will be heads.

Bioinformatics / Computational Biology

This heterogeneous field grows in a rather rapid, conference-driven manner. The following books come to good use for teaching purposes and contain more computational/algorithmic aspects of bioinformatics/computational biology.

Introduction to Computational Genomics (Nello Cristianini and Matthew Hahn, 2006) - A very friendly book, split into self-contained chapters. Worth reading from start to finish even if only for its well-written chapter introductions. The go-to book for a beginner in the field.

Biological Sequence Analysis (Richard Durbin, 1998) - Sequence analysis algorithms, presented with a higher degree of mathematical rigor and foundations, especially the Hidden Markov models and clustering sections.

Introduction to Bioinformatics Algorithms (Pavel Pevzner and Neil Jones, 2004) - Complements the above books with problems that are less data analysis oriented and more bulk computation-type, such as genome assembly. At times written in a cowboy style, which, judging by the cover, seems to be the authors favourite outfit.

Popular Science

The Signal and the Noise: Why Most Predictions Fail but Some Don't (Nate Silver, 2012) - It reads like a collection of very detailed essays from one of the most famous practitioners in data science. Contains absolutely zero maths, but provides important intuition and very detailed examples of distinguishing correlation from causation in fields such as economics, sports betting, moneyball, gambling and even machine intelligence.

"A butterfly flapping its wings in Brazil can theoretically cause a tornado in Texas. But in loosely the same way, a tsunami in Japan or a longshoreman’s strike in Long Beach can affect whether someone in Texas finds a job."

How Not to be Wrong: The Power of Mathematical Thinking (Jordan Ellenberg, 2014) - A book that focuses on intuition, this time about mathematics in general and how the mathematical thinking can improve daily lives. Examples like the US tax system, sacrificing sheep, missing airplanes, and the questions existence of god to point out how basic concepts like (non-)linearity, statistical hypothesis testing, expectation, Bayesian inference manifest themselves in practice.

"It’s easy to make arithmetic mistakes when doing a problem like this under time pressure. And sometimes that leads to a student arriving at a ridiculous result, like a jug of water whose weight is −4 grams. If a student arrives at −4 grams and writes, in a desperate, hurried hand, “I screwed up somewhere, but I can’t find my mistake,” I give them half credit. If they just write “−4g” at the bottom of the page and circle it, they get zero—even if the entire derivation was correct apart from a single misplaced digit somewhere halfway down the page. Working an integral or performing a linear regression is something a computer can do quite effectively. Understanding whether the result makes sense — or deciding whether the method is the right one to use in the first place — requires a guiding human hand. When we teach mathematics we are supposed to be explaining how to be that guide. A math course that fails to do so is essentially training the student to be a very slow, buggy version of Microsoft Excel."

The Code Book (Simon Singh, 2001) - Beautiful overview of the history of cryptography from the Caesars Code, to Enigma machine, to Public Key cryptography and the internet (a bit early for blockchains and crypto-currencies, though). Contains short, worked examples for crypto-systems and how these evolved in parallel with statistics and later computer systems, interleaved with contemporary military history.

"Now picture the following situation. As before, Alice wants to send an intensely personal message to Bob. Again, she puts her secret message in an iron box, padlocks it and sends it to Bob. When the box arrives, Bob adds his own padlock and sends the box back to Alice. When Alice receives the box, it is now secured by two padlocks. She removes her own padlock, leaving just Bob's padlock to secure the box. Finally she sends the box back to Bob. And here is the crucial difference: Bob can now open the box, because it is secured only with his own padlock, to which he alone has the key."

We Have No Idea (Jorge Cham, 2017) - An introduction particle physics and the theory of universe and everything, highlighting the various aspects of the world we have no idea about. Despite the cartoon-like presentation (the author is behind the webcomic PhD comics), it contains surprisingly comprehensive treatment of relativity theory, dark matter, wormholes, unperceivable extra dimensions and more.

"For example, if a baboon throws a baseball very gently at your car, you would see the ball bounce off, and you and the baboon might conclude that your car is one giant single particle."

What If? (Randal Munroe, 2014) Scientific answers to absurd hypothetical questions, written by former NASA engineer, more famous for the webcomic XKCD. Makes for a perfect summer read. We always wondered on questions like "From what height would you need to drop a steak for it to be cooked when it hit the ground?", "What if a rainstorm dropped all of its water in a single giant drop?" and of course, What would happen if you tried to hit a baseball pitched at 90 percent the speed of light?"

"Suppose you’re watching from a hilltop outside the city. The first thing you would see would be a blinding light, far outshining the sun. This would gradually fade over the course of a few seconds, and a growing fireball would rise into a mushroom cloud. Then, with a great roar, the blast wave would arrive, tearing up trees and shredding houses. Everything within roughly a mile of the park would be leveled, and a firestorm would engulf the surrounding city. The baseball diamond, now a sizable crater, would be centered a few hundred feet behind the former location of the backstop. Major League Baseball Rule 6.08(b) suggests that in this situation, the batter would be considered “hit by pitch,” and would be eligible to advance to first base."

Freakonomics: the Hidden Side of Everything (Steven Levitt & Stephen Dubner, 2005) Two renowned behavioural economists write a series of assays on how to understand the world through human incentives, that underpin economics from day one. From sumo wrestler to cheating high school teachers, to how naming a child affects her chances later in life. In simple words, whenever we start assigning numbers to actions we take, us humans will undoubtedly be inclined to optimize that number. It outlines some interesting global cause-and-effect stories, where a particularly interesting one is the chapter on birth control supposedly leading to significant drop in crime all over the US. Interestingly enough, it also warned (in 2005) the nowadays massive hysteria on Orwellian control through information on individuals, way before social media etc.

"So if crack dealing is the most dangerous job in America, and if the salary was only $3.30 an hour, why on earth would anyone take such a job? Well, for the same reason that a pretty Wisconsin farm girl moves to Hollywood. For the same reason that a high-school quarterback wakes up at 5 a.m. to lift weights. They all want to succeed in an extremely competitive field in which, if you reach the top, you are paid a fortune (to say nothing of the attendant glory and power)."

The Weapons of Math Destruction (Cathy O'Neil, 2016) Written by a quant at a large hedge fund, this one somewhat builds upon Freakonomics, but ten years later, where our incentives and behaviour are leveraged via massive information systems. We are monitored on social media, in our shopping habits, driving, political activity, media and entertainment consumption and more. This gives an edge to conglomerates controlling these systems, but also promotes the rich-get-richer paradigm. A more than called for call to action for the societies to be critical about our seemingly innocent digital footprint.

I wondered what the analogue to the credit crisis might be in Big Data. Instead of a bust, I saw a growing dystopia, with inequality rising. The algorithms would make sure that those deemed losers would remain that way. A lucky minority would gain ever more control over the data economy, raking in outrageous fortunes and convincing themselves all the while that they deserved it. After a couple of years working and learning in the Big Data space, my journey to disillusionment was more or less complete, and the misuse of mathematics was accelerating. In spite of blogging almost daily, I could barely keep up with all the ways I was hearing of people being manipulated, controlled, and intimidated by algorithms. It started with teachers I knew struggling under the yoke of the value-added model, but it didn’t end there. Truly alarmed, I quit my job to investigate the issue in earnest. "