Skip to main content

Book: Programming Collective Intelligence by Toby Segaran

Book: Programming Collective Intelligence by Toby Segaran (2007, 368 pages)
First I should said that I have always been interested in Machine Learning. I believe that a lot of information can emerge from this technology. So I am always eager to know about Statistic, Open Data and Machine Learning. I would like to have more time to dedicate to this field.
So about this book: Well, first, the example are in Python. Obviously, I am more of a "Java Guy" if that makes any sense. But I am glad that the subject forced me into reading Python code.  Now I understand better why Python is used in domains like Biology, Genetic and data manipulation. Python is really not only about indentation! It is great at manipulating data structure: multi-dimensional arrays, maps ... Short and powerful.  But even if Python is great, I felt that there could have been more schema and pictures, just to relax a bit from certain code intensive sections, specially when dealing with text parsing and word counting.
Last word about the code: the focus is not on optimizing code. But there are advices and considerations on which algorithm suits specific use case.  Still, I would be eager to read another volume on the subject. Especially about concurrency, Scala and GridGain. And even more algorithm ! There are a lot left to cover, specially time series, stream ...
The use case are well chosen, interesting and allow to introduce each algorithm and its limitation as he moves to another use case which require another algorithm. The algorithm are "classical", but he cover a wide range, from the Bayesian filter to SVM and even genetic programming. He avoids also the "Recipe collection", he outlines the constant principals about optimisation for example.
The title "Building Smart Web 2.0 App" is very limiting. But maybe having "web 2.0" in the title is required to sell a decent amount of books. The range of application and domain covered is way larger. Incidentally, the author work in a Biology company!
It is rare when I read a book, feels like it covers a lot, but still wants to know even more! Obviously he makes the subject interesting. There are a lot of data available that only wait to be minded. I followed the recent "Strata" thread from O'Reilly with great expectation for "Data Journalism".


Popular posts from this blog

VirtualBox, CentOS, Network and Template

I have been working with VirtualBox and CentOS recently, here are some notes about this experience.
I used VirtualBox 4.2 and CentOS 6.3, but most of this should work with other products too. I created the first headless, minimal CentOS via NetInstall.
I cover two points: create a template machine and configure the Network.
Configure the NetworkWe want Internet access and a LAN local to the host.
For background information read: Networking in VirtualBox by Fat Bloke on June 2012.
The easiest is to enable two Network Adapters: One will be "Host-only" and the second "Nat". In the "Preference" menu you can see the DHCP server range for the Host-only Network. So you may set fixed addresses outside this range.
Next: start the guest. There may be various results at first, depending on a lot of things. Some problem might be solved by rm -f /etc/udev/rules.d/70-persistent-net.rules and a reboot.
Anyway, configure the two interfaces (set your own IP and MAC addresses)…

One in six IT projects ends up ‘out of control’

A surprisingly high number of projects are 'ticking time bombs', according to researchers at the University of Oxford. They analysed 1,500 global projects that had revamped their information technology systems within the last 10 years. They discovered that one in six projects in the sample went over budget by an average of 200 per cent (in real terms) or over ran by an average of almost 70 per cent.

Their conclusion is similar to previous studies: