Every day, all around the world, programmers have to recycle legacy data, translate from one vendor's proprietary format into another's, check that configuration files are internally consistent, and search through web logs to see how many people have downloaded the latest release of their product. This kind of "data crunching," may not be glamorous, but knowing how to do it efficiently is essential to being a good programmer.
This book describes the most useful data crunching techniques, explains when you should use them, and shows how they will make your life easier. Along the way, it will introduce you to some handy, but under-used, features of Java, Python, and other languages. It will also show you how to test data crunching programs, and how data crunching fits into the larger software development picture.
Review By: Garry Archer 01/04/2008Data Crunching is a good reference book for any developer, programmer, Web designer, or database administrator. I found the book does slant more towards Python language applications; however, it does touch regularly on Java and lightly on the C and Pascal languages. Although the book is recommended for anyone from the beginner to the experienced user, I don't think the book isn't suited for the average computer user. Rather it is aimed at the more experienced user who must solve problems, issues, and improve business logic. It is also aimed at the developer/programmer. The book does touch on beginner areas in the Relational Databases and Binary Data chapters, but overall I feel that a more experienced user would find the code in this book more intuitive.
I would definitely recommend the book to several of my fellow workers who are involved in database and Web development. I'm sure they would find snippets of code and ideas to make their lives a little easier.
Even though most computer software has a shelf life of weeks to months, the ideas and samples in the book will be very useful for some years to come. The programming code may change slightly but the reasoning and thought processes will remain constant.
Data Crunching has become a way of life in my current job and I have been able to use portions of Greg Wilson’s code in my day-to-day business of Data Crunching of large corporate databases.
The author deals in examples that the reader can immediately put to test themselves. I have always found this to be a preferred way to demonstrate one’s code. This allows the user not only to learn but put into practice that which he/she has learned. He does not shy away from proving his claims along with making many references to Web sites to back his claims, including his own. He makes good use of footnotes to substantiate and reference his material. I even detected a nice dab of humour aptly peppered throughout the chapters.
If I were to rewrite the book, I think I would add code examples for other languages like Pascal, C#, VB, even some good old DOS Batch Files. I might add a further chapter to databases to include MS SQL server and Oracle.
I found the Horseshoe Nails Chapter very refreshing. It hits the nail on the head, so to speak, especially the part on Test-Driven Development. Too often programs are not tested fully enough, and the end user pays the price in downtime and frustration waiting for the bug fix or update to arrive. All in all, the book gives the end user the recipe for success: take baby-steps, err on the side of caution, test rigorously, use common sense, and think of the end user when coding.