|
| In Episode
3 we discussed several types of code review. In this
Episode we give the results of the largest-ever case study on peer review.
Forget theory — what's it really like? The ACM and IEEE archives are replete with papers on code review and inspection. But after a casual perusal you notice that most studies are done with fewer than 50 reviews, usually in a university setting with students and contrived code samples. That's fine for academics, but what's it like in the real world? With real software developers (ranging from junior to senior) and real software projects with 1000's of files and real deadlines? 2500 reviews. 50 developers. Real software In May 2006 we wrapped up the largest case study of peer code review ever published, done at Cisco Systems®. The software was MeetingPlace® — Cisco's computer-based audio and video teleconferencing solution. Over 10 months, 50 developers on three continents reviewed every code change before it was checked into version control.
We collected data from 2500 reviews of a total of 3.2 million lines of code.
This article summarizes our findings.
The Code Collaborator software displayed before/after side-by-side views
of the source code under inspection with differences highlighted in color.
Everyone could comment by clicking on a line of code and typing. As shown
above, conversations and defects are threaded by file and line number.
If defects were found, the author would have to fix the problems and re-upload
the files for verification. Only when all reviewers agreed that no more
defects existed (and previously found defects were fixed) would be review be complete
and the author allowed to check in the changes.
The length of this article prevents a proper treatment of the data. The
patient reader is referred to Chapter 5 of
Best Kept Secrets of Peer Code Review
for a detailed account.
By "defect density" we mean the number of defects found per amount of code, typically per 1000 lines of code as shown on this graph. Typically you expect at least 50 defects per kLOC for new code, perhaps 20-30 for mature code. Of course these types of "rules" are easily invalidated depending on the language, the type of development, the goals of the software, and so forth. A future article will discuss the interpretation of such metrics in more detail. In this case, think of defect density as a measure of "review effectiveness." Here's an example to see how this works. Say there are two reviewers looking at the same code. Reviewer #1 finds more defects than reviewer #2. We could say that reviewer #1 was "more effective" than reviewer #2, and the number of defects found is a decent measure of exactly how effective. To apply this logic across different reviews you need to normalize the "number of defects" in some way — you'd naturally expect to find many more defects in 1000 lines of code than in 100. So this chart tells us that as we put more and more code in front of a reviewer, her effectiveness at finding defects drops. This is sensible — the reviewer doesn't want to spend weeks doing the review, so inevitably she won't do as good a job on each file. This conclusion may seem obvious, but this data shows exactly where the boundary is between "OK" and "too much." 200 LOC is a good limit; 400 is the absolute maximum. Conclusion #2: Take your time (<500 LOC/hour) This time we compare defect density with how fast the reviewer went through the code.
Again, the general result is not surprising: If you don't spend enough time
on the review, you won't find many defects.
Reviews with author preparation have barely any defects compared to
reviews without author preparation.
For a full explanation and evidence for all these results, refer to
Best Kept Secrets of Peer Code Review. Jason Cohen, for the last five years, has been developing tools and mentoring programs for lightweight peer code review through real-world case studies and experiments. In 2003 he founded Smart Bear Software as the only commercial provider of software that saves developer time during review while automatically collecting metrics and enforcing process workflow. jason.cohen@smartbearsoftware.com
Set as favorite
Bookmark
Email this
Hits: 8749 Trackback(0)Comments (4)
|
|
... I think you have muddled the graphs so they appear in the wrong place. But allowign for that - very interesting. |
|
Lidor Wyssocky
said:
|
... Hi Jason, More on the same matter (the relationship between reviewing and mentoring) can be found at: http://lucidquality.qualityaspect.com/ See also: http://blog.qualityaspect.com/...effective/ Lidor |
|
Tom Harris
said:
|
... Hi Jason, I'm not sure why you found author preparation benefit to be such a surprise. Improving a work product in advance of a review is simply professional behavior. I'm glad that your study found some! Meanwhile, if the reviewer could also be a coding expert and a professional at reviewing, then the code author would have a regular mentee-mentor relationship, and improving the code between reviews would be the primary review-related activity. Probably could do without multiple reviewers as well. -- Tom (http://talkaboutquality.wordpress.com) p.s. I have now read your book too. Understandable emphasis on the tool. I would be happy to see your thoughts, on reviewer qualifications and mentoring aspects, in a future article! |
|



