Coupon collecting the computing way

Coupon collecting is a very Dutch thing to do, though I never made a serious hobby out of it (nevertheless, I still have a great Brio Koekjesboek thanks to that), but I did collect stamps for a while, which was more interesting than cutting slips off of margarine wrappings. What does any of this have to do with computing, or math, for that matter? A lot. I mean, think of it: how much margarine must we have bought just to have enough slips to order the Brio cookie-baking booklet ‘for free’? Same story for the coffee packet wrappings. Post stamp collecting is harder: you’d want the whole series of a given edition. The Italian company Panini made a business out of it, enticing people to collect all stickers of all team members playing in a world cup. And that’s what got me into this post’s topic.

Coaching for the next ACM ICPC, which includes training sessions, made me surf on the web for some interesting problems to solve, so as not to have only previous ICPC regional’s and finals problems to train the students with. Simon Whitehouse has a great blog post on what it would cost to complete the whole Panini sticker book for the 2010 Soccer World Cup in South Africa, without swapping cards with friends, i.e.: how many packets of five stickers would you need to buy to get the whole series of 638 stickers (pictures of soccer players) to put in the sticker book? Answering this question sounded like fun. I reworked a bit the problem description from his post so as to generalize it to finding a way to be able to calculate what it would cost for any world cup—rugby and cricket are important in South Africa, too—and any cost of a packet of stickers (there’s some 6% inflation/year here); read the full problem description (pdf), on the first page.

In solving this, first, there are three variables: N for the number of unique stickers, P for the price of a packet of 5 stickers, and C for the total cost we want to know. To calculate C, we thus have \frac{total\_no\_of\_stickers}{5}*P , and we’ll round it up to the nearest integer. The crux is how to get to the total number of stickers.

Whitehouse’s post has a very readable explanation. In short, when you get the first sticker, it is guaranteed to be a new one, the second card has a \frac{638}{637} chance of being new, and so on to the last card \frac{638}{1} , wich follows from some basic notions of probabilities, which you can/will/have come across in a statistics intro course. Generalising this to the arbitrary number of N cards, we obtain

\frac{N}{N} + \frac{N}{N-1} + \frac{N}{N-2} + \ldots + \frac{N}{2} + \frac{N}{1}

to calculate the total amount of stickers you need to buy to have the N ones complete. This is as much as you really need from a computational viewpoint. Here’s a simple python code snippet that gets the job done:

def panini(n):
     tns = 0
     for i in range(1,n):
          tns = n/i + tns
     return tns

But why keep it simple when one can complicate matters…

This problem is an instance of the Coupon Collector’s Problem (CCP). The above formula is an harmonic series, and with some math on the CCP page, and the Euler-Mascheroni constant {\sf \gamma} (from number theory, with lots and lots of mathematics), one somehow obtains that the above-mentioned series is n*H_n , with H_n the harmonic number, and the whole thing equalling also

n \mbox{log} n + \gamma n + \frac{1}{2} + o(1) \mbox{ as } n \rightarrow \infty

according to the Wikipedia entry; there is a lot more online about it, e.g., here [course-level] and here [research], anong many resources. If that’s not enough, \gamma \approx 0.5772156649 , with the decimal digits computed now to over 119 billion decimal digits (it is a major question in mathematics whether it is an irrational number). Somewhere in the whole gamut of formula on Wikipedia and Whitehouse’s clean but unexplained jump (main text and a comment further down on that page), it boils down to, roughly,

total\_no\_of\_stickers = N*\mbox{ln}(N) + \gamma

The latter is easy to plug into a spreadsheet to obtain the answer. But lo and behold, what’s computed with the math-approach and natural log depends on what you plug in for \gamma , i.e., how many decimal digits, and only \gamma or the whole of Eq.2. The series with the simple algorithm does not have that problem with the approximations. And you don’t have to do all the math. I didn’t exactly record the time it took to create the spreadsheet versus typing up the simple algorithm, but the latter may even have been faster to do.

Besides the observation that the computing way made it simpler to solve the problem with respect to the design, there’s still a remark to be made on computing the total cost. With R10 per packet and the soccer world cup sticker book, you’ll end up paying R8977 to complete the soccer world cup book if you’d do it all by yourself! For many a South African, that’s more than the monthly salary. Completing a 400-sticker world cup for R35/packet is going to cost you R18389 (about €1268 with the current exchange rate). You’d be a lot better off swapping doubles with family and friends rather than buying new packets. Then again, mot people probably won’t calculate how much money they’d be spending on collecting things, so, here’s a basis for a business model for you.

2 responses to “Coupon collecting the computing way

  1. Pingback: Reblogging 2014: Coupon collecting the computing way | Keet blog

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.