The team:
Wojtek Kruszewski wojt.eu @wojt_eu
Daniel Sokołowski github.com/dsokolowski
Get in touch: wojtek@oxos.pl
If you ever wanted to travel across Europe by train, you know how frustrating searching connections and booking tickets can be. I’m excited to be part of the team who took this as a challenge.
Our main battlefield is Journey Search - a place for you to search train connections from different national rail companies and book tickets online.
But that’s not the end of it. I was fortunate to fall into company of train junkies thrilled by sheer prospect of traveling by rail, without the need for any destination, Open Source Software advocates who give back to the community and environmental activists.
[UPDATE 2012] OMG did I write that? I’ll add my new opinion at the end.
Up until recently I’ve been testing Rails callbacks like this:
class Post
after_save :send_notification
def send_notification
end
end
post = Post.new
post.expects(:send_notification)
post.save!
…and thought that I’m so cool because I’m testing things in isolation and testing implementation of send_notification in a separate test.
Wrong! I’ve been testing the framework (testing that appropriate callbacks are being fired when the post is saved), not to mention that performing actual save is wasteful. So now I got:
post = Post.new
post.expects(:send_notification)
post.run_callbacks(:create)
Much better! I’ve seen this idea taken further: tests that assert a callback is declared - and not triggering it. Good idea in principle but too radical for me for now. I’ll stay with the above method.
And now the pleasant surprise. run_callbacks triggers observers as well, so when I moved notifications where they belong I got:
post = Post.new
PostObserver.any_instance.expects(:notify)
post.run_callbacks(:create)
[UPDATE]
Nowadays (late 2012) I wouldn’t use ActiveRecord for this purpose at all. Instead I would probably create a separate model, like PublishPostUseCase which saves the post record and sends notification. It would capture knowledge about all things that need to happen when user hits “Publish”. I’d leave poor overused and abused ActiveRecord callbacks for things closely related to attribute values and persistence.
Regular IMAP provides us with a list of emails but we’re left with the task of grouping them into threads. This is not as simple as looking up “RE: ” and “FWD: ” in subjects, not even close. Mail-Followup-To and Mail-Reply-To headers are not reliable either. Gmail does a good job grouping emails into threads and we want to tap into this data.
Gmail IMAP implementation does offer this information: Gmail X-GM-THRID extension
Unfortunately, Ruby’s Net::IMAP throws and exception (“unknown attribute”) when parsing wich such an attribute.
I resorted to monkey-patching the IMAP library. Here’s a snippet from my app from my app:
I first filtered emails by mailbox and date, then took UIDs of first and last email in the results to form a range. Then I fetch thread IDs for this range of UIDs.
I limited monkey patching only to singleton class of my instance of ResponseParser - we don’t want to mess with Ruby std libs too much.
I needed only thread IDs themselves (to perform statistical analysis on thread lengths), so the result is not associated with emails. Also using UID range is brittle, with more interesting filters you might want to take an array of UIDs instead of a range.
BTW Those thread IDs is what Gmail uses to construct URLs, which means you can build links to those threads too. There is also X-GM-MSGID, it could be used to build links that expand a specific email in a thread, but I haven’t tried that.
I’m not following Tumblr community so this may be obvious.
<div id='widget_container'>
</div>
<script src='http://ajax.googleapis.com/ajax/libs/jquery/1.5.1/jquery.min.js' />
<script type='text/javascript'>
var widget_url = 'http://yourblog.tumblr.com/widgets_page_url';
$('#widget_container').load(widget_url);
</script>
It loads your pages content and puts it into the container.
Currently you can see an example at http://wojt.eu - sidebar is loaded from http://wojt.eu/sidebar_widgets
What are the benefits? In my case it was just extracting content I’ve been working on a lot at the time and being able to edit it separately. It might be good for heavy content you don’t want people to wait for (it will start loading only after main content is rendered). I’m sure there are more uses.
4.958 hours of work history (that’d be nearly 3 years of working full time). 10 contracts with average feedback 4.9/5 and some glowing recommendations. Account featured in provider spotlight. And yet I’m deleting it.
Now, to be precise, I’m going to delete it and create new one, a blank slate. It’s because of legacy mess in oDesk internals.
Back to 2008. Being an oDesk newbie I opted into a new and shiny feature: having multiple “company” accounts. I already had my “consulting company” (quotation marks because the company is two developers strong).
I used the feature to create a company profile for two of my awesome colleagues who had their own company. It worked, I would find them a contract, help with negotiations and communication, but the payments would go into a separate financial account and they worked under their own brand.
Then, after a while the mess has started. I uploaded a logo for my company, but my developer provide would display logo of my colleagues company. At some point I was unable to edit my own profile. Then oDesk phased-out this feature, but left it for users who already had multiple accounts. This legacy feature was apparently not the priority when testing the system before releasing new changes, so every now and then something would mess up.
Me and my colleagues no longer collaborate this way (although I’ve had the pleasure to work with them since then). If they use oDesk, the don’t use the old company account I have created for them. I asked the support several times to delete the account so that the bugs would stop, but the only answer was that I can “hide” it from most reports.
Fast forward to 2011. The bugs are back. It turned out that documents I have exported from oDesk reports and sent to my accountant included incorrect data. Guess what… it was data from my colleagues account! Response from oDesk:
Unfortunately, we do not have an estimate timeframe on when we can fix the bug yet. However, our Engineers are already working on this one and for sure it will be resolved., I will close this support ticket.
I’m not sure how long it took because after a month I stopped checking.
But the best one was to come: I received an automated email from oDesk, containing a report on a project. The thing is that the email was not addressed to me, and it contained personal information I definitely should not receive. And this is when I decided I have had it.
I don’t blame oDesk - I had overcomplicated my company setup from the beginning. But now the only way I can think of to make those issues stop once and for all is to delete current account and create another one - this time not tied in any way to my colleagues.
Geepivo.com introduction - watch on Vimeo I say, I say, my first ever screencast in English. To be honest - my first ever screencast altogether. I’m sure in a year I will look back and laugh at this. You don’t have to wait (-: BTW it’s outdated already, the app has proper sort-of-user-friendly interface.
The first time people look at any given ad, they don’t even see it. The second time, they don’t notice it. The third time, they are aware that it is there. The fourth time, they have a fleeting sense that they’ve seen it somewhere before. The fifth time, they actually read the ad. The sixth time they thumb their nose at it. The seventh time, they start to get a little irritated with it.
The eighth time, they start to think, “Here’s that confounded ad again.” The ninth time, they start to wonder if they’re missing out on something. The tenth time, they ask their friends and neighbors if they’ve tried it. The eleventh time, they wonder how the company is paying for all these ads. The twelfth time, they start to think that it must be a good product. The thirteenth time, they start to feel the product has value. The fourteenth time, they start to remember wanting a product exactly like this for a long time. The fifteenth time, they start to yearn for it because they can’t afford to buy it. The sixteenth time, they accept the fact that they will buy it sometime in the future. The seventeenth time, they make a note to buy the product. The eighteenth time, they curse their poverty for not allowing them to buy this terrific product. The nineteenth time, they count their money very carefully. The twentieth time prospects see the ad, they buy what is offering.
jqGrid sports a peculiar in-place editing model. First it pulls JSON data from the server, puts cell values through “formatters” (for example encodes HTMLS entities) and renders the table. When you trigger in-place editing it uses “unformatters” to get original value and uses this original value as default value of the input.
jqGrid ships with a handful of formatters, for example one that turns email value into a mailto link. For each default formatter there is an “unformatter” - to following our example with an email it would extract email address from a mailto link.
This solution works, but smells funny. Also I can think of some situations where formatting is a lossy operation, let’s say you want to trim a value from the server, but allow user to edit full value in the input. This is something I expect in my current projects, so I set out looking for a solution.
Here’s a couple of rough ideas:
Second solution is most architecturally pleasing. I could go further in this direction and use Backbone.js-like data models and use jqGrid for presentation only. Sounds like working around jqGrid more than using it…
I went for last option and added this option to jqGrid initialization:
afterInsertRow: function( rowid, rowdata, rowelem ) {
var tr = $("#"+rowid);
$(tr).data("jqgrid.record_data", rowelem);
},
“rowelem” is the array of cell values from our JSON data feed or [jsonReader] (http://www.trirand.com/jqgridwiki/doku.php?id=wiki:retrieving_data#jsonreader_as_function)
Then at any point I can fetch those attributes using: $(tr).data(“jqgrid.record_data”).
Ka-ching!
Hold on, there’s more! You can store full record data, not only columns used by jqGrid.
There’s this option under misleading name “repeatitems”, which - if set to “false” make jqGrid expect hash of column_name:value pairs, rather than straight array of values. It’s poorly documented, but I found this little blog post that explains what the option does: “Using jqGrid with ASP.NET MVC: LINQ Extensions”(http://blogs.teamb.com/craigstuntz/2009/04/15/38212/) - just search for repeatitems in the article.
JSON data feed can contain all record’s attributes, not only those that map to a table column - jqGrid will just ignore extra attributes. But!… “rowelem” attribute in our after-insert callback will contain full record data. Nice!
Caveat emptor: I would expect this trick might slow down rendering of large grids.
I wanted to get calendar entries in my spreadsheet for further processing. Seems that Google Apps are slowly catching up with their PC parents.
What’s awesome is that I can use a programming language I feel comfortable with - JavaScript. Here’s the code:
I want to handle recurring responsibilities in Pivotal Tracker. I don’t want Pivotal to implement this feature. Better to keep the application lean, instead this seems a great fit for a third party app.
In the mean time (while waiting for Pivotal Recurring Stories App) we can get by with a simple script that adds stories and run it periodically.
Pivotal team did a great showing simple cURL examples in their API docs
curl -H "X-TrackerToken: TOKEN" -X POST -H "Content-type: application/xml" \
-d "<story><story_type>feature</story_type><name>Fire torpedoes</name><requested_by>James Kirk</requested_by></story>" \
http://www.pivotaltracker.com/services/v3/projects/PROJECT_ID/stories
One shell command - no frameworks used, no libraries required. Now just wrap this up in a simple script and add it to cron. Or - if you don’t want your workstation to take server-like duties, you can use service like http://cronless.com/
Command in the example adds a story to the ice box. To add it to bottom of the backlog add “<current_status>unstarted</current>” parameter. Then it can be moved to the top of the backlog with another API request (https://www.pivotaltracker.com/help/api?version=v3#move_stories) but this would be more tricky to implement with Bash and cURL.
I often find myself explaining basics and benefits of iterative and agile development (and one particular: Scrum). I’ll try to gather some main points. I plan to update this. I really do. No – really.
Update 11/4/2010: I don’t think I will update this.
We organize the work in “sprints” one week long (sometimes two weeks). At the end of every sprint we bring you working software with new features completed.
If you don’t fund the project out of your own project, consider this: investors really prefer to get a demonstration of something working than a “progress update” and proofs of concepts instead of documents.
You can launch your project with minimal set of features. This way you can start getting feedback from the users very early and this feedback could lead to new ideas and changes in the direction.
Also, according to 80-20 rule 80% of the value comes from 20% features. This means the product is perfectly usable and valuable in 1/5 of the development process. For example performance optimization is a nice thing to have, but it’s not needed until the project gets many users.
After every sprint you get code that meets software industry’s quality standards:
You can change a provider whenever you want. Ruby on Rails is all about conventions and we follow them. This makes it much easier for other developer to jump into our shoes.
I see no problem with that: if your colleague happens to be skilled Rails developer and all of a sudden has couple months available he should be able to continue our work. Same thing if the project turns out to be new Facebook and Twitter – we’re not willing to scale that much, so in this case you might want to assemble larger team.
You don’t need to take my word for it. You can hire a third party consultant (or just experienced Rails developer) to ask if the code is indeed readable. In fact there are companies that made this a standard service: Planet Argon ActionRails
This really means you own the project: at any given moment you can take it elsewhere.
Product Backlog is a list of features to build. We can work on it together, but you need to prioritize it (here I only advice). What should get highest priority? Features that:
Usually those two criteria point at the same features.
At the beginning of each iteration we pick several tasks from the top of Product Backlog. We discuss and specify them in details. Then I’m having a session of Planning Poker with other developers to estimate tasks. Yes, writing specs and estimating tasks for only a week ahead actually works. It’s simply an amount of work we can grasp.
Finally we limit the selection to features we think we can complete in one sprint. This is a short term goal we commit to. Most of us work more effectively with a short-term goal in sight.
When we finish a feature we want you to review it as soon as possible. If you accept it, it gets deployed and published at the end of the sprint. We avoid having more than two features completed without your acceptance – this would create risks for all of us. If you reject it, we work hard to get it to a state where you can accept it and it can be included in next release. It’s fastest to include your requests at this stage, when the iron is still hot and we’re immersed in the feature (pardon lack of better metafore).
No. They can be initially hidden, for example: you can enable them initially for beta testers group. Then at some point you enable them all for the public and announce new product release. Generally I advice to publish them immediately.
That’s great. This way we can quickly turn it into a Product Backlog simply by removing details and prioritizing. Let’s keep original specs at hand to reference before discussing a feature at the beginning of a sprint.
Recently I needed to interact with largish svn repository containing some 10 years of development history. I tried the usual:
git-svn clone http://repohost.com/proj
The problem was it took way to long to download. I mean I left the workstation running for the night and in the morning it wasn’t done - what’s worse I had no idea how to estimate remaining time.
It turned out that git-svn was too smart and noticed that /projectdirectory is only a part of the tree. It then followed the directory upwards to root url and started downloading years and years of history of many other projects.
—no-follow-parent parameter made git-svn download what I needed very quickly, but then it wouldn’t see branching, so I ended up with orphaned branches:
—no-minimize-url appeared to be the solution, but git-svn version present on my Ubuntu didn’t have this parameter.
In this particular case it turned out to be very easy to upgrade. I navigated to upcoming version packages repository, downloaded new git-svn and installed manually with dpkg -i. I anticipated dependency hell (one update forcing me to update whole system), but fortunately I was required only to download and upgrade git-core package.
With this parameter git-svn clone went smoothly.
I hope this post will save somebody’s time.
Heroku charges $100/mo just to support one SSL certificate. They have their reasons - they need to reserve a public IP number just for your application, and their platform - Amazon EC2 - gives only one IP per virtual machine, so they need to reserve full EC2 instance just to have a public IP.
If the app is small and requires only 2 dynos, then it’s three times the price of the hosting itself. Say you got a dozen of such small apps…
SNI is an option unless people log in from IE6 or IE7/XP. Piggybacking on *.heroku.com certificate doesn’t look very professional.
There is a workaround though: you can have one ssl:custom addon that serves a multi-domain or wildcard certificate for multiple applications.
I tested this with multi-domain a certificate.
You need to:
add your wildcard certificate to one arbitrary app (ssl:add)
add ssl:custom addon to this first app
wait until you receive your assigned IP number from Heroku
configure DNS (point your domain to this IP)
Ok - so far the steps are exactly like when you add normal certificate to one app. Heroku assigns a full EC2 instance just to handle requests and route them to appropriate dynos. However this instance is like any other Heroku’s proxy servers that sits on the edge of the cloud - it’s capable of routing requests to any app based on assigned domain. So to add other apps to the picture you only need to:
point their DNS entries to the same IP (it already responds with your multi-domain or wildcard cert)
and now the twist: add ssl:piggyback addon to every one of them (so that they handled SSL requests)
My workshop runs on four Linux workstation, on Macbook and one Windows box. We also manage some VPS servers. Here’s backup solution in a nutshell:
Separate backup server (Linux), running Rsnapshot every hour, taking a snapshot of local workstations and VPS servers.
Each night latest backup is uploaded to Amazon S3. Besides that everybody archives their projects on DVDs.
If you’re not familiar with Rsnapshot it makes copies of selected directories. Each snapshot is a full regular copy of all protected data - no big compressed archives, incremental copies etc.
What’s cool is that files that do not change between snapshots become hard links. This means that such unchanged file is present in both snapshots, but it’s stored only once on the hard drive. No magic - it’s a feature of Unix file systems.You eat cookie (saving space) and have cookie (each snapshot is full backup).
It’s because the backups are plain directories. I can open my file browser of choice (mine is Midnight Commander), enter a snapshot from two hours ago, find a working directory of my current project and browse it. In other Midnigh Commander’s panel I can navigate to the same project’s working directory in a week old snapshot, and see two versions side by side. I can run diff to compare two versions of a file, run “du -csh *” to get size stats of my project yesterday.
In other words I can use any regular tools on any snapshot. No need to “revert” or “restore” files from backup, run any special software. This is mighty convenient for me and my co workers.
You need to:
This is Rsnapshot’s architecture: backup server pulls data from workstation and stores them locally. To set up backup server in remote location you’d need to make your workstation serve their data over Internet - not a best idea.
An advantage is that you manage one central place for all backups. If you need to add a workstation you only need to share it’s data (set up SSH, Samba or NFS server) and add a line in Rsnapshot config to backup this workstation.
I also thought of keeping the backup server local and storing only data remotely, but Rsnapshot needs to be able to create hard links. NFS supports this, and it worked fine when I mounted an NFS share on backup server and told Rsnapshot to store backups there. I guess you could tunnel NFS through a SSH connection - but it’s an overkill.
Bottom line: backup is in the same location so it gives no protection against disasters like flood or fire that affects whole office/workshop/LAN. My solution: each night duplicity is launched and backs up latest local backup to Amazon S3.
I use three cycles in Rsnapshot: hourly, daily and weekly. Each hourly snapshot is kept for 8 hours, daily backups are kept for 9 days and weekly backup is kept for 4 weeks. This means that if I delete a file locally, after a month it disappears from the backups.
Every developer is still obliged to archive his projects when they are completed, paused or reach a milestone, then burn them on a DVD and label appropriately. Backup and archiving are different things, see: know difference - backup vs archive
Let’s say you have a 20G VirtualBox image. You launch the virtual system, do some work and shut it down. There are always some minor changes in the image (e.g. in Windows virtual memory file, registry, log files). Rsnapshot would store completely new copy of the image. If the virtual image is used often, this means 20G of backup every hour - not good.
Solutions:
That’s roughly it. And… where are my manners… welcome to my new shiny blog!
I don’t expect to write often - but I’ll certainly post answers to all questions I’m frequently asked.