Wednesday, January 25, 2012

Yet Another Steve Jobs Biography Book Review

First of all, I am not a hero-worshipper. Never have been. Never will be. Second, I am not an Apple fanboy. Never have been. Not sure, if I can say, "Never will be!" But, over the years, having worked in technology, having used Apple products and having closely followed product launches (Apple product launches, GoogleIO keynotes and sessions, Kindle announcements, CES etc.), I gradually began to have special regard for Steve Jobs and the products that he championed, even though I hated some of the Apple approaches to dumb down user interface and technology. However, after reading the biography, I have a new found respect for Steve Jobs - his creative talents, his visionary mind, his attention to detail and also the unsung heros at Apple behind those amazing products that they make. I think I have a better understanding of how and why they did the things the "Steve Jobs way."


Being a techie, I choose to judge only things that have "computer science" associated to them and hence I have chosen to neither judge nor comment on any of the personal, emotional or inter-personal aspects (relationships with family, friends, monetary dealings with Wozniack and the others etc.) of Steve. I believe that you have to be one of the parties really involved with the situation to be able to make a judgement and even then, I think, without actually peeking inside ones head and putting all the situational elements (circumstances, past history, personal biases etc.) together, it is really hard to judge somebody, unless you are a psychotherapist. One can argue that everything about him and his accomplishments finally boils down to his personality but since my degree is in computers and not human psychology, I would like to only address things that I can understand and glean from the book.


Some of the thoughts here may not have been explicitly mentioned in the book but may be the way I inferred or interpreted the various scenarios. At times, to drive the point I have added my own 2 cents, since this is what I learnt from the book rather than merely regurgitating sections from the book.
  1. Be persuasive.
  2. Believe in yourself. Never give up. Be obsessive.
  3. Work with and around people who share your vision. Without this, it will seem like sitting in a car with each of its wheels going in a direction that it feels like.
  4. Build an "A team". "A team" members can only with "A team" members and working with "C team" members is a huge drag on productivity and employee morale. Picture the car engine of a high performance race car - every piece in it is for a reason and every piece is giving its 100% and was hand picked to produce that level of overall performance. Even as small a thing as a loose nut can bring down that race car.
  5. Never do a sloppy job. Never. Not even if nobody is looking. Even if this is an internal tool, even if this is for personal use. Redo it how many ever times it takes to get it right. Visualize everything you do as a work of art that somebody is going to look at.
  6. Make technology less scary and more human. Nobody needs a user manual to operate most functions of a refrigerator or a washing machine. Why should technology be any different?
  7. If you build a classy product with the end user in mind they will buy it. Don't think about profits. Think only about the product and the end user. Profits will follow.
  8. Everybody judges a book by its cover. Even though we know not to do it and we are taught in school not to do it, everybody does it. Admit it. You do it at the grocery store - you buy only products that are packaged in an appealing manner. In fact, grocery stores sell you stuff that you don't need, merely by placing it in an appealing manner. It is very essential to make a great impression. Apple products, starting from the Apple store, to the buying experience there, to the packaging of the product,  to the feeling when you hold the product, to the experience when you use the product all culminate to give you the "Apple Experience".
  9. Iterate on your product. Do it multiple times. But do it internally. Don't make your end users your beta testers. Treat products like other consumer electronic devices. Imagine how it would be if your washing machine was released in beta and you would have to work through its water leaks, poor washing capabilities, keep taking upgrades hoping that the next release would give you all the features and finally start working perfectly as advertised? Using the washing machine should not entail having to read through a mass of documents or having to search the web for specific problems or having to ask questions on news groups on how to perform its basic operations.
  10. Anybody can construct a building but not everybody can build the Taj Mahal. When you build something, make it grand. Make it a work of art. Make it something that people will appreciate and remember.
  11. Focus on a few things. Have laser focus. Do a few things and do them well.
  12. Rally your troops by being hands on. Unlike leaders of today, who sit in the luxury of their office and send orders to the poor soldiers who fight in the front, be like the kings of historic times - fight the battle, shoulder to shoulder. Motivate your team. Push them to show them that they can excel.
  13. Don't build products like an assembly line - each group (chassis, hardware and software) building their piece in isolation and throwing it over the wall once they are done and the next team picking it up and repeating the same process. Rather than that, build it as one cohesive piece. Once built, it will work like that too. Again, analogy with the automobile. The entire car is envisioned as one piece - otherwise you might land up with a strange car having the chassis of a boat, the wheels of a tractor and a joystick instead of a steering.
  14. Don't build things that merely replicate functionality that is already out there. Leapfrog your competition by building something orders of magnitude better.
  15. Don't compromise on your key beliefs. Even if it will allow you to get an initial not-so-fully-baked version of your product out sooner. Not even if it will allow you to get a foot in the door first. Compromise is a slippery slope.
When I started writing this blog post, I chose not to comment on personality. However, it is hard for me to resist the opportunity. And since this is my blog, I am going to do it anyway. I think Steve was a genius, one of a kind, passionate, "artistic engineer", who strived to build technology that was more human and resembled art rather than merely the embodiment of product features from a marketing list that were slapped together on a piece of hardware. Most successful engineers (and artists) have made one or two brilliant products but he was able to make so many of them. He was a man way ahead of his times - a visionary. Of course, he was not perfect, especially when it came to inter-personal relationships and he may have burnt a lot of bridges by his abrasive personality. But, like I said in the beginning of the blog, I choose to only absorb the aspects that relate to his geeky part and possibly learn from the mistakes of his inter-personal relationships. I think his son, Reed Jobs, sums it up really well in the book:


"... was not a cold profit-seeking businessman but was motivated by a love of what he did and a pride in the products that he was making"


If you survived reading this post all the way up to this point, you are probably thinking that I am actually a hero-worshipper and an Apple fanboy! It is hard for me to convince you otherwise and all that I can tell you is that if you haven't read the book please do so. I always remember what Dr. Mike Bailey, former Professor of Computer Graphics at UCSD, once said that dramatically changed my thinking. Bill Gates was going to give a talk at UCSD and I was in my "Microsoft is Evil" frame of mind and did not want to attend the talk, which happened to be at the same time slot as the computer graphics class. Dr. Bailey, not only canceled his class but also encouraged us to attend the talk and told us that, weather we liked it or not, Bill Gates was certainly a man who had largely influenced the digital revolution and as computer scientists it was essential for us to have an open mind and listen to what he had to say. If you are curious, yes, I stood in line for more than an hour and listened to Bill Gates but I was really underwhelmed by the contents of his talk and his passion. But that is another story ...

Tuesday, January 3, 2012

Building a Tagged and Revisioned File system

This blog post consists of discussions with vinod regarding the file system in his blog post about os.next and my mindless mumblings during the 12 hour drive to and from Phoenix for the Christmas 2011 holidays!

Version control has been around for a really long time. Tagging, however, is a fairly new concept, that became popular largely due to Flickr and Gmail. Tagging, instead of creating folders in the native file system for the desktop, seems like an extremely useful feature, especially, for documents, photos, music. Usage of tags on the web has demonstrated that it is convenient to tag/label content to categorize them under multiple buckets. Similarly, for content typically found on a desktop PC, we can apply the same concept of tagging/labeling instead of creating folders. This would allow retrieval of files with tags without having to look for them in specific folders. This is again an attribute that we use extensively on Flickr and Gmail and other cloud content storage/retrieval applications and bringing this feature to the desktop might make storage/retrieval of content as easy as that on the web. This approach of tagging in file system vs. using folders, surprisingly is not a new idea and Googling around revealed a couple of posts: tagfs-tag-semantics-for-hierarchical-file-systemsa-tagged-filesystem.

Version control, on the other hand has been used primarily for managing source code on projects and is usually an action that is manually carried out by the user and typically not used for content on the desktop. However, if files on the desktop automatically get revisioned, then it would make this a background activity in the file system and the user would not have to explicitly checkin/checkout their files. This would be an extremely useful in cases where we clicked on "Save" instead of "SaveAs"  and lost the original copy of a document that we were editing. Again, auto-revisioning is not a new idea. It has been around for a long time but surprisingly hasn't become a default feature in all desktop file systems. Mac OS X has good internal backup and versioning support with Time Machine and Lion introduced the auto-save the feature.

Now, combining the flexibility and convenience of tagging with the benefits of auto-revisioning (rather than explicitly placing a file under some version control) would be an interesting combination for the desktop environment, we think. Here are some design and implementation ideas to build a file system that supports both tagging and auto-revisioning:

Goals:
- tagging instead of folders
- auto revisioning of files and tags
- don't break make and other build tools like autoconf, automake, configure, ant etc.
- accomodate duplicate file names (for example, digital cameras auto-number photos and the file system could the potentially have duplicate file names referring to totally different content)

Design Basics:
- prototype implementation on linux using fuse
- when a file or tag is created or revisioned it is associated with an 32 (or x) bit id.
- by default a file without any tags is equivalent to being in the "root" folder.
- tags and files are revisioned objects and by default accessing either will give their latest versions.
- tags and metadata stored as json/bson (hopefully makes the metadata future proof)
- tags can be ordered or unordered
- ordered tags can be used in cases where a fixed folder hierarchy is necessary (source code, for example) and are associated with parent and children tags (similar to the  Nested Labels feature in GMail)
- tags are also be versioned. suffix @@ followed by a version number for a specific tag version. this allows the user to look at what files were tagged on a specific version, similar to what files were in a folder at a specific version
- file are auto-versioned whenever they get modified. suffix @@ followed by a version number for a specific version
- use forward diffs for revisions - so when a file gets modified to version 2, go and update version 0 in the backing store to contain the diff between v0 and v1. similarly when v3 gets created, update v1 as diff between v2 and v1. this will allow the latest version to be viewed without having to traverse the diff hierarchy and old versions can be pruned from the filesystem, if necessary, without having to go and fix the  diff hierarchy. this approach adds an extra overhead of having to update old versions when new ones are created but it can be setup as a background low priority activity as its primary purpose is to reduce the space used by the backing store. this idea of forward diffs can be applied to the scheme of branches also where there could be multiple children to a particular revision. when a branch is created, v0 on that branch is a full copy of the file on the branch point. this allows the files on the branch to get revisioned using the same approach as that on the main branch.

Tag:
- all metadata stored as json/bson
- root.tag file always present and contains the info for the "root" tag
- tag files contain following info:
* name of the tag
* parent tag id (empty string if directly under root. allows "orphaned" files to be tagged to a parent tag)
* ordered child tag ids (allows a directory structure that may be necessary for some source code stored in the file system)
* file ids that are tagged with this tag (basic means of lookup to file_id.dat and file_id.meta)

Files:
- stored in a separate common folder as file_id.dat
- metadata for a file (timestamp etc and associated tags) stored as file_id.meta in json format 

Example:
root.tag
{
  "parent": "",
  "tags": ["1234abcd", "5678abce"],
  "ordered_tags": ""
  "files": ["2345defc", "9876beef"]
}

1234abcd.tag
{
  "parent": "root",
  "tags": "",
  "ordered_tags": ""
  "files": ["2574dddd", "aeee5621"]
}

2574dddd.meta
{
  "name": "somefile.txt",
  "tags": "1234abcd",
  "last_modified": "01-01-2012 1300 PST"
  ...
  ...
  ...
}

2574dddd.dat
<contents of somefile.txt>

Versioning with Tags:
When versioning is added to the above scheme of Tag and Files, the following adjustments need to be made:
- tag files split into two. first one is versioned_tag_id.tag. this contains the information as before but is a file that gets stored as diffs. so, when a new file gets added to this tag, this versioned tag file moves to a newer version with an updated entry for "files". similar thing happens when a file gets removed from this tag.  tag_id.tag is the higher level metadata that contains the mapping between the tag_id and all its corresponding versioned_tag_id.tag files. it is basically a json file containing a list of version numbers and corresponding versioned_tag_id files. this also allows branches and "release" labels to be added to tags.
- nothing really changes with regard to files. they continue to be tracked as file_id.dat except that they are stored as diffs described in the basics section. the specific version of a file in a branch is listed in the entry of the tag and lookups occur based on that.
- to speed up lookup of the latest version of a file special tags (such as CURRENT or LATEST) could be used in the tag file.

Same example, now with versioning:
root.tag
{
  "parent": "",
  "tags": ["1234abcd", "5678abce"],
  "ordered_tags": ""
  "files": ["2345defc", "9876beef"]
}

1234abcd.tag
{
  "parent": "root",
  "tags": "",
  "ordered_tags": [
      {"0": "1235abcd"},
      {"1": "1236abcd"},
    ]
}

1235abcd.tag
{
  "parent": "root",
  "tags": "",
  "ordered_tags": ""
  "files": ["2574dddd"]
}

1236abcd.tag

{
  "parent": "root",
  "tags": "",
  "ordered_tags": ""
  "files": ["2574dddd", "aeee5621"]
}

2574dddd.meta
{
  "name": "somefile.txt",
  "tags": "1234abcd",
  "last_modified": "01-01-2012 1300 PST"
  ...
  ...
  ...
}

2574dddd.dat
<contents of somefile.txt>

Stretch:
- Add a mini webserver with REST api's for all the tag operations and file accesses (GET for operations and POST for data). This will allow a file explorer to be build possibly using javascript. Additional benefit is that access to files on the file system from anywhere on the network can be accomplished without any samba/nfs style mechanisms. They just need to open a browser and connect to the webserver of the machine they want to view the files on. There is an issue with permissions but this can be resolved in a scheme similar to htaccess file used in Apache via a scheme of login's.

Performance and Scalability:
Hard to comment on these without actually building it and carrying out measurements but it seems like the flexibility gained may outweigh the additional overhead of using json. It will also largely depend on the content that is stored in the file system. It seems like the data on a personal laptop/desktop would be an ideal candidate for a file system like this. It would certainly perform horribly if you use it as a file system for storing your database archive! I think medium sized (a wezel word) source code dev tree may perform well in this file system but mixing it other vcs like subversion may cause strange thrashing behavior between the auto-revisioning and the versioning mechanisms of that vcs.

The approach of adding versioning on top of the tagging approach would possibly allow for versioning to be disabled for certain "mount points" or file types like build objects.

In conclusion, the concept of tagged and auto-revisioned file system can be implemented as a user space file system on Linux. The next step would be to actually prototype this design, add varieties of files and tags to the files and see if the added behavior has value and the performance overhead is not unbearable.

[Update in light of vinod's discovery of fossil dvcs]:
This entire design can be re-adjusted using one of the approaches below:
- use fossil and discard the schemes proposed here
- use sqlite as backing datastore instead of pile-of-files (.meta, .tag and .dat) and possibly the scheme of hash instead of tag/file_id
- modify fossil to retrofit this tag concept instead of folders
- add tagging/versioning to a libsqlfs - a fuse file system based on sqlite