Archive for 2008

Tagging and the Semantic Web

Tuesday, May 20th, 2008

A while back I commented on a Tech Crunch article quoting my CEO regarding keyword searches in the Semantic Web space.   My comment was later quoted on the Faviki blog, a Semantic startup involving tagging web pages with semantic wikipedia data.  Finding this prompted me to start writing more about the Semantic Web on my own blog.  This is actually the first time I have ever posted about someone else’s post.

(The following is based on a presentation I gave on the subject in September of 2007.)

Tags the way they are implemented today

The way the better Web 2.0 sites implement tags involves faceting. I have discussed this in a previous blog post regarding faceting with Lucene and SOLR, but it in a nutshell, it allows you to group together documents or objects based on attributes. For example, give me all documents about ‘George Bush’ and ‘Washington’. The problem with these attributes is they have little or no value on their own and they certainly they are not understood by computers. They are just strings denoting some type of concept. Here is a short list of limitations which I feel the Semantic Web web will address:


- Tags do not provide enough meaningful metadata to make meaningful comparisons
- More information is needed besides their origin
- Tags are essentially a full text search mechanism, although faceting helps
- Need more relationships between tags and the objects they pertain to

The solution, tags as objects

Allowing users to tag an object with another object we can make extremely interesting comparisons; discerning a lot more information about the original object becomes simple and accurate. With this type of interrelationship we can pivot through the data like never before, not with full text search but object graph linkages that machines and humans can understand. Lets go over an example.


Lets say a user adds a note into our system ranting about a beet farmer who lives in Washington state by the name of William Gates. The user goes on to discuss his beets and farming techniques in great detail, mentioning nothing about software and Windows Vista of course. In the current Internet model the user would tag this note with strings like, ‘William Gates’, ‘Bill Gates’, ‘beets’, etc.


Now another user comes along and starts digging through documents tagged ‘Bill Gates’ to try and find new articles about Vista. Unfortunately, many searches will turn up bad results, especially if the density of the word ‘Bill Gates’ is great enough in the document about beets. That being said, the other direction would work more as intended, searching on the tags ‘Bill Gates’ and ‘Beets’ would yield more expected results.


In the Semantic Web model, the document about William Gates the beet farmer would be tagged with the William Gates object which could contain a plethora of metadata, his location, occupation, etc. Now when we look at this document there is no guessing as to what it is referring, especially from a machines point of view. This is exactly what the Semantic Web was built for. In this model we are not relying on linguistics, natural language processing, or full text search. We are relying on hard links that machines can understand and relate to.

The disambiguation page (was the tag page in Web 2.0)

What about regular string tags? The Semantic Web cant possibly understand everything?! The fact of the matter is, thats true. We still support regular string tagging. Some things are not proper nouns and less concrete, like adjectives and verbs. They may not yet deserve their own object; however, lets think about actual language here for a second. The semantics behind how we describe things.


Take the adjective ‘cool’. Well, first of all, what are you looking for? Nouns? A grouping of multiple nouns? Probably ‘cool’ nouns. A search on this tag could turn up anything and everything from many different levels. It could start by pulling in a definition from Wikipedia. Then it could group together a list of groups tagged cool like the ‘Super Cars’ group or the ‘Fast Cars’ group. It would also show you users tagged ‘cool’ and documents tagged ‘cool’. But where it becomes really interesting is where you find the ‘cool’ string tag on a tag object! Now you can find proper noun tags like ‘Ferrari’ as well as ‘Super Cars’ the proper noun.


Joining these tags together in a search would yield detailed results from rich metadata like a list of Ferrari’s over the years represented as objects. Each car object would contain detailed specs on engine type, weight, horsepower, etc. Then by examining the ‘Ferrari Enzo’ object we can find all the people who used this tag on their bookmarks, links, documents, or other objects they created. With this information you can connect with these people, join their groups, and further your search for whatever it is you are interested in. The point is, everything is related at many different levels. What links them together are adjectives and verbs that describes them.

Conclusions

To be able to come at your data from every angle is important. Everyone thinks differently and everyone searches differently. The truth is, I think its going to be a while for machines to really understand what us humans are talking about. Its up to us to help organize data in a format that is machine readable so the machines can share, but in return it allows us to perform incredible searches likes never before.


There are many common misconceptions around the Semantic Web. It is a very broad term which has many facets. We are going after what I feel is an attainable portion of this idea. Our platform may not try and fully comprehend and reason what the term ‘cool’ means like a human does but you are a human. You the user understands what this term means and just how to use it. If not, our platform will definitely try to help.

Whats the deal with these vacuum tubes anyway?

Tuesday, May 13th, 2008

RCA Vacuum tube and vintage boxSimply put, a basic vacuum tube is much like a light bulb. In a light bulb, negatively charged electrons travel through the filament colliding into the surrounding atoms but in a vacuum tube the electrons jump off the filament onto a positively charged plate called an anode. This type of vacuum tube is called a diode. What it does is convert AC to DC. This basic idea has been around since 1904 and later perfected in 1907. Since then many variations have come and gone but the same basic concept hasnt changed. The tube you see to the right is an RCA 5692 triode tube from the 1950’s. My newest amp requires three of them on the input side but more of that later.


Over the past 100 years they have appeared everywhere in world of science and technology. They were in televisions, radios, radar, tanks, phone networks, stereos, and really almost everywhere. They eventually became the backbone for the first computers. Today, most tubes have been replaced with much the much smaller and modern semiconductors we see today. Nowadays they are often found among radio enthusiasts, high-end or vintage stereos, and in guitar amps.

So whats so great about vacuum tubes in an amplifier?

It really depends on who you ask. This has been a long drawn and heavily debated issue ever since the first solid-state stereos were introduced in the 70’s. For me, I like all things vintage. Especially vintage technology. I love their simplicity, their nostalgic character, and the attention to detail you get from a hand built piece of equipment. For just the engineering reasons alone I love them. All that being said, they sound wonderful! They are warm and rich and deep. The sound can be so real and the stage so perfect that Neil Young’s harmonica sounds like its next to you and Stevie Nix is singing in your living room.


For me, its a really the entire experience; the charm and character that they bring along with a good piece of vinyl is amazing. The solid-state guys say that the sound isnt right due to the electron cloud inside the tube which is responsible for amplifying the sound. The analog guys claim its reality, its lifelike, and a lot of music was recorded this way since the beginning. Either way, I seem to enjoy them.

What is an RCA 5692 tube and more importantly, what is a triode?!

Lets not worry to much about the RCA tube you see above and look more at what a triode itself is. There were so many different ones built from so many countries over so many years it can be dizzying. All that really matters is that they are like cars. Some were well built and some were not.


Triode exampleTo the left is a basic diagram of a triode vacuum tube (thank you Wikipedia). The main difference here between a diode and a triode is the grid surrounding the filament.


Remember that the filament is emitting negatively charged electrons that are picked up by the anode, but in a triode tube, there is a grid or screen in between. By applying a negative charge to this grid you can control the amount of electrons that will contact the anode.


There is a lot goes into the circuit topology the dictates how to charge the grid but thats another story for another day. The two amplifiers I currently have are both powered by triodes, both on the input and output side, but their design differs in the way the tubes are implemented.


Sourcing parts for the DIY record player

Monday, May 12th, 2008

A few months back I started sketching out plans for a do-it-yourself record player. Since then, not much has gone on due to my friend the electronics expert being out of town so I decided to source some parts. I turned to a Chinese company called diyhifisupply.com. China you say? Well, over the past few years China has really become a key player in the DIY world of high fidelity. They are creating high quality components a great prices from the platter you see below to oil filled caps, tubes, and other various parts.

le club hifi turntable platter

This platter was laser cut out of 40mm thick acrylic at a final diameter of 298mm. Its truly is a beautiful piece of work. The markings on the platter are for timing it with a strobe. Below is a photo of the bearing that I ordered along with it. It too is a very nicely machined piece of work with a self-oiling design using rifles to push the oil back up to the bearing. When everything is all connected together it turns very smoothly.

rifled turntable bearing

College bar

Thursday, May 8th, 2008

Completed bar

Not that kind of college bar. I built this bar in my apartment with the help of some friends my sophomore year in college. It came out pretty well considering I was 19 years old and didnt even have a proper table saw. Literally, we had a table and a saw. Anyway, behind the bar we put in shelving in the middle and enough room for a decent size fridge. On the floor we cracked old tiles and laid out a random mosaic pattern. The bar itself was built out of a 2×4 pine frame with beech veneer covering all sides and doubled up on top for added strength. We finished it up with a coat of stain and a few coats of urethane. It was well worth the effort and got tons of use over the years. Hopefully the new owner the apartment is still enjoying it today.

Maven2, my first impressions

Tuesday, May 6th, 2008

Maven has been a far departure from the usual Ant builds that I have become accustomed to. Now although Ant doesnt deal with dependency management like Maven does I have used Savant in conjunction which worked quite well. The constructs werent complicated, it was getting accustomed to Maven which required me to forget I know everything about Ant; oh yea, and a few days of pounding my head.


After the initial mental exercise I really got to liking it. It allowed me to easily handle multiple-module projects and all their dependencies. It also has great plugin support for anything you can think of, one of my favorite being Jetty (developing locally with Tomcat is for suckers!). I also really like the pom.xml settings as well as profile setting to handle different environments. Anyway, next time I start another project from scratch I will probably take another run with Maven.

BMW M3 head machined and cleaned

Sunday, April 27th, 2008

BMW M3 cam case cleaned
More photos: BMW M3 head photo set

The M3 head just came back from the machinist about a week ago and it looks wonderful but unfortunately I didnt send off the cam case and other parts to be media blasted along with the head. So this Saturday I spent the day with the parts cleaner at my friends shop A1 Imports Autoworks in San Rafael. It saved me a little bit of money and I had to inventory anyway. I cant wait to put it together!

Photography from Mount Tamalpais

Tuesday, April 22nd, 2008

Rrecently I took one of my favorite drives up towards Mt Tamalpais and eventually ended up at Point Reyes where we drove through the farms. Here are some of the photos from the trip, as well as some random ones from my various trips over the years. You can see the rest of these photos on the new flickr site for John Clarke Mills!


A rock climber up on the top



Here is one of the 2002 I really liked. This was a beautiful back road leading down Mt Tamalpais toward Stinson Beach. Fern-covered forests down redwood lined streets for miles and miles. The farm roads in Point Reyes are just as enjoyable.

The BMW 2002 near a stream

PipedInputStream and PipedOutputStream with Java

Thursday, April 10th, 2008

Today I came across an interesting concurrency problem while deleting objects from the Social Graph (Semantic Web remember?). I have been tasked with mass deletes throughout our system, including exporting the objects in case they ever need to be reassembled again. Since our graph is so large, and we could potentially be deleting 10’s of thousands of triples at a time, the serialized XML would be about 10 times that many lines per triple represented in a file. In order to write the output to a file as fast as can be there was no need to store the serialized XML in memory. The best thing to do was to pipe the output stream to our binary store.


Now in order to do this, I need two threads, one to write and one to read. If you were to do this with one thread you would most likely run into a nasty deadlock situation. Anyway, here’s what I came up with:


public InputStream openStream() throws IOException {
    final PipedOutputStream out = new PipedOutputStream();

    Runnable exporter = new Runnable() {
        public void run() {
            tupleTransformer.asXML( tuples, out );
            IOUtils.closeQuietly( out );
        }
    };

    executor.submit( exporter );

    return new PipedInputStream( out );
}


Can anyone see the problem? Well unfortunately I couldnt either for over an hour. My unit tests would pass sometimes and fail others which led me to believe I was dealing with a timing issue. Turns out, sometimes the PipedOutputStream was completed before the PipedInputStream was even instantiated, completely missing the stream of the out.close(). The trick was to instantiate the two streams, in and out, at the same time then start the output with another thread. Problem solved. Here is what the finished product looks like:


public InputStream openStream() throws IOException {
    final PipedOutputStream out = new PipedOutputStream();
    PipedInputStream in = new PipedInputStream( out );

    Runnable exporter = new Runnable() {
        public void run() {
            tupleTransformer.asXML( tuples, out );
            IOUtils.closeQuietly( out );
        }
    };

    executor.submit( exporter );

    return in;
}

My first vacuum tube amplifier

Tuesday, April 1st, 2008

Cary SLI-80 vacuum tube amplifierFor a long time now I have been interested in vacuum tubes. I love their simplicity, sound, and the fact that the design hasnt changed since the early 1900’s. They also were used as transistors in early computers before the solid state transistor was invented. 50 years ago it would take an entire room of these to power a computer with less memory than a Casio wristwatch.


Anyway, this is a beautifully hand made oil-filled-cap integrated amplifier than runs in 80-watt Class A/B ultra-linear mode and 40-watt triode mode. I have it fitted with a set of Russian Electro-Harmonix tubes, the power coming from two sets of KT88’s. The sound is quite crisp with little to no hum or background noise which is surprising considering tubes are known for these issues. This paired with a set of Kef iQ5 floor standing speakers are wonderfully crisp and clean, especially with vocals. Now, to fix the weak link, the 5 dollar garage-sale bought record player. More to come on the DIY record player later.

Ajax and IE 7: cache not invalidating

Friday, March 21st, 2008

For the past day or two I had been struggling at work to figure out why Internet Explore 7 would not pay attention to the response headers stating not to cache the response. First, I tried setting the date header to expire instantly, with a value of -1. QA confirmed that this solved the problem in some revisions of IE 7 but not all. After digging around the web it turns out that you have to set a few more headers, one of which I had never even heard of. Here’s what solved the problem for me. This snippet is in Java but could apply to any language:

response.setDateHeader( "Expires", -1 );
response.setHeader( "Cache-Control", "private" );
response.setHeader( "Last-Modified", new Date().toString() );
response.setHeader( "Pragma", "no-cache" );
response.setHeader( "Cache-Control", "no-store,
                                 no-cache, must-revalidate" );