Tuesday, September 22, 2009

Mathematical Problems in Artificial Intelligence

Another name for this entry could have been how to kill time at work when your bosses half-baked financial application still doesn't have any actual specs to work from...I think this one was more poetic.

So for those of you who are paying attention (a few cats and a small disheveled dog) I've recently been given the opportunity to work with one of Carleton Math profs on some mathematics involve Fuzzy Perceptron networks - despite the fact that I only have a fuzzy idea of what a fuzzy perceptron network is. ( I knew what a perceptron was before hand, but most of Neural Network related interests were in Self Organizing neural networks or those based on biological or information theory). I don't know where this is going yet as I've just started taking the first participatory steps in this - but it should be interest.

Anyway - the point here is I've start reading up on the background material that was given to me - starting with good old Marvin Minsky of MIT fame. (Ok I admit Marvin Minsky isn't exactly a house hold name for non-nerdy folks, but if you've even had a passing interest in the history of modern Artificial Intelligence research his name probably came up ). I'm about half way through it - and although the subject is very interesting - it turns out that good ol'fashion perceptrons had some serious limitations that it would nice to get around. The problem is how could one do that?

---BIG SCARY MATH WORDS WILL BE USED MOSTLY INCORRECTLY HERE----

I was thinking that perhaps (and this idea is very raw - doesn't really even count as half baked and is probably horribly inpracticle at best or very very wrong. Also since I don't know what direction the good doctor wishes to take with these fuzzy perceptrons yet it may also be a completely unhelpful thought experiment) one could extend the idea by the use of Homology or Co-Homology theory...both of which I am highly sketchy on and wasn't exactly good at it in university. (I took Algebriac Topology once ... I am still unaware what mathematical formulae were used that allowed me to pass it, but I imagine it involved a rather lot of fudging on the part of the marker :P) Specifically the part of it involving Poincare Duality and thoe wonderful cell structures to help define the structure of the neural network. Why? well the very very very tenous link comes from the fact that those cell structures are similar to voronoi diagrams which basically define the configuration of Kohenen style self-organizing map. So if one could generalize those diagrams to cell structures, then associate with each cell a perceptron unit, then one *might* be able to learn a bit more mathematically about how collections of learning units model abstract environments.

Or as I said before - it's also possible I'm completely full of shit. I haven't looked at any of this in a few years and am probably making connections where none exist. But me jabbering on about bullshit is the purpose of this blog remember?


---BIG SCARY MATH DONE ... YOU CAN STOP HIDING NOW---

So I apologize but occasionally this blog will get used a musing point for all my bad ideas, not so much for your benefit, but so that I can keep myself on track. Unfortunately it also means that all my wrong ideas will happily aired for all to see ... but honestly, there is nothing wrong with wrong ideas so long as they are recognized as such. For example, the fact that no-one reading this has bought me a beer in the last 10 minutes is very wrong, yet I imagine you accept this and move on with your life ... waiting for the very first opportunity to correct that mistake.

(Guiness is good, so is Keiths).

Saturday, September 12, 2009

Information Pollution

I am starting to think that current search / forum technology is reaching the point where it is counter productive to the user. The problem is that our currently implemented tech - say google page rank, works more off a popularity scale ( yes I know how page rank works mathematically, but when you think about it a bit - pages with a higher page rank are more likely to be the more popular ones ... making page rank just another popularity scale) which at first wasn't a problem - there was a good chance that the most popular link had the right answer.

Now however we have a problem, the internet is starting to become a giant network of peoples opinions about information (much like this blog - I can't stress the opinion part enough ... I make no claim that my word is gospel truth) and many of these people have the technical know - how to manipulate the system. Now examples of Page Rank manipulation can be searched for (such as web hosts selling high page rank sites etc), but the phenomenon goes a bit deeper. I submit Stack Overflow.

Right now - most of the stuff on it is pretty good ( I have wasted far too much work time reading this little forum site ) however - as I have seen first hand many internet users thinks it's funny to post useless junk on these type of forums . Now - I have no issue with these people, they can be amusing and aren't intending any harm for the most part. The problem is that through collusion, and the help of fellow like minded individuals they have managed to raise their "reps" through the roof.

Why is this a problem, well that's simple. Rep is intended as a measure of how good/useful a given users answers are. Instead cases are quickly cropping up where it becomes a measure of popularity instead. Measures of Popularity are incompatible with Measures of Quality which effectively nullifies these systems. If left unchecked (and not only is it unchecked for the most part - *but* - it would be very tenuous position indeed to say that could or should be checked) this will lead to what I am calling information pollution.

Definition - Information Pollution: A situation where correct, or at least informative sources are made more difficult to find due to being mixed in an much larger set of opinion based, or popularity based sources.

I am not exactly sure how one would study the spread of this form of pollution, but suspect based on pure observation that on the internet this is increasing rapidly. Eventually, when searching for factual or useful information we are going to need a pollution filter - some kind of expert system which can apply rules of inference to weed out the merely popular ideas from the ideas which are supported. (note it is not always the case that a popular idea is wrong).

Which of course ignores the core question - what is the difference between a popular idea and a right one? And can one put it in terms that a computer could understand ... or could one teach say a perceptron network or some other machine learning algorithm to tell the difference?

Feel free to submit the answer, worked in detail to me ... I promise to take all the credit so you don't feel bad about not having a popular idea ;)

Sunday, September 6, 2009

SOAP Swarms

Ok, so I've been reading my way through the WCF certifications exam booklet ( Windows Communication Foundation - primarily my interest here is for web servicy goodness) and my brain decided this had another use that was way cooler.

Basically we could use WCF (actually a highly shaved down version of it) as a framework for a "swarm computer". The analogy here is to a swarm of ants - basically a bunch of independants, but not very powerful, computing elements which when they work together creates an emergent behavior that can be incredibly complicated. i.e. the whole becomes more than the sum of its parts (in a matter of speaking).

Now it is easily argued that we have many examples of such systems working nicely in the computing and robotics fields. Heck just the other day I was reading about tiny thumb sized robots that worked together in a swarm talking to a central processor and sharing info via wi-fi, so I am not claiming to have invented the idea of the "swarm computer". The problem is that looking at most implementations basically have the 80s computer problem ... I have a Commodore, you have a Mac, and neither of us can read each others files. (Shut up naysayers - I am sure there were ways around this - but that isn't the point ... also for the rest of you ... you know the ones not quite old enough to be real people yet - this is a commodore 64: http://www.commodore.ca/gallery/adverts_commodore/c64_what_you_get2_commodore_micro_feb85.jpg ). What's missing is an OS that that could be used to program them.

So what would be needed in a "swarm computing" OS. Well, basically we need at minimum:

-A way to define messages.
-A way to transfer messages (the content of the message shouldn't matter - perhaps even the little guys could reprogram each other).
-A way to run programs on messages.

But wait ... that's exactly what the WCF and SOA (service oriented architecture - keep up) provide us - SOA being a concept and WCF being a heavy implementation of that concept. A slimed down version of WCF could be used to create a two part swarm computing OS. Part 1 - allows you to send and receive messages/programs to the swarm, part two sends and receives messages within the swarm based on the currently executing programs.

Anyway, that's my idea for the day.

Saturday, September 5, 2009

Captcha ... Crapcha's

Ah, the joyous wonder of having to decypher and type in the words so nicely obfuscated on the screen so that I can happily acknowledge that yes sir, I am indeed a human being...but is this method effective?

Well - it's highly effective at being annoying, but as a long term solution to deciphering if the user is human or not, it's about as temporary as we can get. Why?

Most Captchas are based on printing an obfuscated word onto the screen and then asking you to type this word in. Now this is supposed to be difficult because reading obfuscated words is hard for the computer right? Well yes - at first. However computer have been "reading" decades now - it's really just a matter of pattern recognition and a bit of image manipulation. Case in point - those pretty lines and pixels sometimes seen obscuring words to make it "harder" for the script are actually removable by a simple algorithm ( I wrote an implementation of it for Wavelets in University based on Morphological Operators. Morphological Operators are neat little critters whose name makes them sound far more complicated than they really are). As for obscuring the letters themselves by stretching them and squeezing them - well that can be fixed either by transformation or by using the examples themselves as input for a machine learning algorithm.

So why does this matter? Captchas are supposed to provide security by posing a problem that machines are bad it and humans are good at - instead most of them actually pose a problem which machines can be made very good at, and some people have trouble with. Good job captchas!

So to solve this problem at work I came up with a new system (I came up with the idea, and my co-worker is currently implementing it ... poorly, not do to his lack of skill, but the legacy project he's inserting it into is a nightmare which makes even doing something poorly a bloody miracle) which while definitely needs some work - I think (humbly ;) ) is closer to the right idea. The concept is rather than test a person on their ability to recognize a pattern and type it in, test them on their ability to reason about a pattern. Why?

Reasoning about things is a hard problem to do in general for computers. Can it be done in many domains? Sure. Is it easy to make a machine that can do general reasoning - well to my knowledge we have no Artificially Intelligent computers wandering around - so for now I'll say yes. So show them shapes, ask questions about the relationships of those shapes, maybe get fancy ask them to move the blue box over the red triangle, etc. There are no end to pattern reasoning questions that can be contrived, and as long as it site keeps those question relatively different, it's going to be infuriatingly difficult for all but the most determined cracker to break through on the Captcha vector ... and short of inventing Skynet they'll have some trouble using their powers to move on seamlessly to the next target.

Now is this easy to do...sadly no - it basically amounts to a mini-Turing test which is highly dependent on the ingenuity of the tester, however I am pretty confident that at the very least a framework could be built for running the test (maybe a reverse shurdlu? (sp) - look it up - shurdlu was cool!) and maybe such a framework would be worth looking into.

Or maybe not. Either way - the problem it won't solve is that Catpchas are annoying - it'll just make them more *usefully* annoying ... for what that's worth.

Wednesday, September 2, 2009

California Crimes Hierarchy: Pedofiles < Rapists < Network Administrator ????

See, this is my blog, and since I've already declared it mostly bunk to begin with - why not use sensationalist headlines while I'm it! (Also I get to creatively use spelling and grammar mistakes to boot - tke that grammaR nazis).

So where does this headline come from? The $5 million dollar bail set for a network administrator who has already spent 14 months in jail for a crime that isn't entirely clear he committed. Now I won't pretend to know if the guy is innocent or not - I don't have all the facts, but what I do know is that the bail here is completely out of synch. Stealing a link from Slashdot (it's ok - they steal links all the time right :) ) we see the following standards set out by the California court system

667.51 (a) * # 288 PC with one prior sex offense - for each prior listed 1,000,000
667.6 (a) * # Prior forcible sex ofense 1,000,000
667.61 # Prior sex offense 1,000,000
667.7 # Habitual criminal with great bodily injury 1,000,000
667.8 * # Kidnapping for sex offense 500,000
667.85* Kidnapping of a child under age 14 300,000
667.9 * # Serious felony on victim disabled, under 14 or over 65 100,000

http://www.sfgov.org/site/uploadedfiles/courts/bail_schedule.pdf

So basically - the court is saying that a network admin who won't part with his password without authorization is greater danger to the public than a child molester by about a factor of 10. That's right ladies and gentlemen, in California they don't warn their kids about strange men in white vans, rather they speak softly and say:

"See that guy with the laptop in Starbucks over there - watch out for him Junior - he'll do horrible things to you with his greasy socialist linux system. If he comes near you - run to the white van driven by the creepy looking guy who follows the girls to school each day ... it'll be better for you."

Damn - I had more to rant about, but I'm out of time and need to get to work - so the basic thought for the day is exactly how well is this freedom thing working out for us if merely being disobedient to a government official is enough for 14 months of jail time and a $5million dollar bail without a trail?

I for one, do no welcome our Big Brother wanna be overlords.

Sunday, August 30, 2009

Belief, Intuition, and logic

It never ceases to amaze me that the world is staggeringly full of people who insist on mistaking their feelings and beliefs for reason. Don't get me wroung - faith, belief, and even those pesky feeling things emo kids sometimes experience have their place - but despite what people believe these are not what differentiates society from barbarians - that would be reason.

Case in point, sitting on the bus, in my normal happy go lucky mood ... alright I was tired and cranky - some kid in oversized glasses decides to start loudly proclaiming how he doesn't *believe* in climate change. Which begs the question, exactly what religion is climate change a part of? Are their priests that I missed etc? Facts are either true or false - they are not up for "belief" status so long as they remain within observable limits of things humans can reason about. Climate Change is either happening or it isn't - and how you feel or believe has no bearing on it.

Now ignoring the case of whether this fellow was right or wrong (maybe he was right - I am not a climate scientist - I suspect he is probably wrong - and I know his argument was bunk) lets go over the basic premise of his argument which boils down to:

There is no way anyone could possibly predict what can happen with the climate, therefor it is bunk.

Really? So all this time scientist just haven't noticed they were doing impossible things? No mathematicians ever noticed climate folks performing mathematical no-nos and predicting things in the future? Clearly all those folks who spent years studying the problem on either side (and who last I checked agree that the climate was changing and not that the cause was humans - which means they ARE predicting things on both sides of the argument) are just morons while the feelings of a 20 something goth kid - those are correct.

So obviously (well to me) I have better things to do with my time then rant about a teenagers bad argument against climate change - so what's my point? My point is that the underlying cause of this bad argument seems to spread throughout society like a pandemic. I feel X, therefor logical statement Y (which is independent of how I feel) is true or false. It's basically the "Let your Feelings Guide You" fallacy (my name, copyright me, trademark me, etc). People have taken the concept of individualism, do what's right for me, etc to such an extreme that they've seem to have somehow come to the conclusion that their very feelings are so high and mighty they need not even be questioned - and this is bad.

Human beings are not jedi - our feelings cannot guide us in anything other than love and friendship, and religion. Our feelings are not useful for guiding our way through our finances, they don't tell us things about the climate, about mathematics (I still remember the person who had decided he didn't believe in Chaos Theory ... but that's another story ), or science in general. Feelings are a good starting point, but the best tool we have - and the reason we aren't throwing shit at each other from the tops of trees (instead we do it from skyscrapers now - isn't progress wonderful) has nothing to do with feelings and everything to do with reason. So screw the adage let your your feelings guide you - try instead to do the opposite and let reason guide you when dealing with the outside world and see what happens.

And that is todays installment of narcissistic bullshit.

Thursday, August 27, 2009

I Haz Blog?

So I've determined that I shall try this blogging concept out for a spin. What shall this blog be about - whatever the hell my narcissistic self decides I want to post that day ... or maybe this will be the only post - who knows ... who cares? Posts will be made whenever the magic 8 - ball tells me it's ok.

Right now the magic 8 ball says I don't like you very much ... why are you reading this?

What I can tell you is 90% of what I write will be opinions - 90% of that is probably of no value and should be taken with a grain of salt roughly the size of Texas. (On a side note anyone actually able to swallow a grain of salt the size of Texas really ought to consider changing their diet a bit). Now I am fully aware that some of you are under some wierd mistaken belief that everyone's opinion should be listened to, that everyone is of equal stature - has a right to have their opinions heard blah blah blah blah.

If you can't figure out my opinion on this idea, then my opinion is you should stop reading now. Thing is, it's a basic fact that most of our opinions are worthless ... yes even mine - but this is my self masturbatory piece of the internet so I declare that mine is important enough to be put here, you want to feel special too, get your own :) Why? Well - it's basically the "fallacy of the emperors nose" which for those of you who haven't run across it runs roughly like this:

The emperor of China can never leave his palace ( and we shall ignore both the sexism of assuming that this person of power must be male and what kinda of shitty ruler is never allowed out of the palace or to be seen by his (or her, or she-he? or whatever sex/gender the ruler happens to be today) people) and can never be seen.

Some hapless person is assigned the job of figuring out the length of the emperors nose....probably the Kleenex manufacturer they hired.

Since this person can't see the emperor, they proceed to interview everyone in China (apparently Kleenex employees have a freakin' unlimited amount of time on their hands) and ask them how big they think the emperors nose is. Essentially asking everyone in China their opinion on the size of the rulers nose.

Our hero takes all this information, shoves into the mother of all excel sheets (or some Open Source equivalent) and averages it out.

Kleenex uses this to make Kleenex, makes the wrong size tissue because they got the size of the emperors nose completely wrong and our poor hero is executed by way tainted McDonalds cheeseburger to the stomach.

So what's wrong here? Well that's simple - averaging the opinions of a bunch of people who have no idea what their talking about just gives you the average wrong answer - not the right one. Before you can begin finding an opinion on a subject of any value - you need to actually look at people who have a clue on the subject. So roughly we can say:

The value of an opinion is directly proportional to how well versed someone is on the subject at hand.

As a side note we should also clarify that being well versed in one subject is non-transferable to another - to wit asking a bunch of DnD experts their opinions on the best sexual position one can experience is very much akin to the story of the emperors nose :)

I intend to use this blog specifically to talk about things that I have no idea about on a regular basis for my own amusement. So far - I think I am achieving this goal!

Also this blog has spelling and grammatical mistakes - you will find that I don't care.