MagicCS: 十二月 2015

It is a technical to search objects in visual environment, like search your friend in a crowd of people, search for food in a supermarket. My visual search have used a eye movements as means to measure to the degree of attention given to stimuli.

But vast research suggest that eye movements move independently of attention and therefore is not a reliable method to examine the role of attention. Much of the previous literature on visual search uses reaction time in order to measure the time taken to detect the target amongst its distractors.

==Search type:
Features search. The objects could be describe by the features. And the target could be quickly figured out by checking the value of the features. For example, from the red circle in a set of black circle. This search could be very efficient. It can be done in parallel.

Conjunction search. When the target and the distractors have more similarities in more than one single visual property such as size, color, orientation, and shape.

Image matching. Visual search via image matching can be used for a large variety of applications. It should have one or more search libraries containing images to be matched, associated with metadata such as descriptions, classifications, and links to further information. The advantages of this method

==Theory,

One way to visual search via image of regions can be accomplished is using a mathematical technique called vector quantization. It is like partitioning the image into kernels, similar to the technique of tokenizing the document into works or sentences. Then, the image is stored as indexes of these kernels.

In general, recommendation system could be summarized in two big classes. One is content-based recommendation, the other one is collaborative filtering recommendation.

==Classification
First, Non-Personalized System, or community based recommendation. which means the system does not reveal any personal information. It collects the most popular items in the system, regardless users' characteristics, like hotel, restaurant recommendation, such as yelp. You cannot filtering the result based on your personal preferences. For instance, I am in my 20s and I like some restaurant whose environment is very fancy and I do not care much about the tasty of the food.

Second, content-based recommendation. User ratings are correlated to item features. For each item, extract its features, then we can get the weighting of item features for each user. For an new item, we can predict the rating based on its features. Alternatively, no all the users like to provide the explicit ratings, for this type of systems, it is possible to get users information from the implicit information, like users' browsing/clicking/navigating information, like news feeds, some music, video recommendation systems.

Third, collaborative-filtering recommendation. The fundamental assumption is that people have similar preference would like to see the similar items. For instance, if both of us like horror romance movie, then you saw a movie that I have not seen before. It is highly likely that I would like the movie as well. The key information is the ratings information. And the problem is that the rating information is quite sparse for most the cases. Ways to handle this issue includes filling the missing values, selecting promising cells.

==Evaluation
First, Accuracy. Precision and Recall to see whether the predicted user preferences is really what user likes.

Second, Usefulness of recommendations. Such as diversity, non-obviousness.

Third, Computational performance.

==Details of Content-based Recommendation
The key is to build the attribute vectors for each item. TFIDF can be used to create a profile of a document/object. For instance, a movie could be described as a weighted vector of its tags. Three main steps:
1. computing vectors to describe items
2. building users' preference
3. predicting user interest in items.

From item vector to user profiles. There could be multiple methods to handle it.

1. Simply add together based on users preference, which is a dot-production of two vectors for each feature.

For instance, Item vectors are as following. The last column is vector rating vector.

	baseball	economics	politics	Europe	Asia	soccer	war	security	shopping	family	user_vec
doc1	1	0	1	0	1	1	0	0	0	1	1
doc2	0	1	1	1	0	0	0	1	0	0	-1
doc3	0	0	0	1	1	1	0	0	0	0
doc4	0	0	1	1	0	0	1	1	0	0
doc5	0	1	0	0	0	0	0	0	1	1
doc6	1	0	0	1	0	0	0	0	0	0
doc7	0	0	0	0	0	0	0	1	0	1
doc8	0	0	1	1	0	0	1	0	0	1	1
doc9	0	0	0	0	0	1	0	0	1	0
doc10	0	1	0	0	1	0	1	0	0	0
doc11	0	0	1	0	1	0	0	0	1	0
doc12	1	0	0	0	0	1	1	0	0	0
doc13	0	0	1	1	1	0	0	1	0	0
doc14	0	1	1	1	0	0	0	0	1	0
doc15	0	0	0	1	0	1	1	1	0	0
doc16	1	0	0	0	0	1	0	0	1	0	1
doc17	0	1	1	1	0	0	0	1	0	0
doc18	0	0	0	1	0	0	0	0	1	0
doc19	0	1	1	0	1	0	1	0	0	1	-1
doc20	0	0	1	1	0	0	1	0	1	0

A simple way to construct users' preference is:

User_Profile#baseball=~~baseball~~ * ~~user_vec~~

Based on the method, we can get results for other features as well. Here is the result:

User_Profile=(3, -2, -1, 0, 0, 2, -1, -1, 1, 0)

2. Normalize the vector. So that an vector with more features will not just simply become more important.

3. Weighted the features. Like, 1/DF or Log(1/DF). The occurrence of the features.

The most challenge part of Content-based recommendation is figuring out the right way to weights and factors

== Details for Collaborative filtering Recommendation
The fundamental assumption is that our past agreement predict our future agreement. The major steps for CF algorithm:
1. Selecting neighbors. which is to find the similar users. Normally, you can select top 25-100 neighbors. The more the better, the system will have higher coverage. The similarity algorithms could be pearson correlation. For this method, you can ignore******* normalization.
2. Scoring the items from neighbors. Predict item score based on similar users. Some methods include, average, weighted average, and multiple linear regression. Weighted average is common.
3. Normalize the data. This is due to users have different preference to rating items. Some give higher number, some give lower number. And normalized the data could reduce this difference.
4.

MagicCS

2015年12月29日星期二

Visual Search

2015年12月24日星期四

Recommendation System Notes