marți, 13 ianuarie 2009
Curentul pentru laptopuri - transmis prin wireless
Viitoarea tehnologie se va baza pe elemente de fizica primara, precum bobina electrica care poate transmite energie la distanta, folosind inductia prin rezonanta.
Reprezentantii companiei Intel au facut o demonstratie, in care au reusit sa aprinda de la distanta un bec de 60 de W.
“Problema pe care o reprezinta electricitatea wireless nu este aceea de a reusi sa o facem, ci de a reusi sa o facem o cale de alimentare sigura si eficienta”, a declarat cercetatorul Intel, Josh Smith.
Cercetatorii Intel spun ca tehnologia electricitatii transmise prin wireless va putea fi folosita in jurul anului 2050, deoarece proiectul este abia in stadiu de inceput.
Trimite SMS gratis cu Google Chat
Pentru a activa optiunea, intrati in Settings - Labs, iar in dreptul Text Messaging (SMS) in Chat bifati Enable. Daca puneti mouse-ul deasupra unui contact, din meniul More apare activa optiunea Send SMS.
Pentru a trimite un SMS, utilizatorii trebuie doar sa tasteze numarul de telefon in “casuta” de chat si sa apese butonul “Send SMS”. Destinatarii pot raspunde la mesaj prin optiunea “Reply” de pe telefonul mobil, iar mesajul trimis va ajunge in chat-ul expeditorului de unde s-a trimis mesajul.
PS: Disponibil doar in US. Deocamdata.
De ce programele ca Skype reusesc sa treaca de firewall?
Un numar din ce in ce mai mare de computere sunt protejate de firewall-uri. In mod ideal, functia firewall-ului va fi realizata de un router, care realizeaza si NAT-ul (Network Address Translation). Aceasta inseamna ca un atacator nu poate sa acceseze direct computerul din afara - conexiunile trebuie sa fie initiate din interior.
Aceasta este bineinteles o problema atunci ca 2 calculatoare cu firewall incearca sa "vorbeasca" direct unul cu celalalt - daca de exemplu utilizatorii lor vor sa se apeleze folosind Voice over IP (VoIP). Problema este clara - oricare parte o suna pe cealalta, firewall-ul destinatarului nu va accepta aparentul atac si nu va trimite pachetele. Apelul nu este realizat. Sau cel putin asa se intampla in viziunea unui administrator de retea.
Insa oricine a folosit programe ca Skype stie ca va functiona la fel de bine cu un firewall ca si atunci cand ar fi conectat direct la internet. Motivul este ca inventatorii Skype (si alte aplicatii asemanatoare) au gasit o solutie.
In mod normal, fiecare firewall trebuie sa permita intrarea pachetelor in retea (utilizatorul vrea sa intre pe site-uri, ca citeasca mailuri etc). Deci firewall-ul trebuie sa trimita pachetele relevante din exterior. Va face acest lucru cand considera ca un pachet este un raspuns la un alt pachet trimis din acea retea. Un router NAT va retine in tabele care calculator intern a comunicat cu care calculator extern si care porturi au fost folosite.
Ceea ce fac programele VoIP este ca ele conving firewall-ul ca o conexiune a fost stabilita si ca pachetele ar trebui primite. Faptul ca datele audio sunt trimise folosind protocolul UDP este in avantajul Skype, pentru ca, spre deosebire de TCP (care include informatii aditionale despre conexiune in fiecare pachet), cu UDP firewall-ul vede numai adresele si porturile ale calculatoarelor sursa si destinatie. Daca pentru un pachet UDP acestea corespund unei inresgitrari in tabela NAT, pachetul va fi lasat sa treaca.
Sa zicem de exemplu ca Diana vrea sa il sune pe Alin. Clientul ei Skype spune serverului Skype ce vrea ea sa faca. Serverul Skype are deja cateva informatii despre Diana. El vede ca Diana are IP-ul 1.1.1.1 si un test rapid arata ca datele ei audio vin intotdeauna pe portul UDP 1414. Serverul Skype trimite aceasta informatie clientului Skype al lui Alin, care, conform cu baza sa de date, are IP-ul 2.2.2.2 si care foloseste portul USP 2828.
Skype-ul lui Alin trimite un pachet USP la 1.1.1.1 port 1414. Acesta este refuzat de firewall-ul Dianei, dar firewall-ul lui Alin nu stie asta. Acum el crede ca orice soseste de la 1.1.1.1:1414 si este destinat pentru 2.2.2.2:2828 este pachet valid - ca raspuns al cererii de mai devreme.
Acum serverul Slype trimite coordonatele lui Alin Dianei, al carei program Skypr incearca sa il contacteze pe Alin la 2.2.2.2:2828. Firewall-ul lui Alin vede adresa, o recunoaste si trimite raspunsul la calculatorul lui Alin - iar apelul a inceput.
;)
Opinion Spam and Analysis
To study the context of product reviews, which are opinion rich and are widely used by consumers and product manufacturers.
To the best of our knowledge, there is still no published study on Opinion Spam, although Web spam and email spam have been investigated extensively. We will see that opinion spam is quite different from Web spam and email spam, and thus requires different detection techniques.
In particular, we investigate opinion spam in reviews. Reviews contain rich user opinions on products and services. They are used by potential customers to find opinions of existing users before deciding to purchase a product. They are also used by product manufacturers to identify product problems and/or to find marketing intelligence information about their competitors.
Due to the fact that there is no quality control, anyone can write anything on the Web. This results in many low quality reviews, and worse still review spam. Review spam is similar to Web page spam. In the context of Web search, due to the economic and/or publicity value of the rank position of a page returned by a search engine, Web page spam is widespread. Web page spam refers to the use of “illegitimate means” to boost the rank positions of some target pages in search engines.
If the reviews are mostly negative, one is very likely to choose another product. Positive opinions can result in significant financial gains and/or fames for organizations and individuals. This gives good incentives for review/opinion spam.
There are three types of reviews; namely, untruthful opinion (giving false positive reviews to promote a product), reviews on brands only (not on product, brands only) and non-reviews: which have two main sub-types: (1) advertisements and (2) other irrelevant reviews containing no opinions (e.g., questions, answers, and random texts).
For the three types of spam, we can only manually label training examples for spam reviews of type 2 and type 3 as they are recognizable based on the content of a review. However, recognizing whether a review is an untruthful opinion spam (type 1) is extremely difficult by manually reading the review because one can carefully craft a spam review which is just like any other innocent review. We tried to read a large number of reviews and were unable to reliably identify type 1 spam reviews manually. Thus, other ways have to be explored in order to find training examples for detecting possible type 1 spam reviews.
Amazon uses a 5-point rating scale with 1 being the worst and 5 being the best. A majority of reviews have very high ratings. Roughly 45% of products and 59% of members have an average rating of 5, which means that the rating of every review for these products and members is 5. On average, a review gets 7 feedbacks. The percentage of positive feedbacks of a review decreases rapidly from the first review of a product to the last. It falls from 80% for the 1st review to 70% for the 10th review. This shows that the first few reviews can be very influential in deciding the sale of a product.
Duplicate and near-duplicate (not exact copy) reviews can be detected using the shingle method. In this work, we use 2- gram based review content comparison. The similarity score of two reviews is the ratio of intersection of their 2-grams to the union of their 2-grams of the two reviews, which is usually called the Jaccard distance. Review pairs with similarity score of at least 90% were chosen as duplicates.
For model building, we used logistic regression. The reason for using logistic regression is that it produces a probability estimate of each review being a spam, which is desirable. In practice, the probabilistic output of logistic regression can be used in many ways in applications.
Results showed that the logistic regression model is highly effective. However, to detect type 1 opinion spam, the story is quite different because it is very hard to manually label training examples for type 1 spam. Detection of such spam is done first by detecting duplicate reviews. We then detect type 2 and type 3 spam reviews by using supervised learning with manually labeled training examples.
Flickr Tag Recommendation based on Collective Knowledge
We analyze a representative snapshot of Flickr and present the results by means of a tag characterization focusing on how users tags photos and what information is contained in the tagging. Based on this analysis, we present and evaluate tag recommendation strategies to support the user in the photo annotation task by recommending a set of tags that can be added to the photo.
Recent user studies on this topic reveal that users do annotate their photos with the motivation to make them better accessible to the general public. Photo annotations provided by the user reflect the personal perspective and context that is important to the photo owner and her audience. This implies that if the same photo would be annotated by another user it is possible that a different description is produced. In Flickr, one can find many photos on the same subject from many different users, which are consequentially described by a wide variety of tags.
The contribution of this paper is twofold. First we analyse how users tag photos" and\what kind of tags they provide", based on a representative snapshot of Flickr consisting of 52 million publicly available photos. Second, we present four different tag recommendation strategies to support to the user when annotating photos by tapping into the collective knowledge of the Flickr community as a whole.
When developing tag recommendation strategies, it is important to analyze why, how, and what users are tagging. The focus in this section is on how users tag their photos. With respect to the tag recommendation task, the head of the power law contains tags that would be too generic to be useful as a tag suggestion. For example the top 5 most frequent occurring tags are: 2006, 2005, wedding, party, and 2004. The very tail of the power law contains the infrequent tags that typically can be categorized as incidentally occurring words, such as mis-spellings, and complex phrases. For example: ambrose tompkins, ambient vector, and more than 15.7 million other tags that occur only once in this Flickr snapshot.
In this section we refer to three different types of tags:
• User defined tags U refers to the set of tags that the user assigned to a photo.
• Candidate tags Cu is the ranked list with the top m most co-occurring tags, for a user-defined tag u 2 U. We denote C to refer to the union of all candidate tags for each user-defined tag u 2 U.
• Recommended tags R is the ranked list of n most relevant tags produced by the tag recommendation system.
For a given set of candidate tags (C) a tag aggregation step is needed to produce the final list of recommended tags (R), whenever there is more than one user-defined tag. In this section, we define two aggregation strategies. One strategy is based on voting, and does not take the co-occurrence values of the candidate tags into account, while the summing strategy uses the co-occurrence values to produce the final ranking. In both cases, we apply the strategy to the top m
co-occurring tags in the list.
The assessors were asked to judge the descriptiveness on a four-point scale: very good, good, not good, and don't know. The distinction between very good and good is defined, to make the assessment task conceptually easier for the user. For the evaluation of the results, we will however use a binary judgement, and map both scales to good. In some cases, we expected that the assessor would not be able to make a good judgement, simply because there is not enough contextual information, or when the expertise of the assessor is not suficient to make a motivated choice. For this purpose, we added the option don't know. The assessment pool contains 972 very good judgements, and 984 good judgements. In 2811 cases the judgement was not good, and in 289 cases it was undecided (don't know).
The results of the empirical evaluation show that we can effectively recommend relevant tags for a variety of photos with different levels of exhaustiveness of original tagging. We found that the tag frequency distribution follows a perfect power law, and we indicated that the mid section of this power law contained the most interesting candidates for tag recommendation.
Discovering Key Concepts in Verbose Queries
To develop and evaluate a technique that uses query-dependent, corpus-dependent, and corpus-independent features for automatic extraction of key concepts from verbose queries.
Current search engines do not, in general, perform well with longer, more verbose queries. One of the main issues in processing these queries is identifying the key concepts that will have the most impact on effectiveness. We show that our method achieves higher accuracy in the identification of key concepts than standard weighting methods such as inverse document frequency.
In this paper, we describe an extension of automatic concept extraction methods for the task of extracting key concepts from verbose natural language queries. TREC topics illustrate the difference between a keyword query and a description query. A TREC topic consists of several parts, each of which corresponds to a certain aspect of the topic. In the example at Figure 1, we consider the title (denoted
It is better to use Steve Jobs than use Professor in the statement-”What Did the Professor Say? Check Your iPod". As when examining the top ten documents retrieved in response to a keyword query “steve jobs"+iPod, we note that most of them discuss Steve Jobs in some relation to the iPod (e.g., a link to a video documenting an iPod introduction by Steve Jobs, which did not appear on the first page of results for the more verbose query).
Noun phrases have proven to be reliable for key concept discovery in some past work on information retrieval and natural language processing, and are flexible enough to naturally distinguish between words, collocations, entities and personal names among others.
We note that collections vary both by type (ROBUST04 is a newswire collection, while W10g and GOV2 are web collections), number of documents and number of available topics, thus providing a diverse experimental setup for assessing the robustness of our classification, weighting and retrieval methods.
The AdaBoost.M1 method was selected for several reasons. First, it consistently outperformed other classification methods such as C4.5 decision tree or in the preliminary experiments we have conducted.
Second, its output for a single input instance xi can be interpreted not only as a binary classification decision (ci 2 KC or ci 2 NKC), but also as a weighted combination of base hypotheses Pj=1;:::;T wjhj (xi) [13]. This combination naturally translates into a confidence function hk(ci). In two out of three collections, performance is further increased when a second-highest ranked concept is also integrated into the base query.
In this paper we address the issue of retrieval using verbose queries. We use several standard TREC collections and corresponding topics to demonstrate that the current retrieval methods perform better, on average, with keyword title queries than with their longer description counterparts.
Advertising Keyword Suggestion Based on Concept Hierarchy
To propose a novel keyword suggestion method that fully exploits the semantic knowledge among concept hierarchy to generate semantic information
In this article they match a keyword with some relevant concepts. Then the relevant concepts are used with their hierarchy to fertilize the meanings of the keywords. Finally new keywords are suggested according to the concept information rather than the statistical co-occurrence of the keyword itself.
Advertisers create ads and bid on keywords that are related to their business. The ads are then displayed on the search result pages when users are searching for corresponding keywords. Since the total number of queries is extremely large, only a handful is bid on by advertisers. The reason being that, those methods have gained prevalence based on the assumption that the statistical co-occurrence indicates keyword relevance. Actually, high co-occurrence between two keywords means they are semantically related. E.g. “Apple” is a fruit and a computer company.
To overcome this problem, keyword suggestion technology was employed to help advertisers to find more appropriate keywords. It involves discovering new words or phrases related to the existing keywords. We try to suggest new keywords according to advertisers’ real objectives rather than the query keywords. Given a keyword, we first match it with some relevant concepts. Then the relevant concepts with their hierarchy are used to fertilize the meanings of the keywords. Finally new keywords are suggested according to the concept information rather than the statistical co-occurrence of the keyword itself.
To ensure the accuracy of the concept hierarchy, we derive it from a high-quality manually-defined web directory, such as the Open Directory Project (ODP). This concept hierarchy involves the definition of concept relationships (i.e. the taxonomy structure of the hierarchy) and concept content (i.e. the meaning of each concept). The categories in the web directory can be treated as concepts. Therefore, the relationship between the categories indicates the concept relationships with the help of algorithms.
To ensure the accuracy of describing the meaning of a keyword, we develop a probabilistic framework to rank the similarity of different concepts, which makes it possible to find the most relevant and representative ones.
Furthermore, with the concepts associated with the phrases, we find a way to estimate the similarity between the phrases via the related concepts. This way keyword suggestion could then be performed based on the similarity.
· The Jaccard coefficient, is defined as the size of the intersection divided by the size of the union of the sample sets.
· One of the most relevant applications of concept hierarchy is to categorize different word senses, which is also an important contribution in our work. The approach in our work has been taken similarly in Word Sense Disambiguation to address the data sparseness problem.
· Another relevant application is to compute the similarity of element sets in a hierarchy. Ganesan et al. introduced a generalized vector space model to calculate the cosine similarity between two document sets with smoothing along the hierarchical domain structure.
· For Associating Concepts with Phrases, assigning weights to the phrases is also a recursive procedure which can be performed along with the construction of the hierarchy.
· Once the Concepts have been obtained We can perform the clustering work in an agglomerative way: the fusion of the clusters starts from the lowest concepts in the hierarchy and the similar clusters are then merged and represented by their LCS (least common subsume). Such work continues until the remaining clusters are sufficiently distinct.
We conduct experiments to evaluate the effectiveness of the proposed keyword suggestion work and the accuracy of the categorization of relevant concepts. The experiment results prove that concept hierarchy based keyword suggestion can perform better than the co-occurrence based methods. It also shows that the categorized keyword suggestion result is acceptable. We investigated a method to derive a concept hierarchy from a web directory. Three weighting criteria were proposed to tightly associate concepts with phrases. By utilizing the domain knowledge and relationships contained in the concept hierarchy, our approach provides new phrases which are categorized into distinguishing sorts. Experiments show that our system presents a wider and more accurate vision than the traditional techniques.