share_log

大数据杀熟背后,是我们裸奔的隐私

Behind the familiarity of big data is our naked privacy

格隆汇 ·  Mar 18, 2019 21:31

An infuriating thing has happened recently.

I usually choose a fixed brand hotel for business trips, which is called A Hotel, because I feel that the price is right and the quality is relatively stable. Over time, every place I go, I choose the nearby A hotel through an APP on my phone, because the price shown in the APP does not fluctuate much, so I don't pay much attention to it.

But the last time I booked a room through my mobile phone, I went to the hotel lobby and found that there was an electronic display screen showing real-time room prices. I don't know if you've noticed that many hotels don't have this kind of device. Seeing that the standard room with the same electronic screen is much cheaper than the price I ordered, I wonder if it is due to the time it took to place the order, but after repeated comparison, I found that the price I ordered with app is a little more expensive, while the price shown in the front desk customer service phone APP is much lower.

I asked my friends around meIt is found that it is not uncommon to charge higher prices for regular users.Commuters tend to charge more than those who do not often take a taxi; after searching repeatedly for a certain type of product, they will find that the price of the product is gradually rising and there are fewer and fewer choices. even some shopping websites collect data that do not give frequent reviews or travel reviews, so that merchants can use these data to send them lower-quality products.

Now there is a noun to describe this phenomenon-big data kills well.

Behind the killing, information leakage is becoming more and more common.

Behind the killing of big data, it only reflects the tip of the iceberg of the increasingly serious leakage of personal information.

After collecting and analyzing the data of consumers' social attributes, living habits, consumption behavior and other main information, the business picture of a user is perfectly abstracted, which is called user portrait.. User profile provides a sufficient information basis for enterprises, which can help enterprises quickly find more extensive feedback information such as accurate user groups and user needs.

In Zhihu Inc., "how horrible can information disclosure be?" There are more than 5000 answers and more than 60 million views under the topic.

wm

If you have ever used a consumer loan from Alipay or Wechat, there is a good chance that you will receive text messages or phone calls from all kinds of small loans; if you have recently opened a stock account, you will probably receive a lot of text messages recently; even you forget when you signed up for an online English learning account, but you are still frequently harassed by all kinds of sales calls.

Who on earth leaked our information?

I use a Momo Inc software installation process to illustrate what information APP will get from us and how to get it.

After searching for Momo Inc in the application market, in the application permissions item under the application details introduction, you can see that the system shows that the application has obtained 14 sensitive privacy permissions.

wm

Of course, some people will say that you can not accept it during the installation process. In my actual installation process, the first pop-up page is the following, there is no do not accept the option. After selecting one button to open, the interface of whether or not to accept these three options will be displayed one by one. If you do not accept it, you cannot start Momo Inc. So these three must be accepted, of course, as a social software, this is acceptable.

wm

Next, go to the mobile login page, of course, you can also choose Wechat or qq to log in, at that time you still have to bind your mobile phone number. The most important thing here is the inconspicuous small print at the bottom of the line: registration means agreeing to the Momo Inc user Agreement and the Silent Privacy Policy.

wm

After clicking separately, "Momo Inc user Agreement" totaled 10791 words, "Momo Inc Privacy Policy" totaled 8674 words. I intercepted some of the contents respectively:

wm

In the end, Momo Inc retained the right of final interpretation.

wm

Momo Inc listed in detail the collection and use of user information there:

wm

如:

wm

In addition to the above terms that are easier to understand, there are a large number of provisions in the agreement that are difficult for ordinary people to understand. When people sign up for these accounts, they pay little attention to these privacy agreements and basically install them on a per-second basis. In the process,That is, you acquiesce to the collection and use of a lot of your information.

At present, the vast majority of APP have such implicit protocols, search software, shopping software, social software, mapping software, music software, we accept the algorithm to bring a better experience, we give up part of our privacy as a price. Some give up may be we are willing, some give up is forced, or unconsciously.

In addition to this way of collecting information through a single APP, there is also a device-based way of collecting information.Some third-party data service companies embed in all kinds of APP through the SDK of various developer services, and obtain a large amount of anonymous device behavior data related to providing services through a variety of APP.

In contrast, the user information collected by other APP is only the information in this field. For example, automotive APP often collects only user information related to "car", while cosmetic APP often collects only user information related to "cosmetics". Outside the field targeted by APP, other information of users is vague.

However, this device-based acquisition method, through certain artificial intelligence, machine learning, algorithm processing, etc., can roughly outline some of the characteristics and behavior tags of the mobile device owner, and establish a three-dimensional and accurate user profile. For example, the mobile phone is equipped with a lot of makeup APP, menstrual monitoring software, Lianjia and Anjuke and other software, and often open Haitao software users, is likely to buy a high level of consumption of women, user portraits are very three-dimensional.

wm

How to form a user profile

What is said above is still just the collection of information, just like raw materials, extracting real gold and silver through massive data, and finally forming characteristic data with commercial value, which needs to test the data company's big data processing and analysis technology. there is a big gap in the accuracy of data that different technologies can produce.

The user profile construction process mainly consists of three parts: basic data collection, behavior modeling and portrait construction.. The difference of data processing is reflected in the link of behavior modeling.

In the process of behavior modeling, we need to abstract some typical features that can represent the real object, such as height, weight, skin color, eye size and so on. Then, through the method of machine learning, we construct an algorithm similar to Y=kX+b, where X represents known information, Y represents user profile, and Y is accurate by constantly precise k and b.

wm

We introduce the modeling process through NetEyun music. I believe many people know that NetEase, Inc Yun Concert recommends new songs to you according to your previous listening habits. Behind this, it is also supported by algorithms, and this algorithm continues to improve. Eventually, the recommended songs will more and more meet the tastes of specific users.

This paper tries to introduce a simple algorithm, whose core is in mathematics.CoSine Formula of the Angle between two Vectors in Multidimensional Space. Zhihu Inc. is quoted here as "what is the playlist recommendation algorithm of NetEYun Music?" Lang Shimahara's answer:

Take three songs as an example, "the most dazzling national style", "sunny day" and "Hero".

A, collect "the most dazzling national style", but when it comes to "sunny days", "Hero" always skips it.

B, often the single cycle "the most dazzling National style", "Sunny" will be finished, "Hero" will be blocked.

C, blocked "the most dazzling national style", while "sunny day" and "Hero" are all collected.

It can be seen here that the tastes of An and B are similar, and C is very different from them. So the question is, how similar is it to say that A _ Magi B is similar, and how can it be quantified?

We think of the three songs as the three dimensions of the three-dimensional space. "the most dazzling national style" is the x-axis, "sunny day" is the y-axis, and "Hero" is the z-axis. The degree of love of each song is the coordinate of that dimension, and quantifies the degree of love (for example: single cycle = 5, sharing = 4, collection = 3, active playback = 2, listening = 1, skip =-1, block =-5).

Then everyone's overall taste is a vector, An is (3), B is (5), and C is (- 5). We can use the cosine of the angle between the vectors to express the degree of similarity between the two vectors. The cosine of the angle of 0 degrees (indicating that two people are exactly the same) is the cosine of 1,180 degrees (indicating that the two people are diametrically opposed to each other).

According to the cosine formula, the included angle cosine = vector dot product / (cross product of vector length) = (x1x2 + y1y2 + z1z2) / (followed by sign (x1 square + y1 square + Z1 square) x followed by sign (x2 square + y2 square + Z2 square). The cosine of the angle between An and B is 0.81 and the cosine of the angle between An and C is-0.97.

The above is the case of three songs, for multiple songs can also be made in the same way, the establishment of N-dimensional N songs coordinate system. The core of the above idea isA person who is very similar to your listening habits, then there is a good chance that other songs he likes to listen to are also songs that you like to listen to. This is based on a people-oriented basis.

Another idea is based on the idea of being object-oriented. To put it simply, people who buy X items will generally buy Y. For example, NetEyun Music has a new user D, who only knows that she likes the most dazzling national style, so the question is, what should I recommend to her?

As shown in the picture below, the number represents the degree of affection for a song. By averaging the difference of the feelings of the three A/B/C towards the most dazzling national style and the other two songs, the average value of the difference between the feelings of ordinary people towards these songs is obtained. Finally, find out the affection of D for the other two songs.

wm

In reality, because the quantity is larger and the algorithm is more complex, the final prediction accuracy will be better. Similarly, there are a large number of models to predict other types of user characteristics.

Can we protect our privacy?

If we search for "how to protect privacy" in browsers, we can get a lot of privacy tips, such as not logging in to wifi in public areas, cleaning up Internet data in a timely manner, paying attention to unreasonable privacy requests when installing APP on mobile phones, and staying away from small websites that test your mental age and love fortune, etc.

However, the essence of the development of the Internet and the rise of various mobile applications is to serve our lives. Now we have to be careful in order to protect our privacy.. Moreover, even if you are careful, to what extent do you take these protective measures to protect your privacy?

A popular article some time ago, "Why doesn't my son indulge in games?" It's about an experienced game planner talking about how to keep children from indulging in games. In the article, the author mentioned that commercial online games are designed for players to indulge. Behind every popular game are hundreds of experienced engineers, game planners and even psychologists, figuring out how to make game design addictive, and game products continue to iterate.

How can a weak individual confront this well-designed temptation? Similarly, data companies will rack their brains to use new technologies to collect your data, and you two are not in the same dimension at all.

The cartoon on the cover of the New Yorker 26 years ago proudly proclaimed that "on the Internet, no one knows you are a dog."

wm

However, today in 26 years,Using big data, you can get whether you are black or white, where you live, what brand of dog food you eat, and where you like to walk.

In the face of huge data, human beings are more and more like a variable role that provides input. The APP you use is trying to understand and define you.

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment