I begin learning Portuguese, it is taught by my colleague.
Becuase I have a lot of works everyday, the study will be a very long process.
Whatever, I like to learn new things
I choose the book Portuguese for dummies as my learning material. One of the interesting thing is that my colleague said he find that some pronunciation rules in the book he has been using everyday but has never been realized.
|do homen||of man|
|da mulher||of woman|
|de nada||not at all, You’re welcome||Same word, same meaning in Spanish|
|saude||health||cheers (toast when drinking); bless you (said to someone who has just sneezed)|
I learned a little bit of Diphthongs pronunciation in Protuguese.
Portuguese nouns come in two types: masculine and feminine. Masculine nouns usually end in an
-o, and feminine nouns usually end in an
My colleague and I decided to improve our English speaking by reading the book 1984 every workday. We start a small reading group, listen the 1984 Audiobook and read aloud. Hope we can finish reading this book in one year.
I will share some interesting things and document what I learned.
How to order adjectives in English
it depicted simply an enormous face more than a metre wide the face of a man of about 45 with a heavy black mustache and ruggedly handsome features
In many languages, adjectives denoting attributes usually occur in a specific order. Generally, the adjective order in English is:
- Quantity or number
- Quality or opinion
- Proper adjective (often nationality, other place of origin, or material)
- Purpose or qualifier
After the first two days, my colleague decided use quizlet (https://quizlet.com/class/5050371/) to document the new words we have learned everyday, however, we didn’t start to use it, because we need do a lot of copy and paste work,
The flat was seven flights up.
flight: A series of stairs rising from one landing to another.
A flight of stairs is usually defined as an uninterrupted series of stairs. This can mean the set of stairs between floors or the set of stairs between landings. The term is used in many different ways and there’s really no set rule for what can be called a flight of stairs.
The author use third person to describe Winston is living the the seventh floor. In literally, it means that Winston need walk up 7 flights of stairs to go to his home.
Eclipse 2017 has gone for 1 week ago. On my current living place, I can see Partial Solar Eclipses, actually I am not very interested in solar eclipses, when the maximum eclipse happened, that was my lunch time
;). I worked regularly on that day and decided do not to see eclipse.
But I want to share the Ted talk You owe it to yourself to experience a total solar eclipse. The most best part of the talk is that the author think ‘duration of experience does not equal impact. One weekend, one conversation – hell, one glance – can change everything. Cherish those moments of deep connection with other people, with the natural world, and make them a priority.’ For other part, his main idea is about respect nature.
Another interesting part of this eclipse is that I relearned Pinhole camera.
The website link of vis.supstat.com is broken, and it seems they don’t have time to fix it now, I made a mirror at yulijia.net/vistat. Hope that can help anyone who want to access the blog.
Thanks to Zeyu Zhang, the broken link is fixed.
- Data cleansing
- Data overview
- Which are the most expensive cities in America to book a Tree Fort
- How to make a PivotTable in R
Recently, I read the The Priceonomics Data Puzzle: TreefortBnb and write my answer at here.
In general, to deal with question like this, I often using three steps:
- Data cleansing
- Data overview
- Answser the question.
At first look of TreefortBnb data, I find that City names are given in mixed case.
TreefortBnb <- read.csv(url("https://s3.amazonaws.com/pix-media/Data+for+TreefortBnB+Puzzle.csv"), comment.char="@") names(TreefortBnb)[4:5] <- c("Price","Reviews") unique(TreefortBnb$City[grepl("new york",TreefortBnb$City,ignore.case = TRUE)]) ##  New York new york ## 106 Levels: Albuquerque Alexandria Anchorage Ann Arbor ... West Hollywood
For example, there are 8043 Tree forts in New York City and 1 Tree forts in new york City. To avoid the misunderstanding, I convert city names to lower case.
TreefortBnb[,"City"] <- tolower(TreefortBnb$City)
Also, Some cities have more than one occurrence of the same name in different states. We need a new tag ‘city-state’ to distinguish them.
TreefortBnb[,"CityState"] <- paste(TreefortBnb$City,TreefortBnb$State,sep=", ")
After data cleansing, there are 124 cities in dataset.
Let’s take a look at the overview of TreefortBnb data. Here are some questions on my mind.
- How many Tree forts in each city?
- What is the highest price for one tree fort?
- How many reviews in each city?
- where is the most reviewed tree fort?
- The ratio of reviewed tree fort in each city.
How many Tree forts in each city?
In the figure shown below, the highest number of tree forts (8044) is in new york, NY.
The median number of tree forts is 85, some top 10 big city like new york, brooklyn, los angeles and san francisco control more than half of the total tree forts.
Top 10 Cities by Tree fort numbers are:
|new york, NY||8044|
|san francisco, CA||3622|
|los angeles, CA||3236|
|miami beach, FL||1345|
|san diego, CA||986|
|new orleans, LA||833|
The highest price for one tree fort
As we can see from the figure above, range of the price for one tree fort is from 10 to 10000 and the highest price is in san francisco, CA, park city, UT and miami beach, FL.
How many reviews in each city?
From the figure above, the highest reviewed city is in new york, NY.
Top 10 Cities by reviews are:
|new york, NY||64177|
|san francisco, CA||30842|
|los angeles, CA||21723|
|miami beach, FL||7126|
It seems that the top 10 reviewed city is correlated with the top 10 cities by tree fort number. There is 8 overlaped cities between top 10 reviews and top 10 tree fort number.
In conclusion, the more tree fort in a city, the more chance a visitor will book a tree fort and give a review about it.
where is the most reviewed tree fort?
There are 208 units are reviewed by 99 times, and their locations are in denver, new york, los angeles, las vegas, washington, silver spring, chicago, long island city, austin, brooklyn, seattle, san francisco, nashville, salt lake city, philadelphia, somerville, san diego, venice, new orleans, cambridge, portland, queens, incline village, santa cruz, boston, paris, eugene, savannah, santa rosa, jersey city, albuquerque, miami beach, arlington, boulder, baltimore, honolulu, alexandria, sonoma, carmel. We can see the highest Total Reviews is in illinois.
There is no tree fort in the gray state around USA.
I am not familiar with geography and climate of USA, so just guess some reasons may relate to the result:
- maybe these states have dry climates, there is no trees suitable to build tree fort.
- no visitors need a tree fort in these state. (no-tree-fort-demand)
- just no data at that time.
By the way, I must metioned at here: geographic profile maps which are basically just population maps
The ratio of reviewed tree fort in each city
We can find there are many tree fort without any reviews. After calculate ratio of reviewed tree fort in each city, brookline, berkeley, dallas, long beach, madison, paris, pasadena, phoenix, richmond are the cities without any tree fort reivews.
The high review ratio doesn’t mean the tree fort is popular, cause there is only one tree fort in some cities, if one tourist reviewed it, the ratio will be 100%.
Which are the most expensive cities in America to book a Tree Fort
A naive thought about that is sort the tree fort price and find the top 100 as below:
|park city, UT||229||299.0||0.4410480|
|laguna beach, CA||68||268.5||0.3676471|
|incline village, NV||118||259.0||0.1779661|
|manhattan beach, CA||55||209.0||0.3636364|
|long beach, NY||3||200.0||0.6666667|
|la jolla, CA||53||195.0||0.4528302|
|hermosa beach, CA||48||189.5||0.3958333|
|sunny isles beach, FL||161||180.0||0.6086957|
|new york, NY||8044||170.0||0.6957981|
|newport beach, CA||84||160.0||0.5119048|
|beverly hills, CA||74||160.0||0.5405405|
|san francisco, CA||3622||150.0||0.6778023|
|miami beach, FL||1345||150.0||0.5873606|
|new orleans, LA||833||150.0||0.6218487|
|santa monica, CA||500||150.0||0.5000000|
|marina del rey, CA||115||150.0||0.5565217|
|mill valley, CA||80||150.0||0.5375000|
|san diego, CA||986||130.0||0.5233266|
|west hollywood, CA||229||129.0||0.6026201|
|las vegas, NV||291||125.0||0.5189003|
|santa cruz, CA||127||125.0||0.6692913|
|san rafael, CA||61||125.0||0.5409836|
|long beach, NJ||1||125.0||0.0000000|
|palo alto, CA||100||120.0||0.5800000|
|santa rosa, CA||71||120.0||0.5915493|
|mountain view, CA||72||113.0||0.6944444|
|colorado springs, CO||54||113.0||0.4444444|
|los angeles, CA||3236||110.0||0.5998146|
|fort lauderdale, FL||151||100.0||0.4834437|
|long beach, CA||113||100.0||0.6637168|
|san jose, CA||104||100.0||0.5288462|
|ann arbor, MI||63||98.0||0.5396825|
|long island city, NY||191||96.0||0.5235602|
|salt lake city, UT||140||89.5||0.5785714|
|jersey city, NJ||82||85.0||0.8902439|
Back to the original question, as the generate study above. I think we can not just sort the price to get the answer.
To define the most expensive tree fort, I think it must including three aspects:
- tree fort median price;
- tree fort number;
- reviews number/ratio.
If a tree fort with high price in Texas without any tourist visit, the expensive tree fort does not popular. Texas may be the most expensive cities, or the most no-tree-fort-demand cites.
As we can see the most reviews ratios are between 0.25 to 0.75 in the below figure.
I also calculated the reviews z-score.
Some tree fort without reivews are much expensive than the reviewed one in the same city. It will promotes the median price of tree fort in one city, but doesn’t show us the reality of visitor acceptable fort tree price in cities.
After filter out tree fort without reviews, I pick up top 100 cities with median price sort in decreasing order.
That’s my answer about the question.
|incline village, NV||118||259.0||200.0||0.1779661|
|laguna beach, CA||68||268.5||200.0||0.3676471|
|hermosa beach, CA||48||189.5||199.0||0.3958333|
|park city, UT||229||299.0||179.0||0.4410480|
|sunny isles beach, FL||161||180.0||177.5||0.6086957|
|long beach, NY||3||200.0||175.0||0.6666667|
|new york, NY||8044||170.0||165.0||0.6957981|
|manhattan beach, CA||55||209.0||162.0||0.3636364|
|la jolla, CA||53||195.0||157.5||0.4528302|
|san francisco, CA||3622||150.0||150.0||0.6778023|
|newport beach, CA||84||160.0||150.0||0.5119048|
|beverly hills, CA||74||160.0||150.0||0.5405405|
|marina del rey, CA||115||150.0||149.0||0.5565217|
|mill valley, CA||80||150.0||148.0||0.5375000|
|miami beach, FL||1345||150.0||130.0||0.5873606|
|santa monica, CA||500||150.0||130.0||0.5000000|
|new orleans, LA||833||150.0||125.0||0.6218487|
|west hollywood, CA||229||129.0||125.0||0.6026201|
|santa cruz, CA||127||125.0||125.0||0.6692913|
|las vegas, NV||291||125.0||120.0||0.5189003|
|san rafael, CA||61||125.0||120.0||0.5409836|
|palo alto, CA||100||120.0||117.0||0.5800000|
|san diego, CA||986||130.0||110.0||0.5233266|
|los angeles, CA||3236||110.0||108.0||0.5998146|
|mountain view, CA||72||113.0||105.0||0.6944444|
|fort lauderdale, FL||151||100.0||100.0||0.4834437|
|santa rosa, CA||71||120.0||99.5||0.5915493|
|long beach, CA||113||100.0||99.0||0.6637168|
|long island city, NY||191||96.0||91.5||0.5235602|
|ann arbor, MI||63||98.0||90.0||0.5396825|
|san jose, CA||104||100.0||85.0||0.5288462|
|jersey city, NJ||82||85.0||85.0||0.8902439|
|salt lake city, UT||140||89.5||80.0||0.5785714|
|san antonio, TX||77||80.0||80.0||0.6103896|
|new haven, CT||52||75.0||80.0||0.4807692|
|colorado springs, CO||54||113.0||79.5||0.4444444|
|silver spring, MD||53||75.0||75.0||0.5471698|
How to make a PivotTable in R
dplyr package to calculate/count the number I needed and make a PivotTabel.
library(dplyr) TreefortBnb.df <- TreefortBnb %>% group_by(CityState) %>% summarise(FortNumber = length(CityState),MedianPrice = median(Price), TotalReviews = sum(Reviews),MedianReviews = median(Reviews), ZscoreReviews = mean(Reviews)/sd(Reviews), City = unique(City),State = unique(State), ReviewsRatio=sum(Reviews>0)/length(Reviews), MedianPricewithReviewsNumber=median(Price[Reviews>0]) ) %>% arrange(desc(FortNumber)) %>% as.data.frame