A Billion Prices Can't be Wrong

It's becoming more difficult for governments to willfully miscalculate the realities of inflation. The reason is increasingly broad availability of pricing data and improved abilities to harvest and analyze it. Even those who try to calculate CPIs and similar statistics do poorly because it's very difficult to say an iPhone today costs 45% more than it did 15 years ago--it just didn't exist then. Given enough gadgets and enough price points, these hiccups in baskets of goods and services get smoothed over.

Tim Harford: A Billion Prices Can't be Wrong

Sharing knowledge and information increases freedom and improves justice. It could be that thoughtful analysis could preclude future economic calamities due to false or obscured information.

Note, this is not an argument for you to buy gold or TIPS. Talk to your financial expert about what investments are right for you.

Gene Amdahl

Gene Amdahl has died. For decades he was, and now will be forever one of the greatest computer technologists of history. The NYTimes has a good obituary. He was one of the primary engineers behind IBM's System/360 which became one of the most successful mainframes ever.

http://www.nytimes.com/2015/11/13/technology/gene-amdahl-pioneer-of-mainframe-computing-dies-at-92.html 

Amdahl's Law is simple yet quite fascinating. It explores the notion of how we think about parallel computation, or even more generally, performing a series of tasks, so there are some relevant considerations for assembly line process and/or people management. It puts a theoretical limit on the speedup we can achieve in a system, and those limits are startlingly low. For example, lets say you have a task, of which 50% is parallelizable (can be processed side by side). Amdahl's Law says it doesn't matter how many workers you have, you can never get a speedup beyond 2x. It makes no difference if you have a thousand or a million workers on a task which is 90% parallelizable: you have have an upper bound on a 10x speedup. It's a formulation of the classic law of diminishing returns, with a very straightforward equation to back it up.

How to Work a Career Fair as a University Student

The Intent Media team just returned from a trip up to Ithaca, NY to recruit Cornell Engineering students for several open junior software engineer positions. We met a lot of smart, talented, and motivated students, and we learned a lot about campus recruiting along the way. Here are a few early impressions I had that might help other university students as they navigate this unusual world of technical hiring.

What is the Company Like?

Take the time to stop at the less busy booths at a career fair. There are no lines to wait in, and those recruiters will have more time to have a long discussion with you individually. At a larger company you will be rushed and it is unlikely they will remember your conversation.

Think about the type company you would flourish at and specifically research those firms. Big companies have great mentorship and guidance programs, but you will likely have an impact on a very specific feature in the product and may not be given a lot of flexibility in implementation. Small firms might have a smaller impact overall, and you will likely be required to work independently and creatively. You may be expected to design and implement entire systems without a lot of coaching. Capitalize on your strengths and find roles that will stretch your weaknesses.

What separates a good candidate from a great candidate are their incisive, thoughtful questions. Turn the short interview around and act confident--as an engineering student you might have several offers and it is incumbent on the company to impress you with how great they are.

Average candidates ask more obvious questions like:

  • Do you have any jobs?

  • What does your company do?

Great candidates ask challenging questions like:

  • I know you sell ABC. How do you tackle the competition from companies like XYZ?

  • Why is your company ranked as a Top Place to Work in NYC?

  • Tell me about a current challenge you all are facing (bonus: weave in a skill you have that could help solve it!)

There is so much more to a company than it's brand and the salary. Understand the leadership, the team, the culture, the side perks, the location, the technical challenges, and the business challenges. You will end up with a better job and will appear more attractive to interviewers.


What are You Like?

Think about what differentiates you from all the other people coming to the booth, and highlight that. Mention the amazing internship you had the previous summer. Call out if you are at the top of your class or win coding competitions with your friends. Remember, the smaller the company, the less GPA matters. Big firms often need a simple number to “weed out” resumes, while smaller ones will be more interested if you have a unique background, skill set, or area of interest.

Great candidates can balance honesty and salesmanship. Remember, you are a highly sought after candidate and have something of value to provide to employers. On the other hand, you might be fairly junior and not have extensive experience in a specific area. Call out your strengths loud and clear on your resume. However, if you do not really know a language, do not claim that you do. Authenticity and confidence are more desirable than exaggerated claims.


When You Meet the Team

Finally, have a conversation, not a script. The person you are talking to is a human, not a robot, and wants to hire humans, not robots. For example, this sounds scripted:

“Hello, my name is Bob Smith and I’m a senior graduating in 2016 from the X University College of Engineering with a major in Computer Science and a minor in Data Analysis from the College of Arts and Letters and I am interested in Full Time employment.”

This is a much more natural intro:

“Hi, I’m Susan! I’ll be getting my CS degree next year and after browsing your website I was really interested in learning more about the XYZ role you have open.”


Do your research, ask tough questions, sell yourself, and treat your interviewer like a human. Hopefully this has been helpful. Good luck as you interview!

An Easier Way to Consider the CAP Theorem

When talking about distributed systems, it is common to describe Eric Brewer's CAP Theorem as "Consistency, Availability, and Partition Tolerance. Pick Two." This is usually illustrated with a triangle or Venn Diagram. While that's not terribly wrong, it's not particularly helpful.

Given that network partitions are rare, but do happen, the better way to consider the theorem is: "In the event of a network partition, does your system guarantee consistency or availability?"

Lets say you have a system with incoming data.



If your system splits, do you stop accepting data (choosing consistency, sacrificing availability), or do you accept the requests and attempt to "fix" the integrity of the data at a later point (choosing availability and living with "eventual consistency")? Even from this simplified diagram, it is clear that is absolutely impossible to have a full system with all three properties at the same instant, which is why the CAP theorem is provably true.

I have found this illustration somewhat more helpful and somewhat more correct than triangles. The theorem is certainly quite nuanced, and I plan to write on other aspects of the theorem and distributed systems in the future.

I definitely want to give a hat tip to Nathan Marz for a similar explanation which I repurposed here.

A Simple Graphical Example of Common Excel Data Assumptions

I have talked in previous posts about the inherent temptation to assume linearity in data, and perhaps more importantly, the inherent desire we have as humans to simplify interpretations of our data. Occam's Razor is a useful tool in which we are encouraged to seek the most simple explanation in our data, but we must remember to rule out simple explanations which do not describe our data.

Take an example of a company with a straightforward growth rate. Each month some marketing brings in some set of new clients and growth is steady. We can run a simple regression on this dataset and see the slope of the graph to make projections about the next quarter or year.

Lets say we hire a new marketing director who initiates a new campaign that has a viral effect. How do we extrapolate on our data now? Perhaps we should wait and see before making a new projection, but our trendline no longer fits the data at the edge. Is this an outlier or a new normal?

The campaign continues and growth is really picking up. Our trendline is clearly not helping. Someone on the team decides we should make projections from now on with an exponential function. We decide to draw our data with an exponential function.

And yet that still doesn't explain all the points very well. The exponent isn't as large as it should be because we're growing faster than it predicts. The linear component of the data at the beginning is throwing off the projection. Even so, the exponential view is certainly better than what we had before.

The viral campaign finishes and things unfortunately do not stick for our imaginary company. Growth slows.

How would you add a trendline to this? A bit tougher, right? The reason is trendlines in a single dimension are really bad for almost all real-world data. One thing we might be able to do to model our growth is to put the X data in a two dimensional vector like this: [TIME, HAS_VIRAL_CAMPAIGN]. The other thing we might do is to build a step function. For certain range values of X the function fits Y = M*X + B, for other ranges it fits Y = X^N + C. We need three such components in our step function, or three related single-dimension functions to accurately model this data. 

It should be clear that all these solutions are merely academic. The data will change next month and we must revise our model. Perhaps we launch another campaign, a competitor launches a similar product, an act of God hits the supply chain, or the CEO says something distasteful in a room full of journalists. These things can be modeled looking backwards but could be difficult to predict forwards. 

All models are wrong. Seek simplicity, but when making your model, make sure it is rich and complex enough to be helpful.


Three Learnings from Real World Large-Scale Machine Learning

I spend much of my day working on and tuning machine learning models here at Intent Media. Here we use predictive analytics for a number of business tasks, but most models currently fall into two major categories. The first are our segmentation models, which predict the likelihood of a user booking or converting on a site; we use these to make decisions about when to show ads. The second are our quality score models, which predict the likelihood of a click, which we use to decide which ads to show. We have recently moved our logistic regression model over to Apache Spark's MLLib on Amazon EMR. In a future post I would love to show some of the code it took to do that. Until then, I will be posting different thoughts on machine learning I have picked up from empirical testing and the wisdom of my more knowledgeable colleagues. Let me know if you (dis)agree or find it useful.


1. Thinking weights or betas as equivalent to feature value is dangerous

Absolute value of a weight is only a very loose proxy for feature contribution. Often business users will come and ask "which features in our model are most valuable?" and it's tempting to go to look at the absolute value of the weights. This will yield a misunderstanding of your data though, for a couple reasons. When using a linear model it's really hard to understand feature interactions and there is no visibility into the frequency a certain feature fires. We have some relatively rare features that can be predictive, but require the website user to have entered via a certain path, used a promo code, purchased a certain sort of product in the past, or similar occurrence with low frequency. We have other features which capture similar but slightly different notions--often many will 'fire' on each prediction. In aggregate they represent a valuable signal in the model, but individually they have low weights. 

I have seen some models with weights that have large positive or negative values and often end up canceling each other out in odd ways, but can still be quite predictive. Perhaps A (-3.5) and B (2.5) frequently occur together, but combined only yield a value of 1.0. We cannot really say those features are as valuable as another with a smaller absolute weight.

Of course, if your model suffers from any overfitting or if training has found a local maxima then these issues will be even more pronounced.


2. Be careful making linearity assumptions about your data

Assuming your data is linear is a little bit like assuming a normal distribution in statistics or your random variables are independent in probability. It's really tempting because you can run simple algorithms that perform really, really well. Linear and logistic regression are well studied, perform well at scale, have some great libraries available, and are conceptually simple to work with. The problem with all this fun and games is that your data may interact in unforeseen ways.

We have attacked the linearity problem on a couple fronts; we use a logistic regression model for our segmentation, but we know that our data interacts in a variety of ways. We spend a lot of time developing and testing some higher-order features that combine data or apply mathematical transformations like logs of values and similar. We are also testing models that learn feature interactions better; we have had some success with decision trees and will continue to be exploiting random forests for future models.


3. Regularization doesn't matter so much when you have big data

This was fun to realize empirically. I was recently working on trying different optimizers for our logistic regression algorithm, trying an LBFGS and SGD. As I was testing these on a large dataset I wanted to make sure I was regularizing appropriately, so I wrote a script that would train the model several times overnight with various regularization parameter values from 0.000001 to 1.0 alternating between both L1 and L2 methods.

As it turns out, when you have lots of good data, it becomes difficult to overfit. A little bit of regularization helped, but not by what I was expecting.



New York City Tips. Chapter 1.

I have been living in NYC for almost three years now. Like many New Yorkers, I am willing to share my city hacks. These will be shorter posts with a few words of of advice for the traveler or new residents. There is no order--the numbering is just to provide a monotonically increasing reference or index.

  1. Unless all subway cars are empty, an empty subway car is not your lucky day. It is empty for a reason. Follow the crowd. In general, subway cars distribute passengers fairly well and it's unusual for one to be overloaded while there are others mostly empty.
  2. The best public restrooms and seating areas are in hotels. High end boutique hotels have great lounge areas and facilities and will be much more enjoyable than a Starbucks.
  3. The line at Whole Foods moves faster than Trader Joes. But you pay for it.
  4. Best Buy iPhone docks are emergency charging stations.
  5. Rush hour isn't bad if you're the first one to exit the train up the stairs in the station. Learn where your exit is and stand by the appropriate door accordingly.
  6. Always carry a Kindle. They're slim, cheap, the batteries last forever, and when the subway stops you'll be glad you did.
  7. Always carry headphones. When someone starts shouting while the train is stopped, you'll be glad you did.
  8. There aren't any good restaurants between 34th St and 50th St. Avoid accordingly.
  9. The MOMA is free on Fridays. The Met can be free if you want.
  10. Always order the cheese pizza. It's the freshest one, and also, the cheapest.

The Art of Abstraction for Engineers

On my long President's Day weekend I took shelter from the NYC cold at the recently-reopened Cooper Hewitt Museum on the Upper East Side.

The Cooper Hewitt is museum about design. On three floors and multple rooms the curators have arranged various sets of objects, often related to each other in some way other than the expanse of time. For instance, a Sumerian cuneiform tablet might be juxtaposed with an original iPhone. The focus is on the art of design and the ways that design helps us as humans. It is well worth a couple hour visit, though I believe even more additions will be made shortly.

My favorite piece was a display of pieces from the Free Universal Construction Set.

As a child, I owned some number of most of the major construction kits. Lego was by far the most dominant, but I also used Duplo, Playmobil, Lincoln Logs, Erector sets, K'nex, Hot Wheels, Tinkertoy, and Fishertechnik. Much of my playtime was spent combining these systems. Sometimes the scales could be frustrating but there was some satisfaction to send a Lego train through a western town made of Lincoln Logs while attacked by Playmobil giants.

The FUCS is a free and open set of 3D printable adaptors for a large number of toy systems, designed to make these toys interact with each other. Finally, a child (or child at heart) with a Makerbot can construct toys and systems that cross beyond the borders of corporate imaginations.

I got to thinking how these toy systems speak some internal language, but in order to work together the appropriate universal adaptors need to be supplied, and connected with other systems. 

This is the same in data and computer science. To bridge two systems we need to agree on standard APIs, formats, and schemas. Oftentimes when one does not control the data itself in a section of the pipeline an engineer must define an appropriate adaptor to parse, transform, or display some set of information. Similarly, an interface needs to be implemented on the receiving end of a pipeline to do some sort of data load process. Good engineers live by defining appropriate, lightweight abstractions to complexity, allowing systems to communicate and lever each other's power, features, or datasets.

One example of a parallel in the software world are Protocol Buffers from Google. If you're not familiar with the concept, essentially JSON has become a standard data interchange format, except it's quite inefficient to read and write at high performance and since it doesn't compress well, is expensive to transfer. Additionally, the types are limited so we're left with a variety of add-ons and supersets to try to describe types like dates and times. Protocol buffers are not the solution to every problem, of course, but they do tackle the issue well on some fronts. Essentially you define a schema like file that describes your data, and then the system can generate an adaptor for your language of choice, creating a standardized way to serialize data to a binary format from a C program and read it in with something like Javascript. Protocol buffers are the abstraction that each subsystem can look to to communicate, allowing us to build polyglot systems.

The FUCS is a great, tangible representation of simple abstractions and interfaces for toy-engineering. Children today might get a better understanding of a system stack just by expanding their toy creations, and that is exciting.


Data Visualization: How to Make a Really Bad Graph

As a data lover I have a particular nonpolitical aversion to Fox News. Their travesties in representing data have been well documented on other sites.

I enjoy poking fun at bad graphs, but over time one comes to recognize common traps many fall into when designing a graph. Recently I came across this particularly bad graph on a social network. If you know the source I would appreciate the ability to cite where it came from, but perhaps it is just as well that we let the origin remain anonymous.


Let's take this graph one piece at a time and let it instruct us on how to make our own bad graphs.

Use Multiple Y Axes when Displaying Similar Data

You'll note that the chart is measuring two continuous measures of exactly the same type of data (employment/population ratio). However, they're plotted against two different Y axes. It can be difficult to add more than two axes, but if you can add more, it really helps aid incomprehensibility so people can't compare data well.

Use Different, Unnatural X-Intercepts for each Y axis

Your readers expect you to make zero the Y-intercept on your axes, so make sure you don't do that. Pick an arbitrary number for each axis that suits the message you wish your data to give, and allows you to exaggerate its effect.

Use Different Scales for Similar Data

The author of this chart was very clever. Look at the difference in scale from the top to the bottom of the chart. On the male trendline the graph measures 78-62=16 units, and the female trendline measures 60-38=22 units. The more axes with misaligned scales the better. Your readers will try to compare the relative scales, and you will have have them making incorrect inferences in no time.

Use Ratios as your Data

It is well known that people do not mentally grasp ratios well. People understand absolute numbers, so whenever you need to shroud information, put it in the form of of a ratio over or under some other number. In this particular case the ratio actually might make more sense, except it would make more sense if it were quoted as a percentage of a certain population.

Add a Trendline, preferably Linear

Here's a big one. Graph consumers love trendlines. Adding a trendline to the graph is like adding a puppy under the Christmas tree. They reinforce the message that everyone should assume the data you're displaying is linear. All data is linear, right? It lets your readers draw wonderfully ridiculous extrapolations off into eternity on either end, and lets you make a strong policy case that change is needed to modify the graph. Trendlines are also excellent for reinforcing the idea in your reader's minds that there is just 1 relevant variable into the function, usually time. Everyone remembers "y=mx+b". Give your readers an intercept b (see above regarding how to choose your intercepts), a comfortable m slope in the form of a trendline, and a nice range of x values along the bottom, and they will solve for truth on their own. This will keep your readers from considering other reasons you didn't have time to research that might be affecting the lines.

Hopefully this has been a valuable lesson in data-obfuscation. I'll continue to bring more tips as I find good examples of bad graphs.

Junior Engineer Hiring and Compensation

Recently I have been considering the nature of the software engineering job market for junior engineers. I have a lot of peripheral experience with training, recruiting, and hiring junior programmers. I have spoken to and coached non-programmer friends and acquaintances through the programming bootcamp experience, and it has been awesome to see where they end up--great companies like GoPro, Amplify, Thoughtbot, the NYTimes, and Priceline. I have recruited bootcamp graduates from most of the major programs in NYC like Dev Bootcamp, Flatiron School, Fullstack Academy, AppAcademy, and General Assembly.

Years ago I was that junior engineer coming out of university looking for any company that would take an interest while seeing senior engineers get hired at ever-increasing rates.

The market for senior software engineers at the time of this writing could not be more insane. Passively put your resume on a site like Hired.com and you will get 10-30 companies of varying size bidding ever-higher salaries (120-160k+ is common) for your skills. The stress of telling a long series of amazing companies "no I don't have time to even talk with you" should cause a cognitive dissonance when you remember how many millions of Americans have completely given up looking for jobs.

Why is it so hard to get a job as a junior engineer? Let's look at a typical profile.

The average tech startup is building a Rails or Django app depending on what programming language their CTO knows better. They might have some funding after an A or B round and a few senior developers on the team. Cash is king, but their investors obviously want them to spend that money on engineer's salaries so they can build product. Everyone is pretty busy and is trying to scale as fast as possible. Employees are expensive, but if they can build product fast enough, they're worth every penny.

The average junior engineer is looking for a job at a startup. They have touched all the tools and aren't afraid of code, but they know pretty much nothing. They've built an app, but only by adding 50 lines of Rails code. They have a Github account, but don't understand anything about snapshots or tags or why you don't force push to master. They know some basic data structures so they can reverse a linked list in an interview question but do not know what a red black tree is. They've written tests but don't understand testing. They've queried a database but haven't written a database. Or dealt with data corruption. They know what a queue is but haven't dealt with a queue that's overloading at millions of items per minute. It's the problem with any job. As a company, you want someone who can say "I think I've dealt with this problem in the past" and then nails it.

Junior engineer salaries out of these bootcamps ballpark 60-100k. This is the attraction of the bootcamp: spend 9 weeks with us and you could get a six figure job.

Here's where I think there's a market imbalance. Junior engineers at the outset aren't worth anything close to 100k. After a year of learning, they definitely might be, but in the meantime they present a huge risk to a growing company. A junior engineer will need coaching, mentoring, and encouragement. They might merge code in that breaks a production system. They might erase your git history by accident. They will break the build. Multiple times. I know because I've done all those things and seen junior engineers do all those things after me.

In a couple years of working that junior engineer might be paid 100k, but be doing output on that Rails app comparable to a senior engineer that might be paid 160k. Sure, there are those really obscure issues that only someone who has programmed for ten years will nail down, but for most product-oriented stories, those two years of experience will be paying big dividends.

My personal experience is aligned here. I made a lot of mistakes my first year from breaking the build to swallowing exceptions. A few years later, while I'm absolutely still learning and improving, I deliver more stories and merge more pull requests than most senior engineers on our team.

So here is my proposal. Let's all be honest about what we're worth. I think more companies should snatch up junior engineers, but pay them 40k--enough to pay the rent and buy groceries. But here's the catch: tell them from the outset "you're worth 40k to us now, but we'll guarantee a compensation discussion every six months, in which we'll plan to ramp your salary to 100k in 2-3 years depending on performance." That would really help stabilize the market. I think it would keep junior engineers humble but ambitious. It would de-risk companies looking to hire. It would reduce unemployment for recent grads, but guarantee a pipeline of competent programmers for startups.

I'm not sure how viable this could be in the real world, but when I launch a startup, I'd be willing to give it a try. Your best future programmer might be the one you're overlooking.