Vote your conscience

What’s happening at the Republican National Convention doesn’t feel real, but it’s real. The self-aggrandizing nominee for president claimed, “I alone can fix it.” Later, chants of “Yes you will, yes you will.” This is not about policies; it’s fear and cult of personality.

Fascism is the following (copied and pasted from here):

  • Glorification of the past (before the debasement of the nation); past seen as glorious, source of inspiration for the present.
  • Exaltation of force, strength, violence: slogans, symbols, costumes, insignias, military. Promotes discipline, sacrifice, blind obedience to the leader.
  • A reaction (defines itself through reaction to something else): against those that have debased the nation, those that disunite it, that cannot defend it against its enemies.
  • In fascism, the enemies of the nation are old corrupt politicians, foreigners, especially Jews, communism (promoted by Jews).

And let’s not forget the calls of “America First”, which is a reference to the political party of Nazi sympathizer Charles Lindburgh.

By all means, vote your conscience. I just hope that your conscience tells you that, above all, this man and his party must be defeated.

Quantified cantillation III: sequences

First post
Second post

Earlier this year I published a couple of blog posts with some descriptive statistics of trop in the Torah. One of the biggest shortcomings of those posts was that they didn’t deal with the order of trop at all. This is a pretty big shortcoming when you consider that many trop come in pairs/groups, or that certain trop frequently or necessarily follow certain other trop. So, this time around I created an interactive tool I’m calling (for lack of creativity) the Trop Sequence Explorer. If you haven’t checked it out yet, I’d suggest playing around with it a bit; it’ll give you context for the rest of this post.

Basically, it shows each trop listed in order from most to least common. When you click one, it shows you all trop that can follow it and how often each one occurs in that sequence. In other words, it shows transition probabilities to each trop conditional on all trop that come before it in a sequence. There’s also a graph at the bottom that shows how often the selected sequence occurs in each perek of the Torah. Clicking a bar in the graph shows the text of the p’sukim in that perek that contain the current sequence.

Trop Explorer screenshot

What follows is a bit of the thought process that went into its creation, some issues I ran into, and some interesting observations. Feel free to jump to the section that’s most interesting to you.

The Jewish Nerd section

Back in the fall, I was gabbaiing and noticed two tevirs in a row. “How often does that happen?”, I wondered. Seven times, it turns out. It’s pretty well known that a zarka has to be followed by a segol or a munakh segol, but it turns out that the latter is actually more common (by a 13-point margin).

Beyond the factoids, there are other fun things to come across. Parallel sentence structures often have parallel trop, even when the trop itself is not that common. In B’midbar 26, gadol is used at a much higher rate than normal, mostly on names in a genealogy; it really pops out in the bar graph.

Gadol in B'midbar 26

One of the most surprising things for me, though, is how relatively unique each pasuk is. Once you get more than three or four levels deep in the tree, there are surprisingly few p’sukim that match that sequence. This is even true for seemingly common sequences. A pasuk that is merkha tipkha etnakhta merkha tipkha sof pasuk only happens 43 times in the entire Torah.

As I was creating the Sequence Explorer, I encountered some challenges and needed to make some decisions about how it used trop data. One question several people have raised is: Why are there ever trop following a sof pasuk? Shouldn’t a sof pasuk, by definition, be the end of a pasuk? The answer is that there are two sets of trop used for the 10 Commandments, the takhtonim, which are used for private study, and the elyonim, which are used for public readings. I chose to use the elyonim because I wanted to examine how trop are read out loud. The problem is that the two sets of trop also have different pasuk divisions. Even though I used the elyon trop, I had to use the takhton pasuk divisions, because the takhton divisions seem to be more standard, and are the ones returned by the Sefaria API, which is what I used to pull the in actual pasuk text when you click on a perek’s bar in the bar graph. Perhaps at some point I’ll add a setting so people can explore both versions.

Many authorities consider munakh legarmeh a separate trop. I decided not to count it separately for two reasons. The simple technical reason is that there is not a different Unicode character for it (distinct from munakh), so I would have to detect it based on context. The other is that, by definition, the munakh legarmeh is a munakh that precedes another munakh. Since that’s exactly the type of data this app shows, it felt both redundant and somewhat circular to distinguish a trop by what follows it. If you click the munakh, the number of munakhs that follow it should be equal to the number of munakh legarmehs.

Seeing sequences also helped me find issues in the data that I couldn’t see otherwise. For example, I found a couple instances where the data showed four pashtas in a row, but this wasn’t really the case. Trop typically indicate where the stress should fall in a word, but some trop must be placed at either the beginning or the end of a word regardless of stress. To help readers, many sources, including — I found out — the data source I used, put such trop on a word twice: once in the required position, and once where the stress falls. I cleaned out those doublings by searching for any word with two trop on it, and if the two trop were the same, I deleted one of them. Hopefully there was no collateral damage from that.

Another oddity was that there were ten tsinnorits and one geresh mukdam. This was odd because those trop aren’t used in the Torah, even if their lookalikes, zarka and geresh are. It seems like they were used for typesetting reasons — their placement on a word is slightly different — so I just lumped them in with their respective lookalikes.

There were also a number of p’sukim with no sof pasuk. I’m not sure exactly why, but I fixed them. Being able to see the bar graph across the bottom was hugely helpful in seeing that this was an issue.

Speaking of the bar graph at the bottom, aggregating by perek is somewhat arbitrary. At some point I would like to try aggregating in other ways, such as by parshah.

The Design Nerd section

I knew pretty early on that I wanted to do some sort of Markov chain–like visualization of transition probabilities, but I set the idea aside to do real work, which, fortunately, happened to involve learning D3. When I turned my attention back to this, I realized two things:

  1. Pairwise transition probabilities aren’t that interesting in isolation; sequences are much more interesting. (In other words, you need memory in your Markov chain.)

  2. As in the previous posts, we have the complete dataset. Descriptively exploring that is very different from wanting to make predictions or generate new sequences, which is a more typical use of Markov chains.

So, I settled on the basics of a design, but without a few key features. The original idea was a tree, where each level would show the conditional probability of going to a particular trop given all those that had come before it. The plan was just to show simple squares with a trop symbol, its name, conditional probability, and conditional count. And, there was no bar graph at the bottom to show where a given sequence occurred.

Original whiteboard sketch (or what's left of it)

It wasn’t until I was sketching out the visual design for the squares — well after I had it actually working — that I came up with the idea of shading them in, making them into a histogram of sorts. Since they seem to follow something not entirely unlike a Poisson distribution, I thought about log-weighting them, but decided it would be more straightforward not to since I’m also showing raw counts.

Once I could play with building sequence trees, I pretty quickly wanted to know where in the Torah those sequences were. And so, the bar graph at the bottom was born. For most of the time I was building it, clicking a bar would just open that perek on Sefaria. Using the Sefaria API to pull in the text of the actual p’sukim was one of the last features to go in.

The Programming Nerd section

When I first started thinking about how to implement this, my intuition was to have the data structure match the tree structure of the interface. It felt elegant, and it seemed like a good idea at the time. I wrote a recursive function (after fighting with mutable container objects in Python) to go through the trop strings and build a giant JSON file shaped like this:

  "name": "munakh",
  "count": 5456,
  "children": [
     "name": "revii",
     "count": 1410,
     "children": […]
     "name": "katan",
     "count": 4350,
     "children": […]

Well, that turned out to be 8.6 MB — way too big to download as part of a web app. A similar file that listed which prakim had which sequences was over two gigabytes. I wrote most of the UI (locally) with these two files. Thankfully, I finally realized that I could just download a 760 kB list of raw trop strings and search for sequences on demand in the browser. And that, folks, is why I’m in HCI, not real computer science. Derp.

Finally, D3 was great to work with. Being able to define a simple linear scale like this

var x = d3.scale.linear()
    .domain([0, width])
    .range([width, 0]);

even made it easy to work right-to-left when SVG objects have their origins in the upper left-hand corner.

Future work

I’m a grad student, so how can I resist a Future Work section? There are a number of features I’d like to add at some point. As I hinted at earlier in this post, it would be nice to be able to aggregate the bar graph by parshah instead of just perek. Combining other aggregations, like sefer, with the ability to limit sequence queries to certain parts of the text would open the door to adding the rest of the Tanakh. (The Emet books would be outta control!) And color coding disjunctive and conjunctive trop would be a nice way to see more structure in sequences. If you want to take a stab at any of these things, have a look at the issues list for this project on GitHub.

And if you’ve made it this far without actually using the app, go play with it now!

Unlocking an iPhone: Do You Have to Restore?

When you unlock an iPhone, the official instructions say something a little odd:

If you have a SIM card from a carrier other than your current carrier, follow these steps:

  1. Remove your SIM card and insert the new SIM card.
  2. Complete the setup process.

If you don’t have another SIM card you can use, follow these steps to complete the process:

  1. Back up your iPhone.
  2. When you have a backup, erase your iPhone.
  3. Restore your iPhone from the backup you just made.

Wait, what? Why would I want to unlock the phone if I didn’t have a SIM from another carrier? And isn’t doing a full restore kind of a lot to ask?

As far as I can tell, here’s what’s going on: when you request an unlock from your original carrier, they don’t unlock your phone, they tell Apple’s activation server that your phone is now unlocked. In order to finish the unlock process, your phone has to check in with Apple’s activation server.

There are apparently only two ways to force an iPhone to re-activate with the server: put in a new SIM, or restore the phone. But why would you go the restore route? Because if you’re traveling abroad, when you arrive at your destination and install your newly acquired SIM, it’ll try to contact the activation server. But it can’t reach the activation server because you don’t have data service on your new carrier yet. At this point you’ll be dropped into Activate mode and won’t be able to do anything with your phone until you activate it. If you happen to be somewhere with wifi that doesn’t require any sort of web-based authentication (so, not most airports or hotels) you may be able to activate that way. Otherwise, you’ll have to use iTunes on your computer — assuming your computer can get wifi.

If you won’t be traveling with your computer, or may not have access to a non-cellular internet connection, you’ll want to do that restore at home before your trip. Otherwise, skip the restore and activate through iTunes or wifi.

Quantified cantillation II: word counts

First post
Third post: Trop Sequence Explorer

A lot of discussion around my last post was about the role of sentence structure. For example, there’s a heuristic that psukim with fewer than five words don’t have an etnakhta, while those with more than five words do. This visualization lets you explore these types of relationships.

By word count visualization static

We see that, indeed, etnakhtas do approximately follow this pattern, while other trops’ counts naturally vary more linearly with word count (e.g., mapakh and pashta). Other trop, though, like tipkha, quickly hit a ceiling regardless of how long a pasuk gets.

Note that I’ve cut off the x axis at 33 words. While there are much longer psukim, there aren’t enough of them to get meaningful averages.

Wordcount distribution Wordcount distribution

Click in the legend to turn a trop on and off; double-click to solo it. As with the first post, there’s nothing revolutionary here, but I think it’s still interesting to see and explore. (Also, I'm no expert on D3/NVD3, so don’t judge me too harshly. And if you’re on IE and it doesn’t work, tough luck.)

Quantified cantillation

Second post: Word counts
Third post: Sequence Explorer

When read publicly, the Torah is often sung using a system of cantillation marks, or trop in Yiddish. There are many different cantillation marks, each of which has a name, a unique sound (or sounds), and comes in combination with other trop.

When the cycle of readings started over this year after Simchas Torah1, it seemed like there were more telisha gedolahs in Bereshit (Genesis), whereas there were more telisha ketanas in D’varim (Deuteronomy). I decided to find out whether or not this was really the case.

First, I needed a dataset. offers the entire Tanakh in XML form, including trop and vowels. I was only interested in the Torah, so I downloaded XML files for each of the five sfarim (books). I went through the XML and tabulated how many of each trop were present in each pasuk (sentence).

Aggregating by sefer to consider my original question about the relative frequencies of telisha gedolahs and telisha ketanas, we see that my intuition was somewhat correct: while there are more ketanas throughout, there are more overall ketanas in D’varim.

Telisha Ketana and Telisha Gedola by Sefer Telisha Ketana and Telisha Gedola by Sefer

However, the ratio of telisha gedola to telisha ketana is actually not substantially different in D’varim and Bereshit. So while overall counts are higher, the relative frequencies are not so different.

Ration of Telisha Ketana to Telisha Gedola by sefer Ration of Telisha Ketana to Telisha Gedola by sefer

Aggregating by sefer is interesting, but I wanted to see more continuous variations. Looking at a series of what for most trop would be zeros and ones, with an occasional two or three, isn’t that useful, but Zach (a Ph.D. student in Statistics) suggested a moving average, and that worked quite nicely. We used a 500-pasuk-wide window, which struck a balance between detail and low-pass filtering. (I come from a signal processing background, not time-series analysis.)

As with the initial bar graph, you can really see the number of telisha ketanas explode in D’varim. But more interestingly, we can get a sense of how they track each other through the Torah.

Telisha Ketana and Telisha Gedola through the Torah Telisha Ketana and Telisha Gedola through the Torah

Seeing how different trop track each other is fun. There are some things that you’d expect. For example, munakh is often associated with katan, revi’i, and mapakhpashta, and we see that clearly here.

Common associated trop through the Torah Common associated trop through the Torah

Particklarly striking is the tight correlation between zarka and segol.

Zarka and segol through the Torah Zarka and segol through the Torah

Although other combinations, though, like dargatevir are more loosely correlated.

Darga and tevir through the Torah Darga and tevir through the Torah

(For more correlations, here are the pasuk by pasuk and moving window correlation tables.)

While these patterns are intuitive, the fact that trop — especially common ones like merkha and tipkha — aren’t uniformly distributed across the Torah was, to me, somewhat less expected. A big reason for this is changes in sentence structure. This becomes extremely obvious when looking at etnakhta, which essentially functions as a comma.

Etnakhta through the Torah Etnakhta through the Torah

The reason for the rather dramatic plunge toward the beginning of B’midbar seems to be a shift in sentence structure. Checking the text, this part of the Torah contains quite a bit of genealogy, which contains many single-phrase sentences (“So-and-so begat so-and-so”2), and many occurrences of the common pasukוידבר ה` אל־משה לאמר”.

Oddly, I did a bit of digging into this, and it looks like a drop in words per pasuk actually lags the drop in etnakhtas. I’m not sure why.

Etnakhta and wordcount through the Torah Etnakhta and wordcount through the Torah

I could imagine running a logistic regression to see whether words per pasuk predicts the presense of an etnakhta, but I’m going to cut myself off now.

If you’re interested in playing around with this yourself, everything is on GitHub. If you just want to cut to the chase, here’s a CSV file of the raw data. And here’s an IPython Notebook.

  1. Benjamin, please forgive my transliterations.

  2. No wife required, incredibly.

Web apps and needless intermediation

Histories of computing, computing culture, and the politics of computers often make this basic claim: the very purpose of the Personal Computer was so individuals could benefit from computation without having to rely on corporate- or government-controlled mainframes. But it’s pretty clear to me that we’ve come full circle and are over-reliant on remote servers. The founders of the PC movement would probably say that we’ve lost our way. (Lazyweb: can someone dig up a quote of Woz saying so?)

Of course it’s not black and white, but we’ve gone overboard. I see this from two directions:

  1. Web apps. I like to give friends on Twitter a hard time because I strongly believe in native apps over web apps. This is partially because of the superior performance and user experience, but in large part it’s because there is simply no reason for most apps to not run on my computer. A word processor or text editor does not need to run on someone else’s computers (which makes me both reliant and vulnerable). Nor does it need to run in a web browser, which, twenty-some years on, is still not particularly well suited to applications. Google’s vision of dumb terminal Chromebooks takes this needless remote execution to its (il)logical extreme.

  2. (Needless?) intermediation. In addition to needless remote code execution in an inferior UI framework and runtime environment, why should my IMs or video calls go through Google or Skype servers between me and their destination? I’m reading The Master Switch by Tim Wu, and one of his points is that the underlying architecture of the internet is helping it resist the corporate forces that have brought about consolidation in other information industries.

    That architecture, it seems to me, is one in which every machine on the network can access any other machine on the network. But that no longer seems to be the case. AIM used to have a feature called Direct Connect; years before Google Docs, SubEthaEdit allowed for collaborative editing directly between computers. So why is that not how we communicate now? It was finicky, and we (rightfully) like things that Just Work™. It was finicky because most of us don’t have public/external IP addresses. That meant having to route network traffic through IP masquerading NATs, and getting a two-way route is hard. (Even Skype’s so-called P2P protocol uses “supernodes”, which are located in Skype/Microsoft datacenters, as intermediaries betweened NATed/firewalled clients.)

    So because most end users do not have public IP addresses, it’s more reliable to have most traffic routed through a server. This breaks the very property of the internet that makes it so unique. Perhaps this is a natural stage of development in the network, because we’re running out of IPv4 addresses. Maybe with IPv6, everyone can have a public IP, and we’ll be able to collaboratively edit documents peer-to-peer, from my text editor to yours.

To conclude, the words of the inimitable @SwiftOnSecurity:

Let’s make XMPP fun again!

I recently had a fun conversation by Facebook Messenger, and I’m disappointed to report that it was actually a really great experience. The same goes for GroupMe.

I can’t exactly put my finger on what makes the experience so good, but that’s not the point of this post. The point of this post is to complain that not only are most of these chat services proprietary and either aren’t sustainable businesses or make their money by selling your personal data in some form or another, but they also aren’t interoperable. You like Kik but your friend is on Whatsapp? Too bad.

The thing is, it doesn’t have to be like this. There is a standard for synchronous chat with a healthy software ecosystem around it: XMPP. Google Talk is based on it (sort of), and it can even support things like geolocation and read receipts! The problem with XMPP is that users — most of whom don’t care about technology and just want to talk to their friends — are left to find their own hosts and clients.

So here’s my modest proposal: can someone make something like Facebook Messenger, Whatsapp, Kik, GroupMe, Viber, etc., (fun and social, easy to add/remove people from groups, optional read receipts, optional geotagging, easy to attach images, even cute emoticons, usable from web or mobile, with indicators of keyboard size to help others manage expectations), but make it a freemium (and maybe, but not necessarily, nonprofit) organization: host accounts for the free customers, but let people bring their own XMPP account if they pay.

It’s counterintuitive to let people pay for the privilege of having a company or organization do less for them (not hosting the account), but I think it makes sense. Those who pay are the ones who care about things like interoperability and not having their data aggregated and sold. And ideally there will be enough of them to support the free users and keep the whole thing sustainable.

With a system like this, I could use Adium on my Mac, a client of my choosing or their own killer mobile client, and maybe even be able to log into my own XMPP account via their web client.

Looking around, I think Whatsapp seems like the right company to take this on. Aside from already being extremely popular, get that asking people to pay for a service they find valuable is a Good Thing. (I know there were rumors of a Google acquisition, but those seemed to have died down and I hope it stays that way.) If they added a web client and charged, say, $10 a year for BYO XMPP, that’d be it. Maybe they could even make the $1/year tier free if they did that, although I don’t think they should.


Collaborative design

Having talked about it a bit with a few people, my friend Cory recently1 sent me an essay from way back in 2002 by Havoc Pennington on why free software often has poorly designed UI.

One of his points is that “designers can’t submit code patches.” I have long felt this was a major problem facing not just UI designers in OSS, but any designers working collaboratively.

Basically, the reason a relatively hierarchically flat collaborative style works so well for software is that it’s atomic. Code is made up of functions and objects, which in turn are made up of lines, which in turn are made up of characters. The way most languages are designed, lines make the perfect atomic unit for versioning, making version management tools like Git work so well for large teams. When someone makes a change to someone else’s code, it’s really easy to see exactly what the change was.

That makes it really easy for someone who’s never contributed to a project before to fix bugs and make small changes. UI design2 is different. Like any kind of design, it requires big-picture vision. And that is exactly what top-down organizational structures or solo designers are good at.

Is this just a matter of tools that are inadequate for the task of distributed design, or is this really a fundamental aspect of design that makes it poorly suited to large, distributed teams?

  1. Apparently I started this post on April 27, 2011 despite not getting around to finishing it till now.

  2. UI design is distinct from UI implementation. Nudging something a few pixels, slightly modifying the behavior of UI elements, etc., is UI implementation, and is something that is at least somewhat atomic.

Adium Weather Sparklines

Today’s programming diversion: weather sparklines in Adium!

Weather sparklines screenshot

Get it and make it better.

In defense of Google

Social media has exploded this afternoon with people upset about Google shutting down Google Reader. Well, I’m about to do something I very rarely do: defend Google.

As you might assume by my support of, I don’t object to proprietary services; I object to proprietary data and lock-in. Even services you pay for can be shut down, though it’s more likely when providing said service isn’t aligned with a company’s business model. By letting people export their feed lists, Google is doing this responsibly. (RSS itself is, obviously, an open format.)

Even if you run something like TT-RSS, the hosting provider (which you pay) could stop operating. Host from a box in your living room? Great, until your ISP caps your upload bandwidth. Autonomy is a lovely idea, but unless you conduct all of your communication via ham radio1, you can pretty much forget about it.

Which reminds me: why are web apps such a good idea in the first place? Just use a native feed reader. (You know, local binary, the whole nine yards.)

  1. For the record, my call sign is KC8TKP. ;)