DISQUS

David R. MacIver: Living on the edge of academia

  • BCox · 1 year ago
    I see that your company, Trampoline, is located in London. I'm not familiar with the policies of UK university libraries but I know that most public and many private university libraries here in the states allow visitor access to the public. If it's the same there you should ask your boss for a day outside the office to photocopy as many papers as you can. I have many friends who have spent a few days outside of the office reading journals at a university 20 miles away.

    As for the implementation issues:
    I work in a different field but I've found the overstatement of an algorithms effectiveness to be a universial atribute of most scholarly works. It's incredibly dissapointing to spend a few days implementing a siliver bullet only to find that it's a lump of coal when applied to real world data. This is just the way it is when the work you're doing is pushing the boundaries of common knowledge.
  • david · 1 year ago
    Yeah, I overstated the point on access slightly: If it ever comes down to a point where I simply *have* to have access to one particular paper I can just go to the british library or something and photocopy it instead of paying the $100. But having a not that easy to access physical copy is still massively inconvenient compared to an electronic copy. So mostly I just take the path of least resistance and don't bother with the papers I don't have immediate electronic access to.

    On implementation: I understand that overstatement of effectiveness is a general problem. It would just be nice to find that out up front rather than after I've put in all the work. :-)

    I try to be charitable about it - in some cases it's probably just that it only works in a certain subset of cases. I'm sure many of the things we're doing in SONAR are similarly ineffective when applied to things outside our domain (e.g. our techniques are optimised for lots of short documents and probably don't do nearly so well on fewer long ones). And sometimes I just can't be charitable. A lot of algorithms published are actually complete nonsense and don't do what they claim to do but get away with it because of massaging of data and other pre/postprocessing.
  • Jason Adams · 1 year ago
    This is something I have encountered and been bothered by quite a bit. It's one of the reasons I advocate using git for research. Ted Pedersen wrote about this very issue as well in a recent issue of Computational Linguistics, so it is encouraging to know that it is at least being acknowledged more widely. One of his points is that even though creating distributable software requires more effort and time, it pays off by having people use your software, increasing your reputation.

    It's not always enough to release the code, though. You can find plenty of bits of software out there. Sometimes it's in a state so bad, it's almost worse having the code than just reimplementing it yourself (at least, that's what you tell yourself).
  • Jason Adams · 1 year ago
    Oops, meant to link to the free version of Ted's paper: Empiricism is not a matter of faith (pdf).
  • diN0bot · 1 year ago
    spot on! most (computer science) research intends to be open content and open source, but in practice fails.

    the culture is changing, so i'm not too worried. keep spreading the word and encouraging academics to use GitHub or whatever makes sharing knowledge most convenient and effective.
  • suman · 1 year ago
    You can become an ACM or IEEE member to access the papers. The membership fee is cheap compared to the price for a single paper. Also, if you have access to a university library, you can download the papers for free there since the university would already have access.
  • Xianhang Zhang · 1 year ago
    Hi David,

    I've generally had very good luck just emailing the authors. I don't think I've ever been rejected with a polite request for a paper and I've certainly always been willing to send my paper to anyone who asks me.

    As for not publishing a reference implementation, I'm willing to bet a huge part of that is also that most academics are kind of embarrassed about the quality of their code. It's not always the most elegant of software engineering and the effort to make it publication worthy is often neglected with so much other stuff to do. Again, I've had good luck emailing them and asking if they'd be willing to send the source code. I don't think I've ever had a reply back that didn't include some sort of apology for the quality of the code.
  • david · 1 year ago
    Thanks all. Glad to know it's not just me. :-)

    Jason: Thanks for the link. It was an interesting read.

    On the subject of bad code: I genuinely would rather have bad code than no code. Even if I can't actually get the damn thing to work it still gives me a source for figuring out the hidden details. To take the punkt example - as far as I can tell, nowhere in the paper does it specify its exact tokenizing scheme. It's easy to guess an appropriate one, and the one I've guessed seems to work, but if it didn't work it would be really nice to be able to go back to the source and figure out exactly what it considers a token.

    Good point about asking people for the papers and code. For some reason I never think to do that. I shall try to be better about it in future.
  • Hadley Wickham · 1 year ago
    Have you thought about joining a university library? It's usually possible to gain access to the electronic resources of a university library just by paying a fairly reasonable access fee (I just looked at the City University of London and it's only 100 pounds / year)
  • Eugene · 1 year ago
    I've returned to academia after 30 years of professional experience. Having spent time on both sides of the great divide, my suggestion is that you identify those academics who are active in your area of interest and invite them to actively consult and/or perform sponsored research. That way you'll end up with the advances that you seek and the academic community ends up with practical application and case study. Such a win-win may even have R&D concessions and other government support. (In Australia, your company would access 125% tax concession plus assistance via additional grants.) All universities that I'm acquainted with have commercial research/consulting arms.
  • Adrian Kuhn · 1 year ago
    David, I can only agree with you. Even though I am in academia, I suffer from the same problems. In particular the lack of reference implementations and thus reproducible results(!!!) is very annoying. This is, among others, one of the reasons why we required that any submission for the WASDeTT journal issue must consist of both papers and tool with sources! (The issue is still in the reviewing process, for the moment refer to http://doi.ieeecomputersociety.org/10.1109/MS.2...)

    My personal experience wrt asking for papers and code is as follows: Papers good, code bad. For papers beyond my research area (ie LNCS Springer) I am without free access either. My solution: ask other researchers for an ssh account within their LAN and download the papers via VPN tunnel. Regarding references implementations I have very bad experience asking for non-open-sourced code. If researchers dont open-source their code, they still give it to you but will become very possessive about all what you do afterwards, however unrelated it might be.
  • Robert Daland · 11 months ago
    I second the comment about an electronic subscription to a university library.
    Through Northwestern University's library I can get electronic access to almost everything that has published since 1996 which is worth reading. A subscription costs something on the order of 120 USD/year.

    Also, about publishing code, consider the incentive structure.
    The citation distribution follows a power law, meaning that a few papers get many citations but most papers get few citations or none at all. In other words, the probability is high that no one will care about your code. Since academics are often ashamed to release an inferior product, they feel they must make a cleaned-up version for public release. And the expected return on this is negative, since the probability is so low that anyone actually cares.
    Academics would publish their code much more freely if they were rewarded for it, and those rewards were clear and tangible. For example, if released code factored into tenure decisions, you would see a quantal change in how much open-sourcing goes on.