Data science from the command line: a look back at 2 years of using xan
AW1.120 | Day 2 | 13:15 - 13:45 | Speakers: Béatrice Mazoyer, Benjamin Ooghe-Tabanou, Guillaume Plique
Abstract
Xan is a command-line tool designed to manipulate CSV files directly from the comfort of the terminal.
Originally developed within a sociology research lab to perform common operations on very large datasets collected from the web (exploration, sorting, computing frequency tables, joins, aggregations, etc.), it has become a go-to solution for its users for many more use-cases, including lexicometry analysis, plotting histograms, time series or heatmaps, and even generating network graphs. And while the tool was initially created to deal with very large CSV files, it is now also used by people to process small files, and other file formats. The tool was thus included in the daily data manipulation practices of its users, who saw it as an opportunity to never leave their shells, without having to rely on GUIs or notebooks.
This presentation, given by a research engineer after two years of regular use, examines the reasons for this appropriation, which relates both to the constraints of research in the Humanities and Social Sciences and to the interface design choices that make xan effective.
Attachments
Speakers
Béatrice Mazoyer is a research engineer in digital methods at the Sciences Po médialab. Her research focuses on how information circulates between social and traditional media, using natural language processing, automatic image analysis and data mining. She has contributed to several open-source software programs for use in humanities and social science research.
Trained as a multidisciplinary engineer, Benjamin Ooghe-Tabanou specialises in applying computer science to scientific research. After multiple experiences within the astrophysics field at Johns Hopkins University in the USA and École Normale Supérieure in France, Benjamin enters the social sciences field first as an OpenData and Parliament transparency activist. He joins Sciences Po's médialab as a research engineer in 2012, focused on webmining and developping open source tools for social sciences, and he leads médialab's research engineers team since 2020.
Guillaume Plique is a research engineer working for SciencesPo's médialab. He assists social sciences researchers daily with their methods and maintain a variety of FOSS tools geared toward the social sciences community and also developers.
Links
External Links
Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.
