CLJS - Read files line by line on NodeJS Part 2
Note: There are a number of things that may be unfamiliar to new cljs devs. Rather than making this a huge post, I’ll write a separate post for each concept. For example, if the
lazy-cat
used in this post is unclear, check out my explanation of lazy-seq and recursion.
In my last post, we had some code that could read files line by line. We then used each line, either by a callback or a channel. While the code worked, I didn’t want the asynchronous reads. Instead, I wanted to get as close to this as possible:
(with-open [rdr (open-file path)]
(doseq [line (line-seq rdr)]
(js/console.log line))) ;;do something with line
This has the following benefits:
- It’s exactly the same as clj
- I don’t have to think about async code that provides no benefits here
- It lazily reads in the file and handles cleanup
- It’s a sequence and I can program to that interface
So the first thing was to switch from createReadStream
to readSync
. readSync
synchronously reads in a user specified number of bytes.
(def fs (js/require "fs")) ;;require nodejs lib
(defn- read-chunk [fd]
(let [length 128
b (js/Buffer. length) ;;Buffer is a global nodejs lib
bytes-read (.readSync fs fd b 0 length nil)]
(if (> bytes-read 0)
(.toString b "utf8" 0 bytes-read))))
We can now read an arbitrary amount of data from the file, but we have to manage finding actual lines of text. Here’s the logic we need to implement:
- Get the next chunk of data
- If it contains a line of text, return that line of text and save the remaining text for the next call
- If it does not contain a line of text, get more data and try our line check again
- If we read the EOL, just return whatever data we have
Here’s what I ended up with:
(defn line-seq
([fd]
(line-seq fd nil))
([fd line]
(if-let [chunk (read-chunk fd)]
(if (re-find #"\n" (str line chunk))
(let [lines (clojure.string/split (str line chunk) #"\n")]
(if (= 1 (count lines))
(lazy-cat lines (line-seq fd))
(lazy-cat (butlast lines) (line-seq fd (last lines)))))
(recur fd (str line chunk)))
(if line
(list line)
()))))
line-seq
takes a file descriptor and lazily reads from the file, returning a lazy sequence of lines.
The last bit is our with-open
macro. Note: cljs macros have to be written in clojure
(defmacro with-open [bindings & body]
(assert (= 2 (count bindings)) "Incorrect with-open bindings")
`(let ~bindings
(try
(do ~@body)
(finally
(.closeSync cljs-made-easy.line-seq/fs ~(bindings 0))))))
Now, we’re where we want to be:
(with-open [fd (.openSync fs "cljs-love.txt")]
(doseq [line (line-seq fd)]
(js/console.log line))) ;;do something with line
And since we have a seq
, we can use all of our normal tools:
(with-open [fd (.openSync fs "cljs-love.txt")]
;;only return lines that contain "awesome"
(doseq [line (filter #(re-find #"awesome" %) (line-seq fd))]
(js/console.log line))) ;;nothing but awesome
Boom. Pretty sweet.
Here’s a gist of the full code.