Scala – oletraveler.com

Favor Disjunction over Validation

OleTraveler — Sun, 20 Jul 2014 23:26:14 +0000

What is Wrong

For validating data in Scala, Scalaz is the de facto tool for the job. The applicative behavior allows the programmer to accumulate errors from various data flows. However, I believe all the video and blog learning resources I have come across are fundamentally wrong in how they explain using Validation classes. Because of these resources, I have been using validation incorrectly for many years. Recently, I have come to the conclusion that the Validation class should only be used during the act of accumulating, that is for the |@| operator, and Disjunction ( \/ ) should be the main class used for validation and error checking.

An Example

Take for instance validating user input for a CreditCard described in the image below. In this example, we expect the string inputs of cardholder, number, expiration month and expiration year. The process should validate that the cardholder’s input only contain ASCII characters and that it has no more than 30 characters. The process should return both errors if both errors are found. For the number input, the process should first validate that there are only digits and then validate that the number passes the Luhn Algorithm. If the number input is not all digits, then we short circuit out of the Luhn Check. The process should validate individually that the expiration month and expiration year are Integers, converting them if they are, and that they are in a valid range. It should then validate against the combined value of month and year to ensure the expiration date is the current month or in the future. Finally, if no validation errors are found, the process will create and return a new instance of CreditCard, otherwise the process will return a list of all the errors found.

Functions Return Validation, an Incorrect Approach

An incorrect approach, one I have been using for years, is to code all validation functions to return a Validation object where the failure type is a NonEmptyList[ErrorMessage]. Having NonEmptyList[ErrorMessage] as the failure type allows the calling code to easily accumulate errors. In the example below, the validate method is taking 4 input strings and returning a CreditCard object if all the validation checks pass, otherwise it returns one or more ErrorMessages indicating a failure.

   type ErrorMessage = String
 
  def asciiOnly(str: String) : Validation[NonEmptyList[ErrorMessage],String] =
    if ("[\\x00-\\x7F]+".r.pattern.matcher(str).matches) str.success
    else "only ascii characters are allowed".failNel
 
  def maxStrLength(max: Int): (String) => Validation[NonEmptyList[ErrorMessage],String] = str =>
    if (str.length <= max) str.success
    else s"can not exceed ${max} characters".failNel
 
  def digitsOnly(str: String) : Validation[NonEmptyList[ErrorMessage], String] =
    if ("""^\d*$""".r.pattern.matcher(str).matches) str.success
    else s"only numbers are allowed".failNel
 
  def toInt(str: String) : Validation[NonEmptyList[ErrorMessage], Int] = try {
    str.toInt.success
  } catch {
    case e: NumberFormatException => "must be a number".failNel
  }
 
  def modTen(str: String) : Validation[NonEmptyList[ErrorMessage], String] = {
    def passesModTen(str: String): Boolean = ???
 
    if (passesModTen(str)) str.success
    else s"invalid number".failNel
  }
 
  def validMonth(m: Int) : Validation[NonEmptyList[ErrorMessage], Int] =
    if (m >= 1 && m <= 12) m.success
    else "invalid month".failNel
 
  def positiveNu(m: Int) : Validation[NonEmptyList[ErrorMessage], Int] =
    if (m > 0) m.success
    else "must be positive".failNel
 
  def validExpiration(currentMonth: Int, currentYear:Int) : (Int,Int) => Validation[NonEmptyList[ErrorMessage], (Int,Int)] = (month,year) =>
    if (year > currentYear || (year === currentYear && month >= currentMonth)) (month, year).success
    else "card has expired".failNel
 
  case class CreditCard(cardholder: String, number: String, expMonth: Int, expYear:Int)
 
  def validateCardHolder(cardholder: String): Validation[NonEmptyList[ErrorMessage], String] =
    (asciiOnly(cardholder) |@| maxStrLength(10)(cardholder)){ (s ,_)  => s }
 
  def validate(cardholder: String, number: String, expMonth: String, expYear: String) : Validation[NonEmptyList[ErrorMessage], CreditCard] = {
 
    /** Accumulate both */
    val cardHolderV = validateCardHolder(cardholder)
 
    /** Check digits, then modTen */
    val numberV = digitsOnly(number).flatMap(modTen(_))
 
    val validToday = validExpiration(7,2014)
 
    val monthYear = for {
      m <- toInt(expMonth).flatMap(validMonth(_))
      y <- toInt(expYear).flatMap(positiveNu(_))
      my <- validToday(m,y)
    } yield my
 
    /** If there were any errors, return them.  Otherwise create a credit card */
    (cardHolderV |@| numberV |@| monthYear) { (c,n,my) => CreditCard(c,n,my._1,my._2)}
 
  }

The most glaring coding error is the lack of separation of concern in the methods that do the low level validation. The validation methods have taken it upon themselves to return a NonEmptyList[ErrorMessage] in the failure position which does not clearly separate what these methods should be doing — which is returning the correct String if valid or one and only one ErrorMessage objects if invalid. The methods also should not be returning a Validation object since the object in the Failure position should not be required to have a Semigroup typeclass; it is just an ErrorMassage. Note: Although the Validation class does not explicitly require the type contained in the failure type has a Semigroup typeclass, the most interesting method on Validation `|@|` does require that the type contained in the failure type has a Semigroup typeclass. In other words, the only useful form of Validation is one where the Failure case type has a Semigroup typeclass.

Different Approach; Default to Disjunction

Scalaz provides the disjunction type \/[+A, +B] which is isomorphic to scala.Either[A,B] but unlike Either is right biased and integrates better with other Scalaz classes we will be using such as Validation and Klesli. Instead of Validation[NonEmptylist,T], the return type for the validating methods should be \/[ErrorMessage,T] for any validating method that will only return a single ErrorMessage or a \/[NonEmptyList[ErrorMessage],T] for any validating type that may return 1 or more ErrorMessage instances.

  type ErrorMessage = String
 
  def asciiOnly(str: String): ErrorMessage \/ String =
    if ("[\\x00-\\x7F]+".r.pattern.matcher(str).matches) \/-(str)
    else -\/("only ascii characters are allowed")
 
  def maxStrLength(max: Int): (String) => ErrorMessage \/ String = str =>
    if (str.length <= max) \/-(str)
    else -\/(s"can not exceed ${max} characters")
 
  def digitsOnly(str: String): ErrorMessage \/ String =
    if ( """^\d*$""".r.pattern.matcher(str).matches) \/-(str)
    else -\/(s"only numbers are allowed")
 
  def toInt(str: String): ErrorMessage \/ Int = try {
    \/-(str.toInt)
  } catch {
    case e: NumberFormatException => -\/("must be a number")
  }
 
  def modTen(str: String): ErrorMessage \/ String = {
    def passesModTen(str: String): Boolean = ???
 
    if (passesModTen(str)) \/-(str)
    else -\/(s"invalid number")
  }
 
  def validMonth(m: Int): ErrorMessage \/ Int =
    if (m >= 1 && m <= 12) \/-(m)
    else -\/("invalid month")
 
  def positiveNumber(m: Int): ErrorMessage \/ Int =
    if (m > 0) \/-(m)
    else -\/("must be positive")
 
  def validExpiration(currentMonth: Int, currentYear: Int): (Int, Int) => ErrorMessage \/ (Int, Int) = (month, year) =>
    if (year > currentYear || (year === currentYear && month >= currentMonth)) \/-((month, year))
    else -\/("card has expired")
 
  case class CreditCard(cardholder: String, number: String, expMonth: Int, expYear: Int)
 
  def validateCardHolder(cardholder: String): NonEmptyList[ErrorMessage] \/ String =
    (asciiOnly(cardholder).validation.toValidationNel |@| maxStrLength(10)(cardholder).validation.toValidationNel) { (s, _) => s}
      .disjunction
 
  def validate(cardholder: String, number: String, expMonth: String, expYear: String): NonEmptyList[ErrorMessage] \/ CreditCard = {
 
    /** Accumulate both */
    val cardHolderV = validateCardHolder(cardholder)
 
    /** Check digits, then modTen */
    val numberV = digitsOnly(number).flatMap(modTen(_))
 
    val validToday = validExpiration(7, 2014)
 
    val monthYear = for {
      my <- (toInt(expMonth).flatMap(validMonth(_)).validation.toValidationNel |@|
        toInt(expYear).flatMap(positiveNumber(_)).validation.toValidationNel) {
        (_, _)
      }
        .disjunction
      validMY <- validToday(my._1, my._2).leftMap(NonEmptyList(_)) //error return type of my is NonEmptyList[ErrorMessage]
    } yield validMY
 
    /** If there were any errors, return them.  Otherwise create a credit card */
    (cardHolderV.validation |@|
      numberV.validation.toValidationNel |@|
      monthYear.validation) { (c, n, my) => CreditCard(c, n, my._1, my._2)}
      .disjunction
 
  }

The resulting code, compared to the first example, is simpler and more concise at the low level validating methods. Code that calls these validating methods are no longer required to use NonEmptyList as the Semigroup implementation.

Notice how in the methods validateCardHolder and validate, we convert to Validation[NonEmptyList[ErrorMessage],T] only when we expect to accumulate errors and then immediately convert back to \/[NonEmptyList[ErrorMessage],T] when we are done accumulating in that current scope.

Disjunction can have lawful Monad instances. This allows Disjunction to play nicely with scalaz.Kleisli which will allow us to chain together types of (T) => ErrorMessage \/ U such that the output of function 1 can map to the input of function 2. I am hoping to produce another post about this subject in the future.

\/.flatMap is a valid method unlike Validation.flatMap which has been deprecated in Scalaz 7.1 for various reasons.

The conversion from \/[ErrorMessage,String] to Validation[NonEmptyList[ErrorMessage], String] using method validation.toValidationNel does suffer from being a bit verbose and maybe a bit too specific to the NonEmptyList implementation. This can be handled at the Scalaz library level or in the short term, we can add an implicit class to reduce the verbosity of the Disjunction conversions. That implicit class may look something like this.

object Disjunction {
 
  implicit class DisjunctionI[A, B](d: \/[A, B]) {

    /** I would prefer a method that required F[_] to have a semigroup typeclass.
     *  That is over my head at the moment. 
     */ 
    def validationF[F[_]](f: A => F[A]): Validation[F[A],B] =
      d match {
        case -\/(a) => Failure(f(a))
        case \/-(b) => Success(b)
      }
 
    def validationNel: Validation[NonEmptyList[A], B] = validationF(NonEmptyList(_))
 
    def leftNel: \/[NonEmptyList[A],B] = d.leftMap(NonEmptyList(_))
 
  }
}

One must also be careful not to use a (T) => \/[E,U] in \/[NonEmptyList[E],T].flatMap. Convert the \/[E,U] to a \/[NonEmptyList[]] as show in the validate method: validToday(my._1, my._2).leftMap(NonEmptyList(_)). This verbosity can also be reduced with an implicit class.

Code for this blog can be found in a Gist. For more examples favoring disjunction, consult my port of Scalaz Validation Contrib.

Pattern Matching vs Subtype Polymorphism

OleTraveler — Thu, 30 May 2013 00:06:15 +0000

This post is an overview of the talk Living in a Post-Functional World by Daniel Spiewak. In ithe talks about The Expression Probmel which is defined as:

Define a datatype by case.
Add new Cases to the datatype.
Add new functions over the datatype
Don’t recompile (don’t change any existing code)

There are no clear solution to this problem, but there are different strategies. In Scala, the options are pattern matching and subtype polymorphism. Pattern matching allow adding a new function without altering existing codc, but alter existing code when adding new cases. Whereas subtype polymorphism allows adding new cases without recompiling existing code, however existing code would need to be altered when adding function.

For example, here is Daniel’s example using Patter Matching:

  
object Pattern {

  sealed trait Expr
  case class Add(e1: Expr, e2: Expr) extends Expr
  case class Sub(e1: Expr, e2: Expr) extends Expr
  case class Num(n: Int) extends Expr

  def value(e: Expr): Int = e match {
    case Add(e1, e2) => value(e1) + value(e2)
    case Sub(e1, e2) => value(e1) - value(e2)
    case Num(n) => n
  }
  val expr = Add(Num(5), Sub(Num(50), Num(13)))

  val result = value(expr) //41
}

Using the above as a codebase, adding a function for pretty printing the expression would not require any of the existing code to be touched. However adding the new algrebaic type Mult would require the value function be modified to match on the Mult type. (Note that the Visitor’s pattern is a poor-mans pattern matching in Object Oriented languages).

Here is Daniel’s example of Subtype Polymorphism.

  
object Subtype {
  sealed trait Expr {
    def value: Int
  }

  case class Add(e1: Expr, e2: Expr) extends Expr {
    def value = e1.value + e2.value
  }

  case class Sub(e1: Expr, e2: Expr) extends Expr {
    def value = e1.value - e2.value
  }

  case class Num(n: Int) extends Expr {
    def value = n
  }
  val expr = Add(Num(5), Sub(Num(50), Num(12)))

  val result = expr.value
}

In order to add a pretty print method, all the case classes would have to be modified. However if we were to add the Multi class, none of the existing case classes would have to be touched.

So, when designing a system that is expected to expand with more cases, use subtype polymorphism, whereas if the system is expected to expand more with functions, use pattern matching.

Arrow[Function1]

OleTraveler — Tue, 02 Aug 2011 15:41:41 +0000

Scalaz defines Arrow as such.

trait Arrow[A[_, _]] {

  val category: Category[A]
  def arrow[B, C](f: B => C): A[B, C]
  def first[B, C, D](a: A[B, C]): A[(B, D), (C, D)]
  def second[B, C, D](a: A[B, C]): A[(D, B), (D, C)]
}

Easy to see with Function1 implementation.

  implicit def Function1Arrow: Arrow[Function1] = new Arrow[Function1] {
    val category = Category.Function1Category
    
    def arrow[B, C](f: B => C) = f

    /** takes a Function1[B,C] and returns a Function1[(B,D),(C,D)].  As you can see
    * only the _1, or the 'first' argument is modified to a different type
    * and the _2 or 'second' is not touched.
    */
    def first[B, C, D](a: B => C) =
      (bd: (B, D)) => (a(bd._1), bd._2)

    /** takes a Function1[B,C] and returns a Function1[(D,B),(D,C)].  
    * Only the _2 or second element of the Tuple is touched.
    */
    def second[B, C, D](a: B => C) =
      (db: (D, B)) => (db._1, a(db._2))
  }

And here we have some example uses.

scala> val plus: Function1[Int,Int] = (_:Int) + 1 
plus: (Int) => Int = 

scala> val firstChar: Function1[String,Option[Char]] = (s:String) => Option(s.charAt(0))
firstChar: (String) => Option[Char] = 

scala>  val m2 = Map(1 -> "astroman", 2 -> "boat")
m2: scala.collection.immutable.Map[Int,java.lang.String] = Map((1,astroman), (2,boat))

scala>  m2.map(plus.first)
res37: scala.collection.immutable.Map[Int,java.lang.String] = Map((2,astroman), (3,boat))

scala> m2.map(firstChar.second)                                                          
res39: scala.collection.immutable.Map[Int,Option[Char]] = Map((1,Some(a)), (2,Some(b)))

The trait MAB defines some useful functions for composing functions.

The *** will create a function in which each composted function will work on individual values of the Tupl2.

scala> val p = plus *** firstChar
p: ((Int, String)) => (Int, Option[Char]) = 

scala> m2.map(p)
res41: scala.collection.immutable.Map[Int,Option[Char]] = Map((2,Some(a)), (3,Some(b)))

&&& will perform two operations on a single value and put the values in a Tuple2.

scala> val plus2 = (_:Int) + 2
plus2: (Int) => Int = 

scala> val q = plus &&& plus2
q: (Int) => (Int, Int) = 

scala> q(3)
res44: (Int, Int) = (4,5)

The product method will perform the same operation on both elements of the Tuple2

scala> plus.product apply (1 -> 2)
res46: (Int, Int) = (2,3)

Hibernate, Proxy Object and the Visitor now with Scala and Pattern Matching

Ole Traveler — Wed, 20 Apr 2011 15:36:01 +0000

The way hibernate implements lazy loading is that a proxy object facilitates the loading of them. This is well know and is documented here:

https://community.jboss.org/wiki/ProxyVisitorPattern

Unfortunately this means that pattern matching on classes may now work. For instance, if the following were hibernate entities:

trait PaymentSource
class CreditCard extends PaymentSource
class Check extends PaymentSource
class User {
  val paymentSources = List[PaymentSource]
}

Doing a patter match as such would not work if the loaded object is a proxy:

user.paymentSources.map( _ match {
  case _:CreditCard => "Credit Card"
  case _:Check => "Check"
})

In Java, the work around is to use a visitor pattern. Here it is implemented in scala:

trait PaymentSource {
  def accept[T](visitor: PsVisitor[T]) : T
}

class CreditCard extends PaymentSource{
  override def accept[T](visitor: PsVisitor[T]) = {
    visitor.visit(this)
  }
}

class Check extends PaymentSource{
  override def accept[T](visitor: PsVisitor[T]) = {
    visitor.visit(this)
  }
}

trait PsVisitor[T] {
  def visit(cc: CreditCard) : T
  def visit(ck: Check) : T
}

Although this works consistently, the syntax is not quite as nice.

  user.paymentSources.map( _.accept(new PsVisitor[Unit]() {
        def visit(cc: CreditCard) = println("CreditCard")
        def visit(ck: Check) = println("Check")
   })

A solution to this is to add simple wrapper classes for each concrete class and a simple visitor implementation to do the actual wrapping

object PsFilter {
  def apply(ps: PaymentSource) = {
    ps.accept(new PsVisitor[PsResult] {
      def visit(cc: CreditCard) = CreditCardResult(cc)
      def visit(ck: Check) = CheckResult(ck)
    })
  }
}

sealed trait PsResult
case class CreditCardResult(cc: CreditCard) extends PsResult
case class CheckResult(cc: Check) extends PsResult

And now we can pattern match again

user.paymentSources.map(PsFilter(_) match {
      case CreditCardResult(cc) => "CreditCard"
      case CheckResult(ck) => "Check"
})

Source for this can be found at Git Hub: https://github.com/OleTraveler/HibernateInheritance